SneakerNet Transfers ==================== Privilege Level Required: Administrator. Version: 2.1.0 and above. Database Requirement: ``d8934c52bac5`` Introduction ------------ SneakerNet transfers are asynchronous transfers that can be managed by the librarian. A SneakerNet transfer is a transfer of data from one site to another using human power; i.e. the data is moved by hand (for instance, on a USB stick, hard drives, etc.). A generic SneakerNet transfer occurs in the following steps: - A clone of a sub-set of data is created on an external device. - A manifest of the data is created. - The data is physically transferred to the destination site. - The manifest is used to ingest the data into the destination site. - A callback from the destination to the source occurs to confirm the transfer has completed successfully. Specifically within the librarian, SneakerNet transfers have the following steps: - If this is the first time using SneakerNet, add a new store to the librarian that represents the device you would like to use to SneakerNet data. If the store already exists, make sure it is enabled using the administrator endpoint ``set_store_state``. - Set up a ``CreateLocalClone`` background task on the source librarian to create a copy of the data to be transferred. - Register the remote librarian with the source librarian and vice-versa. - Use the ``get_store_manifest`` client operation to create a manifest of the cloned store. There are a few helpful options here: ``create_outgoing_transfers`` creates an ``OutgoingTransfer`` object for each file in the store to the ``destination_librarian``, ``disable_store`` disables the store on the source librarian before generating the manifest (to ensure no new data is added to the store and to allow the device to be swapped out), and ``mark_local_instances_as_unavailable`` marks all instances of the file on the new store as unavailable. - This store manifest can then be saved to the device to be moved along with the data. It is recommended that you back up (and potentially version control) the manifests. - Move the device to the destination site. - Use the ``ingest_store_manifest`` client operation to ingest the data into the destination librarian. At this point, the data is only staged on the librarian, and is not yet available on the store or to users. - The ``RecieveClone`` background task on the destination librarian will create a ``File`` and ``Instance`` for each file in the store. Afterwards, the destination librarian will use its database entry for the source librarian to callback. As part of processing this callback, the source librarian will mark its ``OutgoingTransfer`` as complete and create a ``RemoteInstance`` for each file that has been successfully transferred. Below, we have a step-by-step guide to performing a SneakerNet transfer using the librarian command-line interface. Step 1: Adding or enabling a store ----------------------------------- For more information on adding a store, see `Stores <./Stores.rst>`_. It is crucial to mark SneakerNet stores as 'non-ingestible' (i.e. set ``ingestible: false`` in the configuration file), otherwise they themselves will ingest new data passed to the librarian. There are three main states that are important for stores: 1. ``ingestible``: Whether or not 'fresh' files (those sent from uploads or from clones) can be added to the store. 2. ``enabled``: Whether or not the store is currently marked as available for use. All stores start out enabled, but may be disabled when they are full, or a disk is being swapped out. 3. ``available``: This is an internal state that is tracked, irrespective of ``ingestible`` or ``enabled`` which indicates whether the physical device is available for recieving commands. For local stores, this is generally forced to be true. If your store is starting out disabled, you will need to enable it by using the ``set_store_state`` endpoint. This can be easily accomplished using the command-line utility: .. code:: $ librarian set-store-state local-librarian --store local-store --enabled Store local-store state set to enabled. This sets a store called ``local-store`` on a librarian (as defined in ``~/.hl_config.cfg``) to be enabled. If the store is already enabled, this will still go through. If you need to know what stores are available on the librarian, you can use the following command-line wrapper to ``get_store_list``: .. code:: $ librarian get-store-list local-librarian local-store (local) [599.5 GB Free] - Ingestable - Available - Enabled Which will print out helpful information about all attached stores to the librarian. As these things are generally meant to be transparent to regular users of the librarian, these endpoints require administrator privileges. Step 2: Background tasks and remote librarians ---------------------------------------------- There are two core background tasks that are used in SneakerNet transfers: ``CreateLocalClone`` and ``ReceiveClone``. The first is used at the source site to create a complete clone of the data ingested into the librarian, and the latter is used to ingest the data into the destination librarian. More information on background task scheduling is available in the `Background Tasks <./Background.rst>`_ section. At each librarian site, you will also need to register the remote librarian using the command-line tools. This will also generally involve account provision on both librarians, as callbacks are required. To provision a new account, you will need to use the ``create_user`` endpoint, which can be accessed through the command-line tool: TODO: THIS SHOULD BE COMPLETED IN RESPONSE TO ISSUE #61. Once the appropriate accounts are provisioned, you will need to register them with their respective librarians. This can be done with the ``add_librarian`` endpoint: .. code:: $ librarian add-librarian local-librarian \ --librarian remote-librarian \ --url http://remote-librarian \ --port 5000 \ --authenticator username:password \ # This is encrypted by the server. This will try to ping the server. If you do not want that to happen (for instance, if the server is not currently available), you can use the ``--do-not-check-connection`` option to skip the check. You can list the librarians with ``get-librarian-list`` and remove a librarian with ``remove-librarian``. Transfer matching is always done by name, not by database row ID, so you should be more than able to remove and re-add librarians without any issues. Step 3: Creating a store manifest --------------------------------- Once one of your SneakerNet stores are filled up, you can create a manifest of the store using the ``get_store_manifest`` endpoint. This process will also disable the store on the source librarian, create outgoing transfers, and mark local instances as unavailable, ready for the disk to be replaced. .. code:: $ librarian get-store-manifest local-librarian \ --store local-clone --create-outgoing-transfers \ --disable-store --mark-instances-as-unavailable \ --output /path/to/manifest.json The file will be saved as a serialized json object. It is strongly recommended that you back up this file, as it is the only unique record of the data that is being transferred. It should also likely be packaged with the SneakerNet transfer for easy ingestion on the other side. .. note:: Safety Note It may be worth disabling the store manually first, then generating a manifest with none of the extra options turned on (i.e. no ``--create-outgoing-transfers`` or ``--mark-instances-as-unavailable``) at first. You can then re-run the command to do these things, safe in the knowledge you have an already existing backup of the store manifest. Step 4: Moving the data ----------------------- You will then need to move the data to the destination site. This is generally done by physically moving the device to the destination site. It is recommended that you also move the manifest file with the data, as it will be required for the next step, as well as sending this (considerably smaller amount of data) over the network. Step 5: Ingesting the store manifest ------------------------------------ Once the data has been moved to the destination site, you will need to ingest the data into the librarian. This is done using the ``ingest_store_manifest`` endpoint: .. code:: $ librarian ingest-manifest local --manifest ./test_manifest.json --store-root=/path/to/sneaker/device/store Ingesting manifest: 100%|███████████████████████████████| 4/4 [00:00<00:00, 31.48it/s] Successfully ingested 3/4 files, 1/4 already existed. If this fails, you can always try again (as long as the root cause is fixed!) as the librarian will not ingest the same file twice. You will need to have the optional library ``tqdm`` installed to see the progress bar. Note that this does not necessarily mean that the files are available on the destination librarian right away. You will need to wait until the ``ReceiveClone`` background task has completed, and the source librarian has received the callback from the destination librarian.