Globus Connectivity¶
When you are not forced into using SneakerNet functionality, you will want to move files over the internet. This can be accomplished by hooking into Globus, a high-speed data transfer service.
This guide assumes you have:
Two librarians already set up and ready to transfer files.
Two globus endpoints, one at the source librarian and one at the destination librarian.
A globus account that can transfer files between those two endpoints.
The transfers between these two librarians will be automated. As such, it is recommended that you set up some observability of your librarians before you begin. This is how you will monitor your transfers.
A crucial quirk of the librarian system is that your globus endpoint
must have the exact same absolute filesystem structure as the one that
the libraria sees. For example, a path like /my/file/location must be
/my/file/location on the globus endpoint too; no globus roots are allowed.
Inter-librarian transfers happen asynchronously. In an A to B transfer:
Within the
send_clonetask: + A calls up B to batch-stage up toNfiles. + B createsNstaging directories. + B responds with the globus endpoint ID to send files to. + A creates an item in the ‘send queue’ linked to these transfers.Within the
consume_queuetask: + A picks up all available ‘send queue’ tasks. + Each task is shipped off in a single globus transfer (i.e. each transferis responsible for sending up to
Nfiles).Up to
M, which is set to not go over the globus-imposed limit of 100, globus tasks can be active in the globus-managed queue at once.
Within the
check_consumed_queuetask: + Each active globus task is checked to confirm its status. Transfer ofactual bytes is handled by globus running in a separate process.
If the globus task is complete, A calls up B and marks all files as STAGED.
Within the
recv_clonetask: + Once the incoming files are staged, B ingests them. + B calls back to A to register a ‘Remote Instance’ of each file.
Required Variables¶
To enable globus connectivity, on the source side you will need to provide (in the server settings JSON config):
globus_enableneeds to be set totrueglobus_client_idneeds to be set to the UUID for your globus account with associated privilges.globus_client_native_app: a boolean describing whether or not to use a Native App (true) or a Confidential App (false) for the client. Usually, the Native App is used (set this totrue).globus_local_endpoint_id: the endpoint UUID of the source librarian.globus_client_secret_file: your authorization API key from globus.
On the destination side, you will also need to create a configuration for an asynchronous transfer manager in your destination store:
All that is required here is the destination_endpoint which is the
UUID of the globus endpoint for the destination librarian.
Required Background Tasks¶
At a minimum, you will need send_clone, consume_queue, and
check_consumed_queue on the source side, and recv_clone on the
destination side. More information on background tasks is available on
the appropriate page.
Hypervisors¶
Sometimes things go wrong. They will go wrong if you are running applications on the internet. Every now and then a callback will fail, or a transfer will get stuck… To automatically deal with these problems, we have ‘hypervisors’. There are two types of hypervisors:
incoming_transfer_hypervisor: ran on the destination side, and takes the usual parameters plusage_in_days.outgoing_transfer_hypervisor: ran on the source side, and takes the usual parameters plusage_in_days.
If an incoming or outgoing transfer passes the age specified here,
a call to the opposing librarian is made to query the status. If
there is a mis-match, the problem is handled gracefully. It is strongly
recommended that you let outgoing transfers age out sooner (i.e. age_in_days
is less) than incoming transfers, as with the push-based librarian it
is easier to handle things gracefully in this way.
The most common case, for example, is when a callback from B to A fails because of network interruption. Here, the outgoing transfer hypervisor will find it (as it will still be STAGED on A), call up B, find that the instance exists, and register a remote instance on A. The transfer is then marked as complete.
Enabling and Disabling Transfers¶
There may be points in time when you want to shut down transfers to specific machines. Whilst this can always be performed by editing the configuration files and restarting the server, that is not always optimal.
Instead, you can use the following command-line script inside a container (i.e. with direct access to the database):
librarian-change-transfer-status [-h] --librarian LIBRARIAN [--enable] [--disable]
Change the status of an external librarian, to enable or disable transfers.
options:
-h, --help show this help message and exit
--librarian LIBRARIAN
Name of the librarian to change the status of.
--enable Enable the librarian.
--disable Disable the librarian.
Or using the client:
librarian get-librarian-list [-h] [--ping] CONNECTION-NAME
Get a list of librarians known to the librarian.
positional arguments:
CONNECTION-NAME Which Librarian to talk to; as in ~/.hl_client.cfg.
options:
-h, --help show this help message and exit
--ping Ping the librarians to check they are up.
to find information about the connected librarians, and to set their properties:
librarian set-librarian-transfer [-h] [--name NAME] [--enabled] [--disabled] CONNECTION-NAME
Set the transfer state of a librarian.
positional arguments:
CONNECTION-NAME Which Librarian to talk to; as in ~/.hl_client.cfg.
options:
-h, --help show this help message and exit
--name NAME The name of the librarian to set the transfer state of.
--enabled Set the librarian to enabled for transfers.
--disabled Set the librarian to disabled for transfers.
These client tools require an administrator account to use.