Stores¶
Stores in the librarian are abstractions over various storage systems. The most common of these is a local POSIX filesystem, but there is a completely abstract system within the librarian for interacting with any generic storage system (such as a block or key-value store).
Because stores are such a foundational part of the librarian, they are
defined in the configuration file, as pointed to by
$LIBRARIAN_CONFIG_PATH.
The configuration file is a JSON file, and the stores are defined in the
add_stores key. The value of this key is a dictionary that is
de-serialized to a StoreMetadata class. In the same block, you will
need to defined the transfer managers that can be used for data
ingress and egress. Documentation on how to configure the stores
is available under the stores themselves, and the transfer managers
likewise.
All stores have three core properties:
store_name: the name of the store, used in various places.store_type: the type of the store (must be"local").ingestable: whether or not new files can be placed on this store. Usually you only want one ingestible store per librarian so all of your files are placed on the same disk. Most non-ingestible stores are backup devices or sneaker drives.
You will end up with a JSON configuration like this:
"add_stores": [
{
"store_name": "store",
"store_type": "local",
"ingestable": true,
"store_data": {
"staging_path": "/tmp/store/libstore/staging",
"store_path": "/tmp/store/libstore/store",
"report_full_fraction": 1.0,
"group_write_after_stage": true,
"own_after_commit": true,
"readonly_after_commit": true
},
"transfer_manager_data": {
"local": {
"available": true,
"hostnames": [
"compute-0.0.local",
"example-librarian-hostname"
]
}
},
"asynchronous_transfer_manager_data": {
"global": {
"available": true,
"destination_endpoint": "ACBS-2223-4242481ABB-88888"
}
}
}
]
If you change the JSON configuration of a store, you will need to migrate it. You can do this with the migration script:
Local Stores¶
Currently the only implemented storage service in the librarian is a
‘local’ store. This type of store is a wrapper around a POSIX-compatible
filesystem. Associated with this store is a required set of data arguments,
in the store_data section:
staging_path: the path to the staging diretory (this must exist).store_path: the path to the final store location. This is where all files ingested into the librarian will live.report_full_fraction: the fraction of the total space on the device at which to treat this store as full. This is by default 1.0, but can be useful to leave a little space on, for example, sneaker drives. With this set to 0.5, an 8 GB storage device would report full after ingesting up to 4 GB of data.group_write_after_stage: whether to explicitly chmod staging directories to 775 after creation.own_after_commit: whether to explicitly chown files after they are committed to the store location.readonly_after_commit: whether to chmod files in the store directory to 444 after ingestion is complete.
Transfer Managers¶
Transfer managers are used to ingest files into your librarian from scratch, and for other synchronous transfer mechanisms like the clone transfers to other drives. There is currently only one type of (synchronous) transfer manager: local.
Local¶
The local transfer manager is a wrapper around a simple filesystem-level copy.
‘Access control’ is then performed using the usual unix filesystem permissions;
by using group_write_after_stage, one can only write into the staging directories
if they are in the appropriate unix group.
Alongside available, which simply sets whether or not this store can have
files added to it, there is one parameter: hostnames. This is the list of
hostnames that can copy to the librarian, enforced by the client. As such, this
is simply informational, and can be useful when you have different stores
that are available from different machines.
One natural consequence of this setup is that the absolute paths of files as seen by the cilent must be the same as those seen by the librarian. For instance, when bind mounting a store into a container, the source and destination paths must be identical.
Asynchronous Transfer Managers¶
Asynchronous transfer managers are used to transfer data between librarians. There are three: local, rsync, and globus. The only one that is recommended to be used in production is the globus transfer manager, for which there is a dedicated page.