Transferring Data Using rclone
Rclone is an open-source, command-line utility for managing files in cloud storage. Rclone allows users to copy or sync files from local storage to a cloud storage provider like Google Drive, Dropbox, OneDrive, etc (or vice versa). This is useful for creating synced backups, for example.
Please note that rclone transfers may be slow for directories with large numbers of files. Alternative backup programs like Borg, Kopia, or Restic may be better solutions in those cases.
Due to security risks, please be mindful of the type of information being transferred. Where possible, omit all information that may be considered confidential. For examples of confidential information that requires additional consideration, visit https://sites.usc.edu/trojansecure/information-data-security/.
0.0.1 Loading the rclone module
Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.
On CARC clusters, rclone is available as a software module. To use rclone, first load the corresponding module:
module load rclone
After loading the rclone module, all rclone commands will become accessible. A list of current commands can be found here.
0.0.2 Creating an rclone remote connection
Usage of each cloud service requires the setup of a remote connection, or “remote”, to that service.
To begin the process of creating a new remote connection, use the command rclone config
. You should see the following three options in your terminal:
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q>
Enter “n” to continue with a new remote setup.
Rclone will then prompt you one-by-one to enter details about the new remote like the following:
- name
- Storage
- client_id
- client_secret
- scope
It will then prompt you to choose some configuration settings and then a method to grant permission for rclone to access your cloud storage account.
There may be a different set of steps to follow depending on the storage provider. See more information for each provider by consulting the official Rclone docs.
0.0.2.1 Example: Connecting to Google Drive
The following example shows how to set up a Google Drive remote connection. This process requires installing Rclone on your local computer, configuring the remote on your local computer, and then copying the resulting configuration file to your CARC home directory. See the official docs here.
Installing Rclone
First install Rclone on your local computer. See instructions on the Rclone downloads page.
Configuring remote
Once installed, enter rclone config
to configure a new remote. Enter “n” to create a new remote and follow the prompts.
name
: Name the remote something informative, e.g., “google-drive”.
Storage
: After choosing a name, the next prompt should be a long numbered menu with different storage solutions. Look for Google Drive and enter the associated number.
client_id
and client_secret
: The next prompts are Google Application Client ID client_id
and Google Application Client Secret client_secret
. Leave both prompts blank by pressing “Enter” to accept the default values. Rclone’s default client ID is shared by all users of rclone, possibly resulting in slow performance. It is recommended to create your own client ID instead. Instructions for this can be found here.
scope
: Choose “1” to allow access to all files.
service_account_file
: Leave this blank by pressing “Enter” to use interactive login.
Edit advanced config? (y\n)
: Enter “n”.
Use auto config?
: Enter “y”.
Your browser should then open a new page. Log in to your Google Account if needed and authorize the rclone app for access.
Configure this as a Shared Drive (Team Drive)?
: Enter “n” if it is a personal drive. Enter “y” if it is a shared drive.
You should then see a configuration complete message and a copy of the configuration settings.
Keep this "google-drive" remote?
: Enter “y” if all the information was entered correctly.
Enter “q” to exit the main rclone config
menu.
Copying the configuration file
Find the config file location by running the following command:
rclone config file
Note the file location and then copy the file to your home directory on CARC systems. For example, using rsync
:
rsync ~/.config/rclone/rclone.conf ttrojan@hpc-transfer1.usc.edu:~/.config/rclone/
Once copied to the correct location, you can then test the connection from CARC systems by listing files in the remote. For example:
rclone ls "google-drive:"
0.0.3 Navigating remote cloud storage
Rclone can be used to list, create, or delete files on the remote connection, similar to using commands to view local files.
For example, to view the files in a ProjectDocs folder on your “google-drive” remote:
rclone ls "google-drive:ProjectDocs"
To create a new subfolder in ProjectDocs:
rclone mkdir "google-drive:ProjectDocs/Test"
To delete a file in the remote ProjectDocs folder:
rclone deletefile "google-drive:ProjectDocs/test.txt"
0.0.4 Copying files
Rclone’s copy
command creates copies of files between a source and a destination (e.g., from CARC systems to your Google Drive storage). It will skip files that already exist in the destination.
The copy
command takes the form of:
rclone copy "source:sourcepath" "destination:destinationpath"
Options:
- Using the flag
--update
checks that skipped files in the remote destination have a newer modified time than the file being transferred. This ensures the newest version of files is available in the cloud. --verbose
gives information about every file being transferred. This can create a lot of screen output but may be helpful to diagnose problems.--progress
shows progress through percentage completed and total elapsed time. This can be useful for large data transfers.- The
--no-traverse
flag prevents rclone from traversing the entire destination directory when copying files. If the remote destination is very large and you are only copying a small number of files from the source, this can save a lot of time. - Using filters like
--max-age <time>
or--max-size <size>
will make the copying process more efficient and avoid copying or traversing unwanted files. More details about filtering can be found here.
The entire list of copy
options can be found here.
An example command to copy files from a CARC scratch directory to the ProjectDocs folder in your Google Drive is:
rclone copy --update "/scratch1/ttrojan/files" "google-drive:ProjectDocs"
where “google-drive” is the name of the remote connection to your Google Drive.
If the destination folder (ProjectDocs) does not exist, rclone will create it.
To check if the files have transferred properly, you can manually list files at the remote location with rclone ls "destination:destinationpath"
.
Additionally, rclone copy
can be used within a regularly executed bash script to emulate a scheduled “backup” to your cloud service.
0.0.5 Synchronizing files
The difference between copying files and synchronizing files is that copy
creates duplicates from a source to a destination, but sync
creates a replica of the source at the destination. In other words, if files are deleted from the source, synchronizing the source and destination will delete files from the destination as well. Copying will never delete files in the destination.
The sync
command takes the form of:
rclone sync "source:sourcepath" "destination:destinationpath"
An example command to sync a CARC project directory to the ProjectDocs folder in your Google Drive is:
rclone sync "/project/ttrojan_123/files" "google-drive:ProjectDocs"
where “google-drive” is the name of the remote connection to your Google Drive.
Additional options can be added similar to the copy
command. The entire list of sync
options can be found here.