Transferring Data Using rclone

Last updated March 05, 2024

Rclone is an open-source, command-line utility for managing files in cloud storage. Rclone allows users to copy or sync files from local storage to a cloud storage provider like Google Drive, Dropbox, OneDrive, etc (or vice versa). This is useful for creating synced backups, for example.

Please note that rclone transfers may be slow for directories with large numbers of files. Alternative backup programs like Borg, Kopia, or Restic may be better solutions in those cases.

Due to security risks, please be mindful of the type of information being transferred. Where possible, omit all information that may be considered confidential. For examples of confidential information that requires additional consideration, visit https://sites.usc.edu/trojansecure/information-data-security/.

0.0.1 Loading the rclone module

Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.

On CARC clusters, rclone is available as a software module. To use rclone, first load the corresponding module:

module load rclone

After loading the rclone module, all rclone commands will become accessible. A list of current commands can be found here.

0.0.2 Creating an rclone remote connection

Usage of each cloud service requires the setup of a remote connection, or “remote”, to that service.

To begin the process of creating a new remote connection, use the command rclone config. You should see the following three options in your terminal:

No remotes found - make a new one 
n) New remote 
s) Set configuration password 
q) Quit config 
n/s/q>

Enter “n” to continue with a new remote setup.

Rclone will then prompt you one-by-one to enter details about the new remote like the following:

  • name
  • Storage
  • client_id
  • client_secret
  • scope

It will then prompt you to choose some configuration settings and then a method to grant permission for rclone to access your cloud storage account.

There may be a different set of steps to follow depending on the storage provider. See more information for each provider by consulting the official Rclone docs.

0.0.2.1 Example: Connecting to Google Drive

The following example shows how to set up a Google Drive remote connection. This process requires installing Rclone on your local computer, configuring the remote on your local computer, and then copying the resulting configuration file to your CARC home directory. See the official docs here.

Installing Rclone

First install Rclone on your local computer. See instructions on the Rclone downloads page.

Configuring remote

Once installed, enter rclone config to configure a new remote. Enter “n” to create a new remote and follow the prompts.

name: Name the remote something informative, e.g., “google-drive”.

Storage: After choosing a name, the next prompt should be a long numbered menu with different storage solutions. Look for Google Drive and enter the associated number.

client_id and client_secret: The next prompts are Google Application Client ID client_id and Google Application Client Secret client_secret. Leave both prompts blank by pressing “Enter” to accept the default values. Rclone’s default client ID is shared by all users of rclone, possibly resulting in slow performance. It is recommended to create your own client ID instead. Instructions for this can be found here.

scope: Choose “1” to allow access to all files.

service_account_file: Leave this blank by pressing “Enter” to use interactive login.

Edit advanced config? (y\n): Enter “n”.

Use auto config?: Enter “y”.

Your browser should then open a new page. Log in to your Google Account if needed and authorize the rclone app for access.

Configure this as a Shared Drive (Team Drive)?: Enter “n” if it is a personal drive. Enter “y” if it is a shared drive.

You should then see a configuration complete message and a copy of the configuration settings.

Keep this "google-drive" remote?: Enter “y” if all the information was entered correctly.

Enter “q” to exit the main rclone config menu.

Copying the configuration file

Find the config file location by running the following command:

rclone config file

Note the file location and then copy the file to your home directory on CARC systems. For example, using rsync:

rsync ~/.config/rclone/rclone.conf ttrojan@hpc-transfer1.usc.edu:~/.config/rclone/

Once copied to the correct location, you can then test the connection from CARC systems by listing files in the remote. For example:

rclone ls "google-drive:"

Rclone can be used to list, create, or delete files on the remote connection, similar to using commands to view local files.

For example, to view the files in a ProjectDocs folder on your “google-drive” remote:

rclone ls "google-drive:ProjectDocs"

To create a new subfolder in ProjectDocs:

rclone mkdir "google-drive:ProjectDocs/Test"

To delete a file in the remote ProjectDocs folder:

rclone deletefile "google-drive:ProjectDocs/test.txt"

0.0.4 Copying files

Rclone’s copy command creates copies of files between a source and a destination (e.g., from CARC systems to your Google Drive storage). It will skip files that already exist in the destination.

The copy command takes the form of:

rclone copy "source:sourcepath" "destination:destinationpath"

Options:

  • Using the flag --update checks that skipped files in the remote destination have a newer modified time than the file being transferred. This ensures the newest version of files is available in the cloud.
  • --verbose gives information about every file being transferred. This can create a lot of screen output but may be helpful to diagnose problems.
  • --progress shows progress through percentage completed and total elapsed time. This can be useful for large data transfers.
  • The --no-traverse flag prevents rclone from traversing the entire destination directory when copying files. If the remote destination is very large and you are only copying a small number of files from the source, this can save a lot of time.
  • Using filters like --max-age <time> or --max-size <size> will make the copying process more efficient and avoid copying or traversing unwanted files. More details about filtering can be found here.

The entire list of copy options can be found here.

An example command to copy files from a CARC scratch directory to the ProjectDocs folder in your Google Drive is:

rclone copy --update "/scratch1/ttrojan/files" "google-drive:ProjectDocs"

where “google-drive” is the name of the remote connection to your Google Drive.

If the destination folder (ProjectDocs) does not exist, rclone will create it.

To check if the files have transferred properly, you can manually list files at the remote location with rclone ls "destination:destinationpath".

Additionally, rclone copy can be used within a regularly executed bash script to emulate a scheduled “backup” to your cloud service.

0.0.5 Synchronizing files

The difference between copying files and synchronizing files is that copy creates duplicates from a source to a destination, but sync creates a replica of the source at the destination. In other words, if files are deleted from the source, synchronizing the source and destination will delete files from the destination as well. Copying will never delete files in the destination.

The sync command takes the form of:

rclone sync "source:sourcepath" "destination:destinationpath"

An example command to sync a CARC project directory to the ProjectDocs folder in your Google Drive is:

rclone sync "/project/ttrojan_123/files" "google-drive:ProjectDocs"

where “google-drive” is the name of the remote connection to your Google Drive.

Additional options can be added similar to the copy command. The entire list of sync options can be found here.

0.0.6 Additional resources

Rclone
Rclone docs