Storage File Systems

Last updated March 11, 2024

All CARC users are assigned three directories on three file systems where they can store files and run programs:

  • /home1
  • /project
  • /scratch1

Researchers using the cryo-EM instruments at USC also have the option of requesting a dedicated cryo-EM storage directory on /cryoem2.

These are global file systems in that you can access them from any Discovery, Endeavour, or transfer node. You can list the directories available to you and storage usage for each by entering the command myquota.

The following table provides an overview of the file system use recommendations:

File system Disk space File recovery (snapshots) Purpose
/home1 100 GB per user Yes Personal files, configuration files, software
/project Default of 5 TB per project (can be increased in 5 TB increments), shared among group members Yes Shared files, data files, software
/cryoem2 Can be requested in 5 TB increments, shared among group members No Designated for cryo-EM research
/scratch1 10 TB per user No Temporary files and high-performance I/O

CARC also provides the Cold Storage System for data archiving purposes.

0.0.1 Sensitive data

Currently, CARC systems do not support the use or storage of sensitive data. If your research work includes sensitive data, including but not limited to HIPAA-, FERPA-, or CUI-regulated data, see our Secure Computing Compliance Overview or contact us at carc-support@usc.edu before using our systems.

0.0.2 Home file system (/home1)

The /home1 file system has a total capacity of 136 TB, running NFS/ZFS on dedicated storage machines. It consists of personal directories for CARC users. Your home directory has a quota of 100 GB of disk space and 1.91 million files. It is intended for storing personal files, configuration files, and software. I/O-intensive jobs should not be run directly from your home directory.

When you log in, you will always start in your home directory, which is located at:

/home1/<username>

Use the cd command to quickly change to your home directory from another directory.

We keep two weeks of snapshots for files in your home directory. You can think of these snapshots as semi-backups. If you accidentally delete some data, then we will be able to recover it if it was captured by a snapshot in the past two weeks. If data was created and deleted within a one-day period, between snapshots, then we will not be able to recover it. You should always keep extra backups of your important data and other files because of this.

If you need to recover a deleted file, please contact the CARC team by submitting a ticket and we will determine if a snapshot of the file exists.

0.0.3 Project file system (/project)

The /project file system has a total capacity of 8.4 PB and consists of directories for different research project groups. It offers high-performance, parallel I/O, running ZFS/BeeGFS on dedicated storage machines. The default quota for each project directory is 5 TB of disk space and 30 million files.

A project’s PI must request a project storage allocation via the CARC User Portal. Each PI can request up to 10 TB of storage across their project(s) at no cost. If more than 10 TB is needed, a PI can request additional storage space in 5 TB increments at a cost of $40/TB/year. For more information on storage quotas and pricing, see the Accounts and Allocations page.

Each project member has access to their group’s project directory, where they can store data, scripts, and related files and install software. The project directory should be used for most of your CARC work, and it’s also where you can collaborate with your research project group. Users affiliated with multiple CARC projects will have access to multiple project directories so they can easily share their files with the appropriate groups.

Project directories are located at:

/project/<PI_username>_<id>

<PI_username> is the username of the project owner and <id> is a 2 or 3 digit project ID number (e.g., ttrojan_123).

You can list your project directories and storage usage by entering the command myquota. You can also find the project ID and directory path on the project page in the User Portal.

Tip: You can create an alias command to quickly change to your project directory. For example, for the user ttrojan, adding the line alias cdp="cd /project/ttrojan_123" to their ~/.bashrc file will create the alias command cdp every time they log in, which can be used as a shortcut for quickly switching to their project directory.

To create your own subdirectory within your project’s directory, enter a command like the following:

mkdir /project/<PI_username>_<id>/<username>

If needed, you can change the permissions of this subdirectory using a chmod command.

We keep two weeks of snapshots for files in your project directories. You can think of these snapshots as semi-backups. If you accidentally delete some data, then we will be able to recover it if it was captured by a snapshot in the past two weeks. If data was created and deleted within a one-day period, between snapshots, then we will not be able to recover it. You should always keep extra backups of your important data and other files because of this.

If you need to recover a deleted file, please contact the CARC team by submitting a ticket and we will determine if a snapshot of the file exists.

0.0.4 Cryo-EM file system (/cryoem2)

Cryo-EM research has a dedicated /cryoem2 file system with 1.4 PB of space. This storage option functions similarly to the project file system and allocations are requested in the user portal in the same way. See the Request a New Allocation user guide for instructions. Cryo-EM storage allocations can be requested in 5 TB increments.

Cyro-EM storage space does not have a free tier like the project file system. The current rate for cryo-EM storage is $40/TB/year. Users can request a decrease in their project storage allocation to offset costs, if needed.

Each /cryoem2 project member has access to their group’s cryo-EM directory, where they can store data, scripts, and related files and install software. This directory should be used for most of your cryo-EM work, and it’s also where you can collaborate with your research group.

Cryo-EM directories are located at:

/cryoem2/<PI_username>_<id>

<PI_username> is the username of the project owner and <id> is a 2 or 3 digit project ID number (e.g., ttrojan_123).

You can list all your project directories and storage usage by entering the command myquota. You can also find the project ID and directory path on the project page in the user portal.

To create your own subdirectory within your project’s directory, enter a command like the following:

mkdir /cryoem2/<PI_username>_<id>/<username>

Data stored in /cryoem2 is not backed up. If needed, files stored here should be periodically backed up to decrease the risk of data loss.

0.0.5 Scratch file system (/scratch1)

The /scratch1 file system offers high-performance, parallel I/O, running ZFS/BeeGFS on dedicated storage machines. /scratch1 has a total capacity of 1.6 PB. Each CARC user gets a personal directory in /scratch1. The quota for the scratch directory is 10 TB of disk space and 20 million files.

The scratch file system is intended for temporary and intermediate files, so it is not backed up. If needed, files stored here should be periodically backed up to decrease the risk of data loss.

A data purge is conducted on the /scratch1 file systems every 6 months—or whenever the /scratch1 file system’s capacities are greater than 80%—to ensure fair and efficient use for all CARC users.

Your /scratch1 directory is located at:

/scratch1/<username>

Use the cds command to quickly change to your /scratch1 directory from another directory.

0.0.6 CARC Cold Storage system

The CARC Cold Storage System is intended for long-term (e.g., more then 5 yrs) storage of large data sets (TB to PB scale). CARC offers this system as an option for users to preserve and archive their data at a competitive rate. Users request a cold storage allocation in the user portal. See the Request a New Allocation user guide for instructions.

Cold storage is a fee-based service platform at a current rate of $20/TB/year.

Cold storage is not intended as a system for frequently backing up and retrieving data. Copying data into and out of cold storage is notably slower than the other available file systems.

CARC’s Cold Storage System preserves one copy of the stored data in one location with no regularly performed data integrity checks. PIs interested in multiple copies of their data and integrity checks should use the USC Digital Repository for their data archiving needs instead. Please submit a help ticket and the CARC team will assist you in facilitating this service.

More details on this system can be found in the Research Data Preservation user guides.

0.0.7 Using /tmp space

For temporary files, each compute node has a local /tmp directory, implemented as a RAM-based file system (tmpfs). However, they are restricted to 1 GB of space that is shared among jobs running on the same node. If more space is needed, you could instead use the local /dev/shm directory on each compute node for temporary files, also implemented as a RAM-based file system (tmpfs), but it is limited based on the amount of memory you request for your job. You can also use your scratch directories for temporary files, but read/write speeds may be slower because files are saved to disk.

In your scripts and programs, you can explicitly define temporary directories. Most applications will also save temporary files to the value of the TMPDIR environment variable, which by default is set to a unique /tmp directory for jobs. To automatically redirect your temporary files to another location, set the TMPDIR environment variable. For example:

export TMPDIR=/scratch1/<username>

Include this line in job scripts to set the TMPDIR for batch jobs.

0.0.8 Limits on disk space and number of files

CARC clusters are shared resources. As a result, there are quotas on usage to help ensure fair access to all USC researchers as well as to maintain the performance of the file systems. There are quotas on both the amount of disk space used and the number of files stored.

To check your quota, enter the myquota command. Under size, compare the results of used and quota or hard. If the value of used is close to the value of the other, you will need to delete, compress, consolidate, and/or archive files.

For project directories, PIs can also request an increase in disk space from the user portal. For more information on storage quotas and pricing, see the Project and Allocation Management pages.

For scratch directories, quotas can be temporarily increased by request. Please submit a request to the CARC team by submitting a ticket.

Please note that the quota for your home directory is fixed and unchangeable.

The chunk files section indicates the way your files and directories are divided up by the parallel file system, not the absolute number of files. Nonetheless, if you exceed the limit, you will need to reduce the number of files or request more space.

[ttrojan@discovery1 ~]$ myquota
/home1/ttrojan

TYPE        NAME           USED  QUOTA  OBJUSED  OBJQUOTA
POSIX User  ttrojan       1.03G   100G    37.5K     1.91M


/scratch1/ttrojan

      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
--------------|------||------------|------------||---------|---------
       ttrojan|375879||   13.20 GiB|   10.00 TiB||   162363| 20000000

/project/ttrojan_120

      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
--------------|------||------------|------------||---------|---------
   ttrojan_120| 32855||   16.92 GiB|    5.00 TiB||     1134| 30000000

If you exceed the limits, you will receive a “disk quota exceeded” or similar error.