Stata is a proprietary software package for statistics and data science.
Using Stata on CARC systems
You can use Stata in either interactive or batch modes. For either mode, first load the corresponding software module:
module load stata
Currently, this module loads Stata 16.1. If needed, you can use previous versions by entering the command
version <#> within Stata (e.g.,
version 13). To check the version currently being used, enter
version within Stata. For reproducibility purposes, it is also a good practice to include a
version statement like this in your Stata scripts (do-files), based on the version used to develop the script.
The Stata module provides multiple executables, but for HPC uses you will most likely want to use the
stata-mp executable when using Stata on CARC systems. The MP version of Stata enables use of large datasets as well as multiple cores for parallel computation. The current Stata license allows up to 8 cores.
Please note that we do not currently support the use of the Stata GUI on CARC systems.
Running Stata in interactive mode
After loading the module, to run Stata interactively on the login node, simply enter
stata-mp and this will start a new Stata session. Using Stata on the login node should be reserved for installing packages and non-intensive work. Conversely, using Stata interactively on a compute node is necessary for more intensive work like exploring data, testing models, and debugging.
A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g.,
endeavour.usc.edu). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.
To run Stata interactively on a compute node, first use Slurm's
salloc command to reserve job resources on a node:
user@discovery1:~$ salloc --time=2:00:00 --cpus-per-task=8 --mem=16GB --account=<project_ID> salloc: Pending job allocation 24316 salloc: job 24316 queued and waiting for resources salloc: job 24316 has been allocated resources salloc: Granted job allocation 24316 salloc: Waiting for resource configuration salloc: Nodes d05-08 are ready for job
Make sure to change the resource requests (the
--time=2:00:00 --cpus-per-task=8 --mem=16GB --account=<project_ID> part after your
salloc command) as needed, such as the number of cores and memory required. Also make sure to substitute your project ID, which is of the form
<PI_username>_<id>. You can find your project ID in the CARC User Portal.
Once you are granted the resources and logged in to a compute node, load the module and then enter
user@d05-08:~$ module load gcc/8.3.0 stata user@d05-08:~$ stata-mp ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 16.1 Copyright 1985-2019 StataCorp LLC Statistics/Data analysis StataCorp 4905 Lakeway Drive MP - Parallel Edition College Station, Texas 77845 USA 800-STATA-PC https://www.stata.com 979-696-4600 email@example.com 979-696-4601 (fax) Stata license: 20-user 8-core network, expiring 30 Jun 2022 Serial number: 555555555555 Licensed to: Center for Advanced Research Computing University of Southern California Notes: 1. Unicode is supported; see help unicode_advice. 2. More than 2 billion observations are allowed; see help obs_advice. 3. Maximum number of variables is set to 5,000; see help set_maxvar. .
Notice that the shell prompt changes from
user@<nodename> to indicate that you are now on a compute node (e.g.,
To run Stata scripts (do-files) from within Stata, use the
do command (e.g.,
do script.do). Alternatively, to run Stata scripts from the shell, use the
stata-mp -b do <script> command (e.g.,
stata-mp -b do script.do).
To exit the node and relinquish the job resources, enter
exit to exit Stata and then enter
exit again in the shell. This will return you to the login node:
. exit user@d05-08:~$ exit exit salloc: Relinquishing job allocation 24316 user@discovery1:~$
Please note that compute nodes do not have access to the internet, so any data downloads or package installations should first be completed on the login or transfer nodes.
Running Stata in batch mode
In order to submit jobs to the Slurm job scheduler, you will need to use Stata in batch mode. There are a few steps to follow:
- Create a Stata script (do-file)
- Create a Slurm job script that runs the Stata script
- Submit the job script to the job scheduler using
Your Stata script should consist of the sequence of Stata commands needed for your analysis.
A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job running Stata, a Slurm job script should look something like the following:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 #SBATCH --mem=16GB #SBATCH --time=1:00:00 #SBATCH --account=<project_ID> module purge module load gcc/8.3.0 stata stata-mp -b do script.do
Each line is described below:
|Command or Slurm argument||Meaning|
|Use Bash to execute this script|
|Syntax that allows Slurm to read your requests (ignored by Bash)|
|Use 1 compute node|
|Run 1 task (e.g., running a Stata script)|
|Reserve 8 CPUs for your exclusive use|
|Reserve 16 GB of memory for your exclusive use|
|Reserve resources described for 1 hour|
|Charge compute time to <project_ID>. If not specified, you may use up the wrong PI's compute hours|
|Clear environment modules|
|Load the |
Make sure to adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. Please note that the current Stata license limits you to a maximum of 8 CPUs.
You can develop Stata scripts and job scripts on your local computer and then transfer them to CARC storage, or you can use one of the available text editor modules (e.g.,
micro) to develop them directly on the cluster.
Save the job script as
stata.job, for example, and then submit it to the job scheduler with Slurm's
user@discovery1:~$ sbatch stata.job Submitted batch job 170554
To check the status of your job, enter
squeue --me. For example:
user@discovery1:~$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 170554 main stata.jo user R 3:07 1 d11-04
If there is no job status listed, then this means the job has completed.
The output of your Stata script will be saved to a log file, not the Slurm output file. In batch mode, Stata will automatically create a plain-text log file in the current working directory (e.g.,
script.log). As a result, you do not need to include log commands in your scripts. To view the contents of the log file, enter
less <script>.log, and then enter
q to exit the viewer.
For more information on running and monitoring jobs, see the Running Jobs guide.
Installing Stata packages
User-developed Stata packages can be installed from a login node using one of the Stata commands
net install <package> or
ssc install <package>, depending on the source of the package. These packages will be installed in your home directory by default.
Storing temporary files
Loading the Stata module will automatically change the
STATATMP directory to a
/scratch2/<username>/stata directory, used for storing temporary files. To use a different directory, set the
STATATMP environment variable in your job script after loading the module:
<dir> is the directory of your choice. You will get the best performance by using a directory in one of your /project, /scratch, or /scratch2 directories.
Parallel programming with Stata
If using the
stata-mp executable, Stata will automatically use the requested number of cores from Slurm's
--cpus-per-task option. This implicit parallelism does not require any changes to your code. The current Stata license allows up to 8 cores. For more information about
stata-mp, see Stata's performance report.
There are also user-developed packages for Stata that provide additional capabilities. For example, the
parallel package implements parallel for loops: https://github.com/gvegayon/parallel. In addition, the
gtools package provides faster alternatives to some Stata commands when working with big data: https://github.com/mcaceresb/stata-gtools
On Linux, like CARC systems, it is also a good practice to set maximum memory use in your Stata scripts. For example:
set max_memory 16g
The value should be equal to or less than the total memory requested with Slurm's
If you have questions about or need help with Stata, please submit a help ticket and we will assist you.