Slurm Cheatsheet
A compact reference for Slurm commands and useful options, with examples.
0.0.1 Custom CARC Slurm commands
myaccount - View account information for user
noderes - View node resources
jobqueue - View job queue information
jobhist - View compact history of user’s jobs
jobinfo - View detailed job information
Each command has an associated help page (e.g., jobinfo --help
).
0.0.2 Job submission
salloc - Obtain a job allocation for interactive use (docs)
sbatch - Submit a batch script for later execution (docs)
srun - Obtain a job allocation and run an application (docs)
Option | Description |
---|---|
-A, –account=<account> | Account to be charged for resources used |
-a, –array=<index> | Job array specification (sbatch only) |
-b, –begin=<time> | Initiate job after specified time |
-C, –constraint=<features> | Required node features |
–cpu-bind=<type> | Bind tasks to specific CPUs (srun only) |
-c, –cpus-per-task=<count> | Number of CPUs required per task |
-d, –dependency=<state:jobid> | Defer job until specified jobs reach specified state |
-m, –distribution=<method[:method]> | Specify distribution methods for remote processes |
-e, –error=<filename> | File in which to store job error messages (sbatch and srun only) |
-x, –exclude=<name> | Specify host names to exclude from job allocation |
–exclusive | Reserve all CPUs and GPUs on allocated nodes |
–export=<name=value> | Export specified environment variables (e.g., all, none) |
–gpus-per-task=<list> | Number of GPUs required per task |
-J, –job-name=<name> | Job name |
-l, –label | Prepend task ID to output (srun only) |
–mail-type=<type> | E-mail notification type (e.g., begin, end, fail, requeue, all) |
–mail-user=<address> | E-mail address |
–mem=<size>[units] | Memory required per allocated node (e.g., 16GB) |
–mem-per-cpu=<size>[units] | Memory required per allocated CPU (e.g., 2GB) |
-w, –nodelist=<hostnames> | Specify host names to include in job allocation |
-N, –nodes=<count> | Number of nodes required for the job |
-n, –ntasks=<count> | Number of tasks to be launched |
–ntasks-per-node=<count> | Number of tasks to be launched per node |
-o, –output=<filename> | File in which to store job output (sbatch and srun only) |
-p, –partition=<names> | Partition in which to run the job |
–signal=[B:]<num>[@time] | Signal job when approaching time limit |
-t, –time=<time> | Limit for job run time |
Examples:
# Request interactive job on debug node with 4 CPUs
salloc -p debug -c 4
# Request interactive job with V100 GPU
salloc -p gpu --ntasks=1 --gpus-per-task=v100:1
# Submit batch job
sbatch batch.job
0.0.3 Job management
squeue - View information about jobs in scheduling queue (docs)
Option | Description |
---|---|
-A, –account=<account_list> | Filter by accounts (comma-separated list) |
-o, –format=<options> | Output format to display |
-j, –jobs=<job_id_list> | Filter by job IDs (comma-separated list) |
-l, –long | Show more available information |
–me | Filter by your own jobs |
-n, –name=<job_name_list> | Filter by job names (comma-separated list) |
-p, –partition=<partition_list> | Filter by partitions (comma-separated list) |
-P, –priority | Sort jobs by priority |
–start | Show the expected start time and resources to be allocated for pending jobs |
-t, –states=<state_list> | Filter by states (comma-separated list) |
-u, –user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View your own job queue with estimated start times
squeue --me
# View own job queue with estimated start times for pending jobs
squeue --me --start
# View job queue on specified partition in long format
squeue -lp epyc-64
scancel - Signal or cancel jobs, job arrays, or job steps (docs)
Option | Description |
---|---|
-A, –account=<account> | Restrict to the specified account |
-n, –name=<job_name> | Restrict to jobs with specified name |
-w, –nodelist=<hostnames> | Restrict to jobs using the specified host names (comma-separated list) |
-p, –partition=<partition> | Restrict to the specified partition |
-u, –user=<username> | Restrict to the specified user |
Examples:
# Cancel specific job
scancel 111111
# Cancel all your own jobs
scancel -u $USER
# Cancel your own jobs on specified partition
scancel -u $USER -p oneweek
# Cancel your own jobs in specified state
scancel -u $USER -t pending
sprio - View job scheduling priorities (docs)
Option | Description |
---|---|
-o, –format=<options> | Output format to display |
-j, –jobs=<job_id_list> | Filter by job IDs (comma-separated list) |
-l, –long | Show more available information |
-n, –norm | Show the normalized priority factors |
-p, –partition=<partition_list> | Filter by partitions (comma-separated list) |
-u, –user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View normalized job priorities for your own jobs
sprio -nu $USER
# View normalized job priorities for specified partition
sprio -nlp gpu
0.0.4 Job accounting
sacct - View job accounting data (docs)
Option | Description |
---|---|
-A, –account=<account_list> | Filter by accounts (comma-separated list) |
-X, –allocations | Show job allocations, but not job steps |
-a, –allusers | Show jobs for all users |
-E, –endtime=<time> | End of reporting period |
-o, –format=<options> | Output format to display |
-j, –jobs=<job_id_list> | Filter by job IDs (comma-separated list) |
–name=<job_name_list> | Filter by job names (comma-separated list) |
-N, –nodelist=<hostnames> | Filter by host names (comma-separated list) |
-r, –partition=<partition_list> | Filter by partitions (comma-separated list) |
-S, –starttime=<time> | Start of reporting period |
-s, –state=<state_list> | Filter by states (comma-separated list) |
-u, –user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View accounting data for specific job with custom format
sacct -j 111111 --format=jobid,jobname,submit,exitcode,elapsed,reqnodes,reqcpus,reqmem
# View compact accounting data for your own jobs for specified time range
sacct -X -S 2022-07-01 -E 2022-07-14
sacctmgr - View or modify account information (docs)
sacctmgr show associations
sacctmgr show user <username>
Option | Description |
---|---|
cluster=<clusters> | Filter by clusters (e.g., condo, discovery) |
format=<options> | Output format to display |
user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View your own associations with custom format
sacctmgr show associations user=$USER format=cluster,account,user,qos
sreport - Generate reports from accounting data (docs)
sreport cluster accountutilizationbyuser
sreport cluster userutilizationbyaccount
sreport job sizesbyaccount
sreport user topusage
Option | Description |
---|---|
-T, –tres=<resource_list> | Resources to report (e.g., cpu, gpu, mem, billing, all) |
clusters=<clusters> | Filter by clusters (e.g., condo, discovery) |
end=<time> | End of reporting period |
format=<options> | Output format to display |
start=<time> | Start of reporting period |
accounts=<account_list> | Filter by accounts (comma-separated list) |
users=<user_list> | Filter by users (comma-separated list) |
nodes=<hostnames> | Filter by host names (comma-separated list) (job reports only) |
partitions=<partition_list> | Filter by partitions (comma-separated list) (job reports only) |
printjobcount | Print number of jobs ran instead of time used (job reports only) |
Examples:
# Report account utilization for specified user and time range
sreport cluster accountutilizationbyuser start=2022-07-01 end=2022-07-14 users=$USER
# Report account utilization by user for specified account and time range
sreport cluster userutilizationbyaccount start=2022-07-01 end=2022-07-14 accounts=ttrojan_123
# Report job sizes for specified partition
sreport job sizesbyaccount partitions=epyc-64 printjobcount
# Report top users for specified account and time range
sreport user topusage start=2022-07-01 end=2022-07-14 accounts=ttrojan_123
0.0.5 Partition and node information
sinfo - View information about nodes and partitions (docs)
Option | Description |
---|---|
-o, –format=<options> | Output format to display |
-l, –long | Show more available information |
-N, –Node | Show information in a node-oriented format |
-n, –nodes=<hostnames> | Filter by host names (comma-separated list) |
-p, –partition=<partition_list> | Filter by partitions (comma-separated list) |
-t, –states=<state_list> | Filter by node states (comma-separated list) |
-s, –summarize | Show summary information |
Examples:
# View all partitions and nodes by state
sinfo
# Summarize node states by partition
sinfo -s
# View nodes in idle state
sinfo --states=idle
# View nodes for specified partition in long, node-oriented format
sinfo -lNp epyc-64
scontrol - View or modify configuration and state (docs)
scontrol show partition <partition>
scontrol show node <hostname>
scontrol show job <job_id>
Option | Description |
---|---|
-d, –details | Show more details |
-o, –oneliner | Show information on one line |
scontrol hold <job_list>
scontrol release <job_list>
scontrol show hostnames
Examples:
# View information for specified partition
scontrol show partition epyc-64
# View information for specified node
scontrol show node b22-01
# View detailed information for running job
scontrol show job 111111 -d
# View hostnames for job (one name per line)
scontrol show hostnames
0.0.6 Output environment variables
Variable | Description |
---|---|
SLURM_ARRAY_TASK_COUNT | Number of tasks in job array |
SLURM_ARRAY_TASK_ID | Job array task ID |
SLURM_CPUS_PER_TASK | Number of CPUs requested per task |
SLURM_JOB_ACCOUNT | Account used for job |
SLURM_JOB_ID | Job ID |
SLURM_JOB_NAME | Job Name |
SLURM_JOB_NODELIST | List of nodes allocated to job |
SLURM_JOB_NUM_NODES | Number of nodes allocated to job |
SLURM_JOB_PARTITION | Partition used for job |
SLURM_NTASKS | Number of job tasks |
SLURM_PROCID | MPI rank of current process |
SLURM_SUBMIT_DIR | Directory from which job was submitted |
SLURM_TASKS_PER_NODE | Number of job tasks per node |
Examples:
# Specify OpenMP threads
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Specify MPI tasks
srun -n $SLURM_NTASKS ./mpi_program