Slurm Cheatsheet

Last updated March 05, 2024

A compact reference for Slurm commands and useful options, with examples.

0.0.1 Custom CARC Slurm commands

myaccount - View account information for user
nodeinfo - View node information by partition, CPU/GPU model, and state
noderes - View node resources
myqueue - View job queue information for user
jobqueue - View job queue information
jobhist - View compact history of user’s jobs
jobinfo - View detailed job information

Each command has an associated help page (e.g., jobinfo --help).

0.0.2 Job submission

salloc - Obtain a job allocation for interactive use (docs)
sbatch - Submit a batch script for later execution (docs)
srun - Obtain a job allocation and run an application (docs)

Option Description
-A, –account=<account> Account to be charged for resources used
-a, –array=<index> Job array specification (sbatch only)
-b, –begin=<time> Initiate job after specified time
-C, –constraint=<features> Required node features
–cpu-bind=<type> Bind tasks to specific CPUs (srun only)
-c, –cpus-per-task=<count> Number of CPUs required per task
-d, –dependency=<state:jobid> Defer job until specified jobs reach specified state
-m, –distribution=<method[:method]> Specify distribution methods for remote processes
-e, –error=<filename> File in which to store job error messages (sbatch and srun only)
-x, –exclude=<name> Specify host names to exclude from job allocation
–exclusive Reserve all CPUs and GPUs on allocated nodes
–export=<name=value> Export specified environment variables (e.g., all, none)
–gpus-per-task=<list> Number of GPUs required per task
-J, –job-name=<name> Job name
-l, –label Prepend task ID to output (srun only)
–mail-type=<type> E-mail notification type (e.g., begin, end, fail, requeue, all)
–mail-user=<address> E-mail address
–mem=<size>[units] Memory required per allocated node (e.g., 16GB)
–mem-per-cpu=<size>[units] Memory required per allocated CPU (e.g., 2GB)
-w, –nodelist=<hostnames> Specify host names to include in job allocation
-N, –nodes=<count> Number of nodes required for the job
-n, –ntasks=<count> Number of tasks to be launched
–ntasks-per-node=<count> Number of tasks to be launched per node
-o, –output=<filename> File in which to store job output (sbatch and srun only)
-p, –partition=<names> Partition in which to run the job
–signal=[B:]<num>[@time] Signal job when approaching time limit
-t, –time=<time> Limit for job run time

Examples:

# Request interactive job on debug node with 4 CPUs
salloc -p debug -c 4

# Request interactive job with V100 GPU
salloc -p gpu --ntasks=1 --gpus-per-task=v100:1

# Submit batch job
sbatch batch.job

0.0.3 Job management

squeue - View information about jobs in scheduling queue (docs)

Option Description
-A, –account=<account_list> Filter by accounts (comma-separated list)
-o, –format=<options> Output format to display
-j, –jobs=<job_id_list> Filter by job IDs (comma-separated list)
-l, –long Show more available information
–me Filter by your own jobs
-n, –name=<job_name_list> Filter by job names (comma-separated list)
-p, –partition=<partition_list> Filter by partitions (comma-separated list)
-P, –priority Sort jobs by priority
–start Show the expected start time and resources to be allocated for pending jobs
-t, –states=<state_list> Filter by states (comma-separated list)
-u, –user=<user_list> Filter by users (comma-separated list)

Examples:

# View your own job queue with estimated start times
squeue --me

# View own job queue with estimated start times for pending jobs
squeue --me --start

# View job queue on specified partition in long format
squeue -lp epyc-64

scancel - Signal or cancel jobs, job arrays, or job steps (docs)

Option Description
-A, –account=<account> Restrict to the specified account
-n, –name=<job_name> Restrict to jobs with specified name
-w, –nodelist=<hostnames> Restrict to jobs using the specified host names (comma-separated list)
-p, –partition=<partition> Restrict to the specified partition
-u, –user=<username> Restrict to the specified user

Examples:

# Cancel specific job
scancel 111111

# Cancel all your own jobs
scancel -u $USER

# Cancel your own jobs on specified partition
scancel -u $USER -p oneweek

# Cancel your own jobs in specified state
scancel -u $USER -t pending

sprio - View job scheduling priorities (docs)

Option Description
-o, –format=<options> Output format to display
-j, –jobs=<job_id_list> Filter by job IDs (comma-separated list)
-l, –long Show more available information
-n, –norm Show the normalized priority factors
-p, –partition=<partition_list> Filter by partitions (comma-separated list)
-u, –user=<user_list> Filter by users (comma-separated list)

Examples:

# View normalized job priorities for your own jobs
sprio -nu $USER

# View normalized job priorities for specified partition
sprio -nlp gpu

0.0.4 Job accounting

sacct - View job accounting data (docs)

Option Description
-A, –account=<account_list> Filter by accounts (comma-separated list)
-X, –allocations Show job allocations, but not job steps
-a, –allusers Show jobs for all users
-E, –endtime=<time> End of reporting period
-o, –format=<options> Output format to display
-j, –jobs=<job_id_list> Filter by job IDs (comma-separated list)
–name=<job_name_list> Filter by job names (comma-separated list)
-N, –nodelist=<hostnames> Filter by host names (comma-separated list)
-r, –partition=<partition_list> Filter by partitions (comma-separated list)
-S, –starttime=<time> Start of reporting period
-s, –state=<state_list> Filter by states (comma-separated list)
-u, –user=<user_list> Filter by users (comma-separated list)

Examples:

# View accounting data for specific job with custom format
sacct -j 111111 --format=jobid,jobname,submit,exitcode,elapsed,reqnodes,reqcpus,reqmem

# View compact accounting data for your own jobs for specified time range
sacct -X -S 2022-07-01 -E 2022-07-14

sacctmgr - View or modify account information (docs)

sacctmgr show associations
sacctmgr show user <username>

Option Description
cluster=<clusters> Filter by clusters (e.g., condo, discovery)
format=<options> Output format to display
user=<user_list> Filter by users (comma-separated list)

Examples:

# View your own associations with custom format
sacctmgr show associations user=$USER format=cluster,account,user,qos

sreport - Generate reports from accounting data (docs)

sreport cluster accountutilizationbyuser
sreport cluster userutilizationbyaccount
sreport job sizesbyaccount
sreport user topusage

Option Description
-T, –tres=<resource_list> Resources to report (e.g., cpu, gpu, mem, billing, all)
clusters=<clusters> Filter by clusters (e.g., condo, discovery)
end=<time> End of reporting period
format=<options> Output format to display
start=<time> Start of reporting period
accounts=<account_list> Filter by accounts (comma-separated list)
users=<user_list> Filter by users (comma-separated list)
nodes=<hostnames> Filter by host names (comma-separated list) (job reports only)
partitions=<partition_list> Filter by partitions (comma-separated list) (job reports only)
printjobcount Print number of jobs ran instead of time used (job reports only)

Examples:

# Report account utilization for specified user and time range
sreport cluster accountutilizationbyuser start=2022-07-01 end=2022-07-14 users=$USER

# Report account utilization by user for specified account and time range
sreport cluster userutilizationbyaccount start=2022-07-01 end=2022-07-14 accounts=ttrojan_123

# Report job sizes for specified partition
sreport job sizesbyaccount partitions=epyc-64 printjobcount

# Report top users for specified account and time range
sreport user topusage start=2022-07-01 end=2022-07-14 accounts=ttrojan_123

0.0.5 Partition and node information

sinfo - View information about nodes and partitions (docs)

Option Description
-o, –format=<options> Output format to display
-l, –long Show more available information
-N, –Node Show information in a node-oriented format
-n, –nodes=<hostnames> Filter by host names (comma-separated list)
-p, –partition=<partition_list> Filter by partitions (comma-separated list)
-t, –states=<state_list> Filter by node states (comma-separated list)
-s, –summarize Show summary information

Examples:

# View all partitions and nodes by state
sinfo

# Summarize node states by partition
sinfo -s

# View nodes in idle state
sinfo --states=idle

# View nodes for specified partition in long, node-oriented format
sinfo -lNp epyc-64

scontrol - View or modify configuration and state (docs)

scontrol show partition <partition>
scontrol show node <hostname>
scontrol show job <job_id>

Option Description
-d, –details Show more details
-o, –oneliner Show information on one line

scontrol hold <job_list>
scontrol release <job_list>
scontrol show hostnames

Examples:

# View information for specified partition
scontrol show partition epyc-64

# View information for specified node
scontrol show node b22-01

# View detailed information for running job
scontrol show job 111111 -d

# View hostnames for job (one name per line)
scontrol show hostnames

0.0.6 Output environment variables

Variable Description
SLURM_ARRAY_TASK_COUNT Number of tasks in job array
SLURM_ARRAY_TASK_ID Job array task ID
SLURM_CPUS_PER_TASK Number of CPUs requested per task
SLURM_JOB_ACCOUNT Account used for job
SLURM_JOB_ID Job ID
SLURM_JOB_NAME Job Name
SLURM_JOB_NODELIST List of nodes allocated to job
SLURM_JOB_NUM_NODES Number of nodes allocated to job
SLURM_JOB_PARTITION Partition used for job
SLURM_NTASKS Number of job tasks
SLURM_PROCID MPI rank of current process
SLURM_SUBMIT_DIR Directory from which job was submitted
SLURM_TASKS_PER_NODE Number of job tasks per node

Examples:

# Specify OpenMP threads
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Specify MPI tasks
srun -n $SLURM_NTASKS ./mpi_program