Using GPUs

Some programs can take advantage of the unique hardware architecture in a graphics processing unit (GPU). GPUs can be used for specialized scientific computing work, including 3D modelling and machine learning. The CARC's Discovery cluster currently offers three different models of GPUs for use with your jobs. Condo Cluster Program users participating in the traditional purchase model have the option to include GPU nodes in their dedicated resources.

Requesting GPU resources

To request a GPU on Discovery's GPU partition, add the following line to your Slurm job script:

#SBATCH --partition=gpu 

Also add one of the following sbatch options to your Slurm job script to request the type and number of GPUs you'd like to use:

#SBATCH --gres=gpu:<number>


#SBATCH --gres=gpu:<gpu_type>:<number>


<number> is the number of GPUs per node requested, and
<gpu_type> is one of the following: k40, p100, or v100.

Use the chart below to determine which GPU type to specify:

GPU typeMax number of GPUs per nodeGPU model
k402NVIDIA Tesla K40
p1002NVIDIA Tesla P100
v1002NVIDIA Tesla V100

To see a live list of available GPUs, you can run the following command:

sinfo -o "%20P %5D %3c %6m %G" | grep -v null

The maximum number of GPUs that can be used at one time per user, in one job or across multiple jobs, is 36.

System Unit (SU) charges

Each job will subtract from your project's allocated System Units (SUs) depending on the types of resources you request. For GPUs, the SU charge varies depending on the GPU model. The following table shows the SU charge for different GPU models for one hour:

GPU typeSystem Unit (SU) Charge

Loading corresponding modules

GPU-enabled software often requires the CUDA Toolkit or the cuDNN library. These are available as modules and can be found by running:

module spider cuda
module spider cudnn

Or to search for modules that contain 'cud' in the name, run:

module spider cud

There are multiple versions available. To load the modules, for example, run:

module load cuda/10.1.243
module load cudnn/8.0.2-10.1

In addition, the newer NVIDIA HPC SDK with associated compilers, libraries, and other tools is available as a core module:

module load pgi-nvhpc

If you require a different version of one of these modules that is not currently installed on CARC systems, please submit a help ticket and we will install it for you.

Compiling programs

After a cuda module is loaded, you can then use the nvcc command to compile a CUDA C/C++ program:

nvcc -o program

Enter nvcc --help for more information on the available compiler options.

For the pgi-nvhpc module, in addition to nvcc, there are NVIDIA's HPC compilers nvc, nvc++, and nvfortran. For example, to compile a CUDA Fortran program:

nvfortran program.cuf -o program

One advantage of these HPC compilers is that they provide GPU-acceleration of standard C++ and Fortran programs that are not explicitly written for GPUs.

To compile programs on GPU nodes, you can use Slurm's salloc command for an interactive job:

salloc --gres=gpu:1

Example Slurm job script

The following is an example Slurm job script for GPU jobs:


#SBATCH --partition=gpu 
#SBATCH --gres=gpu:k40:1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16GB
#SBATCH --time=1:00:00
#SBATCH --account=<project_id>

module purge
module load gcc/8.3.0
module load cuda/10.1.243


Each line is described below:

Command or Slurm argumentMeaning
#!/bin/bashUse Bash to execute this script
#SBATCHSyntax that allows Slurm to read your requests (ignored by Bash)
--partition=gpuUse Discovery's GPU partition (not required for Endeavour jobs)
--gres=gpu:k40:1Reserve 1 K40 GPU
--nodes=1Use 1 node
--ntasks=1Run 1 task at a time
--cpus-per-task=8Reserve 8 CPUs for your exclusive use
--mem=16GBReserve 16 GB of memory for your exclusive use
--time=1:00:00Reserve resources described for 1 hour
--account=<project_id>Charge compute time to <project_id>. If not specified, you may use up the wrong PI's compute hours
module purgeClear environment modules
module load gcc/8.3.0Load the gcc compiler environment module
module load cuda/10.1.243Load the cuda environment module
./programRun program

Make sure to adjust the resources requested based on your needs, but keep in mind that requesting fewer resources should lead to less queue time for your job.

Additional resources

CUDA Toolkit

Back to top