Discovery Resource Overview

Last updated April 02, 2025

CARC’s general-use HPC cluster Discovery has over 20,000 cores across 500 compute nodes available for researchers to use.

Discovery is a shared resource, so there are limits in place on size and duration of jobs. This ensures that everyone has a chance to run jobs. For details on the limits, see Running Jobs.

0.0.1 Slurm partitions

There are different Slurm partitions available on Discovery for different purposes and with different types of compute nodes. Each partition has a separate job queue. These are general-use partitions available to all researchers. The table below describes the intended purpose for each partition:

Partition Purpose
main Serial and small-to-medium parallel jobs
epyc-64 Serial and medium-to-large parallel jobs
gpu Jobs requiring GPUs
oneweek Long-running jobs (up to 7 days)
largemem Jobs requiring larger amounts of memory (up to 1.5 TB)
debug Short-running jobs for debugging purposes

0.0.2 Node specifications

Each partition has a different mix of compute nodes. The table below describes the available node types and which partitions they are located in. Each node typically has two sockets with one multi-core processor each and an equal number of cores per processor. There are varying numbers of nodes per partition and this may change over time. In the table below, the CPUs/node column refers to logical CPUs such that 1 logical CPU = 1 core = 1 thread.

CPU model Microarchitecture CPU frequency CPUs/node Memory/node GPU model GPUs/node Partitions
epyc-9534 zen4 2.45 GHz 64 748 GB L40S 3 gpu
epyc-9354 zen4 3.25 GHz 64 1498 GB largemem
epyc-7513 zen3 2.60 GHz 64 248 GB main, epyc-64, largemem
epyc-7513 zen3 2.60 GHz 64 248 GB A100 2 gpu
epyc-7313 zen3 3.00 GHz 64 248 GB A40 2 gpu, debug
epyc-7542 zen2 2.90 GHz 64 248 GB main
epyc-7282 zen2 2.80 GHz 32 248 GB A40 2 gpu
xeon-6130 skylake_avx512 2.10 GHz 32 184 GB V100 2 gpu
xeon-4116 skylake_avx512 2.10 GHz 24 185 GB main, oneweek, debug
xeon-4116 skylake_avx512 2.10 GHz 24 89 GB main, oneweek
xeon-2640v4 broadwell 2.40 GHz 20 123 GB P100 2 gpu, debug
xeon-2640v4 broadwell 2.40 GHz 20 60 GB oneweek

Use the noderes -c command to see a list of nodes and their configured resources. To see this information by partition, add the partition filter option. For example, to only see nodes in the gpu partition, use the command noderes -c -p gpu. For help information, use the noderes -h command.

There are a few commands you can use for more detailed node information. For CPUs, the lscpu command will provide information about CPUs. For nodes with GPUs, the nvidia-smi command and its various options will provide information about GPUs. Alternatively, after module load nvhpc, use the nvaccelinfo command to view information about GPUs. After module load gcc/13.3.0 hwloc, use the lstopo command to view a node’s topology.

0.0.3 GPU specifications

The following is a summary table for GPU specifications:

GPU Model Partitions Architecture Memory Memory Bandwidth Base Clock Speed CUDA Cores Tensor Cores Single Precision Performance (FP32) Double Precision Performance (FP64)
L40S gpu ada lovelace 48 GB 864 GB/s 1110 MHz 18176 568 91.6 TFLOPS 1.4 TFLOPS
A100 gpu ampere 80 GB 1.9 TB/s 1065 MHz 6912 432 19.5 TFLOPS 9.7 TFLOPS
A100 gpu ampere 40 GB 1.6 TB/s 765 MHz 6912 432 19.5 TFLOPS 9.7 TFLOPS
A40 gpu, debug ampere 48 GB 696 GB/s 1305 MHz 10752 336 37.4 TFLOPS 584.6 GFLOPS
V100 gpu volta 32 GB 900 GB/s 1230 MHz 5120 640 14 TFLOPS 7 TFLOPS
P100 gpu, debug pascal 16 GB 732 GB/s 1189 MHz 3584 n/a 9.3 TFLOPS 4.7 TFLOPS