Discovery Resource Overview

Last updated April 02, 2025

Table of Contents

0.0.1 Slurm partitions
0.0.2 Node specifications
0.0.3 GPU specifications

CARC’s general-use HPC cluster Discovery has over 20,000 cores across 500 compute nodes available for researchers to use.

Discovery is a shared resource, so there are limits in place on size and duration of jobs. This ensures that everyone has a chance to run jobs. For details on the limits, see Running Jobs.

0.0.1 Slurm partitions

There are different Slurm partitions available on Discovery for different purposes and with different types of compute nodes. Each partition has a separate job queue. These are general-use partitions available to all researchers. The table below describes the intended purpose for each partition:

Partition	Purpose
main	Serial and small-to-medium parallel jobs
epyc-64	Serial and medium-to-large parallel jobs
gpu	Jobs requiring GPUs
oneweek	Long-running jobs (up to 7 days)
largemem	Jobs requiring larger amounts of memory (up to 1.5 TB)
debug	Short-running jobs for debugging purposes

0.0.2 Node specifications

Each partition has a different mix of compute nodes. The table below describes the available node types and which partitions they are located in. Each node typically has two sockets with one multi-core processor each and an equal number of cores per processor. There are varying numbers of nodes per partition and this may change over time. In the table below, the CPUs/node column refers to logical CPUs such that 1 logical CPU = 1 core = 1 thread.

CPU model	Microarchitecture	CPU frequency	CPUs/node	Memory/node	GPU model	GPUs/node	Partitions
epyc-9534	zen4	2.45 GHz	64	748 GB	L40S	3	gpu
epyc-9354	zen4	3.25 GHz	64	1498 GB	—	—	largemem
epyc-7513	zen3	2.60 GHz	64	248 GB	—	—	main, epyc-64, largemem
epyc-7513	zen3	2.60 GHz	64	248 GB	A100	2	gpu
epyc-7313	zen3	3.00 GHz	64	248 GB	A40	2	gpu, debug
epyc-7542	zen2	2.90 GHz	64	248 GB	—	—	main
epyc-7282	zen2	2.80 GHz	32	248 GB	A40	2	gpu
xeon-6130	skylake_avx512	2.10 GHz	32	184 GB	V100	2	gpu
xeon-4116	skylake_avx512	2.10 GHz	24	185 GB	—	—	main, oneweek, debug
xeon-4116	skylake_avx512	2.10 GHz	24	89 GB	—	—	main, oneweek
xeon-2640v4	broadwell	2.40 GHz	20	123 GB	P100	2	gpu, debug
xeon-2640v4	broadwell	2.40 GHz	20	60 GB	—	—	oneweek

Use the noderes -c command to see a list of nodes and their configured resources. To see this information by partition, add the partition filter option. For example, to only see nodes in the gpu partition, use the command noderes -c -p gpu. For help information, use the noderes -h command.

There are a few commands you can use for more detailed node information. For CPUs, the lscpu command will provide information about CPUs. For nodes with GPUs, the nvidia-smi command and its various options will provide information about GPUs. Alternatively, after module load nvhpc, use the nvaccelinfo command to view information about GPUs. After module load gcc/13.3.0 hwloc, use the lstopo command to view a node’s topology.

0.0.3 GPU specifications

The following is a summary table for GPU specifications:

GPU Model	Partitions	Architecture	Memory	Memory Bandwidth	Base Clock Speed	CUDA Cores	Tensor Cores	Single Precision Performance (FP32)	Double Precision Performance (FP64)
L40S	gpu	ada lovelace	48 GB	864 GB/s	1110 MHz	18176	568	91.6 TFLOPS	1.4 TFLOPS
A100	gpu	ampere	80 GB	1.9 TB/s	1065 MHz	6912	432	19.5 TFLOPS	9.7 TFLOPS
A100	gpu	ampere	40 GB	1.6 TB/s	765 MHz	6912	432	19.5 TFLOPS	9.7 TFLOPS
A40	gpu, debug	ampere	48 GB	696 GB/s	1305 MHz	10752	336	37.4 TFLOPS	584.6 GFLOPS
V100	gpu	volta	32 GB	900 GB/s	1230 MHz	5120	640	14 TFLOPS	7 TFLOPS
P100	gpu, debug	pascal	16 GB	732 GB/s	1189 MHz	3584	n/a	9.3 TFLOPS	4.7 TFLOPS