Discovery Resource Overview

Last updated March 04, 2024

CARC’s general-use HPC cluster Discovery has over 20,000 cores across 500 compute nodes available for researchers to use.

Discovery is a shared resource, so there are limits in place on size and duration of jobs. This ensures that everyone has a chance to run jobs. For details on the limits, see Running Jobs.

0.0.1 Partitions and compute nodes

There are a few Slurm partitions available on Discovery, each with a separate job queue. These are general-use partitions available to all researchers. The table below describes the intended purpose for each partition:

Partition Purpose
main Serial and small-to-medium parallel jobs (single node or multiple nodes)
epyc-64 Serial and medium-to-large parallel jobs (single node or multiple nodes)
gpu Jobs requiring GPU nodes
oneweek Long-running jobs (up to 7 days)
largemem Jobs requiring larger amounts of memory (up to 1 TB)
debug Short-running jobs for debugging purposes

Each partition has a different mix of compute nodes. The table below describes the available nodes by partition. Each node typically has two sockets with one multi-core processor each and an equal number of cores per processor. In the table below, the CPUs/node column refers to logical CPUs such that 1 logical CPU = 1 core = 1 thread.

Partition CPU model CPU frequency CPUs/node GPU model GPUs/node Memory/node Nodes
main epyc-7513 2.60 GHz 64 256 GB 61
main epyc-7542 2.90 GHz 64 256 GB 32
main xeon-2640v3 2.60 GHz 16 64 GB 32
main xeon-2640v4 2.40 GHz 20 64 GB 16
main xeon-4116 2.10 GHz 24 94 GB 39
main xeon-4116 2.10 GHz 24 192 GB 29
main xeon-2640v4 2.40 GHz 20 K40 2 64 GB 45
epyc-64 epyc-7513 2.60 GHz 64 256 GB 78
gpu xeon-6130 2.10 GHz 32 V100 2 191 GB 29
gpu xeon-2640v4 2.40 GHz 20 P100 2 128 GB 38
gpu epyc-7282 2.80 GHz 32 A40 2 256 GB 12
gpu epyc-7313 3.00 GHz 32 A40 2 256 GB 17
gpu epyc-7513 2.60 GHz 64 A100 (40 GB) 2 256 GB 12
gpu epyc-7513 2.60 GHz 64 A100 (80 GB) 2 256 GB 12
oneweek xeon-4116 2.10 GHz 24 192 GB 10
oneweek xeon-2640v4 2.40 GHz 20 64 GB 35
largemem epyc-7513 2.60 GHz 64 1024 GB 4
debug xeon-4116 2.10 GHz 24 192 GB 2
debug xeon-2640v4 2.60 GHz 20 P100 2 128 GB 1
debug epyc-7313 2.60 GHz 32 A40 2 256 GB 1

Use the nodeinfo command for similar real-time information.

There are a few commands you can use for more detailed node information. For CPUs, the lscpu command will provide information about CPUs. For nodes with GPUs, the nvidia-smi command and its various options will provide information about GPUs. Alternatively, after module load nvhpc, use the nvaccelinfo command to view information about GPUs. After module load gcc/11.3.0 hwloc, use the lstopo command to view a node’s topology.

0.0.2 CPU microarchitectures and instruction set extensions

Different CPU models also offer different CPU instruction set extensions. Compiled programs can use these extensions to boost performance. The following is a summary table:

CPU model Microarchitecture Partitions AVX AVX2 AVX-512
xeon-2640v3 haswell main, debug
xeon-2640v4 broadwell main, gpu, debug
xeon-4116 skylake_avx512 main, oneweek, debug
xeon-6130 skylake_avx512 gpu
epyc-7542 zen2 epyc-64
epyc-7513 zen3 epyc-64, gpu, largemem
epyc-7282 zen2 gpu
epyc-7313 zen3 gpu

Use the lscpu command while logged in to a compute node to list all available CPU flags.

0.0.3 GPU specifications

The following is a summary table for GPU specifications:

GPU Model Partitions Architecture Memory Memory Bandwidth Base Clock Speed CUDA Cores Tensor Cores Single Precision Performance (FP32) Double Precision Performance (FP64)
A100 gpu ampere 80 GB 1.9 TB/s 1065 MHz 6912 432 19.5 TFLOPS 9.7 TFLOPS
A100 gpu ampere 40 GB 1.6 TB/s 765 MHz 6912 432 19.5 TFLOPS 9.7 TFLOPS
A40 gpu ampere 48 GB 696 GB/s 1305 MHz 10752 336 37.4 TFLOPS 584.6 GFLOPS
V100 gpu volta 32 GB 900 GB/s 1230 MHz 5120 640 14 TFLOPS 7 TFLOPS
P100 gpu, debug pascal 16 GB 732 GB/s 1189 MHz 3584 n/a 9.3 TFLOPS 4.7 TFLOPS
K40 main, debug kepler 12 GB 288 GB/s 745 MHz 2880 n/a 4.29 TFLOPS 1.43 TFLOPS