Discovery Resource Overview
CARC's general-use HPC cluster Discovery has over 21,000 cores across 600 compute nodes available for researchers to use.
Note: Discovery is a shared resource, so there are limits in place on size and duration of jobs. This ensures that everyone has a chance to run jobs. For details on the limits, see Running Jobs.
For general CARC system specifications, see our High Performance Computing page.
Partitions and compute nodes
There are a few Slurm partitions available on Discovery, each with a separate job queue. These are general-use partitions available to all researchers. The table below describes the intended purpose for each partition:
Partition | Purpose |
---|---|
main | Serial and small-to-medium parallel jobs (single node or multiple nodes) |
epyc-64 | Medium-to-large parallel jobs (single node or multiple nodes) |
gpu | Jobs requiring GPU nodes |
oneweek | Long-running jobs (up to 7 days) |
largemem | Jobs requiring larger amounts of memory (up to 1 TB) |
debug | Short-running jobs for debugging purposes |
Each partition has a different mix of compute nodes. The table below describes the available nodes by partition. Each node typically has two sockets with one multi-core processor each and an equal number of cores per processor. In the table below, the CPUs/node column refers to logical CPUs such that 1 logical CPU = 1 core = 1 thread.
Partition | CPU Model | CPUs/node | GPU model | GPUs/node | Memory (GB)/node | Nodes |
---|---|---|---|---|---|---|
main | xeon-4116 | 24 | - | - | 94 | 39 |
main | xeon-4116 | 24 | - | - | 192 | 29 |
main | xeon-2640v4 | 20 | - | - | 64 | 16 |
main | xeon-2640v4 | 20 | k40 | 2 | 64 | 45 |
main | xeon-2640v3 | 16 | - | - | 64 | 32 |
epyc-64 | epyc-7542 | 64 | - | - | 256 | 32 |
epyc-64 | epyc-7513 | 64 | - | - | 256 | 139 |
gpu | xeon-6130 | 32 | v100 | 2 | 191 | 29 |
gpu | xeon-2640v4 | 20 | p100 | 2 | 128 | 38 |
gpu | epyc-7513 | 64 | a100 (80 GB) | 2 | 256 | 12 |
gpu | epyc-7513 | 64 | a100 (40 GB) | 2 | 256 | 12 |
gpu | epyc-7313 | 32 | a40 | 2 | 256 | 17 |
gpu | epyc-7282 | 32 | a40 | 2 | 256 | 12 |
oneweek | xeon-4116 | 24 | - | - | 192 | 10 |
oneweek | xeon-2640v4 | 20 | - | - | 64 | 35 |
largemem | epyc-7513 | 64 | - | - | 1024 | 4 |
debug | xeon-4116 | 24 | - | - | 192 | 2 |
debug | xeon-2640v4 | 20 | p100 | 2 | 128 | 1 |
debug | epyc-7313 | 32 | a40 | 2 | 256 | 1 |
This table was last updated on February, 26, 2024.
Note: Use the
nodeinfo
andgpuinfo
commands for similar real-time information.
There are a few commands you can use for more detailed node information. For CPUs, the lscpu
command will provide information about CPUs. For nodes with GPUs, the nvidia-smi
command and its various options will provide information about GPUs. Alternatively, after module load nvhpc
, use the nvaccelinfo
command to view information about GPUs. After module load gcc/11.3.0 hwloc
, use the lstopo
command to view a node's topology.
CPU microarchitectures and instruction set extensions
Different CPU models also offer different CPU instruction set extensions. Compiled programs can use these extensions to boost performance. The following is a summary table:
CPU model | Microarchitecture | Partitions | AVX | AVX2 | AVX-512 |
---|---|---|---|---|---|
xeon-2650v2 | ivybridge | oneweek | ✓ | ||
xeon-2640v3 | haswell | main, debug | ✓ | ✓ | |
xeon-2640v4 | broadwell | main, gpu, debug | ✓ | ✓ | |
xeon-4116 | skylake_avx512 | main | ✓ | ✓ | ✓ |
xeon-6130 | skylake_avx512 | gpu | ✓ | ✓ | ✓ |
epyc-7542 | zen2 | epyc-64 | ✓ | ✓ | |
epyc-7513 | zen3 | epyc-64, gpu, largemem | ✓ | ✓ | |
epyc-7282 | zen2 | gpu | ✓ | ✓ | |
epyc-7313 | zen3 | gpu | ✓ | ✓ |
Use the lscpu
command while logged in to a compute node to list all available CPU flags.
GPU specifications
The following is a summary table for GPU specifications:
GPU Model | Partitions | Architecture | Memory | Memory Bandwidth | Base Clock Speed | CUDA Cores | Tensor Cores | Single Precision Performance (FP32) | Double Precision Performance (FP64) |
---|---|---|---|---|---|---|---|---|---|
A100 | gpu | ampere | 80 GB | 1.9 TB/s | 1065 MHz | 6912 | 432 | 19.5 TFLOPS | 9.7 TFLOPS |
A100 | gpu | ampere | 40 GB | 1.6 TB/s | 765 MHz | 6912 | 432 | 19.5 TFLOPS | 9.7 TFLOPS |
A40 | gpu | ampere | 48 GB | 696 GB/s | 1305 MHz | 10752 | 336 | 37.4 TFLOPS | 584.6 GFLOPS |
V100 | gpu | volta | 32 GB | 900 GB/s | 1230 MHz | 5120 | 640 | 14 TFLOPS | 7 TFLOPS |
P100 | gpu, debug | pascal | 16 GB | 732 GB/s | 1189 MHz | 3584 | n/a | 9.3 TFLOPS | 4.7 TFLOPS |
K40 | main, debug | kepler | 12 GB | 288 GB/s | 745 MHz | 2880 | n/a | 4.29 TFLOPS | 1.43 TFLOPS |