Discovery Resource Overview
CARC’s general-use HPC cluster Discovery has over 20,000 cores across 500 compute nodes available for researchers to use.
Discovery is a shared resource, so there are limits in place on size and duration of jobs. This ensures that everyone has a chance to run jobs. For details on the limits, see Running Jobs.
0.0.1 Partitions and compute nodes
There are a few Slurm partitions available on Discovery, each with a separate job queue. These are general-use partitions available to all researchers. The table below describes the intended purpose for each partition:
Partition | Purpose |
---|---|
main | Serial and small-to-medium parallel jobs (single node or multiple nodes) |
epyc-64 | Serial and medium-to-large parallel jobs (single node or multiple nodes) |
gpu | Jobs requiring GPU nodes |
oneweek | Long-running jobs (up to 7 days) |
largemem | Jobs requiring larger amounts of memory (up to 1 TB) |
debug | Short-running jobs for debugging purposes |
Each partition has a different mix of compute nodes. The table below describes the available nodes by partition. Each node typically has two sockets with one multi-core processor each and an equal number of cores per processor. In the table below, the CPUs/node column refers to logical CPUs such that 1 logical CPU = 1 core = 1 thread.
Partition | CPU model | CPU frequency | CPUs/node | GPU model | GPUs/node | Memory/node | Nodes |
---|---|---|---|---|---|---|---|
main | epyc-7513 | 2.60 GHz | 64 | — | — | 256 GB | 61 |
main | epyc-7542 | 2.90 GHz | 64 | — | — | 256 GB | 32 |
main | xeon-2640v3 | 2.60 GHz | 16 | — | — | 64 GB | 32 |
main | xeon-2640v4 | 2.40 GHz | 20 | — | — | 64 GB | 16 |
main | xeon-4116 | 2.10 GHz | 24 | — | — | 94 GB | 39 |
main | xeon-4116 | 2.10 GHz | 24 | — | — | 192 GB | 29 |
main | xeon-2640v4 | 2.40 GHz | 20 | K40 | 2 | 64 GB | 45 |
epyc-64 | epyc-7513 | 2.60 GHz | 64 | — | — | 256 GB | 78 |
gpu | xeon-6130 | 2.10 GHz | 32 | V100 | 2 | 191 GB | 29 |
gpu | xeon-2640v4 | 2.40 GHz | 20 | P100 | 2 | 128 GB | 38 |
gpu | epyc-7282 | 2.80 GHz | 32 | A40 | 2 | 256 GB | 12 |
gpu | epyc-7313 | 3.00 GHz | 32 | A40 | 2 | 256 GB | 17 |
gpu | epyc-7513 | 2.60 GHz | 64 | A100 (40 GB) | 2 | 256 GB | 12 |
gpu | epyc-7513 | 2.60 GHz | 64 | A100 (80 GB) | 2 | 256 GB | 12 |
oneweek | xeon-4116 | 2.10 GHz | 24 | — | — | 192 GB | 10 |
oneweek | xeon-2640v4 | 2.40 GHz | 20 | — | — | 64 GB | 35 |
largemem | epyc-7513 | 2.60 GHz | 64 | — | — | 1024 GB | 4 |
debug | xeon-4116 | 2.10 GHz | 24 | — | — | 192 GB | 2 |
debug | xeon-2640v4 | 2.60 GHz | 20 | P100 | 2 | 128 GB | 1 |
debug | epyc-7313 | 2.60 GHz | 32 | A40 | 2 | 256 GB | 1 |
Use the nodeinfo
command for similar real-time information.
There are a few commands you can use for more detailed node information. For CPUs, the lscpu
command will provide information about CPUs. For nodes with GPUs, the nvidia-smi
command and its various options will provide information about GPUs. Alternatively, after module load nvhpc
, use the nvaccelinfo
command to view information about GPUs. After module load gcc/11.3.0 hwloc
, use the lstopo
command to view a node’s topology.
0.0.2 CPU microarchitectures and instruction set extensions
Different CPU models also offer different CPU instruction set extensions. Compiled programs can use these extensions to boost performance. The following is a summary table:
CPU model | Microarchitecture | Partitions | AVX | AVX2 | AVX-512 |
---|---|---|---|---|---|
xeon-2640v3 | haswell | main, debug | ✓ | ✓ | |
xeon-2640v4 | broadwell | main, gpu, debug | ✓ | ✓ | |
xeon-4116 | skylake_avx512 | main, oneweek, debug | ✓ | ✓ | ✓ |
xeon-6130 | skylake_avx512 | gpu | ✓ | ✓ | ✓ |
epyc-7542 | zen2 | epyc-64 | ✓ | ✓ | |
epyc-7513 | zen3 | epyc-64, gpu, largemem | ✓ | ✓ | |
epyc-7282 | zen2 | gpu | ✓ | ✓ | |
epyc-7313 | zen3 | gpu | ✓ | ✓ |
Use the lscpu
command while logged in to a compute node to list all available CPU flags.
0.0.3 GPU specifications
The following is a summary table for GPU specifications:
GPU Model | Partitions | Architecture | Memory | Memory Bandwidth | Base Clock Speed | CUDA Cores | Tensor Cores | Single Precision Performance (FP32) | Double Precision Performance (FP64) |
---|---|---|---|---|---|---|---|---|---|
A100 | gpu | ampere | 80 GB | 1.9 TB/s | 1065 MHz | 6912 | 432 | 19.5 TFLOPS | 9.7 TFLOPS |
A100 | gpu | ampere | 40 GB | 1.6 TB/s | 765 MHz | 6912 | 432 | 19.5 TFLOPS | 9.7 TFLOPS |
A40 | gpu | ampere | 48 GB | 696 GB/s | 1305 MHz | 10752 | 336 | 37.4 TFLOPS | 584.6 GFLOPS |
V100 | gpu | volta | 32 GB | 900 GB/s | 1230 MHz | 5120 | 640 | 14 TFLOPS | 7 TFLOPS |
P100 | gpu, debug | pascal | 16 GB | 732 GB/s | 1189 MHz | 3584 | n/a | 9.3 TFLOPS | 4.7 TFLOPS |
K40 | main, debug | kepler | 12 GB | 288 GB/s | 745 MHz | 2880 | n/a | 4.29 TFLOPS | 1.43 TFLOPS |