Discovery Resource Overview

The Center for Advanced Research Computing's general use high-performance computing cluster, Discovery, has over 400 compute nodes available for users to run their jobs on.

For general CARC system specifications, see our System Information page.

Discovery Cluster Overview video

Partitions and compute nodes

There are a few Slurm partitions available on Discovery, each with a separate job queue. These are general-use partitions available to all researchers. The table below describes the intended purpose for each partition.

mainSerial and small-to-medium parallel jobs (single node or multiple nodes)
epyc-64Medium-to-large parallel jobs (single node or multiple nodes)
oneweekLong-running jobs (up to 7 days)
debugShort-running jobs for debugging purposes
largememJobs requiring larger amounts of memory (up to 1 TB)

Each partition has a different mix of compute nodes. The table below describes the available nodes on each of the partitions. Typically, each node has two sockets with one processor each and an equal number of cores per processor; in the table below, the CPUs column refers to logical CPUs such that 1 logical CPU = 1 core = 1 thread. In addition, please note that the maximum available memory per node for jobs is actually a few GB less than listed in the Memory column because some memory is reserved for system overhead (use the --mem=0 option to request all available memory on a node).

PartitionNodesCPUsMemory (GB)CPU typeCPU freqGPU type
debug11663xeon-26652.40 GHzk20
debug120128xeon-2640v42.40 GHzp100
debug51663xeon-2650v22.60 GHzNone
epyc-643264256epyc-75422.90 GHzNone
main181663xeon-2640v32.60 GHzk40
main2932191xeon-61302.10 GHzv100
main4020128xeon-2640v42.40 GHzp100
main452063xeon-2640v42.40 GHzk40
main601663xeon-2640v32.60 GHzNone
main822064xeon-2640v42.40 GHzNone
main392494xeon-41162.10 GHzNone
main4124192xeon-41162.10 GHzNone
oneweek351663xeon-2650v22.60 GHzNone
oneweek316128xeon-2650v22.60 GHzNone
oneweek216256xeon-2650v22.60 GHzNone
largemem3401031xeon-48502.00 GhzNone

Note: This information is current as of March 19, 2021. Use the sinfo2 command for similar information.

Job limits

Discovery is a shared resource, so we put limits on the size and duration of jobs to ensure everyone has a chance to run jobs:

Queue (or partition)Maximum run timeMaximum concurrent coresMaximum GPUsMaximum number of jobs or job steps (running or pending)
main48 hours1,200365,000
epyc-6448 hours1,200n/a5,000
oneweek168 hours208n/a50
debug30 minutes4845
largemem168 hours120n/a10

Jobs also depend on your project account allocations, and each job will subtract from your project's allocated System Units (SUs) depending on the types of resources you request:

Resource reserved for 1 hourSUs Charged
1 CPU/core/thread1
1 GB memory0.25

For GPUs, the SU charge varies depending on the GPU model. The table below shows the SU charge for different GPU models for one hour:

GPU ModelSystem Unit (SU) Charge

Note: SUs are charged based on resources that you request, not what is actually used. Be sure not to request more resources than your program requires.

You can use the myaccount command to see your available and default account allocations and usage for each:

ttrojan@discovery2:~$ myaccount

      User              Account             Def Acct                  QOS
---------- -------------------- -------------------- --------------------
   ttrojan                acct1                acct1               normal

account usage: acct1
Top 10 Users 2019-08-13T00:00:00 - 2020-08-12T23:59:59 (31622400 secs)
Usage reported in Percentage of Total
  Cluster     Login     Proper Name         Account     Used   Energy
--------- --------- --------------- --------------- -------- --------
discovery   ttrojan         ttrojan           acct1   10.03%    0.00%

The user ttrojan has used 10.03% of their allocation on their default account acct1.

Back to top