Using Anaconda

Anaconda is a package and environment manager primarily used for open-source data science packages for the Python and R programming languages. It also supports other programming languages like C, C++, FORTRAN, Java, Scala, Ruby, and Lua.

Using Anaconda on CARC systems

Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.

To use Anaconda, first load the corresponding module:

module load anaconda3

Included in all versions of Anaconda, Conda is the package and environment manager that installs, runs, and updates packages and their dependencies. Many Conda packages are pre-installed with Anaconda in the base Conda environment, including the popular data science packages for Python like pandas, NumPy, SciPy, matplotlib, and scikit-learn.

To use these packages or to create your own Conda environment, initialize your shell to use Conda:

conda init bash
source ~/.bashrc

This modifies your ~/.bashrc file so that Conda is ready to use every time you log in. The base environment should now be activated, which will be reflected in your shell prompt:

(base) user@discovery1:~$

By default, the base environment will now automatically be activated every time you log in. To disable this, change the Conda config:

conda config --set auto_activate_base false

This will create a ~/.condarc file. Read more about Conda configuration here.

Installing Conda environments and packages

You can create new Conda environments in your home or project directory. Conda environments are isolated project environments designed to manage distinct package requirements and dependencies for different projects.

To create a new Conda environment in your home directory, enter:

conda create --name <env_name>

where <env_name> is the name your want for your environment. Then activate the environment:

conda activate <env_name>

Once activated, you can then install packages to that environment:

conda install <pkg>

Please note that a version of the main application you are using (e.g., Python or R) is installed in the Conda environment, so the module versions of these should not be loaded when the Conda environment is activated.

To deactivate an environment, enter:

conda deactivate <env_name>

You can also create a new environment in your project directory instead using the --prefix option. For example:

conda create --prefix /project/<project_id>/<env_name>

where <project_id> is your project's account ID of the form <PI_username>_<id>. Then activate the environment:

conda activate /project/<project_id>/<env_name>

To view a list of all your Conda environments, enter:

conda env list

To remove a Conda environment, enter:

conda env remove --name <env_name>

Running Anaconda in interactive mode

After loading the Anaconda module and activating your Conda environment, the main application you are using with your environment can be run interactively on a login node by simply entering the associated command. Using Anaconda on a login node should be reserved for setting up environments and installing packages. Conversely, using Anaconda interactively on a compute node is useful for more intensive work like exploring data, testing models, and debugging.

A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g., discovery.usc.edu or endeavour.usc.edu). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.

To use Anaconda interactively on a compute node, first use Slurm's salloc command to reserve job resources on a node:

(base) user@discovery1:~$ salloc --time=2:00:00 --cpus-per-task=8 --mem=16GB --account=<project_id>
salloc: Pending job allocation 22658
salloc: job 22658 queued and waiting for resources
salloc: job 22658 has been allocated resources
salloc: Granted job allocation 22658
salloc: Waiting for resource configuration
salloc: Nodes d11-35 are ready for job

Make sure to change the resource requests (the --time=2:00:00 --cpus-per-task=8 --mem=16GB --account=<project_id> part after your salloc command) as needed, such as the number of cores and memory required. Also make sure to substitute your project ID, which is of the form <PI_username>_<id>. You can find your project ID in the CARC User Portal.

Once you are granted the resources and logged in to a compute node, activate your environment and then enter the relevant command (e.g., python):

(base) user@d11-35:~$ module purge
(base) user@d11-35:~$ conda activate /project/ttrojan_123/env
(env) user@d11-35:~$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Notice that the shell prompt changes from user@discovery1 to user@<nodename> to indicate that you are now on a compute node (e.g., d11-35).

To exit the compute node and relinquish the job resources, enter exit() to exit Python and then enter exit in the shell. This will return you to the login node:

>>> exit()
(env) user@d11-35:~$ exit
exit
salloc: Relinquishing job allocation 22658
(base) user@discovery1:~$

Please note that compute nodes do not have access to the internet, so any data downloads or package installations should be completed on the login or transfer nodes, either before the interactive job or concurrently in a separate shell session.

Running Anaconda in batch mode

In order to submit jobs to the Slurm job scheduler, you will need to use the main application you are using with your Conda environment in batch mode. There are a few steps to follow:

  1. Create an application script
  2. Create a Slurm job script that runs the application script
  3. Submit the job script to the job scheduler with sbatch

Your application script should consist of the sequence of commands needed for your analysis.

A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job using Anaconda, a Slurm job script should look something like the following:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16GB
#SBATCH --time=1:00:00
#SBATCH --account=<project_id>

module purge

eval "$(conda shell.bash hook)"

conda activate /project/ttrojan_123/env

python script.py

Each line is described below:

Command or Slurm argumentMeaning
#!/bin/bashUse Bash to execute this script
#SBATCHSyntax that allows Slurm to read your requests (ignored by Bash)
--nodes=1Use 1 compute node
--ntasks=1Run 1 task (e.g., running an R script)
--cpus-per-task=8Reserve 8 CPUs for your exclusive use
--mem=16GBReserve 16 GB of memory for your exclusive use
--time=1:00:00Reserve resources described for 1 hour
--account=<project_id>Charge compute time to <project_id>. You can find your project ID in the CARC User Portal
module purgeClear environment modules
eval "$(conda shell.bash hook)"Initialize the shell to use Conda
conda activate /project/ttrojan_123/envActivate your Conda environment
python script.pyUse python to run script.py

Make sure to adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. Note that to fully utilize the resources, especially the number of cores, you may need to explicitly change your application script to do so.

You can develop application scripts and job scripts on your local machine and then transfer them to the cluster, or you can use one of the available text editor modules (e.g., micro) to develop them directly on the cluster.

Save the job script as py.job, for example, and then submit it to the job scheduler with Slurm's sbatch command:

user@discovery1:~$ sbatch py.job
Submitted batch job 10002

To check the status of your job, enter squeue --me. For example:

user@discovery1:~$ squeue --me
         JOBID PARTITION     NAME     USER     ST    TIME  NODES NODELIST(REASON)
         10002      main   py.job     ttrojan   R    3:07      1 d11-04

If there is no job status listed, then this means the job has completed.

The results of the job will be logged and, by default, saved to a plain-text file of the form slurm-<jobid>.out in the same directory where the job script was submitted from. To view the contents of this file, enter less slurm-<jobid>.out, and then enter q to exit the viewer.

For more information on running and monitoring jobs, see the Running Jobs guide.

Additional resources

If you have questions about or need help with Anaconda, please submit a help ticket and we will assist you.

Anaconda
Anaconda documentation
Conda documentation
Python user guide
R user guide

Back to top