Using Python

Python is an open-source, general purpose programming language.

HPC with Python video

Using Python on CARC systems

Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.

You can use Python in either interactive or batch modes. In either mode, first load the corresponding software module:

module load python

This loads the default version, currently 3.9.2, and is equivalent to module load python/3.9.2. If you require a different version, specify the version of Python when loading. For example:

module load python/3.7.6

To see all available versions of Python, enter:

module spider python

The Python modules depend on the gcc/8.3.0 module, which is loaded by default when logging in. This module needs to be loaded first because Python was built with the GCC 8.3.0 compiler.

If needed, the gcc module should be loaded before loading a python module:

module purge
module load gcc/8.3.0
module load python/3.9.2

Or alternatively enter module load usc and then load a python module.

Installing a different version of Python

If you require a different version of Python that is not currently installed on CARC systems, please submit a help ticket and we will install it for you.

Pre-installed packages

Many popular Python packages have already been installed and are available to use after loading one of the Python modules. Use the pip list command to view them. You can install other Python packages that you need in your home or project directories (see the section on installing packages below).

Jupyter notebooks

Please note that we do not currently support the use of Jupyter notebooks on CARC systems.

Running Python in interactive mode

After loading the module, to run Python interactively on a login node, simply enter python and this will start a new Python session. Using Python on a login node should be reserved for installing packages. Conversely, using Python interactively on a compute node is useful for more intensive work like exploring data, testing models, and debugging.

A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g., discovery.usc.edu or endeavour.usc.edu). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.

To run Python interactively on a compute node, first use Slurm's salloc command to reserve job resources on a node:

user@discovery1:~$ salloc --time=1:00:00 --ntasks=1 --cpus-per-task=8 --mem=16GB --account=<project_id>
salloc: Pending job allocation 24737
salloc: job 24737 queued and waiting for resources
salloc: job 24737 has been allocated resources
salloc: Granted job allocation 24737
salloc: Waiting for resource configuration
salloc: Nodes d05-04 are ready for job

Make sure to change the resource requests (the --time=1:00:00 --ntasks=1 --cpus-per-task=8 --mem=16GB --account=<project_id> part after your salloc command) as needed, such as the number of cores and memory required. Also make sure to substitute your project ID, which is of the form <PI_username>_<id>. You can find your project ID in the CARC User Portal.

Once you are granted the resources and logged in to a compute node, load the modules and then enter python:

user@d05-04:~$ module load usc python/3.9.2
user@d05-04:~$ python
Python 3.9.2 (default, Mar 19 2021, 09:12:17)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Notice that the shell prompt changes from user@discovery1 to user@<nodename> to indicate that you are now on a compute node (e.g., d05-04).

To run Python scripts from within Python, use the command exec(open('script.py').read()). Alternatively, to run Python scripts from the shell, use the python3 script.py command.

To exit the node and relinquish the job resources, enter exit in Python and then enter exit in the shell. This will return you to the login node:

>>> exit()
user@d05-04:~$ exit
exit
salloc: Relinquishing job allocation 24737
user@discovery1:~$

Please note that compute nodes do not have access to the internet, so any data downloads or package installations should be completed on the login or transfer nodes, either before the interactive job or concurrently in a separate shell session.

Running Python in batch mode

To use Python in batch mode, there are a few steps to follow:

  1. Create a Python script
  2. Create a Slurm job script that runs the Python script
  3. Submit the job script to the job scheduler using sbatch

Your Python script should consist of the sequence of Python commands needed for your analysis. The python command, available after a Python module has been loaded, runs Python scripts, and it can be used in the shell and in Slurm job scripts.

A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job running Python, a Slurm job script should look something like the following:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16GB
#SBATCH --time=1:00:00
#SBATCH --account=<project_id>

module purge
module load gcc/8.3.0
module load python/3.9.2

python script.py

Each line is described below:

Command or Slurm argumentMeaning
#!/bin/bashUse Bash to execute this script
#SBATCHSyntax that allows Slurm to read your requests (ignored by Bash)
--nodes=1Use 1 compute node
--ntasks=1Run 1 task (e.g., running a Python script)
--cpus-per-task=8Reserve 8 CPUs for your exclusive use
--mem=16GBReserve 16 GB of memory for your exclusive use
--time=1:00:00Reserve resources described for 1 hour
--account=<project_id>Charge compute time to <project_id>. You can find your project ID in the CARC User Portal
module purgeClear environment modules
module load gcc/8.3.0Load the gcc compiler environment module
module load python/3.9.2Load the python environment module
python script.pyUse python to run script.py

Make sure to adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. Note that to fully utilize the resources, especially the number of cores, you may need to explicitly change your Python code to do so (see the section on parallel programming below).

You can develop Python scripts and job scripts on your local machine and then transfer them to the cluster, or you can use one of the available text editor modules (e.g., micro) to develop them directly on the cluster.

Save the job script as python.job, for example, and then submit it to the job scheduler with Slurm's sbatch command:

user@discovery1:~$ sbatch python.job
Submitted batch job 13587

To check the status of your job, enter squeue --me. For example:

user@discovery1:~$ squeue --me
         JOBID PARTITION     NAME     USER     ST    TIME  NODES NODELIST(REASON)
        170552      main python.j     user      R    1:01      1 d05-04

If there is no job status listed, then this means the job has completed.

The results of the job will be logged and, by default, saved to a file of the form slurm-<jobid>.out in the same directory where the job script is located. To view the contents of this file, enter less slurm-<jobid>.out, and then enter q to exit the viewer.

For more information on job status and running jobs, see the Running Jobs user guide.

Installing Python packages

After loading a Python module (in this case, version 3), to install packages in your home directory, enter:

pip install <package_name> --user

By default, Python will install local (i.e., user) packages in your home directory (~/.local/lib/python3.9/site-packages).

To install Python packages in a library other than the default, you can use the --target option with pip. For example, to install a package in a project directory, enter:

pip install <package_name> --target /project/<project_id>/python/pkgs

where <project_id> is your project's ID. You can find your project ID in the CARC User Portal.

To load packages from this location, ensure you have appended your PYTHONPATH environment variable to include this directory:

export PYTHONPATH=/project/<project_id>/python/pkgs:${PYTHONPATH}

To automatically set this variable when logging in to the cluster, add this line to your ~/.bashrc. Additionally, add this line to your Slurm job scripts that depend on the packages installed in this location.

You can also create project-specific package environments using virtual environments. To create a virtual environment, navigate to the directory where you want it to be installed, such as your home or project directory, and enter:

python -m venv <env_name>

where <env_name> is the name of your environment. This will create an <env_name> subdirectory in the current directory. To activate the environment, enter:

source <env_name>/bin/activate

This will be reflected in your shell prompt:

(<env_name>) user@discovery:~$

Now when you install packages, they will automatically be installed in your <env_name> environment and directory (e.g., ./<env_name>/lib/python3.7/site-packages). Additionally, add a similar line to your Slurm job scripts that use this Python environment, but make sure to include the absolute path.

To deactivate the environment, enter deactivate.

Parallel programming with Python

Python uses only one thread by default, but it also supports both implicit and explicit parallel programming to enable full use of multi-core processors and compute nodes. This also includes the use of shared memory on a single node or distributed memory on multiple nodes. On CARC systems, 1 thread = 1 core = 1 logical CPU (requested with Slurm's --cpus-per-task option).

Parallelizing your code to use multiple cores or nodes can reduce the runtime of your Python jobs, but the speedup does not necessarily increase in a proportional manner. The speedup depends on the scale and types of computations that are involved. Furthermore, sometimes using a single core is optimal. There is a cost to setting up parallel computation (e.g., modifying code, communications overhead, etc.), and that cost may be greater than the achieved speedup, if any, of the parallelized version of the code. Some experimentation will be needed to optimize your code and resource requests (optimal number of cores and amount of memory). Also keep in mind that your project account will be charged CPU-minutes based on the cores reserved for a job, even if all those cores are not actually used during the job.

Implicit parallelism

Implicit parallelism is based on multi-threading, so that you do not need to explicitly call for parallel computation in your Python code. Multi-threaded Python packages and functions will automatically detect and use the number of threads. As a result, requesting multiple cores in your Slurm jobs with the --cpus-per-task option will enable implicit parallelism via automatic multi-threading.

Explicit parallelism

Explicit parallelism means explicitly calling for parallel computation in your Python code, either in relatively simple ways or potentially in more complex ways depending on the tasks to be performed. Many Python packages exist for explicit parallelism, designed for different types of tasks that can be parallelized.

The main Python packages for explicit parallelism are summarized in the following table:

PackagePurpose
threadingFor explicit multi-threading (I/O bound tasks)
multiprocessingFor explicit multi-processing (CPU bound tasks)
concurrent.futuresFor asynchronous evaluations and workflows
mpi4pyFor interfacing to MPI libraries
CythonFor interfacing to C or C++ code
h5pyFor parallel I/O
pyslurmFor launching jobs via Slurm
daskFor asynchronous evaluations and workflows
snakemakeFor workflows

Please review the linked documentation above for examples and more information about how to use these packages and their functions.

Additional resources

If you have questions about or need help with Python, please submit a help ticket and we will assist you.

Python website
Python documentation
SciPy
Anaconda user guide

Tutorials:

Programming with Python
Python tutorial
SciPy tutorials

Web books:

Think Python
SciPy Lectures

CARC Python workshop materials:

HPC with Python

Back to top