HPC with Python

Last updated March 05, 2024

Python is an open-source, general purpose programming language.

HPC with Python video

0.0.1 Using Python on CARC systems

Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.

Python can be used in either interactive or batch modes. In either mode, first load a corresponding software module:

module purge
module load gcc/11.3.0
module load python/3.11.3

Other versions of Python are available. To see all available versions of Python, enter:

module spider python

Different versions of Python may require different dependency modules.

The Python modules depend on a gcc module. This module needs to be loaded first because Python was built with the GCC compiler. Loading the module also ensures that any Python packages installed from source are built using the same version of GCC.

0.0.1.1 Installing a different version of Python

If you require a different version of Python that is not currently installed on CARC systems, please submit a help ticket and we will install it for you.

Alternatively, you could:

  • Install Python with Conda.
  • Use a Singularity container with Python installed.
  • Install a different version of Python from source within one of your directories.

0.0.1.2 Installing Python packages

You can install Python packages that you need in one of your directories (see the section on installing packages below).

0.0.1.3 Integrated development environments

JupyterLab, VSCode, and other integrated development environments (IDEs) can be used on compute nodes via our CARC OnDemand service. To install Jupyter kernels, see our guide here.

0.0.2 Running Python in interactive mode

Using Python on a login node should be reserved for installing packages. A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g., discovery.usc.edu or endeavour.usc.edu). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.

To run Python interactively on a compute node, follow these two steps:

  1. Reserve job resources on a node using salloc
  2. Once resources are allocated, load the required modules and enter python
[user@discovery1 ~]$ salloc --time=1:00:00 --ntasks=1 --cpus-per-task=8 --mem=16G --account=<project_id>
salloc: Pending job allocation 24737
salloc: job 24737 queued and waiting for resources
salloc: job 24737 has been allocated resources
salloc: Granted job allocation 24737
salloc: Waiting for resource configuration
salloc: Nodes d05-04 are ready for job

Change the resource requests (the --time=1:00:00 --ntasks=1 --cpus-per-task=8 --mem=16G --account=<project_id> part after your salloc command) as needed, such as the number of cores and memory required. Also substitute your project ID; enter myaccount to view your available project IDs.

Once you are granted the resources and logged in to a compute node, load the modules and enter python:

[user@d05-04 ~]$ module load gcc/11.3.0 python/3.11.3
[user@d05-04 ~]$ python
Python 3.11.3 (main, May 15 2023, 13:07:50) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

The shell prompt changes from user@discovery1 to user@<nodename> to indicate that you are now on a compute node (e.g., d05-04).

To run Python scripts from within Python, use the command exec(open('script.py').read()). Alternatively, to run Python scripts from the shell, use the python script.py command.

To exit the node and relinquish the job resources, enter exit() in Python and then enter exit in the shell. This will return you to the login node:

>>> exit()
[user@d05-04 ~]$ exit
exit
salloc: Relinquishing job allocation 24737
[user@discovery1 ~]$

0.0.3 Running Python in batch mode

To submit jobs to the Slurm job scheduler, use Python in batch mode:

  1. Create a Python script
  2. Create a Slurm job script that runs the Python script
  3. Submit the job script to the job scheduler using sbatch

Your Python script should consist of the sequence of Python commands needed for your analysis or modeling. The python command, available after a Python module has been loaded, runs Python scripts, and it can be used in the shell and in Slurm job scripts.

A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job running Python, a Slurm job script should look something like the following:

#!/bin/bash

#SBATCH --account=<project_id>
#SBATCH --partition=main
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=1:00:00

module purge
module load gcc/11.3.0
module load python/3.11.3

python script.py

Each line is described below:

Command or Slurm argument Meaning
#!/bin/bash Use Bash to execute this script
#SBATCH Syntax that allows Slurm to read your requests (ignored by Bash)
--account=<project_id> Charge compute time to <project_id>; enter myaccount to view your available project IDs
--partition=main Submit job to the main partition
--nodes=1 Use 1 compute node
--ntasks=1 Run 1 task (e.g., running a Python script)
--cpus-per-task=8 Reserve 8 CPUs for your exclusive use
--mem=16G Reserve 16 GB of memory for your exclusive use
--time=1:00:00 Reserve resources described for 1 hour
--account=<project_id> Charge compute time to <project_id>. You can find your project ID in the CARC user portal
module purge Clear environment modules
module load gcc/11.3.0 Load the gcc compiler environment module
module load python/3.11.3 Load the python environment module
python script.py Use python to run script.py

Adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. Note that to fully utilize the resources, especially the number of cores, you may need to explicitly change your Python code to do so (see the section on parallel programming below).

Develop and edit Python scripts and job scripts to run on CARC clusters:

  • on your local computer and then transfer the files to one of your directories on CARC file systems.
  • with the Files app available on our OnDemand service.
  • or with one of the available text editor modules (nano, micro, vim, or emacs).

Save the job script as python.job, for example, and then submit it to the job scheduler with Slurm’s sbatch command:

[user@discovery1 ~]$ sbatch python.job
Submitted batch job 13587

To check the status of your job, enter myqueue. If there is no job status listed, then this means the job has completed.

The results of the job will be logged and, by default, saved to a file of the form slurm-<jobid>.out in the same directory where the job script is located. To view the contents of this file, enter less slurm-<jobid>.out, and then enter q to exit the viewer.

For more information on job status and running jobs, see the Running Jobs user guide.

0.0.4 Installing Python packages

After loading a Python module, to install packages in your home directory, enter:

pip install <package_name> --user

By default, Python will install local (i.e., user) packages in your home directory (e.g., ~/.local/lib/python3.11/site-packages).

To install Python packages in a library other than the default, you can use the --target option with pip. For example, to install a package in a project directory, enter something like the following:

pip install <package_name> --target /project/ttrojan_123/python/pkgs/3.11

To load packages from this location, ensure you have appended your PYTHONPATH environment variable to include this directory:

export PYTHONPATH=/project/ttrojan_123/python/pkgs/3.11:$PYTHONPATH

To automatically set this variable when logging in to the cluster, add this line to your ~/.bashrc.

You can also create project-specific package environments using virtual environments. To create a virtual environment, navigate to the directory where you want it to be installed, such as your home or project directory, and enter:

python -m venv <env_name>

where <env_name> is the name of your environment. This will create an <env_name> subdirectory in the current directory. To activate the environment, enter:

source ./<env_name>/bin/activate

This will be reflected in your shell prompt:

(<env_name>) [user@discovery1 ~]$

Now when you install packages using pip, they will automatically be installed in your <env_name> environment and directory (e.g., ./<env_name>/lib/python3.11/site-packages).

To deactivate the environment, enter deactivate.

Alternatively, you can use pipx to create isolated package environments.

You can update pip itself with:

pip install pip --upgrade --user

Note that using pip creates unnecessary package cache files in your home directory. Enter pip cache purge to clear the cache and free up storage space.

0.0.4.1 Loading dependency modules

Some Python packages have system dependencies, and the modules for these dependencies should be loaded before starting Python and installing the packages. For example, the mpi4py package requires an MPI library, such as openmpi. In this case, load the associated module with module load openmpi and then enter pip install mpi4py --user. For some packages, you may also need to specify header and library locations for dependencies when installing.

To search for available modules for dependencies, use the module keyword <keyword> command, replacing <keyword> with the name of the dependency. If you cannot find a necessary module, please submit a help ticket and we will install it for you.

0.0.5 Parallel programming with Python

Python uses only one core by default, but it also supports both implicit and explicit parallel programming to enable full use of multi-core processors and compute nodes. This also includes the use of shared memory on a single node or distributed memory on multiple nodes. On CARC systems, 1 thread = 1 core = 1 logical CPU (requested with Slurm’s --cpus-per-task option).

Parallelizing your code to use multiple cores or nodes can reduce the execution time of your Python jobs, but the speedup does not necessarily increase in a proportional manner. The speedup depends on the scale and types of computations that are involved. Furthermore, sometimes using a single core is optimal. There is a cost to setting up parallel computation (e.g., modifying code, communications overhead, etc.), and that cost may be greater than the achieved speedup, if any, of the parallelized version of the code. Some experimentation will be needed to optimize your code and resource requests (optimal number of cores and amount of memory). Also keep in mind that your project account will be charged CPU-minutes based on the cores reserved for a job, even if all those cores are not actually used during the job.

0.0.5.1 Implicit parallelism

Some Python packages and their functions use implicit parallelism via multi-threading, so that you do not need to explicitly call for parallel computation in your Python code. Multi-threaded Python packages and functions typically automatically detect and use the available number of cores. As a result, requesting multiple cores in your Slurm jobs with the --cpus-per-task option will enable implicit parallelism via automatic multi-threading.

0.0.5.2 Explicit parallelism

Explicit parallelism means explicitly calling for parallel computation in your Python code, either in relatively simple ways or potentially in more complex ways depending on the tasks to be performed. Many Python packages exist for explicit parallelism, designed for different types of tasks that can be parallelized.

The main Python packages for explicit parallelism are summarized in the following table:

Package Purpose
threading For explicit multi-threading (I/O bound tasks)
multiprocessing For explicit multi-processing (CPU bound tasks)
Numba For JIT-compiled code
Cython For interfacing to C or C++ code
mpi4py For interfacing to MPI libraries
h5py For parallel I/O
pyslurm For launching jobs via Slurm
concurrent.futures For asynchronous evaluations and workflows
dask For asynchronous evaluations and workflows
snakemake For workflows

Please review the linked documentation above for examples and more information about how to use these packages and their functions.

For more information about high-performance computing with Python, see our workshop materials for HPC with Python as well as the resources linked below.

0.0.6 Additional resources

If you have questions about or need help with Python, please submit a help ticket and we will assist you.

Tutorials:

Web books:

CARC Python workshop materials: