HPC with Python
Python is an open-source, general purpose programming language.
0.0.1 Using Python on CARC systems
Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.
Python can be used in either interactive or batch modes. In either mode, first load a corresponding software module:
module purge
module load gcc/11.3.0
module load python/3.11.3
Other versions of Python are available. To see all available versions of Python, enter:
module spider python
Different versions of Python may require different dependency modules.
The Python modules depend on a gcc
module. This module needs to be loaded first because Python was built with the GCC compiler. Loading the module also ensures that any Python packages installed from source are built using the same version of GCC.
0.0.1.1 Installing a different version of Python
If you require a different version of Python that is not currently installed on CARC systems, please submit a help ticket and we will install it for you.
Alternatively, you could:
- Install Python with Conda.
- Use a Singularity container with Python installed.
- Install a different version of Python from source within one of your directories.
0.0.1.2 Installing Python packages
You can install Python packages that you need in one of your directories (see the section on installing packages below).
0.0.1.3 Integrated development environments
JupyterLab, VSCode, and other integrated development environments (IDEs) can be used on compute nodes via our CARC OnDemand service. To install Jupyter kernels, see our guide here.
0.0.2 Running Python in interactive mode
Using Python on a login node should be reserved for installing packages. A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g., discovery.usc.edu
or endeavour.usc.edu
). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.
To run Python interactively on a compute node, follow these two steps:
- Reserve job resources on a node using
salloc
- Once resources are allocated, load the required modules and enter
python
[user@discovery1 ~]$ salloc --time=1:00:00 --ntasks=1 --cpus-per-task=8 --mem=16G --account=<project_id>
salloc: Pending job allocation 24737
salloc: job 24737 queued and waiting for resources
salloc: job 24737 has been allocated resources
salloc: Granted job allocation 24737
salloc: Waiting for resource configuration
salloc: Nodes d05-04 are ready for job
Change the resource requests (the --time=1:00:00 --ntasks=1 --cpus-per-task=8 --mem=16G --account=<project_id>
part after your salloc
command) as needed, such as the number of cores and memory required. Also substitute your project ID; enter myaccount
to view your available project IDs.
Once you are granted the resources and logged in to a compute node, load the modules and enter python
:
[user@d05-04 ~]$ module load gcc/11.3.0 python/3.11.3
[user@d05-04 ~]$ python
Python 3.11.3 (main, May 15 2023, 13:07:50) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
The shell prompt changes from user@discovery1
to user@<nodename>
to indicate that you are now on a compute node (e.g., d05-04
).
To run Python scripts from within Python, use the command exec(open('script.py').read())
. Alternatively, to run Python scripts from the shell, use the python script.py
command.
To exit the node and relinquish the job resources, enter exit()
in Python and then enter exit
in the shell. This will return you to the login node:
>>> exit()
[user@d05-04 ~]$ exit
exit
salloc: Relinquishing job allocation 24737
[user@discovery1 ~]$
0.0.3 Running Python in batch mode
To submit jobs to the Slurm job scheduler, use Python in batch mode:
- Create a Python script
- Create a Slurm job script that runs the Python script
- Submit the job script to the job scheduler using
sbatch
Your Python script should consist of the sequence of Python commands needed for your analysis or modeling. The python
command, available after a Python module has been loaded, runs Python scripts, and it can be used in the shell and in Slurm job scripts.
A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job running Python, a Slurm job script should look something like the following:
#!/bin/bash
#SBATCH --account=<project_id>
#SBATCH --partition=main
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=1:00:00
module purge
module load gcc/11.3.0
module load python/3.11.3
python script.py
Each line is described below:
Command or Slurm argument | Meaning |
---|---|
#!/bin/bash |
Use Bash to execute this script |
#SBATCH |
Syntax that allows Slurm to read your requests (ignored by Bash) |
--account=<project_id> |
Charge compute time to <project_id>; enter myaccount to view your available project IDs |
--partition=main |
Submit job to the main partition |
--nodes=1 |
Use 1 compute node |
--ntasks=1 |
Run 1 task (e.g., running a Python script) |
--cpus-per-task=8 |
Reserve 8 CPUs for your exclusive use |
--mem=16G |
Reserve 16 GB of memory for your exclusive use |
--time=1:00:00 |
Reserve resources described for 1 hour |
--account=<project_id> |
Charge compute time to <project_id>. You can find your project ID in the CARC user portal |
module purge |
Clear environment modules |
module load gcc/11.3.0 |
Load the gcc compiler environment module |
module load python/3.11.3 |
Load the python environment module |
python script.py |
Use python to run script.py |
Adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. Note that to fully utilize the resources, especially the number of cores, you may need to explicitly change your Python code to do so (see the section on parallel programming below).
Develop and edit Python scripts and job scripts to run on CARC clusters:
- on your local computer and then transfer the files to one of your directories on CARC file systems.
- with the Files app available on our OnDemand service.
- or with one of the available text editor modules (nano, micro, vim, or emacs).
Save the job script as python.job
, for example, and then submit it to the job scheduler with Slurm’s sbatch
command:
[user@discovery1 ~]$ sbatch python.job
Submitted batch job 13587
To check the status of your job, enter myqueue
. If there is no job status listed, then this means the job has completed.
The results of the job will be logged and, by default, saved to a file of the form slurm-<jobid>.out
in the same directory where the job script is located. To view the contents of this file, enter less slurm-<jobid>.out
, and then enter q
to exit the viewer.
For more information on job status and running jobs, see the Running Jobs user guide.
0.0.4 Installing Python packages
After loading a Python module, to install packages in your home directory, enter:
pip install <package_name> --user
By default, Python will install local (i.e., user) packages in your home directory (e.g., ~/.local/lib/python3.11/site-packages
).
To install Python packages in a library other than the default, you can use the --target
option with pip
. For example, to install a package in a project directory, enter something like the following:
pip install <package_name> --target /project/ttrojan_123/python/pkgs/3.11
To load packages from this location, ensure you have appended your PYTHONPATH
environment variable to include this directory:
export PYTHONPATH=/project/ttrojan_123/python/pkgs/3.11:$PYTHONPATH
To automatically set this variable when logging in to the cluster, add this line to your ~/.bashrc
.
You can also create project-specific package environments using virtual environments. To create a virtual environment, navigate to the directory where you want it to be installed, such as your home or project directory, and enter:
python -m venv <env_name>
where <env_name>
is the name of your environment. This will create an <env_name>
subdirectory in the current directory. To activate the environment, enter:
source ./<env_name>/bin/activate
This will be reflected in your shell prompt:
(<env_name>) [user@discovery1 ~]$
Now when you install packages using pip
, they will automatically be installed in your <env_name>
environment and directory (e.g., ./<env_name>/lib/python3.11/site-packages
).
To deactivate the environment, enter deactivate
.
Alternatively, you can use pipx to create isolated package environments.
You can update pip
itself with:
pip install pip --upgrade --user
Note that using pip
creates unnecessary package cache files in your home directory. Enter pip cache purge
to clear the cache and free up storage space.
0.0.4.1 Loading dependency modules
Some Python packages have system dependencies, and the modules for these dependencies should be loaded before starting Python and installing the packages. For example, the mpi4py
package requires an MPI library, such as openmpi
. In this case, load the associated module with module load openmpi
and then enter pip install mpi4py --user
. For some packages, you may also need to specify header and library locations for dependencies when installing.
To search for available modules for dependencies, use the module keyword <keyword>
command, replacing <keyword>
with the name of the dependency. If you cannot find a necessary module, please submit a help ticket and we will install it for you.
0.0.5 Parallel programming with Python
Python uses only one core by default, but it also supports both implicit and explicit parallel programming to enable full use of multi-core processors and compute nodes. This also includes the use of shared memory on a single node or distributed memory on multiple nodes. On CARC systems, 1 thread = 1 core = 1 logical CPU (requested with Slurm’s --cpus-per-task
option).
Parallelizing your code to use multiple cores or nodes can reduce the execution time of your Python jobs, but the speedup does not necessarily increase in a proportional manner. The speedup depends on the scale and types of computations that are involved. Furthermore, sometimes using a single core is optimal. There is a cost to setting up parallel computation (e.g., modifying code, communications overhead, etc.), and that cost may be greater than the achieved speedup, if any, of the parallelized version of the code. Some experimentation will be needed to optimize your code and resource requests (optimal number of cores and amount of memory). Also keep in mind that your project account will be charged CPU-minutes based on the cores reserved for a job, even if all those cores are not actually used during the job.
0.0.5.1 Implicit parallelism
Some Python packages and their functions use implicit parallelism via multi-threading, so that you do not need to explicitly call for parallel computation in your Python code. Multi-threaded Python packages and functions typically automatically detect and use the available number of cores. As a result, requesting multiple cores in your Slurm jobs with the --cpus-per-task
option will enable implicit parallelism via automatic multi-threading.
0.0.5.2 Explicit parallelism
Explicit parallelism means explicitly calling for parallel computation in your Python code, either in relatively simple ways or potentially in more complex ways depending on the tasks to be performed. Many Python packages exist for explicit parallelism, designed for different types of tasks that can be parallelized.
The main Python packages for explicit parallelism are summarized in the following table:
Package | Purpose |
---|---|
threading | For explicit multi-threading (I/O bound tasks) |
multiprocessing | For explicit multi-processing (CPU bound tasks) |
Numba | For JIT-compiled code |
Cython | For interfacing to C or C++ code |
mpi4py | For interfacing to MPI libraries |
h5py | For parallel I/O |
pyslurm | For launching jobs via Slurm |
concurrent.futures | For asynchronous evaluations and workflows |
dask | For asynchronous evaluations and workflows |
snakemake | For workflows |
Please review the linked documentation above for examples and more information about how to use these packages and their functions.
For more information about high-performance computing with Python, see our workshop materials for HPC with Python as well as the resources linked below.
0.0.6 Additional resources
If you have questions about or need help with Python, please submit a help ticket and we will assist you.
Tutorials:
Web books:
CARC Python workshop materials: