Horovod

Last updated March 06, 2024

Horovod is an open source toolkit, originally developed at Uber, that facilitates distributed Deep Learning computations while requiring minimal modifications to the existing TensorFlow and PyTorch codes. This package is available under the Apache 2.0 license (https://github.com/horovod/horovod).

Popular Deep Learning packages, like TensorFlow and PyTorch, are designed to rely on one single GPU, or at most on one node, to conduct the underlying calculations. With an ever-increasing demand for more complex models, this limitation poses difficulties for many research teams scaling their research models and testing them in real-life applications.

0.0.1 Installing Horovod with PyTorch

To install and test Horovod with PyTorch, create a new Conda environment:

module purge
module load usc
module load nccl/2.12.12-1
module load cuda/11.6.2
module load cmake/3.23.2
module load conda/23.3.1
mamba create -n horovod-pytorch
mamba activate horovod-pytorch
mamba install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia -y
module unload conda
HOROVOD_NCCL_HOME=/spack/2206/apps/linux-centos7-x86_64_v3/gcc-11.3.0/nccl-2.12.12-1-e6klrou HOROVOD_GPU_ALLREDUCE=NCCL pip install --no-cache-dir horovod

Use the following script for carrying out the benchmark test using synthetic images on PyTorch:

wget <https://raw.githubusercontent.com/horovod/horovod/master/examples/pytorch/pytorch_synthetic_benchmark.py>

To launch the computations, ask for 1 node with 2X P100 GPUs:

salloc --partition=gpu --gres=gpu:p100:2 --time=01:00:00 --exclusive

Then follow the steps below:

module purge
module load usc
module load nccl/2.12.12-1
module load cuda/11.6.2
module load conda/23.3.1

conda activate horovod-pytorch

srun --mpi=pmix_v2 -n 2 python pytorch_synthetic_benchmark.py > out

Below is the strong scaling plot using these tests on Discovery nodes with 2x P100 Nvidia GPUs:

0.0.2 Additional resources

If you have questions about or need help with Horovod, please submit a help ticket and we will assist you.