Launcher is a utility for performing simple, data parallel, high-throughput computing (HTC) workflows on clusters, massively parallel processor (MPP) systems, workgroups of computers, and personal machines. It is designed for running large collections of serial or multi-threaded applications in a single batch job. Launcher can be used as an alternative to job arrays and to pack many short-running jobs into one batch job.
With Launcher, you can run a sequence of defined jobs within a single batch job even when you have more jobs to run than the requested number of processors. The number of available processors simply determines the upper limit on the number of jobs that can be run at the same time.
Using Launcher on CARC systems
You can use Launcher by loading the corresponding software module:
module load launcher
Launcher is not a compiled program. Instead, it's a set of Bash and Python scripts, so you can use the Launcher module with any software tree available on CARC systems.
Running Launcher in batch mode
In order to submit jobs to the Slurm job scheduler, you will need to use Launcher in batch mode. There are a few steps to follow:
- Create a launcher job file that contains jobs to run (one job per line)
- Create a Slurm job script that requests resources, configures Launcher, and runs the launcher job file
- Submit the job script to the job scheduler using
A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job running Launcher, a Slurm job script should look something like the following:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=16 #SBATCH --cpus-per-task=1 #SBATCH --mem=0 #SBATCH --time=1:00:00 #SBATCH --account=<project_ID> module purge module load launcher module load usc hwloc export LAUNCHER_DIR=$LAUNCHER_ROOT export LAUNCHER_RMI=SLURM export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins export LAUNCHER_SCHED=interleaved export LAUNCHER_BIND=1 export LAUNCHER_WORKDIR=$PWD export LAUNCHER_JOB_FILE=simulations.txt $LAUNCHER_DIR/paramrun
Each line is described below:
|Command or Slurm argument||Meaning|
|Use Bash to execute this script|
|Syntax that allows Slurm to read your requests (ignored by Bash)|
|Use 1 compute node|
|Run 16 tasks|
|Reserve 1 CPU per task for your exclusive use|
|Reserve all memory on a node for your exclusive use|
|Reserve resources described for 1 hour|
|Charge compute time to <project_ID>. You can find your project ID in the CARC User Portal|
|Clear environment modules|
|Load the |
|Load the |
|Set Launcher root directory|
|Use Slurm plugin|
|Set plugin directory|
|Use interleaved scheduling option|
|Bind tasks to cores using |
|Set working directory for job|
|Specify launcher job file to use|
Make sure to adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. Note that this example is for serial applications.
In this example, the file
simulations.txt may contain many lines like the following:
./sim 3 4 5 >& job-$LAUNCHER_JID.log ./sim 6 4 7 >& job-$LAUNCHER_JID.log ./sim 1 9 2 >& job-$LAUNCHER_JID.log
The same simulation program
sim is being run but with varying parameter values for each run.
Launcher will schedule each line as a job on one of the tasks (processors) requested. In this serial example, the number of tasks is equal to the number of processors: 16. So 16 jobs will run at one time until all jobs are completed.
In this example, the output of each job is also saved to a unique log file. For example, the
job-1.log file would contain the output for the first line in the file.
You can develop job scripts and launcher job files on your local machine and then transfer them to the cluster, or you can use one of the available text editor modules (e.g.,
micro) to develop them directly on the cluster.
Save the job script as
launcher.job, for example, and then submit it to the job scheduler with Slurm's
user@discovery1:~$ sbatch launcher.job Submitted batch job 13589
To check the status of your job, enter
squeue --me. For example:
user@discovery1:~$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 13589 main launcher user R 1:01 1 d05-04
If there is no job status listed, then this means the job has completed.
The results of the job will be logged and, by default, saved to a file of the form
slurm-<jobid>.out in the same directory where the job script is located. To view the contents of this file, enter
less slurm-<jobid>.out, and then enter
q to exit the viewer. In this example, each launcher job also has its own unique log file, and you can enter
less job-<$LAUNCHER_JID>.log to view them.
For more information on job status and running jobs, see the Running Jobs user guide.
If you have questions about or need help with Launcher, please submit a help ticket and we will assist you.