Picard
Picard is a set of Java command-line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. When using it on a Slurm cluster, the key is to understand both how to run Java applications and how to submit jobs to Slurm. Here’s a short guide on how to use Picard on a Slurm cluster:
0.0.1 Load the Picard module to use it in interactive mode
module purge
module load usc
module load openjdk
module load picard
0.0.2 Prepare Your Picard Command
Prepare the command you want to run. For example:
java -jar picard.jar MarkDuplicates INPUT=sample.bam OUTPUT=sample.marked_duplicates.bam METRICS_FILE=metrics.txt
0.0.3 or write a SLURM batch script and submit it to the cluster
Create a batch script (run_picard.sh) for Slurm with the necessary directives:
#!/bin/bash
#SBATCH --job-name=picard-markdup
#SBATCH --ntasks=1
#SBATCH --mem=4G # Adjust memory request. Picard can be memory-intensive.
#SBATCH --time=02:00:00 # Adjust time as needed.
#SBATCH --output=picard_%j.out # %j will be replaced with the job ID
#SBATCH --account=ttrojan_123
#SBATCH --partition=main
module purge
module load usc
module load openjdk
module load picard
java -jar /path/to/picard.jar MarkDuplicates \
INPUT=sample.bam \
OUTPUT=sample.marked_duplicates.bam \
METRICS_FILE=metrics.txt \
TMP_DIR=tmp # Specify a temporary directory if needed.
Adjust the #SBATCH
directives according to your needs:
--job-name
: Set a name for your job.
--ntasks
: Number of tasks. Usually Picard runs in a single thread, so this should be 1.
--mem
: Set the amount of memory you expect the job to use.
--time
: The maximum amount of time your job will run before it is terminated.
--output
: The file where Slurm will write the output of your job.
0.0.4 Submit Your Job
Use the sbatch command to submit your batch script to the Slurm scheduler:
sbatch run_picard.sh
0.0.5 Collect Your Results
After the job has completed, you can find the output in the file specified by --output
in your Slurm script, and any files Picard was set to produce (e.g., sample.marked_duplicates.bam).