Picard

Last updated November 04, 2023

Picard is a set of Java command-line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. When using it on a Slurm cluster, the key is to understand both how to run Java applications and how to submit jobs to Slurm. Here’s a short guide on how to use Picard on a Slurm cluster:

0.0.1 Load the Picard module to use it in interactive mode

module purge
module load usc
module load openjdk
module load picard

0.0.2 Prepare Your Picard Command

Prepare the command you want to run. For example:

java -jar picard.jar MarkDuplicates INPUT=sample.bam OUTPUT=sample.marked_duplicates.bam METRICS_FILE=metrics.txt

0.0.3 or write a SLURM batch script and submit it to the cluster

Create a batch script (run_picard.sh) for Slurm with the necessary directives:

#!/bin/bash
#SBATCH --job-name=picard-markdup
#SBATCH --ntasks=1
#SBATCH --mem=4G               # Adjust memory request. Picard can be memory-intensive.
#SBATCH --time=02:00:00        # Adjust time as needed.
#SBATCH --output=picard_%j.out # %j will be replaced with the job ID
#SBATCH --account=ttrojan_123
#SBATCH --partition=main

module purge
module load usc
module load openjdk
module load picard

java -jar /path/to/picard.jar MarkDuplicates \
     INPUT=sample.bam \
     OUTPUT=sample.marked_duplicates.bam \
     METRICS_FILE=metrics.txt \
     TMP_DIR=tmp             # Specify a temporary directory if needed.

Adjust the #SBATCH directives according to your needs:

--job-name: Set a name for your job. --ntasks: Number of tasks. Usually Picard runs in a single thread, so this should be 1. --mem: Set the amount of memory you expect the job to use. --time: The maximum amount of time your job will run before it is terminated. --output: The file where Slurm will write the output of your job.

0.0.4 Submit Your Job

Use the sbatch command to submit your batch script to the Slurm scheduler:

sbatch run_picard.sh

0.0.5 Collect Your Results

After the job has completed, you can find the output in the file specified by --output in your Slurm script, and any files Picard was set to produce (e.g., sample.marked_duplicates.bam).