Picard

Last updated November 04, 2023

Picard is a set of Java command-line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. When using it on a Slurm cluster, the key is to understand both how to run Java applications and how to submit jobs to Slurm. Here’s a short guide on how to use Picard on a Slurm cluster:

1 Load the Picard module to use it in interactive mode

module purge
module load usc
module load openjdk
module load picard

2 Prepare Your Picard Command

Prepare the command you want to run. For example:

java -jar picard.jar MarkDuplicates INPUT=sample.bam OUTPUT=sample.marked_duplicates.bam METRICS_FILE=metrics.txt

3 or write a SLURM batch script and submit it to the cluster

Create a batch script (run_picard.sh) for Slurm with the necessary directives:

#!/bin/bash
#SBATCH --job-name=picard-markdup
#SBATCH --ntasks=1
#SBATCH --mem=4G               # Adjust memory request. Picard can be memory-intensive.
#SBATCH --time=02:00:00        # Adjust time as needed.
#SBATCH --output=picard_%j.out # %j will be replaced with the job ID
#SBATCH --account=ttrojan_123
#SBATCH --partition=main

module purge
module load usc
module load openjdk
module load picard

java -jar /path/to/picard.jar MarkDuplicates \
     INPUT=sample.bam \
     OUTPUT=sample.marked_duplicates.bam \
     METRICS_FILE=metrics.txt \
     TMP_DIR=tmp             # Specify a temporary directory if needed.

Adjust the #SBATCH directives according to your needs:

--job-name: Set a name for your job. --ntasks: Number of tasks. Usually Picard runs in a single thread, so this should be 1. --mem: Set the amount of memory you expect the job to use. --time: The maximum amount of time your job will run before it is terminated. --output: The file where Slurm will write the output of your job.

4 Submit Your Job

Use the sbatch command to submit your batch script to the Slurm scheduler:

sbatch run_picard.sh

5 Collect Your Results

After the job has completed, you can find the output in the file specified by --output in your Slurm script, and any files Picard was set to produce (e.g., sample.marked_duplicates.bam).