Picard
Picard is a set of Java command-line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. When using it on a Slurm cluster, the key is to understand both how to run Java applications and how to submit jobs to Slurm. Here’s a short guide on how to use Picard on a Slurm cluster:
1 Load the Picard module to use it in interactive mode
module purge
module load usc
module load openjdk
module load picard
2 Prepare Your Picard Command
Prepare the command you want to run. For example:
java -jar picard.jar MarkDuplicates INPUT=sample.bam OUTPUT=sample.marked_duplicates.bam METRICS_FILE=metrics.txt
3 or write a SLURM batch script and submit it to the cluster
Create a batch script (run_picard.sh) for Slurm with the necessary directives:
#!/bin/bash
#SBATCH --job-name=picard-markdup
#SBATCH --ntasks=1
#SBATCH --mem=4G # Adjust memory request. Picard can be memory-intensive.
#SBATCH --time=02:00:00 # Adjust time as needed.
#SBATCH --output=picard_%j.out # %j will be replaced with the job ID
#SBATCH --account=ttrojan_123
#SBATCH --partition=main
module purge
module load usc
module load openjdk
module load picard
java -jar /path/to/picard.jar MarkDuplicates \
INPUT=sample.bam \
OUTPUT=sample.marked_duplicates.bam \
METRICS_FILE=metrics.txt \
TMP_DIR=tmp # Specify a temporary directory if needed.
Adjust the #SBATCH
directives according to your needs:
--job-name
: Set a name for your job.
--ntasks
: Number of tasks. Usually Picard runs in a single thread, so this should be 1.
--mem
: Set the amount of memory you expect the job to use.
--time
: The maximum amount of time your job will run before it is terminated.
--output
: The file where Slurm will write the output of your job.
4 Submit Your Job
Use the sbatch command to submit your batch script to the Slurm scheduler:
sbatch run_picard.sh
5 Collect Your Results
After the job has completed, you can find the output in the file specified by --output
in your Slurm script, and any files Picard was set to produce (e.g., sample.marked_duplicates.bam).