BCFtools

Last updated November 04, 2023

BCFtools is a program for variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF. Using bcftools on a CARC cluster involves running batch jobs where BCFtools commands are executed on the compute nodes of the cluster.

Here is a short guide to get you started with BCFtools on a SLURM cluster:

0.0.1 Load the BCFtools module to use it in interactive mode

module purge
module load usc
module load bcftools

0.0.2 or write a SLURM batch script and submit it to the cluster

#!/bin/bash

#SBATCH --job-name=bcftools_job
#SBATCH --output=bcftools_job_%j.out
#SBATCH --error=bcftools_job_%j.err
#SBATCH --time=01:00:00
#SBATCH --partition=main
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=4G
#SBATCH --account=ttrojan_123

# Load the bcftools module
module purge
module load usc
module load bcftools

# Your bcftools command here, for example:
bcftools stats input.vcf > output.stats

# End of script

0.0.3 Submitting Your Job

Use the sbatch command to submit your job to the SLURM scheduler.

sbatch bcftools_job.slurm

0.0.4 Collecting Results

After the job completes, the results (like statistics, filtered VCF files, etc.) will be available in the output file or the specified output directory. Check the output.stats file (as named in the example command above) for your results.

cat output.stats

0.0.4.1 Additonal Tips

Always test your BCFtools commands interactively (if possible) on a small dataset before submitting a batch job.

For long-running jobs, make sure to set an appropriate wall time in the #SBATCH --time directive.

If your job requires more memory, adjust the #SBATCH --mem directive accordingly.

Be courteous of shared resources and use only what you need.

Make use of job arrays if you’re processing multiple files in a similar manner.