VCFtools

Last updated November 04, 2023

VCFtools is a suite of functions for use in the analysis of VCF (Variant Call Format) data. When you are working on a SLURM cluster, you need to follow the job submission procedure that is typical for that environment. Below is a guide that covers the basics of using VCFtools on a SLURM-managed cluster.

0.0.1 Prepare Your Environment

Make sure you are in the directory where your VCF files are located, or move the VCF files to your current working directory.

For more information on transferring data, see the Transferring Research Data user guide.

0.0.2 Load the vcftools module to use it in interactive mode

module purge
module load usc
module load vcftools

0.0.3 or write a SLURM batch script and submit it to the cluster

Create a script file (e.g., vcftools_job.slurm) to specify the SLURM job directives and the vcftools command you want to run.

#!/bin/bash

#SBATCH --job-name=vcftools_job
#SBATCH --output=vcftools_job_%j.out
#SBATCH --error=vcftools_job_%j.err
#SBATCH --time=01:00:00
#SBATCH --partition=main
#SBATCH --ntasks=1
#SBATCH --mem=4G
#SBATCH --account=ttrojan_123

# Load vcftools module
module purge
module load usc
module load vcftools

# Your vcftools command here
vcftools --vcf input_data.vcf --freq --out output_data

Replace input_data.vcf with the name of your input file and output_data with the desired output file prefix.

0.0.4 Submit Your Job

Submit your SLURM job using the sbatch command.

sbatch vcftools_job.slurm

0.0.5 Check the Output

Once your job has completed, you can check the files generated for output and error logs.

cat vcftools_job_*.out
cat vcftools_job_*.err

0.0.6 Post-Processing

After VCFtools has finished running, you may need to perform additional analysis or data handling based on the output provided.