FastQC

Last updated November 04, 2023

FastQC is a popular tool for quality control of high-throughput sequence data. Here’s a short user guide on how to use FastQC on a SLURM-managed cluster.

0.0.1 Prepare Your Data

Ensure your sequencing data (usually in .fastq or .fastq.gz format) is uploaded to the cluster. You can use scp or an SFTP client to transfer files from your local machine to the cluster.

More information on transferring data can be found in our Research Data Management user guides.

0.0.2 Load FastQC Module to use it in interactive mode

module purge
module load usc
module load fastqc

0.0.3 or write a SLURM batch script and submit it to the cluster

Create a batch script (fastqc_analysis.sh) for your FastQC job. Use your preferred text editor on the cluster.

#!/bin/bash
#SBATCH --job-name=fastqc_analysis
#SBATCH --output=fastqc_%j.out
#SBATCH --error=fastqc_%j.err
#SBATCH --partition=main
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=4G
#SBATCH --time=02:00:00
#SBATCH --account=ttrojan_123

module purge
module load usc
module load fastqc

# Go to the directory where the data files are located
cd /path/to/your/data

# Run FastQC on all fastq files in the folder
fastqc *.fastq.gz

# Or for a single file
# fastqc your_data_file.fastq.gz

Replace /path/to/your/data with the actual path to your data on the cluster. Adjust the SLURM directives (#SBATCH) as per your requirements.

0.0.4 Submit the Job

Submit the job to the SLURM scheduler using the sbatch command:

sbatch fastqc_analysis.sh

0.0.5 Collect Results

Once the job is complete, you will find the FastQC reports in your data directory. FastQC generates HTML reports for each file, which you can download and view in any web browser. You can use scp to transfer these files back to your local machine.