Structural Bioinformatics

Last updated July 05, 2023

To support USC researchers wishing to utilize CARC resources in structural biology and bioinformatics, we created a conda environment that contains popular tools used in the field.

Currently we have the following software packages installed:

  • Pymol (open source edition)
  • VMD
  • ChimeraX
  • Alphafold2

These tools are free for non-commercial, academic research purposes.

If your group requires other tools not listed here, please submit a help ticket and we will do our best to accommodate your needs.

0.0.1 Pymol, VMD, ChimeraX

To use the installed packages, follow our guide for Building a Customized Conda Environment. Once the environment is built, use the desired software packages by activating them in the terminal with the following commands:

conda activate /spack/conda/envs/pymol-open-source

conda activate /spack/conda/envs/vmd

conda activate /spack/conda/envs/chimerax

0.0.2 Data visualization using Artemis

CARC’s cloud platform, Artemis, allows you to take advantage of a graphical interface by creating a virtual machine.

Before using Artemis, you must request an allocation for it. See the Request a New Allocation user guide for instructions.

Follow the step-by-step how-to guide to create a virtual machine on our Artemis platform with the following settings:

  • Ubuntu 22.04 template
  • 4-8 CPU cores
  • 8-16 GB of memory

All other parameters set to default.

Access the remote desktop application by selecting the monitor icon next to the VM name to launch VNC or follow the instructions to use RDP.

Once the remote desktop has launched, open up the terminal in the VM and activate conda environment with the software package of your choice. All CARC storage systems are accessible directly within the VM.

A secure USC connection is required to access CARC systems, including Artemis. See the Connecting to a USC VPN quick-start guide if you are trying to access CARC systems off-campus.

0.0.3 Alphafold2

AlphaFold2 is an artificial intelligence (AI) program developed by DeepMind (a subsidiary of Alphabet) which performs predictions of protein structure. The program is designed as a deep learning system. More info

Alphafold2 at CARC is installed from the author’s Github repository and is licensed with Apache License 2.0.

CARC provides a conda environment containing Alphafold2 and a full set of databases used by the program.

Get started using the following instructions:

  1. Log in with ssh ttrojan@discovery.usc.edu.
  2. Prepare the temporary directories and set up conda (do it only once; replace ttrojan with your uscnetid):
module purge
module load conda
mamba init bash
mkdir /scratch1/ttrojan/tmp
  1. Log out and log in again.
  2. Prepare the job script (replace ttrojan and ttrojan_123 with your uscnetid and your account ID).

Sample job script:

shell
#!/bin/bash
#SBATCH --account=ttrojan_123
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:p100:1
#SBATCH --mem=32GB
#SBATCH --time=24:00:00
module purge
eval “$(conda shell.bash hook)”
conda activate /spack/conda/alphafold/

export TMPDIR=/scratch1/ttrojan/tmp
python /spack/conda/alphafold/alphafold/run_alphafold.py \
		--fasta_paths=/path/to/the/input/file/input.fa \
		--model_preset=multimer \
		--data_dir=/project/biodb/alphafold_data \
		--bfd_database_path=/project/biodb/alphafold_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
		--uniprot_database_path=/project/biodb/alphafold_data/uniprot/uniprot.fasta \
		--uniref90_database_path=/project/biodb/alphafold_data/uniref90/uniref90.fasta \
		--uniref30_database_path=/project/biodb/alphafold_data/uniref30/UniRef30_2021_03 \
		--pdb_seqres_database_path=/project/biodb/alphafold_data/pdb_seqres/pdb_seqres.txt \
		--mgnify_database_path=/project/biodb/alphafold_data/mgnify/mgy_clusters_2018_12.fa \
		--template_mmcif_dir=/project/biodb/alphafold_data/pdb_mmcif/mmcif_files/ \
		--obsolete_pdbs_path=/project/biodb/alphafold_data/pdb_mmcif/obsolete.dat \
		--max_template_date=2022-12-12 \
		--output_dir=/scratch1/ttrojan/output \
		--use_gpu_relax=TRUE  
  1. Submit the job script to a GPU partition.

sbatch my-alphafold-job.sh

Alphafold2 only utilizes on GPUs. We recommend using a single P100 GPU to complete your jobs.

0.0.4 Additional resources

If you have questions or need help, please submit a help ticket and we will assist you.