Structural Bioinformatics
To support USC researchers wishing to utilize CARC resources in structural biology and bioinformatics, we created a conda environment that contains popular tools used in the field.
Currently we have the following software packages installed:
- Pymol (open source edition)
- VMD
- ChimeraX
- Alphafold2
These tools are free for non-commercial, academic research purposes.
If your group requires other tools not listed here, please submit a help ticket and we will do our best to accommodate your needs.
0.0.1 Pymol, VMD, ChimeraX
To use the installed packages, follow our guide for Building a Customized Conda Environment. Once the environment is built, use the desired software packages by activating them in the terminal with the following commands:
conda activate /spack/conda/envs/pymol-open-source
conda activate /spack/conda/envs/vmd
conda activate /spack/conda/envs/chimerax
0.0.2 Data visualization using Artemis
CARC’s cloud platform, Artemis, allows you to take advantage of a graphical interface by creating a virtual machine.
Before using Artemis, you must request an allocation for it. See the Request a New Allocation user guide for instructions.
Follow the step-by-step how-to guide to create a virtual machine on our Artemis platform with the following settings:
- Ubuntu 22.04 template
- 4-8 CPU cores
- 8-16 GB of memory
All other parameters set to default.
Access the remote desktop application by selecting the monitor icon next to the VM name to launch VNC or follow the instructions to use RDP.
Once the remote desktop has launched, open up the terminal in the VM and activate conda environment with the software package of your choice. All CARC storage systems are accessible directly within the VM.
A secure USC connection is required to access CARC systems, including Artemis. See the Connecting to a USC VPN quick-start guide if you are trying to access CARC systems off-campus.
0.0.3 Alphafold2
AlphaFold2 is an artificial intelligence (AI) program developed by DeepMind (a subsidiary of Alphabet) which performs predictions of protein structure. The program is designed as a deep learning system. More info
Alphafold2 at CARC is installed from the author’s Github repository and is licensed with Apache License 2.0.
CARC provides a conda environment containing Alphafold2 and a full set of databases used by the program.
Get started using the following instructions:
- Log in with
ssh ttrojan@discovery.usc.edu
. - Prepare the temporary directories and set up conda (do it only once; replace ttrojan with your uscnetid):
module purge
module load conda
mamba init bash
mkdir /scratch1/ttrojan/tmp
- Log out and log in again.
- Prepare the job script (replace ttrojan and ttrojan_123 with your uscnetid and your account ID).
Sample job script:
shell
#!/bin/bash
#SBATCH --account=ttrojan_123
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:p100:1
#SBATCH --mem=32GB
#SBATCH --time=24:00:00
module purge
eval “$(conda shell.bash hook)”
conda activate /spack/conda/alphafold/
export TMPDIR=/scratch1/ttrojan/tmp
python /spack/conda/alphafold/alphafold/run_alphafold.py \
--fasta_paths=/path/to/the/input/file/input.fa \
--model_preset=multimer \
--data_dir=/project/biodb/alphafold_data \
--bfd_database_path=/project/biodb/alphafold_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniprot_database_path=/project/biodb/alphafold_data/uniprot/uniprot.fasta \
--uniref90_database_path=/project/biodb/alphafold_data/uniref90/uniref90.fasta \
--uniref30_database_path=/project/biodb/alphafold_data/uniref30/UniRef30_2021_03 \
--pdb_seqres_database_path=/project/biodb/alphafold_data/pdb_seqres/pdb_seqres.txt \
--mgnify_database_path=/project/biodb/alphafold_data/mgnify/mgy_clusters_2018_12.fa \
--template_mmcif_dir=/project/biodb/alphafold_data/pdb_mmcif/mmcif_files/ \
--obsolete_pdbs_path=/project/biodb/alphafold_data/pdb_mmcif/obsolete.dat \
--max_template_date=2022-12-12 \
--output_dir=/scratch1/ttrojan/output \
--use_gpu_relax=TRUE
- Submit the job script to a GPU partition.
sbatch my-alphafold-job.sh
Alphafold2 only utilizes on GPUs. We recommend using a single P100 GPU to complete your jobs.
0.0.4 Additional resources
If you have questions or need help, please submit a help ticket and we will assist you.