Life Sciences Computing

Last updated January 09, 2023

The Center for Advanced Research Computing (CARC) provides access to life sciences resources, such as reference genomes and protein and nucleotide sequences databases.

In 2021, CARC completed a collaboration between Amgen, Dornsife, and ITS to establish access to two cryogenic electron microscopy (cryo-EM) instruments for USC researchers. The microscopes utilize a comprehensive data management and computational processing platform developed by ITS and CARC.

0.0.1 Cryo-EM microscopes

USC has two cryo-EM microscopes available for use: Krios with the K3 direct detection camera and Glacios with the Falcon 4 direct detection camera, both manufactured by Thermo Fisher. Both instruments are housed in the Michelson Center for Convergent Bioscience building in the Core Center for Excellence in Nano Imaging (CNI) at USC’s University Park campus.

The comprehensive computational environment for cryo-EM data processing includes:

  • Automation of data extraction and transfer to CARC storage.
  • Automation of data delivery to Amgen’s cloud storage.
  • GPU cluster system deployment.
  • Development of cryo-EM data pre-processing platform using Pegasus Workflow Management System.
  • Development of cryo-EM user portal with integrated Slack user notification feature.

The high degree of automation during the data processing, extraction, and transfer process is a huge benefit for researchers making use of the microscopes, and not typically available in cryo-EM workflows. In particular, the integration of Slack as a means to view pre-processed images from the microscopes in near real time is valuable for monitoring purposes.

Detailed information on creating a cryo-EM project and using the microscopes can be found in the Cryo-EM user guide.

0.0.2 Other resources

Life sciences resources are available on both Discovery and Endeavour clusters. Users can access them either through copying the desired path via the Bio Resources user guide or going to /project/biodb/resourcename and choosing the desired resource. Listed below are some of the popular resources we offer. If you need a specific resource that is not currently listed, please submit a help ticket and we will try to make those resources available to you.

0.0.2.1 Genomes

A set of ready-to-use reference sequences and annotations for commonly analyzed organisms, sourced from iGenomes.

0.0.2.2 Genbank

The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.

0.0.2.3 Genome Taxonomy Database (GTDB)

The Genome Taxonomy Database (GTDB) is an initiative to establish a standardized microbial taxonomy based on genome phylogeny.

0.0.2.4 Pfam database

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).

0.0.2.5 TIGRFAMs

TIGRFAMs is a resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins.

0.0.2.6 UniProt

The Universal Protein Resource (UniProt), a collaboration between the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics, and the Protein Information Resource (PIR).