Data Science & Analysis Resources

Last updated January 12, 2025

The Center for Advanced Research Computing (CARC) offers data science support for USC researchers through extensive guides, user consultations, and workshops.

0.0.1 Data science at CARC

There are a variety of ways to run data science scripts on CARC systems. Researchers have the option to submit job scripts by running Anaconda in batch mode, using a Apptainer or Docker container, or running an interactive JupyterLab session on CARC OnDemand.

Our Data Science user guides provide details for each method, as well as information on how to use Apptainer. For questions and more individualized support, submit a help ticket and one of our Research Facilitators will assist you.

Check our workshops to see the list of classes available and our current schedule.

CARC can facilitate the use of several popular packages for data science applications and data analysis. More details on each of the packages listed can be found in our Popular Data Science Packages page. If you need a specific resource that is not currently listed, please submit a help ticket and we will try to make those resources available to you.

0.0.2.0.1 Tensorflow

TensorFlow is an open-source deep learning framework developed by Google that provides a comprehensive ecosystem of tools, libraries, and resources for building and deploying machine learning models.

0.0.2.0.2 PyTorch

PyTorch is a popular open-source deep learning framework widely used for research and production applications.

0.0.2.0.3 Keras

Keras is a high-level deep learning API that acts as a front-end for various deep learning frameworks, including TensorFlow and PyTorch.

0.0.2.0.4 Scikit-learn

Scikit-learn is a versatile and widely-used machine learning library in Python that provides a rich set of algorithms and tools for tasks such as classification, regression, clustering, and dimensionality reduction.

0.0.2.0.5 Pandas

Pandas is a powerful data manipulation and analysis library for Python that provides easy-to-use data structures and data analysis tools.

0.0.2.0.6 Matplotlib

Matplotlib is a widely-used data visualization library in Python that provides a flexible and comprehensive set of tools for creating static, animated, and interactive visualizations.

0.0.2.0.7 Seaborn

Seaborn is a statistical data visualization library built on top of Matplotlib that provides a high-level interface for creating attractive and informative statistical graphics.

0.0.2.0.8 Scipy

Scipy is a powerful library for scientific and technical computing in Python that provides a collection of modules for mathematical algorithms, optimization, integration, signal processing, statistics, and more.