Data and Tools for Cancer Genomics

  • The Cancer Genome Atlas (TCGA): TCGA is a comprehensive collection of multi-dimensional cancer genomics data covering multiple cancer types.

  • International Cancer Genome Consortium (ICGC): Description: ICGC provides high-quality genomic and clinical data from various cancer projects worldwide.

  • Gene Expression Omnibus (GEO): GEO is a public repository hosted by the National Center for Biotechnology Information (NCBI) containing a vast collection of gene expression data, including cancer datasets.

  • European Genome-phenome Archive (EGA): Description: EGA is a repository for secure storage and sharing of human genetic and phenotypic data, including cancer datasets.

  • National Cancer Institute (NCI) Genomic Data Commons (GDC): Description: GDC is an open-access data portal providing access to a wide range of cancer genomics datasets.

  • OncoLnc: Description: OncoLnc is a web resource that provides survival analysis and expression correlation for genes of interest across multiple cancer datasets.

  • UCSC Cancer Genomics Browser: The UCSC Cancer Genomics Browser offers a comprehensive collection of cancer genomics data integrated with genomic annotations.

  • GREIN : GEO RNA-seq Experiments Interactive Navigator: GREIN is an interactive web platform that provides user-friendly options to explore and analyze GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>6,000) of already processed datasets. These datasets were retrieved from GEO and reprocessed consistently by the back-end GEO RNA-seq experiments processing pipeline (GREP2).

  • GEPIA2: GEPIA2 is a web-based tool for analyzing gene expression data in cancer. It stands for Gene Expression Profiling Interactive Analysis 2 and is an updated version of the original GEPIA tool. GEPIA2 allows users to explore gene expression patterns, perform survival analyses, and visualize gene expression data across various cancer types.

  • UALCAN: UALCAN is a web-based platform that provides interactive and comprehensive analysis of cancer transcriptome data. It enables users to explore gene expression patterns, perform survival analyses, and compare gene expression between tumor and normal samples across different cancer types. UALCAN utilizes data from The Cancer Genome Atlas (TCGA) to facilitate cancer research and provide insights into tumor biology.

  • cBioPortal for Cancer Genomics:: cBioPortal hosts a large collection of cancer genomics datasets, allowing users to explore and visualize the data.

  • ONCOMINE: ONCOMINE is a powerful web-based platform for the analysis and visualization of cancer transcriptomic data. It provides researchers with access to a vast collection of publicly available gene expression datasets derived from cancer studies. ONCOMINE allows users to explore gene expression patterns, identify potential biomarkers, and compare gene expression between different cancer types or subtypes.

Guideline for Bioconductor Users

Bioconductor is an open-source and open-development software project that provides a comprehensive collection of bioinformatics and computational biology tools in the R programming language. It focuses on the analysis and comprehension of high-throughput genomic data, including DNA sequencing, RNA sequencing, microarray analysis, proteomics, and more.

Required software

Prework

Before attending the any workshop please have the following installed and configured on your machine. - Recent version of R - Recent version of RStudio - Most recent release of the Bioconductor and other packages used in courses

Install the latest release of R, then get the latest version of Bioconductor by starting R and entering the commands.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.16")
  • Ensure you can knit R markdown documents

    • Open RStudio and create a new Rmarkdown document
    • Save the document and check you are able to knit it.

Install Bioconductor Packages

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install()

Install specific packages, e.g., “GenomicFeatures” and “AnnotationDbi”, with

BiocManager::install(c("GenomicFeatures", "AnnotationDbi"))

The install() function (in the BiocManager package) has arguments that change its default behavior; type ?install for further help.

R Packages RNASeq and Single-cell RNA-seq Analysis

  • DESeq2: DESeq2 is a widely used package for differential gene expression analysis in RNA-seq data.
  • edgeR: edgeR is another popular package for differential gene expression analysis in RNA-seq data.
  • limma: limma is a package commonly used for the analysis of microarray and RNA-seq data, particularly for differential expression analysis.
  • Ballgown: Ballgown is a package for differential expression analysis and visualization of transcriptome assembly data.
  • DEXSeq: DEXSeq is specifically designed for the detection of differential exon usage in RNA-seq data.
  • NOISeq: NOISeq is a package for non-parametric analysis of differential expression in RNA-seq data.
  • clusterProfiler: clusterProfiler is a package for functional enrichment analysis of gene clusters derived from RNA-seq data.
  • GenomicFeatures: GenomicFeatures provides tools for working with genomic features, such as gene models, and is useful for annotating RNA-seq data.
  • Seurat: Seurat is a package for single-cell RNA-seq data analysis, allowing exploration and visualization of cellular heterogeneity.

Blogs for R Programming, Statistics, and Data Analyis

Videos

  • July 9, 2023: I’m thrilled to announce that the highly anticipated videos from our workshop on “R for Research: Fundamentals of R - Part 1” are now available for viewing. Whether you missed the live event or want to revisit the valuable insights shared during the session, these videos are your gateway to mastering the basics of R programming language for data analysis and research. Join us as we explore the foundations of R and learn essential skills to enhance your data analysis capabilities. Don’t wait any longer; dive into the videos today and take your research to new heights! Check it out:

  • April 12, 2023: Watch this informative 2-hour workshop on how NASA Earth Observing Data can help improve public health. Discover how we can use these data to monitor our environment and identify potential health risks. Learn about the different ways NASA Earth Observing Data can benefit our communities and keep us safe. Check it out: