• Subscribe

Genomic Data Commons: Expanded access to large-scale cancer genomic data

Image courtesy istockphoto.com.

The University of Chicago in Illinois, US, is collaborating with the US National Cancer Institute (NCI) to establish the nation's most comprehensive computational facility, storing cancer genomic data generated through NCI-funded research programs.

The NCI Genomic Data Commons (GDC) will expand access for scientists, speeding up research and leading to faster discoveries for patients. Providing an interactive system for researchers, the GDC will make the data easier to use. It will also provide resources to facilitate the identification of subtypes of cancer as well as potential therapeutic targets.

"The Genomic Data Commons has the potential to transform the study of cancer at all scales," says Robert Grossman, director of the GDC project and professor in the department of medicine at the University of Chicago. "It supplies the data so that any researcher can test their ideas, from comprehensive big data studies to genetic comparisons of individual tumors, to identify the best potential therapies for a single patient."

NCI has funded a number of large research projects that have collected genomic data on tumor types from more than 10,000 patients. However, the data for these studies is scattered across different locations and is in different formats, making it challenging for researchers to perform analyses. The situation will become ever more problematic as genome-sequencing technology continues to evolve and datasets become increasingly large and more complex.

The Genomic Data Commons is a first-of-its-kind facility - a comprehensive system to store and harmonize data from NCI-funded research programs in a single repository. Video courtesy the University of Chicago in Illinois, US.

According to an Institute of Medicine report, there is an urgent need for a system to store, harmonize, and analyze existing cancer genomics data - roughly 20 petabytes of information, which is 10 times as much as all of the publications currently housed in US academic research libraries.

Data democracy

To make raw and processed genomic data broadly accessible, the GDC will provide an expandable, modern informatics framework. Through a data storage and analysis approach similar to that used by companies such as Google and Facebook, it will harmonize and centralize existing NCI datasets.

Additionally the GDC will eliminate a major chokepoint, streamlining researcher's access to data regardless of their institution's size or budget. This will democratize access to the material and enable previously unfeasible collaborative efforts between scientists.

"With the GDC, the pace of discovery shifts from slow and sequential to fast and parallel," says Conrad Gilliam, dean at the University of Chicago biological sciences division. "Discovery processes that today would require many years, millions of dollars, and the coordination of multiple research teams could literally be performed in days, or even hours."

Establishing the GDC represents a key step toward the development of precision medicine - targeted treatments tailored to individual patients. Once fully developed, it will provide an interactive system for researchers and clinicians to upload their cancer genomics data and use it to identify the molecular subtype of cancer and potential therapeutic targets. Genetic data will be linked to extensive clinical information from patients and their responses to treatment.

"The availability of high-quality genomic data and associated clinical annotations is extremely important because this information can be combined and mined repeatedly to make new discoveries," says Louis Staudt, director of NCI's Center for Cancer Genomics.

Foundation for the cloud

The GDC also creates a foundation for future cloud-based technologies, allowing researchers to analyze large-scale datasets and perform experiments remotely. The open-source software being developed could become a model for data-intensive research efforts for other diseases such as Alzheimer's and diabetes, which desperately need similar large-scale, data-driven approaches to develop cures.

"The GDC is absolutely needed," says Jean Zenklusen, director of The Cancer Genome Atlas (TCGA) program office at NCI. "The current scale of the data is such that mostly big institutes with large bioinformatics cores are the only ones who have been able to take advantage of the huge amount of genetic data that is being amassed daily. NCI's goal for the GDC is to be a resource for all investigators to generate hypotheses and make new discoveries from the data."

The GDC builds upon the Bionimbus Protected Data Cloud, a pilot cloud-based system developed by Grossman that was the first to be approved by the National Institutes of Health to hold cancer genomic data from projects such as TCGA.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2021 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.