• Subscribe

Piecing together the genomic puzzle with NCGAS

Speed read
  • The genome holds the solution to many bio-physical questions.
  • National Center for Genome Analysis Support (NCGAS) gives computational tools to life scientists.
  • Life scientists gain new insights through genomic analysis.

Looking to build a body? Simple: All the information you need is right there in the genome. Looking to heal a body? Again, all the information you need is right there in the genome.

Of course it’s not that simple, but scientists are getting closer every day. We now know that many of our diseases stem from something broken within the genome. But looking inside the genome takes particular tools and particularly trained experts to handle them.

A supercomputer is one of the most important tools scientists require to understand the genome. But many life scientists, outstanding in their fields though they may be, often remain novices when transplanted to a computational environment.

<strong>Building bridges. </strong>The Bridges supercomputer is one of the supercomputing resources the NSF offers to life scientists. It is specially designed for large memory tasks like genomic analysis. Courtesy Pittsburgh Supercomputing Center.

Since so few National Science Foundation (NSF) -funded life scientists were using the national cyberinfrastructure, the NSF helped create liaison centers like the National Center for Genome Analysis Support (NCGAS).

“We felt there was a need to serve NSF researchers, researchers who were doing genomics and didn’t have resources to use,” recalls Thomas Doak, principal investigator for NCGAS. “But it’s not just the crusty old genomics-y guys. It’s ecologists, people who work on North Atlantic copepods or endangered fish in the Colorado Plateau – these folks can learn a lot about their organism by doing some genomics.”

Led by the IU Pervasive Technology Institute, NCGAS is a collaboration with the Texas Advanced Computing Center, the San Diego Supercomputer Center, and the Pittsburgh Supercomputing Center.

“We’re here to help biologists solve their problem, whatever their problem might be." ~Thomas Doak

The NSF has recently extended its investment in NCGAS, asking them to continue helping scientists make use of the vast quantities of genomic information now available.

Vast indeed, genomic datasets can easily soar into the multi-terabyte range, which is one reason why researchers look for resources with a lot of computing memory. Computers like Mason, the 500GB-per node cluster NCGAS operates at Indiana University or the 3TB nodes on the Bridges supercomputer at PSC.

Solving the puzzle

Genomic analysis is a labor-intensive process, beginning with technicians isolating DNA in labs all across the world. From these labs, the DNA is fragmented and sent on to sequencing centers where it is digitized and uploaded onto a server.

“So you have a massive jigsaw puzzle which someone’s taken apart and you have to put back together,” says Doak. “If you have enough little pieces, you can certainly do that—but you can’t do it by yourself. You have to come up with very sophisticated computer programs that will do that work and figure out which computational tool is best for your problem. That’s something we try to help with.”

<strong>National treasure. </strong> Nearly 500 institutions in over 40 states take advantage of NCGAS services. Courtesy Thomas Doak.

Researchers like James Denvir and Swanthana Rekulapally of Marshall University in West Virginia saw their research accelerate with NCGAS assistance. They used Bridges’ 3-terabyte large memory nodes to reassemble the 1 billion DNA bases of the Narcissus flycatcher bird genome in 6.6 hours. This is nearly five times faster than previous systems allowed.

NCGAS resources also played a key part in research into the North Atlantic copepod, the major food source for whales, and a pillar of the global food chain.

Researchers from Cornell partnered with NCGAS at PSC to help construct the Non-human Primate Reference Transcriptome Resource, an assemblage of genomes from more than 13 different species.

NCGAS is proving their value to the community, as the NSF-reinvestment indicates. Their services are in use across the US and beyond, and are free to any NSF-funded researcher.

“Whether for modeling systems or to research ecological or conservation issues, we cast a pretty broad net and lots of different sorts of scientific questions fall our way,” says Doak. “We’re here to help biologists solve their problem, whatever their problem might be. People who never thought about looking at a DNA sequence are suddenly realizing it’s really helpful in whatever they’re studying.”

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2021 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.