- The genome holds the solution to many bio-physical questions.
- National Center for Genome Analysis Support (NCGAS) gives computational tools to life scientists.
- Life scientists gain new insights through genomic analysis.
Looking to build a body? Simple: All the information you need is right there in the genome. Looking to heal a body? Again, all the information you need is right there in the genome.
Of course it’s not that simple, but scientists are getting closer every day. We now know that many of our diseases stem from something broken within the genome. But looking inside the genome takes particular tools and particularly trained experts to handle them.
A supercomputer is one of the most important tools scientists require to understand the genome. But many life scientists, outstanding in their fields though they may be, often remain novices when transplanted to a computational environment.
Since so few National Science Foundation (NSF) -funded life scientists were using the national cyberinfrastructure, the NSF helped create liaison centers like the National Center for Genome Analysis Support (NCGAS).
“We felt there was a need to serve NSF researchers, researchers who were doing genomics and didn’t have resources to use,” recalls Thomas Doak, principal investigator for NCGAS. “But it’s not just the crusty old genomics-y guys. It’s ecologists, people who work on North Atlantic copepods or endangered fish in the Colorado Plateau – these folks can learn a lot about their organism by doing some genomics.”
“We’re here to help biologists solve their problem, whatever their problem might be." ~Thomas Doak
The NSF has recently extended its investment in NCGAS, asking them to continue helping scientists make use of the vast quantities of genomic information now available.
Vast indeed, genomic datasets can easily soar into the multi-terabyte range, which is one reason why researchers look for resources with a lot of computing memory. Computers like Mason, the 500GB-per node cluster NCGAS operates at Indiana University or the 3TB nodes on the Bridges supercomputer at PSC.
Solving the puzzle
Genomic analysis is a labor-intensive process, beginning with technicians isolating DNA in labs all across the world. From these labs, the DNA is fragmented and sent on to sequencing centers where it is digitized and uploaded onto a server.
“So you have a massive jigsaw puzzle which someone’s taken apart and you have to put back together,” says Doak. “If you have enough little pieces, you can certainly do that—but you can’t do it by yourself. You have to come up with very sophisticated computer programs that will do that work and figure out which computational tool is best for your problem. That’s something we try to help with.”
Researchers like James Denvir and Swanthana Rekulapally of Marshall University in West Virginia saw their research accelerate with NCGAS assistance. They used Bridges’ 3-terabyte large memory nodes to reassemble the 1 billion DNA bases of the Narcissus flycatcher bird genome in 6.6 hours. This is nearly five times faster than previous systems allowed.
NCGAS resources also played a key part in research into the North Atlantic copepod, the major food source for whales, and a pillar of the global food chain.
Researchers from Cornell partnered with NCGAS at PSC to help construct the Non-human Primate Reference Transcriptome Resource, an assemblage of genomes from more than 13 different species.
NCGAS is proving their value to the community, as the NSF-reinvestment indicates. Their services are in use across the US and beyond, and are free to any NSF-funded researcher.
“Whether for modeling systems or to research ecological or conservation issues, we cast a pretty broad net and lots of different sorts of scientific questions fall our way,” says Doak. “We’re here to help biologists solve their problem, whatever their problem might be. People who never thought about looking at a DNA sequence are suddenly realizing it’s really helpful in whatever they’re studying.”