- Sequencing the genome of Arabica coffee aids research leading to more robust plants
- Genome browsers allow plant scientists to integrate and visualize agricultural knowledge with highly detailed genomic data
- National Center for Genome Analysis Support (NCGAS) enables state-of-the-art genomic research collaborations among institutions
Imagine your morning cup of coffee: a rich, steaming elixir of energy, happiness, and the will to face the rest of your day.
Today the beans in your cup are most likely Coffea arabica. Unlike C. canephora, the ‘robusta’ species most commonly used in powdered products, C. arabica is generally considered a more flavorful coffee. But the plants are prone to devastating diseases, including coffee rust.
A form of fungus, coffee rust can attack suddenly and ruin a farm’s entire crop. And climate change only makes it worse. As night-time temperatures rise in coffee-growing countries, even high-altitude growing areas are vulnerable to crippling die-offs of this perennial plant that should be productive over spans of 10 to 20 years.
A world without coffee -- it’s unthinkable. The millions of workers employed in the industry would lose their livelihoods, and international productivity would take a nosedive.
Which is why scientists like Keithanne Mockaitis are working to sequence and annotate the entire genome of C. arabica and share the findings with other researchers.
“Our hope is that this information will be used by plant breeders to quickly and efficiently select for traits that will help C. arabica survive,” says Mockaitis.
Cracking the coffee code
Thanks to ambitious ventures like the Human Genome Project (1990-2000), scientists’ ability to work with strings of DNA has come a long way in a few short years.
“Technology advances in both data generation and in the software for assembly have just exploded in the last ten years, and now we’re able to tackle the sequencing and assembly of more complex things,” says Mockaitis.
Arabica coffee is one of those complexities. Unlike the simpler robusta plant, a version of which was sequenced in 2014, C. arabica is tetraploid, meaning it has four copies of each of its chromosomes.
That adds up to a lot of data that must be extracted and then put together like the pieces of a puzzle. Since there’s no existing technology that can read a chromosome from A to Z, researchers must instead chop the DNA into pieces and then sequence the pieces.
Mockaitis’s collaborators have written software and continually improve it for assembling the puzzles of large, complex genomes. In this case, Aleksey Zimin of Johns Hopkins University is the DNA puzzle master of the coffee genome.
Mockaitis’s group at Indiana University (IU) puts together the RNA sequences to identify the ‘active’ parts of the genome and marks up the final reference with as much detail as possible. This allows plant researchers to learn the location of genes and how much they differ among coffee types. In addition, Coffea species other than the tetraploid arabica are also being sequenced for comparison studies.
Throughout the project, experts from the National Center for Genome Analysis Support (NCGAS) install advanced genomics software on IU’s high performance computing systems and expand their training for tackling the challenges of genomic research.
Thanks to a $1.5 million grant from the National Science Foundation (NSF), consulting expertise from NCGAS staff, computational time, storage of data and experimental results, and customization of open source software are made available at no cost to researchers with current NSF funding.
“The computing power, storage, file systems, and software support needed for projects like this are enormous,” says Mockaitis. “It's pretty rare that a biology department or a smaller university would be able to do any of this on their own—that’s who NCGAS was developed for.”
Mockaitis’ projects also use national resources provided by the Extreme Science and Engineering Discovery Environment (XSEDE).
“We have changed and developed our strategies for computing and storage over time, as new resources become available,” says Mockaitis. “The expertise of NCGAS has been absolutely essential to customizing and troubleshooting these resources to serve our projects.”
Out of the lab and onto the farm
When Mockaitis and her team have sequenced, assembled, and annotated a genome, they disseminate the information in a visualization known as a genome browser.
This is the age of big data in biology and that's exactly why places like NCGAS exist, says Mockaitis.
The browser is an interactive website, hosted by NCGAS, offering access to genome sequence data integrated with a large collection of aligned annotations. Plant breeders who have a knowledge of classical genetics can use the genome browser to relate the new information to known DNA markers and map them together.
Mockaitis works with collaborators at Cornell University and with Cenicafe, a coffee research center in Colombia, a major coffee-growing region. Scientists there use the browser to zoom in on parts of the genome and propose experiments to test different ideas. The collaboration overall provides a wealth of data that allows scientists to design new research approaches.
“Providing a high quality sequence reference doesn't really answer any one question by itself,” says Mockaitis. “It kicks off a whole new wave of research into coffee.”