Behold the geranium: mainstay of the home garden. These colorful blooms are natural mutants, evolving many times faster than their plant peers, according to Robert Jansen, professor of biology at The University of Texas at Austin. "The degree of change in this group is off the charts," he says.
Geraniums (the Geraniaceae family) are unusual for a few reasons. For one, the organization of the chloroplast genome (and the genes within it) are highly rearranged in comparison to other plants. Second, the rates of change for certain gene sequences, especially some functional groups of genes, are highly elevated in both the chloroplast and mitochondrial genomes.
Geraniums are only one of two plant groups known to have such mutable genomes, making them a model species for scientific study. "We use evolution for lots of purposes agriculturally," Jansen says. "We select for certain features in crop plants to have bigger ears of corn or bigger tomatoes. If you don't understand the genes that are involved and how they work, it's hit or miss with regard to whether you're doing the right thing."
Jansen teamed with Jeff Palmer at Indiana University and Jeff Mower at the University of Nebraska to sequence the genomes of dozens of species of geranium. With a grant from the National Science Foundation through the Plant Genome Research Program, the scholars began applying next-generation sequencing methods and computational analysis to better understand why geraniums have evolved to be so radically different from other plants.
Currently in year two of a five-year study, the researchers are gathering sequence data, and assembling and analyzing it, with the assistance of the Ranger supercomputer at the Texas Advanced Computing Center (TACC). In the coming months, they will sequence genomes from dozens of geraniums as well as closely related rosids, whose evolutionary rates are normal. They will compare the genes involved in recombination and DNA repair in geraniums to their relatives to identify key differences that may be causing unchecked mutations.
The technologies that researchers use to sequence and analyze genetic data are only a few years old and the scale of the information is massive. Before Jansen and collaborators could start interpreting the genomic data, they needed to determine the most efficient way to gather it. "We first went through the literature to see what everybody thought we should do and there was absolutely no consensus," says Jansen.
A comparative analysis determined that the Illumina HighSeq 2000 platform (a next-generation sequencer) in tandem with Trinity (a leading assembly tool) achieved the most accurate and efficient results on Ranger. They also determined that roughly 40% of the sequence data was needed before they reached a plateau of useful information to assemble a complete transcriptome. "We had no idea how much data we needed, and the more data you have to gather the more expensive it is," Jansen says.
Supercomputers like TACC's Ranger speed up sequence analysis by breaking the process down into small chunks and distributing the data to thousands of computer processors working together.
"For each species that we're looking at, we have to assemble billions of short reads into a complete genome, or into complete transcriptomes. This takes lots of memory and space. The bottom line in our case - we could not do it without TACC."
Above and beyond the specific evolutionary history of the geranium, the researchers are hoping to discover basic facts about evolution. They speculate that the high levels of rate change occurring in this group might have something to do with genes that are involved in DNA repair and recombination.
"Experimental evidence demonstrates that if you mutate the recombination genes, you can generate instability in the genome," Jansen says. "We're hoping to uncover evidence that this phenomenon is related to those classes of genes."