- Global collaboration examines diversification of plant species across 1 billion year history
- More complete picture of evolutionary innovation disentangles relationships between species
- Massive scope of project demanded new computational tools and supercomputers to process genetic sequences
In the culmination of a nine-year research project, gene sequences for more than 1,100 plant species have been released by an international consortium of nearly 200 plant scientists, in which scientists and a supercomputer at the University of Arizona played a key role.
The One Thousand Plant Transcriptomes Initiative, or 1KP, is a global collaboration to examine the diversification of plant species, genes, and genomes across the more than 1-billion-year history of green plants dating back to the ancestors of flowering plants and green algae.
The project is by far the biggest effort to decipher genomes across the kingdom of plants, according to Mike Barker, associate professor in the University of Arizona Department of Ecology and Evolutionary Biology.
Until now, plant scientists had generated reference genomes from a relative handful of plant representatives, including Arabidopsis, the "fruit fly" of plant genetics, rice, a fern, and a moss. But those were mere dots of light in a largely dark tree comprising about 400,000 species of known land plants alone, Barker said.
"You could say we have turned on the lights in the dark corridors of the plant tree of life where we haven't been able to look before," he said. "We went from a few light bulbs lighting up isolated rooms to 1,500."
"In the tree of life, everything is interrelated,” said Gane Ka-Shu Wong, professor in the University of Alberta Department of Biological Sciences. "And if we want to understand how the tree of life works, we need to examine the relationships between species. That’s where genetic sequencing comes in."
Their recent paper reveals the timing of whole genome duplications and the origins, expansions and contractions of gene families contributing to fundamental genetic innovations enabling the evolution of green algae, mosses, ferns, conifer trees, flowering plants, and all other green plant lineages.
The history of how and when plants secured the ability to grow tall, and make seeds, flowers, and fruits provides a framework for understanding plant diversity around the planet, including annual crops and long-lived forest tree species.
"Our inferred relationships among living plant species inform us that over the billion years since an ancestral green algal species split into two separate evolutionary lineages – one including flowering plants, land plants, and related algal groups and the other comprising a diverse array of green algae – plant evolution has been punctuated with innovations and periods of rapid diversification” said James Leebens-Mack, professor of plant biology in the University of Georgia Franklin College of Arts and Sciences.
He continues, “In order to link what we know about gene and genome evolution to a growing understanding of gene function in flowering plant, moss, and algal organisms, we needed to generate new data to better reflect gene diversity among all green plant lineages.”
The study inspired a community effort to gather and sequence diverse plant lineages derived from terrestrial and aquatic habitats on a global scale. More than 100 taxonomic specialists contributed material from field and living collections around the world.
By sequencing and analyzing genes from a broad sampling of plant species, researchers are better able to reconstruct gene content in the ancestors of all crops and model plant species, and gain a more complete picture of the gene and genome duplications that enabled evolutionary innovations.
Nearly a decade ago, Wong organized private funding through the Somekh Family Foundation as well as support from the Government of Alberta and a sequencing commitment from BGI in Shenzhen, China, to launch 1KP.
Once the project was operational, additional resources came from other ongoing projects, including iPlant (now CyVerse), a national project providing computational infrastructure and data science training for life sciences research funded by the US National Science Foundation and housed at the University of Arizona.
The massive scope of the project demanded development and refinement of new computational tools for sequence assembly and phylogenetic analysis. The research team behind the decade-long project used supercomputers to process the genetic sequences from plant samples and map the data onto more than a half-million "family trees" showing the relationship among gene families.
Barker said the University of Arizona stood out from the collaboration in that several undergraduate students were at the core of the project, most notably Thomas Kidder, Sally Galuska and Chris Reardon, all of whom graduated with degrees in bioinformatics. They worked closely with Zheng Li, a doctoral student in Barker’s lab, to analyze hundreds of thousands of gene trees.
"It would be difficult to do these analyses anywhere else," Barker said. "The high-performance computer facilities at the University of Arizona and CyVerse made analyses of this scale possible in the first place."
"1KP was one of the grand challenge initiatives that CyVerse was originally designed to support and enable," said Ramona Walls, CyVerse senior science informatician. "Researchers often take these large-scale, computationally intensive projects for granted now, forgetting how difficult it was to do something of this nature when 1KP started. Seeing 1KP come to fruition is almost like watching your kid graduate from college. It is especially gratifying to have all of the data publicly available in one place on the CyVerse Data Commons, because it means that even more science can continue to come out of the project."
The timing of 244 whole genome duplications across the green plant tree of life was one of the interrelated research focuses of the project. By comparing genetic sequences across gene trees, researchers can distentangle relationships between plant species.
"If you have a genome duplication that happened a long time ago in evolution, you expect to see traces of it in the descendants from that lineage," Barker said. "Every flowering plant that you encounter on a daily basis, outside or on your dinner plate, has traces of ancient genome duplications in its genome.
"Perhaps the biggest surprise of our analyses was the near absence of whole genome duplications in the algae,” he said. “Building on nearly 20 years of research on plant genomes, we found that the average flowering plant genome has nearly four rounds of ancestral genome duplication dating as far back as the common ancestor of all seed plants more than 300 million years ago. We also find multiple rounds of genome duplication in fern lineages, but there is little evidence of genome doubling in algal lineages.”
In addition to genome duplications, the expansion of key gene families has contributed to the evolution of multicellularity and complexity in green plants, including evolutionary innovations such as the origin of the seed and later the flower.
Having a large, cohesive dataset in one place provides plant scientists everywhere with a valuable roadmap to navigate complications in ongoing and future research projects and offers practical applications in seemingly unrelated fields, Barker explained.
Among the unexpected byproducts of the sequencing collaboration, which has made their data publicly available throughout the project, has been the discovery of photosensitive molecules that greatly expanded the toolbox of neuroscientists who study brain functions in other organisms.
Sequences, sequence alignments, and tree data are available through the CyVerse Data Commons.