Scientists have collected monthly samples of the deep ocean since 1988
New project corrals these genomic and environmental data sets
Labs around the world will conduct analyses to monitor climate change
Scientists have been making monthly observations of the physical, biological, and chemical properties of the ocean since 1988. Now, thanks to the Hurwitz Lab at the University of Arizona (UA), researchers around the world have greater access than ever before to the information collected at these remote ocean sites.
Led by Bonnie Hurwitz, assistant professor of biosystems engineering at UA, the Hurwitz Lab corrals big data sets into a more searchable form to help scientists study microorganisms – bacteria, fungi, algae, viruses, protozoa – and how they relate to each other, their hosts and the environment.
The lab is building a data infrastructure on top of Cyverse to integrate and build information from diverse data stores in collaboration with the broader cyber community. The goal is to give people the ability to use data sets that span a range of storage servers, all in one place.
"One of the exciting things my lab is funded for is Planet Microbe, a three-year project through the National Science Foundation (NSF), to bring together genomic and environmental data sets coming from ocean research cruises," Hurwitz said. "Samples of water are taken using an instrument called a CTD that measures salinity, temperature, depth, and other features to create a scan of ocean conditions across the water column."
As the CTD descends into the ocean, bottles are triggered at different depths to collect water samples for a variety of experiments including sequencing the DNA/RNA of microbes. The moment each sample leaves the ship is often the last time these valuable and varied data appear together.
The first phase of the project focuses on the Hawaiian Ocean Time Series and the Bermuda Atlantic Time Series. At both locations, samples are collected across an ocean transect at a variety of depths across the water column, from surface to deep ocean.
The readings taken at each level stream out to data banks around the world. Different labs conduct the analyses, but the Hurwitz lab reunites all of the data sets, including data from these long-term ecological sites used for monitoring climate and changes in the oceans.
"Oceanographers have different tool kits. They are collecting data on ship to observe both the ocean environment and the genetics of microbes to understand the role they play in the ocean," Hurwitz said. "We are including these data in a very simple web-based platform where users can run their own analyses and data pipelines to use the data in new ways."
While still in year one of the project, the first data have just been released under the iMicrobe platform, which connects users with computational resources for analyzing and visualizing the data.
The platform’s bioinformatics tools let researchers analyze the data in new ways that may not have originally been possible when the data were collected, or to compare these global ocean data sets with new data as it becomes available.
"We're plumbers, actually, creating the pipelines between the world's oceanographic data sets. We're trying to enable scientists to access data from the world's oceans," Hurwitz said.
A larger mission
In addition to their Planet Microbe work, Hurwitz and her team work with the three entities that store and sync all of the world's "omics" (genomics, proteomics) data – the European Bioinformatics Institute, the National Center for Biotechnology Information and the DNA Databank of Japan, and others.
"We are working with the National Microbiome Collaborative, a national effort to bring together the world's data in the microbiome sciences, from human to ocean and everything in between," Hurwitz said.
"Having those data sets captured and searchable is great," said Hurwitz. "They are so big they can't be housed in any one place. The infrastructure allows you to search across these areas.”
“If we want to start looking at things together in a holistic manner, we need to be able to remotely access data that are not on our servers. We are essentially indexing the world's data and becoming a search engine for microbiome sciences."
By reconnecting 'omics data with environmental data from oceanographic cruises, Hurwitz and her team are speeding up discoveries into environmental changes affecting the marine microbes that are responsible for producing half the air that we breathe.
These data can be used in the future to predict how our oceans respond to change and to specific environmental conditions.
"Our researchers can not only use a $30 million supercomputer at XSEDE (Extreme Science and Engineering Discovery Environment) supported by the NSF for running analyses, they also have access to modern big data architectures through a simple computer interface."
"We're trying to understand where all the data are and how we can sync them," Hurwitz said. "How data are structured and assembled together has been like the Wild West. We're figuring it out."