Feature - Ecological forecasting in NEON
Massive independent networks of environmental and ecological data stations distributed across the globe could launch environmental science into the petascale era, transforming the way scientists look at our planet.
In the United States, the National Ecological Observatory Network is poised to begin construction later this year.
"What NEON is about is measuring the effects of climate change, land use change, and invasive species on continental scale ecology. And we're doing that in order to enable ecological forecasting," said Michael Keller, the chief of science at NEON.
Ecological forecasting, like weather forecasting, uses extensive data sets over large areas and periods of time to refine models that can in turn be used to predict ecological outcomes. Unlike weather forecasting, however, ecological forecasting involves running simulations to answer questions like, "What will happen to the ecosystem if we manage it in a particular fashion?"
The project, which is funded by the National Science Foundation, will regularly take over 2,000 measurements in each of the 20 climatic regional domains distributed across the contiguous 48 states, Alaska, Hawaii, and Puerto Rico over a period of 30 years.
"That scale is very difficult for the investigator to maintain over a long time because their research funds come in four year blocks or five year blocks," Keller said.
Despite over 12,000 sensors distributed across the network, there is more to NEON than sensors, according to Bob Tawa, NEON's director of computing.
"NEON will be gathering data from sensors, as well as field data collected by field crews, data from various analysis labs (genomic, chemical, taxonomic, etc.) and existing data (e.g., land use) from sources such as USGS," Tawa said.
NEON will also be sponsoring experiments.
"We have one in our initial configuration," Keller said. "The STREON experiment looks at the effects of nutrient additions on stream ecosystems, and within the STREON experiment, we'll also be looking at the effects of excluding top-level predators."
He added, "We want to, under controlled conditions, see what that does to stream function, the biodiversity within the stream, etc."
Rather than devote resources to developing equipment and software, NEON will take advantage of existing off-the-shelf cutting edge technology. Nor will NEON attempt to be an end to end ecological forecasting solution.
"NEON will collect the data. NEON will provide infrastructure for investigators to conduct experiments or add additional measurements," Keller said. However, "NEON is not the agency that's going to be responsible for the predictions. Our goal is to enable those predictions by providing the data."
For example, on 8 February the US Commerce Department announced its proposal for a new national climate service which will be managed by the National Oceanic and Atmospheric Administration. NOAA and NEON already have plans to integrate their data.
Managing the data avalanche
To bring order to the massive amounts of data NEON will be handling, NEON's cyberinfrastructure has to be organized. That's why the computing team has conceived of four levels through which the data will pass.
When data is first collected and archived, it is considered Level 0. This raw data will be available to anyone upon request, but it will not be posted online. Next, the data is calibrated, transformed into the applicable units, vetted for obvious errors, and then undergoes a final quality check by the relevant NEON scientists. This Level 1 data is made available online, and is likely to be of the most interest to scientists in the directly related domain. Then this Level 1 data is transformed into Level 2 (gap filled) and Level 3 (gridded) data as applicable for that particular data stream.
Finally, the data streams will be combined and assimilated in various ways to create more readily meaningful Level 4 data.
"NEON is not just putting out basic data. We won't just give people a list of the species that we encountered or the number of times that we captured a mouse," Keller said. "We'll take that data and put it through some well-understood analytical tools, and give them value added information."
For example, a person who isn't a specialist in mouse populations may find it difficult to make sense of basic data. The Level 4 data, however, will organize that data over time, or area, in order to make it meaningful for non-specialists. That makes it possible for specialists in related fields to make use of the information, or laypeople to understand what it means.
At each stage of processing, metadata will be added either by the automated processing software, or by scientists annotating the data. The history of what has happens to the data over time will also be logged.
NEON expects to publish 118 L4 data 'products.'
"We have estimated that we will need roughly 1200 core days of compute to process 103 out of the 118 L4 data products that are currently defined," said Tawa. They plan to request the processing power for the remaining 16 data products from TeraGrid or an individual supercomputing site.
Constructing NEON will take at least six years. That puts the project in its early phases. According to Tawa, they are currently looking at JBOSS and LifeRay as potential technology platforms, and MetaMatrix as a data abstraction layer.
Data standards will be a major issue in designing the NEON cyberinfrastructure.
"Given the variety of data types that we are ingesting and the community that we expect to service, we plan on providing data in a number of standard formats," said Tawa. "However, those are yet to be defined."
At present, they are in the process of setting up a technical working group comprised of experts outside of NEON to study this very issue and provide advice on data publishing standards. The decisions they make will be crucial in enabling interoperability with other networks across the world.
"We are working to make our measurements as compatible as possible with existing measurements from similar organizations," Keller said. "We'll have to put more effort into coordination at our national borders, but we'd like to see a coordinated effort towards interoperability around the world."
Similar initiatives exist in Australia, China, Europe, and Canada, to name a few. Together, they may be able to foresee the future of our planet's biosphere.
-Miriam Boon, iSGTW