• Subscribe

Feature - Ecological forecasting in NEON

Feature - Ecological forecasting in NEON


TOP: NEON's proto-tower just north of Boulder, Colorado, where the project is testing equipment. The site is already producing a real-data stream.

BOTTOM: Hongyan Luo conducts tests at the base of NEON's test proto-tower.

Images courtesy of NEON, Inc.

Massive independent networks of environmental and ecological data stations distributed across the globe could launch environmental science into the petascale era, transforming the way scientists look at our planet.

In the United States, the National Ecological Observatory Network is poised to begin construction later this year.

"What NEON is about is measuring the effects of climate change, land use change, and invasive species on continental scale ecology. And we're doing that in order to enable ecological forecasting," said Michael Keller, the chief of science at NEON.

Ecological forecasting, like weather forecasting, uses extensive data sets over large areas and periods of time to refine models that can in turn be used to predict ecological outcomes. Unlike weather forecasting, however, ecological forecasting involves running simulations to answer questions like, "What will happen to the ecosystem if we manage it in a particular fashion?"

The project, which is funded by the National Science Foundation, will regularly take over 2,000 measurements in each of the 20 climatic regional domains distributed across the contiguous 48 states, Alaska, Hawaii, and Puerto Rico over a period of 30 years.

"That scale is very difficult for the investigator to maintain over a long time because their research funds come in four year blocks or five year blocks," Keller said.

Despite over 12,000 sensors distributed across the network, there is more to NEON than sensors, according to Bob Tawa, NEON's director of computing.

"NEON will be gathering data from sensors, as well as field data collected by field crews, data from various analysis labs (genomic, chemical, taxonomic, etc.) and existing data (e.g., land use) from sources such as USGS," Tawa said.

NEON will also be sponsoring experiments.

"We have one in our initial configuration," Keller said. "The STREON experiment looks at the effects of nutrient additions on stream ecosystems, and within the STREON experiment, we'll also be looking at the effects of excluding top-level predators."

He added, "We want to, under controlled conditions, see what that does to stream function, the biodiversity within the stream, etc."

NEON by the numbers:
  • Nationwide sensors - 12,000
  • Eco-climatic domains - 20
  • Permanent core sites - 20
  • Project lifetime - 30 years
  • Relocateable sites - 40
  • Construction cost - $430 million
  • Projected peak employees - 250
  • Yearly data - 170 terabytes
  • Basic data products (per site) - 542
  • Synthesized data products (per site) - 118
  • Computational resources needed each year - over 1200 core-days

Rather than devote resources to developing equipment and software, NEON will take advantage of existing off-the-shelf cutting edge technology. Nor will NEON attempt to be an end to end ecological forecasting solution.

"NEON will collect the data. NEON will provide infrastructure for investigators to conduct experiments or add additional measurements," Keller said. However, "NEON is not the agency that's going to be responsible for the predictions. Our goal is to enable those predictions by providing the data."

For example, on 8 February the US Commerce Department announced its proposal for a new national climate service which will be managed by the National Oceanic and Atmospheric Administration. NOAA and NEON already have plans to integrate their data.

Managing the data avalanche

To bring order to the massive amounts of data NEON will be handling, NEON's cyberinfrastructure has to be organized. That's why the computing team has conceived of four levels through which the data will pass.

When data is first collected and archived, it is considered Level 0. This raw data will be available to anyone upon request, but it will not be posted online. Next, the data is calibrated, transformed into the applicable units, vetted for obvious errors, and then undergoes a final quality check by the relevant NEON scientists. This Level 1 data is made available online, and is likely to be of the most interest to scientists in the directly related domain. Then this Level 1 data is transformed into Level 2 (gap filled) and Level 3 (gridded) data as applicable for that particular data stream.

Finally, the data streams will be combined and assimilated in various ways to create more readily meaningful Level 4 data.

"NEON is not just putting out basic data. We won't just give people a list of the species that we encountered or the number of times that we captured a mouse," Keller said. "We'll take that data and put it through some well-understood analytical tools, and give them value added information."

For example, a person who isn't a specialist in mouse populations may find it difficult to make sense of basic data. The Level 4 data, however, will organize that data over time, or area, in order to make it meaningful for non-specialists. That makes it possible for specialists in related fields to make use of the information, or laypeople to understand what it means.

At each stage of processing, metadata will be added either by the automated processing software, or by scientists annotating the data. The history of what has happens to the data over time will also be logged.

NEON expects to publish 118 L4 data 'products.'

"We have estimated that we will need roughly 1200 core days of compute to process 103 out of the 118 L4 data products that are currently defined," said Tawa. They plan to request the processing power for the remaining 16 data products from TeraGrid or an individual supercomputing site.

Constructing NEON will take at least six years. That puts the project in its early phases. According to Tawa, they are currently looking at JBOSS and LifeRay as potential technology platforms, and MetaMatrix as a data abstraction layer.

NEON has partitioned the US into 20 eco-climatic domains, pictured above. The domains are the result of a new statistically rigorous analysis performed by William Hargrove and Forrest Hoffman using the now-defunct Stone Soupercomputer at Oak Ridge National Laboratory. Hargrove and Hoffman used national data sets to generate the eco-climatic variables for multivariate geographic clustering algorithms. Each domain generated by their analysis represents different regions of vegetation, landform, climate, and ecology.

Within each domain, NEON plans to place one core site (the blue circles), selected to represent unmanaged wildland conditions. Two relocatable sites per domain (the red triangles) will collect data related to the effects of human land management on ecosystems.

Image courtesy of NEON, Inc.

Data standards will be a major issue in designing the NEON cyberinfrastructure.

"Given the variety of data types that we are ingesting and the community that we expect to service, we plan on providing data in a number of standard formats," said Tawa. "However, those are yet to be defined."

At present, they are in the process of setting up a technical working group comprised of experts outside of NEON to study this very issue and provide advice on data publishing standards. The decisions they make will be crucial in enabling interoperability with other networks across the world.

"We are working to make our measurements as compatible as possible with existing measurements from similar organizations," Keller said. "We'll have to put more effort into coordination at our national borders, but we'd like to see a coordinated effort towards interoperability around the world."

Similar initiatives exist in Australia, China, Europe, and Canada, to name a few. Together, they may be able to foresee the future of our planet's biosphere.

-Miriam Boon, iSGTW

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2023 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.