With the global population set to reach over 9.5 billion by the middle of this century, it is vital that modern large-scale data analysis techniques are put to use to help ensure that land is farmed in the most efficient way possible. Using the insights gained from data will also play a vital role in mitigating the effects of climate change on the global supply of food.
'Data Intensive Techniques to Boost the Real-Time Performance of Global Agricultural Data Infrastructures', or SemaGrow, is a research project that aims to introduce agricultural researchers to technologies around big data.
As more and more data is published online, exciting new opportunities are arising to create added value by combining and cross-indexing heterogeneous datasets at a large scale. To make the most out of these opportunities, agricultural researchers need access to infrastructure that is not only efficient, responsive, and scalable, but which is also sufficiently flexible and robust to welcome data in a wide variety offorms. The SemaGrow project seeks to develop the scalable, efficient, and robust data services needed to take full advantage of modern data-intensive inter-disciplinary science and to reshape the way that data analysis techniques are applied to the heterogeneous data cloud.
Farming the data
The SemaGrow project, which is funded through the European Commission's Seventh Framework Programme (FP7), carries out fundamental research to develop a technology stack for querying and combining big agri-food datasets in a linked open-data environment. The basis of the SemaGrow stack is the federation of SPARQL endpoints. We have developed algorithms to combine these and enable customization at the infrastructure layer, which improves service performance over the grid.
The SemaGrow services and tools are organized around three use cases testing different aspects of the technology:
- In close collaboration with the Trees4Future project, the first use case is for heterogeneous data collections and streams supporting agricultural and forestry-modeling research.
- The second focuses on 'reactive data analysis' and is designed to support user needs of the AGRIS bibliographic network of the Food and Agriculture Organization of the United Nations in finding complete and accurate data for compiling reports from fragmented information sources.
- Finally, building on top of Agro-Know's Agriculture Discovery Space, we showcase reactive resource discovery with the real-time discovery of relevant multimedia resources to support the creation of agricultural educational activities.
More specifically, with the support of SemaGrow's semantic search tools, Agro-Know (a fast-growing SME with a clear focus on knowledge-intensive technology innovation for agriculture, food, and biodiversity) has enhanced its data platform and has created a demonstrator to showcase the ability of Reactive Resource Discovery. The SemaGrow improvements are reflected both at the layer of the APIs that the Agro-Know data platform is exposing and at the layer of the front-end discovery application where diverse data sources - with specialized educational and research content - can be easily searched, accessed, and interlinked. The Agro-Know data platform has also been enhanced to make it more robust and increase automation of processes.
With the SemaGrow project now in its third and final year, the key technologies have now been developed and integrated into a number of 'demonstrators'. The Agricultural Discovery Space (ADS) demonstrator, for instance, has been designed - based on requirements collected from stakeholders within the Global Food Safety Partnership (GFSP) network - to test and evaluate the SemaGrow project outcomes in a real usage scenario (see diagram). It provides an interface for querying and browsing multiple data sources. Over the coming months, the SemaGrow demonstrators will be further enhanced and tested with stakeholders from across the agricultural, food, environmental, and biodiversity science spectrum.
The SemaGrow technology was also recently demonstrated at the GODAN (Global Open Data for Agriculture and Nutrition 2015 meeting, which took place in Wageningen, the Netherlands, in January. A hackathon event, entitled 'Future Food Hack', was also collocated with the GODAN 2015 meeting. Two of the hackathon challenges made use of the SemaGrow SPARQL endpoints and stack APIs.