The data deluge is coming. In fact, in many research disciplines it is already upon us. The era of 'big data' poses enormous challenges to researchers across almost all fields of endeavor, from natural scientists to humanities researchers and from citizens to policy makers. However, big data also presents a wealth of opportunities, especially in today's global, interconnected world.
With scientific data output alone growing at a staggering 30% per year, it is vital that researchers come together to build the social and technical bridges required to enable open sharing of data. The organization charged with achieving this is the Research Data Alliance (RDA), which is supported by funding bodies from Australia, Europe, and the US. Having only recently celebrated its first anniversary, the RDA has already grown to include over 1,500 members. "The growth has been precipitous," says Francine Berman, co-chair of the RDA Council. "Our community is expanding in both scope and numbers and our organization is evolving."
"I think we're on a very exiting cusp of a change in how research is done," says John Wood, Berman's fellow co-chair. "It's not just about data, but the democratization of science." Wood explains that the RDA's vision is to enable researchers from across the globe to openly share data across technologies and disciplines, so as to tackle the grand challenges of the 21st century, such as disease, malnutrition, and climate change.
Berman and Wood were speaking at the third plenary meeting of the RDA, which was held in Dublin, Ireland, last month. The 497 attendees at the event engaged in discussion on a wide variety of related topics, ranging from the role of publishers and persistent identifiers to heritage data and legal interoperability. Data applications discussed also included geospatial information, marine observation, food production, and urban quality of life indicators.
"Research practices have changed substantially over the last five-to-ten years," says Australia's chief scientist, Ian Chubb, who spoke during the opening plenary session of the event. "Today, things are far more global." During his speech, he emphasized the role that data sharing has to play in addressing global public health issues and drew particular attention to a working group within the RDA focusing on the interoperability of data relating to wheat crops. He argues that the growing global population and shifting rainfall patterns due to climate change make this a vital area of work. "Global challenges can only be solved by global research endeavor," adds Chubb.
Tony Hey, vice president of Microsoft Research, also gave a keynote address on the second day of the event. He cited work by researchers from the Harvard-Smithsonian Center for Astrophysics, Massachusetts, US, that shows that 44% of URL links embedded in papers published by the American Astronomical Society in 2001 were broken just a decade later. However, things are improving, notes Hey: "The emphasis on data is long overdue," he says. "Data is becoming a first class citizen, it is no longer something that you don't look after."
Time to build an ark for the humanities?
Milena Žic Fuchs also gave a keynote address on the final day of the event. A prominent researcher in the field of cognitive linguistics, Fuchs has served as Minister of Science and Technology in her native Croatia and she recently chaired the Standing Committee for the Humanities of the European Science Foundation (ESF). During her speech, she outlined the important role humanities research has to play in tackling society's grand challenges. "Digital humanities offers scholars new and productive ways of exploring old questions and developing new ones," says Fuchs. Yet, she warns: "There is still a huge lack of awareness of the existence and importance of research infrastructures in the humanities."
Fuchs proposes the creation of a European bibliographic database to improve the visibility of research and facilitate sharing. "It would produce new synergies in research, give insights to researchers from different domains, open up avenues for collaboration between domains, and make visible European research at a global level," she says. "This is especially necessary for addressing major issues within social-science and humanities disciplines, but also issues pertaining to wider topics of global dimension."
Read more about this in our recent interview with Fuchs, here.
During his speech, Hey focused on the interplay between data sharing and the drive towards increased open access, arguing that a "tipping point" in this movement has now been reached. He also spoke about the need for 'data scientists', who are not specialists in any given research discipline, but who instead have expertise in handling large amounts of data. Hey argues for a separation of roles between those who collect the data, those who explore the data through statistical and analytical methods, and those who manage and preserve the data. "Scientists currently try to do all of these things," he says.
The issue of data preservation also featured prominently in a panel discussion on data policy challenges. During this discussion, audience member Chris Greer of the US National Institute of Standards and Technology argued that while research funding pays for the gathering of scientific data, there's still not enough support from funding bodies to support long-term preservation. These concerns were echoed by many delegates at the event, who questioned the suitability of the project-based funding models prevalent in Australia, Europe, the US, and elsewhere for improving this situation. "Only a long-lived institution can provide long-term data preservation," explains panelist Ross Wilkinson, executive director of the Australian National Data Service. And, he argues, data must have value for an institution if they are expected to protect it.
Carlos Morais-Pires, who coordinates the area of scientific data e-infrastructures at the European Commission Directorate General for Communications, Networks, Content and Technology, was also a member of the panel. He echoed calls for greater involvement from both research communities and industry in forming data policy. Morais-Pires also highlighted the Open Data Charter, a document signed by world leaders during the UK presidency of the G8 last year. In this document, five principles for open data are established: open data by default, quality and quantity, useable by all, releasing data for improved governance, and releasing data for innovation.
While the publication of these principles, since adopted by many national and international bodies, certainly marked an important step in changing the way researchers deal with data, Morais-Pires and others at the event recognize that change cannot simply be prescribed - or even imposed - from the top down. Instead it is the research communities, those working at the coal face of the big data challenge, who will bring about the sea change necessary in how data is handled. The modern-day agora that is the RDA, with its community-led focus and fuelled by passionate argument, is undoubtedly the organization best placed to tackle these challenges posed by the data deluge. Through this work, it may be ensured that we, as a society, are able to equip ourselves with the tools of the big data era in our fight against the grand challenges of our century.
For more information about the RDA, read our in-depth interview with Mark Parsons, the organization's newly appointed secretary general, here.