Q&A: David de Roure talks data
David de Roure from the University of Southampton is the Chair of OMII-UK and was recently appointed National Strategic Director of e-Social Science towards the end of last year. We met him at the All Hands/IEEE e-Science conference, where he told us about his new role.
iSGTW: Can you tell us about your work as National Strategic Director of e-Social Science?
de Roure: e-Social Science is a program set up by the ESRC (Economic and Social Research Council) - it's e-science meets the social sciences.
The program had two phases. In the first phase, the "hub" in Manchester very successfully set up a number of projects, called "nodes," which had a role in terms of community interaction and working with the other nodes. What we need to do now is get those new techniques that have been established within those nodes out to the broader social science research community. That's my mission and I have a great team to help me.
The three things I'm really promoting are an approach based on methods, based on international ways of working and on the next generation of social scientists. My way of doing this is to work as a partner for projects that are both inside and outside the e-social science program. Sitting down with people, travelling around the country, and visiting other projects has been great. We're working together with them to get these methods out.
iSGTW: What do you see as some of the most important challenges the community faces?
de Roure: If we can get the metadata right, then data becomes a lot more reusable . . . it requires changing culture and practice. People who are working with a particular piece of data might not have the incentive to make it reusable by others and that is a deeper problem. In the long term, that's something we need to fix - we need to make sure we have the reward structures.
At the moment the culture is: if people have data, they don't really get a reward for making it reusable for others. There's been a lot of discussion about reward structures. We have a system about papers which doesn't take into account the data, software, blog posts, workflows, all these other new things that come out of e-science and e-social science.
"If ten percent less money was spent on hardware and that money was put into training people in software and techniques for understanding data, then actually we'd make more progress in e-science." - David de Roure
These issues are not just technical, they're social issues as well. There's a famous quote from Jim Gray that says "may all your problems be technical," because we can deal with them. It's the social problems that are difficult.
iSGTW: We've heard you're working on a project called SALAMI (Structural Analysis of Large Amounts of Music Information). Could you tell us a bit more about it?
de Roure: For me, a lot of my life is spent helping researchers in other disciplines - I've been doing that for 15 years and that's what I enjoy doing. It's very important, therefore, that sometimes I am the researcher using the tools, so I can see it from both sides.
The SALAMI project is to do with music and e-science, but I'm on the researcher end of it.
It came out of a program called the 'Digging into Data Challenge' and it's supported by four funding agencies across three countries: JISC in the UK, the National Endowment for the Humanities and the National Science Foundation in the US, and the Social Sciences and Humanities Research Council from Canada. They're putting together this program that takes existing datasets and really does something with them. And I think that's very, very important. People are so good at collecting and curating data but I think we don't spend enough time working out how to understand it and what to do with it - this project really addresses that.
In the SALAMI project for example, we're going to take all the music on The Internet Archive and thousands of supercomputing hours that have been donated to us from NCSA (US National Center for Supercomputing Applications). We will then analyze all the music to build a resource for musicologists, based on the structure of the music. It's a great example of a program - it's multi-disciplinary, it's multi-country. Even just at the level of funding, working across that number of agencies with a peer review process that they're all happy with, is quite an achievement.
iSGTW: In September last year you went on a fact-finding mission to the US. Can you share some of your insights from the trip with us?
de Roure: I went out with Malcolm Atkinson on a mission to find out about the use of research data - asking scientists who were actually using the data what their current practices are, what's going to happen in the future, what works and what doesn't work.
We spent three-and-a-half weeks seeing different institutions everyday and we came back with our heads spinning with information. One of the things that we learned is that, in the US, the libraries are much more involved in e-science than in this country. For example, they've funded two new datanet projects, DataONE and the Data Conservancy, and there are many figures who come from the libraries and the repositories world.
Another discovery is that they have a real acknowledgement in the US that data and understanding data is supported. We've always been very data-centric in UK e-science, but I think we need to build up a more comprehensive understanding. If ten percent less money was spent on hardware and that money was put into training people in software and techniques for understanding data, then actually we'd make more progress in e-science.
That's quite a controversial statement because there's a lot of commitment to buying big computers and having the ones with the flashiest lights and the biggest number of flops. It's a difficult thing to talk about because money going into software or training doesn't necessarily take the same route as money going into hardware and building infrastructures. That shift is already occurring in the US - there's increasing investment in data - so we feel strongly that we need more of this in the UK.
-Manisha Lalloo, GridTalk, for iSGTW. See GridCast for more.