iSGTW speaks to Milena Žic Fuchsahead of her keynote talk at the Research Data Alliance (RDA) Third Plenary Meetingin Dublin, Ireland, later this month. A prominent researcher in the field of cognitive linguistics, Fuchs has served as Minister of Science and Technology in her native Croatia and she recently chaired the Standing Committee for the Humanities of the European Science Foundation (ESF).
Why, in your opinion, is it important to encourage the sharing of research data across disciplines?
There is a global trend towards inter-, multi-, and trans-disciplinarity. This means that sharing data across research domains should be encouraged. Tackling the societal challenges outlined by the European Commission in Horizon 2020 will be impossible without this. Also, there's a whole host of research areas that have yet to be built and the questions that researchers are dealing with are changing. New 'cultures of knowledge' will have to be established to answer these questions.
What do you believe to be the greatest barriers to increased sharing? And how do you think these barriers can be overcome?
As a researcher I'm a very multidisciplinary person, but I'd like to speak specifically from the point of view of the humanities. I think there are basically two main barriers:
One relates to the sharing of the human potential that the humanities in Europe can offer. Or, to put it another way: it's important to make visible what researchers in all disciplines of the humanities are actually doing. Unfortunately, much of the work that goes on today in Europe is not visible. In the humanities we lack databases that showcase the different disciplines and topics that humanities researchers are dealing in. Such databases need to show - by country, language, etc. - what the freshest opinions, attitudes, and research results on major issues are.
Humanities scholars very often publish in their national languages and will continue to do so in the future; it is impossible to impose on them the so-called 'English-bias'. Europe lacks an integrated database of published journals in various national languages. A database of this kind could be a sort of 'who's who' within a particular field of research.
Of course, sophisticated databases do exist in quite a number of countries, including small countries like my own, Croatia. We have a very sophisticated open-access database on all domains of science and this has been around for about 15 years now. Every paper in this database has an abstract in English at the very beginning with full information about the author. With this database, it's very easy to identify who the leading names are in a particular field and find out exactly what they do. A couple of years ago, the European Science Foundation suggested the integration of all such existing databases in Europe for humanities and social science publications. This would showcase what research is going on and would provide colleagues from other domains with the opportunity to identify where researchers that could be of special interest to them are located.
The second barrier relates to the actual research data. There's still a lot of work to be done on this in the humanities. At the level of ESFRI, the European Strategy Forum on Research Infrastructures, there are two big humanities projects: CLARIN, the Common Language Resources and Technology Infrastructure, and DARIAH, the Digital Research Infrastructure for the Arts and the Humanities. In very brief terms, CLARIN is focused on integrating language data across Europe, while DARIAH is more focused on increasing the visibility at the European level of national research related to cultural heritage, digital arts, and so on. These two projects are, to my mind, a move in the right direction. Both are being developed and are close to what is known within ESFRI as the implementation phase. The focus now needs to be on two things: one is filling in the gaps where no data exists and the other one is connecting data where it does exist but lives a life of its own in an unconnected place. Of course, it's important to remember that these challenges also exist in other domains, as well.
Given your recent role as chair of the Standing Committee for the Humanities at the ESF, do you have any insight as to why issues around 'big data' and data sharing are so often perceived as primarily being the preserve of the natural sciences? Is this perception justified?
The perception was that the humanities did not have these challenges. This is something that I, as chair, encountered quite often. However, it's simply not true.
The Standing Committee for the Humanities published a science policy briefing on digital humanities, which was the first of its kind. This pinpointed all the different kinds of data that you can find within the realm of the digital humanities. We created this document because we considered it essential that colleagues from other domains become aware of what exists in the humanities. This is not saying that every single database in the humanities is something that a natural scientist would reach for, but it showcases both the more discipline-oriented research infrastructures and the ones that could be of interest to colleagues from other domains.
We also produced a report, which I will be talking about in Dublin, concerned with how to best showcase the potential of the researchers in the domain of the humanities. Unfortunately, the ESF began to wind down at the same time this report came out and I can only hope that other bodies that take care of the humanities at a European level will take up the report and develop it further, while keeping this basic idea that databases of these kinds enable better communication in the cross-border world that we live in today.
In your career as a researcher, have you ever encountered situations where a lack of sharing of research data has been a problem for you personally?
Yes, here's a concrete example: some of my research is geared towards the language of ICT and shows that communication technologies have very special effects on languages. Part of this work was done on text messaging and when I started this work back in the year 2000, there were only a couple of corpora of text messages available. I literally had to build up my own for the research! The situation has improved greatly in the years since, but integration between corpora and between languages is still lacking. My vision is to have a huge database of all corpora of text messages in all possible languages in Europe, but we are still a far cry from anything like this. Researchers have to specifically know where such corpora exist. This is a type of big data problem: the integration and creation of new data is of exceptional importance for the humanities.
Is it fair to say that you feel things are continuing to improve in this area?
Generally speaking, I think the humanities are moving in the right direction, but the problem of low visibility remains. There is often a lack of awareness in other domains of the importance of the research done on data of various kinds within the humanities. I think that more concentrated efforts should be made to bring the digital reality of the humanities to other domains.
The RDA Third Plenary Meetingwill be held in Dublin, Ireland, from 26-28 March, 2014.