Data volumes are exploding, and the need to efficiently store and share data quickly, reliably and securely—along with making the data discoverable—is increasingly important. Deliberate planning and execution are necessary to properly collect, curate, manage, protect and disseminate the data that are the lifeblood of the modern research enterprise.
In order to better understand the current state of research data management, Globus reached out to several research computing leaders at institutions around the world to gain insights into their approach and their recommendations for others. We wanted to explore how data management tools, services, and processes deployed at their institutions help the research community reduce the amount of time spent addressing technological hurdles and accelerate their time to science.
At these and many other leading institutions, new methodologies and technologies are being developed and adopted to create a frictionless environment for scientific discovery. Self-service data portals and point-and-click data management tools such as Globus are being made available to allow researchers to spend more time on science and less on technology.
Education, documentation, and a data management plan can assist researchers in their desire to spend more time on science. Hack-a-thons, webinars, and researcher resource books are just some of the ways that institutions are educating researchers on best practices.
We present here some excerpts from our discussions on the importance for data management to science, the state of the craft, some of the challenges being faced, and ways to educate and implement best practices. Our conversations yielded lots of valuable advice and interesting tidbits, and we are publishing more complete write-ups of these on the Globus blog.
Sharon Broude-Geva, Director of Advanced Research Computing, University of Michigan
“As a community, we first identified the need for reproducibility and data sharing, while understanding the wariness and concerns surrounding both practices. People have been thinking about data management for a long time, and eventually (if not already) will be forced to implement it by the funding agencies, so this discussion can no longer be postponed. Initially, it was all about funding and storage resources. Clearly, it is now a much bigger issue and requires institutional commitment.
Three key questions have emerged: What data are we managing? What does management mean? And who is responsible for this at an institutional level? Even if there is a clear assignment of institutional responsibility, this is not a problem to be solved by one part of an institution. It requires involvement from university stakeholders across disciplines, including groups that are responsible for the oversight of the research enterprise, General Counsel, Privacy and Compliance groups, Libraries, IT providers, and very importantly, the researchers themselves.
At U-M, the Provost and Vice President for Research created a task force for Public Access to Research Data in an effort to continue the discussions started at the AAU/APLU workshops on that topic, and we continue towards broadening the available guidance, training, and resources on campus.”
Rajendra Bose, Director of Research Computing, The Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University
“Having a dedicated data engineer as part of an award with an appropriate background, who can serve as a bridge between the technology and the research world. Someone to work almost daily and closely with lab members whose PIs have 'bought into' piloting with new tools is the kind of effort and situation needed to really learn whether a brand new approach or tool will eventually be adopted or not.
It takes much more than simply developing something and announcing on a web page or through email that something is available for use. In some cases, because researchers are often already fluent in sophisticated data analysis software and scientific instruments and procedures, we may be overestimating their willingness to jump in and explore new tools.”
Nick Jones, Director, New Zealand eScience Infrastructure
“As a national provider, we look at the interplay between the central national infrastructure and the institutions regarding the flows of data across the research ecosystem. As a research system we’re increasingly adopting a research data management perspective, assessing whether there is a shared understanding of practices, workflows, and governance. These are all the components to make up a healthy ecosystem for research data management.
Our focus at NeSI is on active data and data workflows—we don’t spend a lot of time on data repositories from an archival or pure publication perspective. We are focusing on HPC and interactive computing use cases such as Jupyter notebooks where researchers need to move data in and out of campuses, on and off different facilities, with a strong focus on moving data as little as possible and implementing the right controls. In a national context we have a particular focus in creating an environment that is inclusive of indigenous Māori interests in research.“
Matthew Harvey, Research Computing Service Manager, Imperial College, UK
“Our value is in facilitating research. Until 2017 our college did not have a coordinated central strategy for research data management. There were many independent solutions targeted at ‘sync-and-share’ (like Dropbox and Box). In 2018 we designed a centralized institutional research data storage platform. It needed to be fast and directly accessible and as close to ubiquitous as possible. We also needed some way to move data between external collaborators. Access modalities included direct presentation to the compute environment, CIFS to people’s machines and Globus for external collaborators. We are promoting Globus Connect Personal for people in regions with high latency or who are unable to access VPN.”
Doug Jennewein, Senior Director of Research Technology, Arizona State University
“Data management is as important as advanced computing. And not storage but data, let’s make that distinction. Data must be a first-class citizen. At ASU we have a Research Data Management office and a digitally focused data repository. This is a strength and an investment that the institution takes seriously. While many researchers employ advanced computing, virtually all university researchers have data management needs.
Look at what we did in computing: the need for guidance and expertise in the practice of advanced computing has given rise to an emerging job family of advanced computing facilitators. The same thing needs to happen with data management. There needs to be a redefined job category and you need to educate users to show them: Here is how you move data, and here is where Globus transfer and SFTP make sense. Yes, computing is where the analysis happens, but there are these other two pieces – networking and data – that are equally challenging.”
For the complete interviews please visit Globus.org.