- It’s not easy for research teams to share and discuss large amounts of data
- SeedMeLab combines content management system with file-sharing capability
- Enhanced discoverability and knowledge retention accelerates productivity
Supercomputers are invaluable to modern science. Research teams submit scientific problems to high-performance computing clusters and anxiously await the results. But when those results come in, the researchers face a new problem.
The new data and results need to be reviewed, assessed, and discussed—often by large research teams distributed around the globe. To make matters worse, computed results are not always easy to access. Files can be spread across many folders and systems with limited or no access. And the more people you have who want to work with the data, the bigger the headache.
That’s why the San Diego Supercomputer Center (SDSC) at UC San Diego developed a cloud service and software to help researchers share, access, and discover data. SeedMeLab helps research groups retain knowledge by consolidating information and making it more consumable.
“We found there was a gap in the infrastructure available to scientists,” says Amit Chourasia, principal investigator for SeedMeLab. “SeedMeLab fills that gap by making data easy to discover, display, and discuss.”
“There is more to data than just files and folders,” says Chourasia. “Accessing a file and folder is not sufficient. It is just the first step.”
According to Chourasia, making the most of data involves 5 Ds: description, discussion, display, discovery, and dissemination. Description and discussion are two of the most important.
Description includes metadata for a file or folder, but may also encompass a researcher’s thoughts about the file. By adding descriptions directly to the file storage system, that additional knowledge becomes searchable. Consolidating the information in one location makes it more consumable by the rest of the research team and accelerates productivity.
Chourasia says that discussion of files usually takes place over email, between the person who created the data and the PI or other stakeholders. But the person who creates the data is often a postdoc or student who leaves the research group after a few years. The files may remain behind, but the context is lost when the creator moves on. Preserving discussion alongside the data helps research groups retain knowledge and onboard new members smoothly.
To do all of these things, SeedMeLab combines a content management system with file sharing capability. In this case, the creators chose the open-source Drupal framework because they wanted to build on a proven, scalable, and sustainable foundation.
“The great thing is that description and discussion capabilities are already part of content management systems, so we could focus on inventing the file system piece of it,” says Chourasia. “We turn the tables and make data the star. In blogging, one writes the story and then links it to the data. In our case, we do the opposite by presenting data first and then describing it, which is more natural for researchers.”
And because SeedMeLab is a set of distributable building blocks, each organization or service provider can decide how to compose, customize, and use them.
“Research groups can stand up their own instances,” says Chourasia. “They can brand it as their own, and they decide who gets access to it. There is a deep level of control, not limited to read and write, but also over who can do what on the site.”
The Laser Plasma Lab at UC San Diego has been using SeedMeLab since 2017. Since then, the group has grown from a handful of researchers to over a dozen. The lab now shares data with collaborators in Europe and Japan and has produced dozens of papers. Almost all of their work has used SeedMeLab in some fashion, accelerating their productivity.
CIPRES, a public resource for the reconstruction of large phylogenetic trees has also turned to SeedMeLab for help. The CIPRES gateways offers access to XSEDE’s high-performance computing resources through a browser interface, but doesn’t have file-sharing capability. They used SeedMeLab’s building blocks to develop CIPRES Share, a place where all CIPRES users can transfer and share results and even make them public.
This connection to science gateways led SeedMeLab to become an affiliate member of the Science Gateways Community Institute (SGCI). They turned to SGCI’s services to improve their product through usability, sustainability, and marketing-oriented engagements.
“We consulted with SGCI to help us identify ways to proliferate and promote SeedMeLab and make it sustainable,” says Chourasia. “We are hoping to achieve sustainability in a multifaceted way that includes distributing open-source software that leverages the vibrant Drupal ecosystem, offering a fully managed SeedMeLab subscription service for research teams, and developing project-specific customizations and porting key capabilities back to our core.”
After receiving several years of feedback, SeedMeLab developers feel they now have a much better understanding of the needs of the research community. Their next priority is to transform SeedMeLab into a customizable data repository system that enables publishing. They also want to improve integration with science gateways and create more tools for that.
“As a cyberinfrastructure provider, we have a very good understanding of where the friction points are,” says Chourasia. “I think this is an important contribution to our community. To provide data accessibility and discovery in a meaningful, expressive, and consumable way—that’s a huge gap where we can contribute to enable tangible scientific and societal impacts.”