A decade of research data management innovation

Last year marked the 10th anniversary of Globus, which launched in 2010 as the “Globus Online” service to connect researchers and make large-scale data transfer accessible to any researcher with an internet connection and a laptop.

It would have been difficult to predict back then that Globus would become an essential service for over 150,000 researchers around the world. Users in 80 countries have moved over one exabyte of data and 100 billion files, and the service has evolved into a platform that enables universities, national laboratories, government facilities, and commercial organizations to securely manage data throughout the research lifecycle.

Battling COVID-19 with Science and Technology

Since January 2020, nearly 80 research groups from across the United States and around the world have used protein crystallographic techniques at the Advanced Photon Source (APS), a Department of Energy (DOE) Office of Science User Facility at Argonne National Laboratory, to learn more about the protein structures comprising SARS-CoV-2, the virus that causes COVID-19.

The more we know about the protein structures, the better equipped researchers are to understand how to treat it.  Researchers use the ultrabright, high-energy X-ray light generated at the APS to determine the virus’ structure at an atomic level. 

Globus enables the facility to provide researchers with a means to quickly, effectively, and securely transfer and share data with other researchers. Researchers working remotely can rapidly transfer massive datasets and see sample data generated from the beamlines in near real-time to determine whether it is good or bad.

A pipeline has been created between the Structural Biology Center (SBC) for macromolecular crystallography at the APS and the Argonne Leadership Computing Facility (ALCF), another user facility at Argonne. The pipeline begins with Globus transferring images from the APS to the Theta supercomputer.

The images are then analyzed and processed using funcX, a function-as-a-service computation platform that organizes the dispatch of individual tasks to available computing nodes. Through the funcX platform the facility is able to spin up hundreds of compute nodes and route computations to the right system for the job, giving the researchers compute capacity on demand—where and when they need it.

FuncX is subsequently also used to extract metadata about hits, identify crystal diffractions, and generate visualizations depicting both the sample and hit locations. After this the raw data, metadata, and related visualizations are published to a portal hosted at the ALCF, where they are indexed and made searchable for reuse.

“Globus speeds up the whole process. We have been able to achieve data transfer speeds of 700 megabytes per second through Globus and keep pace with data collection. Researchers can now see sample data coming out of the beamlines as the experiment is running and determine whether the data is good or bad.

This allows the scientists to focus on more meaningful data,” says Ian Foster, Globus co-founder and director of Argonne’s Data Science and Learning Division.

Globus Enables Scientific Discovery

Globus has been instrumental in enabling many scientific breakthroughs that have literally changed the world. Globus was an important building block in the Large Hadron Collider Computing Grid, where Globus provided fast, reliable file transfer to process the data that identified the Higgs boson. The Higgs boson is a critical piece of the Standard Model that defines our current understanding of the laws of nature.

Another of the many scientific facilities that depend on Globus transfer capabilities is IceCube, the world’s largest neutrino detector. The IceCube detector has enabled scientists for the first time to trace the origins of a ghostly subatomic particle (the neutrino) that traveled 3.7 billion light-years to earth. IceCube uses Globus to move and archive data between Madison, Wisconsin, NERSC in California, and DESY in Berlin.

Today Globus services are used across numerous disciplines and projects. In fact, many core facilities—sequencing centers, cryoEM, fMRI, and others—rely on Globus for data access by, and distribution to, all their users. Globus is adept at simplifying research data management and addressing the needs of researchers for data-driven discovery.

