- Open Science Grid logged over one billion compute hours in 2016
- Weeks with 30 million compute hours are becoming the norm
- Designed for large data flows, OSG also manages opportunistic computing runs
Serving researchers across a wide variety of scientific disciplines, the Open Science Grid (OSG) weaves the national fabric of distributed high throughput computing.
“We just had a record week recently of over 30 million hours (close to 32.8 million) and the trend is pointing to frequent 30 million-hour weeks — it will become typical,” says Scott Teige, manager of OSG’s Grid Operations Center at Indiana University (IU).
“To reach 32.8 million, we need 195,000 cores running 24/7 for a week.”
Teige's job is to keep things running smoothly. The OSG Grid Operations Center provides operational support for users, developers, and system administrators. They are also on point for real-time monitoring and problem tracking, grid service maintenance, security incident response, and information repositories.
Big and small
Where is all this data coming from? Teige explains that the largest amount of data is coming from the experiments associated with the Large Hadron Collider (LHC), for which the OSG was originally designed.
But the LHC is just part of the story. There are plenty of CPU cycles to go around, so opportunistic use has become a much larger focus. When OSG resources are not busy, scientists from many disciplines use those hours to revolutionize their science.
For example, the Structural Protein-Ligand Interactome (SPLINTER) project by the Indiana University School of Medicine predicts the interaction of thousands of small molecules with thousands of proteins using the three-dimensional structure of the bound complex between each pair of protein and compound.
By using the OSG, SPLINTER finds a quick and efficient solution to its computing needs — and develops a systems biology approach to target discovery.
The opportunistic resources deliver millions of CPU hours in a matter of days, greatly reducing simulation time. This allows researchers to identify small molecule candidates for individual proteins, or new protein targets for existing FDA-approved drugs and biologically active compounds.
"We serve virtual organizations (VOs) that may not have their own resources,” says Teige. “SPLINTER is a prime example of how we partner with the OSG to transform research — our resources alone cannot meet their needs.”
Because Teige's group is based at Indiana University, a lot of the OSG operational infrastructure is run out of the IU Data Center. And, because IU is an Extreme Science and Engineering Discovery Environment (XSEDE) resource, the university also handles submissions to the OSG.
That means scientists and researchers nationwide can connect both to XSEDE's collection of integrated digital resources and services and to OSG's opportunistic resources.
“We operate information services to determine states of resources used in how jobs are submitted,” said Teige. “We operate the various user interfaces like the GOC homepage, support tools, and the ticket system. We also operate a global file system called Oasis where files are deposited to be available for use in a reasonably short time span. And we provide certification services for the user community.”
From LHC big data to smaller opportunistic research computing needs, Teige's team makes sure the OSG has the support they depend on so discovery moves forward reliably and transparently.