• Subscribe

Opportunistic compute cycles extend the long tail of science

The Open Science Grid (OSG) all-hands meeting wrapped up this month at the SLAC National Accelerator Laboratory near the Stanford University campus in California, US. The annual meeting brings together members of the US ATLAS Computing Facility, the CMS Tier-2 and Tier-3 facilities, and members of campus grids and communities from across the country.

LHC plans to double CPU resources over the coming two years. Image courtesy OSG Grid Operations Center. Cover image courtesy Amber Harmon.

Supported by the US National Science Foundation and the US Department of Energy, the OSG is built and operated by a consortium of 90 universities, national laboratories, scientific collaborations, and software developers. It is a national, shared cyberinfrastructure used by physicists at the Large Hadron Collider (LHC), as well as engineers and scientists in fields like biology, nanotechnology, medicine, earth sciences, chemistry, and astrophysics.

The OSG does not own computational resources, but instead enables the opportunistic use and sharing of resources (including software and services) by users and resource providers alike. For many scientists and campuses facing funding cuts, increased operational costs, and ever more ambitious scientific pursuits, this is welcome news.

It's estimated that CPU and storage resources on the grid will more than double over the next few years - especially after the LHC begins its second round of data taking and processing at higher energy levels in 2015. "As a result we'll see not only increased use of OSG by both ATLAS and CMS experiments, but also increased opportunity for the long tail of science," says Lothar Bauerdick, OSG executive director.

"We want to deliver distributed high-throughput computing capabilities to campuses and researchers, but it has to be easy," says Rob Gardner, a senior fellow at the Computation Institute at the University of Chicago, and senior research associate at Fermi National Accelerator Laboratory, near Chicago, Illinois, US. "Easy to the point of someone actually getting an account up and running and submitting thousands of jobs all within 30 minutes," notes Gardner, who introduced OSG Connect last year.

From climate science and combating concussion to virtual screening and epigenetics, the long tail of science on OSG continues to extend. Sustained growth, however, requires having the right people to address issues that technologies can't solve. Using computational resources is only a small part of what scientists spend their time on; any emotional energy spent on learning or managing resources results in less to devote to their research.

GenBank at NCBI, in collaboration with partners in Europe and Japan, is the world's largest annotated collection of publicly available DNA sequences. GenBank contains 170 million sequences from 280,000 different species. Image courtesy National Library of Medicine.

"We're trying not only to impact what they can do with computing, but also what they can discover in their research," says Lauren Michael, a research computing facilitator at the University of Wisconsin-Madison, US.

Bioinformaticists, in particular, will benefit from OSG compute cycles in coming years. In 2012, nearly 2.5 million life scientists a day ran the basic local alignment search tool (BLAST) to compare biological sequences against the National Center for Biotechnology Information (NCBI) databases, and those numbers are still climbing.

Rob Quick, OSG Grid Operations Center manager, gave an all-hands overview of the Galaxy web portal maintained by the National Center for Genome Analysis Support. The portal now allows BLAST job submissions that run on OSG.

"These BLAST jobs are CPU (not memory) intensive," explains Quick. "High-performance clusters with dedicated memory are not ideal for running these jobs; an active cluster can actually slow them down. They are very much suited to running on opportunistic cycles in OSG."

Today more than 2 million CPU hours per day and a petabyte of data traverse OSG, much of it on ESNet and Internet2. In fact, this month marked a new weekly production record of 17 million compute hours. Over the next decade, data- and compute-intensive science will become the norm and not just the exception, as scientists from every domain place new and greater demands on campus technology and infrastructure providers.

Many campuses will look harder at best practices and move toward solutions such as OSG, which provides not only economies of scale, but also a community focused on sharing resources for the greatest good and the greatest discoveries.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2018 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.