• Subscribe

At Science Node, we need your help. We have ALMOST reached our fund-raising goal. In order to maintain our independence as a source of unbiased news and information, we don’t take money from big tech corporations. That’s why we’re asking our readers to help us raise the final $10,000 we need to meet our budget for the year. Donate now to Science Node's Gofundme campaign. Thank you!

iSGTW Feature - DZero first consumer of opportunistic storage in OSG

Feature - Opportunistic storage increases grid job success rate


Processed events by DZero per week from May to October 2008. The vertical scale goes to 12M events. (Click image for per site representation.)

Image courtesy of DZero.

The DZero high-energy physics experiment at Fermilab, an Open Science Grid user, typically submits 60,000-100,000 simulation jobs per week at 23 sites. The experiment's application executables make many requests for input data in quick succession. Due to the lack of storage local to the processing sites, up until recently much of DZero's simulated data had to transfer in real-time over the wide area network, leading to high latencies, job timeouts and job failures.

OSG worked with member institutions to allow DZero to use opportunistic storage, that is, idle storage on shared machines, at several sites. This represents the first successful deployment of opportunistic storage on OSG, and opens the door for other OSG Virtual Organizations. With allocations of up to 1 TB at sites where it processes jobs, DZero has increased its job success rate from roughly 30% to upwards of 85%.

Hosting storage resources is often tricky, especially for smaller grid sites, both in terms of hardware and professional expertise, says Abhishek Singh-Rana, coordinator of the Virtual Organizations group in OSG, which helps science communities achieve good results using the OSG. For this reason, the VO group negotiated with the larger OSG science communities, US ATLAS and US CMS, to allow other OSG communities to use their storage resources opportunistically. So far, DZero has used storage at six US-LHC Tier-2 sites, and is looking for more.

Tape robot

Image courtesy of Fermilab.

Opportunities

Work to improve DZero's job efficiency began in early July and by early August the experiment was producing about 3.7M events per week. By the second week of September, production reached a record 11.0M events, a 130% increase in its average weekly OSG production rate for the past year.

DZero's success demonstrates the OSG's commitment to establishing relationships with its user communities in order to benefit all members.

"We are committed to the goals of the OSG, and that includes the development of opportunistic resources," says Ken Bloom, manager of the CMS Tier-2 centers in the US. "When the OSG works well, all VOs can benefit. If we can help get opportunistic storage working for DZero, then maybe DZero sites will make some of their storage opportunistically available to CMS, and if we can make good use of that, the reward will be well worth the effort."

-Marcia Teckenbrock, Open Science Grid

The OSG is continuing to work with its stakeholders and resource providers to improve the mechanism for using opportunistic storage. CDF and SBGrid have also expressed an interest in using opportunistic storage in the future. See the recommendations based on the DZero use scenario for how OSG sites can enable opportunistic storage.

Added 16 October 2008:

DZero's push for storage local to processing nodes on the grid was pioneereed by Joel Snow, of Langston University and Fermilab. Snow studied the low efficiency (high failure rate) problem, determined that local storage elements would be the key for improving the efficiency, and worked with OSG to implement a solution.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2019 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.