• Subscribe

Feature - Case Study: EinsteinatOSG

Feature - Case Study: Einstein@OSG


A screenshot of the Einstein@Home screensaver. Image courtesy of Einstein@Home.

For over five years, volunteers have been lending their computers' spare cycles to the Laser Interferometer Gravitational Wave Observatory (LIGO) and GEO-600 projects via the BOINC application Einstein@Home. Now a new application wrapper, dubbed "Einstein@OSG," brings the application to the Open Science Grid.

Today, although Einstein@OSG has been running for only six months, it is already the top contributor to Einstein@Home, processing about 10 percent of jobs.

"The Grid was perfectly suitable to run an application of this type," said Robert Engel, lead developer and production coordinator for the Einstein@OSG project. "BOINC would benefit from every single CPU that we would provide for it. Increasing the number of CPUs by 1000 really results in 1000 times more science getting done."

Getting Einstein@Home to run on a grid was not without difficulties. Normally, a volunteer would download and install the application. The application would constantly download data, analyze it, and then return the results. In short, each instance of Einstein@Home has a permanent home on a volunteer's computer.

The same process would not work on the Grid. Grid jobs cannot run indefinitely, so each instance of Einstein@OSG was given a time limit.

"Once the time limit is up, the Einstein@Home application exits, followed by the Einstein@OSG application, which will save all results to an external storage location," Engel explained. "The next time Einstein@OSG starts, it likely starts on a different cluster node which may use a different architecture."

Next, the Einstein@OSG application detects changes in the environment, such as the architecture, location, version of software, or network connectivity, and then compiles any missing software 'on-the-fly.' After a final check to verify that all requirements for Einstein@Home are met, it starts up. The results from the previous run are loaded from the remote storage location, and Einstein@Home picks up where it left off.

An application on a grid will encounter software and hardware issues much more frequently than a desktop application such as Einstein@Home, according to Engel. This is because grids are much more complex, and deal with an extremely high volume of jobs.

Because the average Einstein@Home user will only encounter an error every couple of months, it's practical for her to handle the error manually. With Einstein@OSG running on up to 10,000 cores, however, there are errors every couple of minutes. Fixing these manually simply isn't practical, so Einstein@OSG eventually automated the process.

"It was only because of that mechanism that we were able to scale up," Engel said. "A computer never gets tired looking for errors and fixing them, unlike me, who likes to sleep at night and spend time with his family."

The number of clusters running on Einstein@OSG is plotted on the horizontal axis; the total number of CPU cores across all clusters is plotted on the vertical axis. The rectangles each represent one week between June 2009 and February 2010. The color indicates how much work was accomplished that week, ranging from blue (the least) to red (the most). Note that the dates of three arbitrarily chosen weeks are written in white to illustrate how over time, the amount of work as well as the number of clusters and cores has increased.

Image courtesy of Einstein@OSG.

Before Engel began work on Einstein@OSG, he was a member of a team led by Thomas Radke at the Max Planck Institute for Gravitational Physics. Radke's team created a wrapper for Einstein@Home compatible with the German Grid Initiative (D-Grid) in 2006. Part of Engel's contribution was the design of a user interface that allows one person to effectively monitor and control thousands of Einstein@Home applications.

"Back then it consisted of a command line tool that would summarize all activities on the Grid on a single terminal page," Engel said. Now the tool records activities and uses that historical data to create error statistics. Those and other statistics are displayed on an internal webpage.

The wrapper created by Radke's team could not simply be repurposed to run on OSG, unfortunately.

"OSG and the German grid are different," Engel said. For example, "in Germany the entire grid depends on Globus."

Engel and his team examined their options for getting Einstein@Home onto OSG, and concluded that the best option was Condor-G, a sort of hybrid of Condor and Globus. But implementing Condor-G would have required a great deal of work, delaying the launch of Einstein@Home on OSG.

That's why Engel's team opted to implement Globus' GRAM, which took only two weeks of work, before they began work on a Condor-G solution. It's a good thing too, because they soon discovered a serious issue with GRAM.

"It doesn't go up in scale very well," Engel said. "If you try to run more than 100 jobs on a given resource, you'll bring down that resource."

Still, given a chance to do things differently, Engel would have implemented GRAM, he said. "It meant that for a year, we could run jobs on OSG."

The Condor-G version went live in September 2009, and it has rapidly picked up steam. "On a typical day, we are running between 5000 and 8000 jobs at any time," Engel said. "Before that we were running approximately 500."

Watch this video to learn more about LIGO and GEO600, the experiments that are supported by Einstein@Home!

Video courtesy of the American Museum of Natural History.

-Miriam Boon, iSGTW

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2023 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.