Feature: Virtualization - key for LHC physics volunteer computing

Feature: Virtualization - Key for LHC physics volunteer computing


BOINC is versatile enough that even mobile phones can do volunteer computing. Image courtesy BOINC

In 2006, the team that built LHC@home was given a challenge by Wolfgang von Rueden, then IT Division Leader at CERN: look at the use of the volunteer computing project BOINC (Berkeley Open Infrastructure for Network Computing) for simulating events of interest in the Large Hadron Collider.

It presented a demanding problem.

The software environments used by the experiments such as ATLAS, ALICE, CMS and LHCb are very large.

Furthermore, all LHC physics software development is done under Scientific Linux, whereas most volunteer PCs run under Windows. Porting the large and rapidly changing software is not practical, so another approach was needed.

The solution?

Marrying volunteer computing with the CernVM virtual image management system under development. It would enable the practical use of thousands of volunteer PCs as an additional grid resource to run increased amounts of event simulation for the LHC experiments.

CERN had begun harnessing volunteer computing in 2004 to aid the LHC project through the LHC@home system, which allowed CERN accelerator engineers to run intensive simulations to check the stability of the twin proton beams circulating in the LHC machine.

More than 50,000 members of the public volunteered spare cycles of their PCs to carry out these calculations, with their effort managed by the open source BOINC system, part of the well known SETI@home project.

Image courtesy ACAT 2010.

Enter CernVM

Since then, almost all the effort at CERN around BOINC was and is undertaken by student interns during summer. Two such students surveyed the potential of virtualization for solving the problem.

Instead of sending a ported executable to the volunteer PC, an LHC Linux environment with the exact version of running physics code is sent and executed under a virtual hypervisor. Results were promising, since virtualization entailed only a few percent of CPU overhead.

However, fresh challenges came about, due to the very large size of the virtual image required to contain all the LHC experiment software packages (about 8-10 gigabytes). Further complicating things, there were frequent changes in packages, with each change requiring a full image reload.

CernVM, launched in 2008 by Predrag Buncic, offered a general solution to the problem of virtual image management for physics computing. Instead of loading each running virtual machine with the full image containing all the code and libraries for an LHC experiment, only a basic "thin appliance" of about 100 megabytes is loaded initially, and further image increments are loaded as needed. Updates to images are made automatically by the CernVM system, which keeps up-to-date versions of all image modules in a repository for each supported LHC experiment. The resulting working images are typically less than 1 gigabyte in size and are cached by the virtual machines, minimizing access to the CernVM repository until changes appear.

Since 2008, further students and volunteers have produced a working prototype of a combined BOINC-CernVM system. Further work by the CernVM team has successfully linked CernVM to two of the LHC experiments' job production systems (ATLAS's PanDA and ALICE's AliEn). This allows the ATLAS and ALICE experiments to access CernVM "Clouds" without changing their production scripts or procedures.

Exploiting all this work would therefore offer "Volunteer Clouds" of BOINC PCs to be harnessed as an additional resource of processing power to assist the LHC experiments. It was presented at the Advanced Computing and Analysis Techniques in Physics Research conference in Jaipur, India, yesterday (February 23, 2010).

-Ben Segal, CERN, and Seth Bell, iSGTW. For more, see the entry on Indico