"As a layman, I'd say we have it." It was with these words that CERN's director general, Rolf Heuer, last month announced the discovery of a particle consistent with the Higgs boson, the long-sought-after corner stone of particle physics' standard model. The scientific results upon which Heuer based his statement - taken from two experiments involved, ATLAS and CMS - are now set to be published in the upcoming issue of Physics Letters B.What many people outside of particle physics may not know is that distributed computing played a crucial role in the race towards this discovery.
"Particle physics is nowadays an international and highly data-intensive field of science and it requires a massive international computing effort," said Roger Jones, ATLAS physicist and collaboration board chair of the Worldwide LHC Computing Grid (WLCG), the organization that supplies this huge computing effort. Founded in 2002, today the WLCG involves the collaboration of over 170 computing centers in 36 countries, making it the largest scientific computing grid in the world.
Of its three tiers, Tier 0 is located at CERN, Switzerland, next to the ATLAS experiment. It has a capacity of about 68,000 cores, which is about a third of the grid's total capacity of approximately 235,000 cores. Tier 0 is linked with the Tier 1 centers, which are typically regional research institutes, and each of those is connected with a series of Tier 2 computer centers, mostly situated in universities. The bandwidth used is impressive: 1.5 - 2 gigabytes per second flow continuously from CERN to the Tier 1 centers, and the worldwide flow of LHC-related data is 7.5 - 10 gigabytes per second.
The process of handling particle physics data can be broken down into three main parts: firstly reconstruction of the raw data from the detectors, secondly producing simulations of what the theory predicts should be seen in the detector (this is the most data-intensive part), and thirdly the physics analysis itself.
"In 2011 and 2012 ATLAS alone generated roughly six petabytes of raw data and a similar amount of derived data," said Jamie Boyd, data preparation coordinator for ATLAS. "This is orders of magnitude more than previous particle physics experiments." The challenge is not unique to ATLAS, but shared also by CMS, its sister experiment on the LHC. "CMS is today able to routinely sustain weekly data traffic of about 1.5 petabytes of data over a complex topology of Tier centers," said Daniele Bonacorsi, deputy CMS computing coordinator. "The smooth handling of large volumes of data has been crucial for the LHC experiments to explore their physics potential".
The data volume is exceptionally high because ATLAS, for instance, has about 100 million readout channels. Proton-proton collisions produce a high number of particles, requiring more computing power to reconstruct them. In addition, the LHC was unprecedentedly successful in 2012, resulting in what physicists call 'pile-up' beyond the design level of the machine.
Add to that the pressure to produce a result in time for the biggest particle physics conference of the year, the ICHEP, held in Melbourne, Australia, and the race was well and truly on.
Of course, the physicists were not entirely taken by surprise, as they had been preparing for the demands of LHC operation for some time. "We reduced the CPU time to reconstruct one event to 25 seconds per event," said ATLAS computing coordinator Hans von der Schmitt. "From 2010 to 2012 we increased the capacity of the WLCG by 50%." But when the crunch came in the first half of this year, even this was not enough. "At one point we had to borrow access to 2,000 more CPUs in Tier 0 to keep up with the LHC, and currently many of the funding agencies are providing us with about 20% more capacity than they had originally pledged."
Following last month's announcement, physicists are now trying to more closely identify the "Higgs-like" particle. "Because the next step in the science requires us to look for differential distributions in the data, we will be needing much more of it," said Boyd. This translates into even more computing capacity, but scientists also hope to make the existing capacity more efficient.
"We would like to change the computing models to require fewer copies of the data world-wide, and to minimize the number of bytes needed to store a physics event," said Von der Schmitt. CMS also has some performance-improving tricks up its sleeve. "We are working on more dynamic data placement tactics and extending use of remote access techniques," said Bonacorsi. Specialized processing units (originally developed for computer graphics) and multi-threading will also help.
Physicists are not so secretly hoping that their more detailed analyses will reveal something unusual about the Higgs boson leading to new physics. Who knows? But one thing is for sure: without distributed computing, their scientific advances would not be possible.