In the early 2010s, excitement in the physics community mounted as two laboratories, hosting the world’s two highest-energy particle accelerators, seemed on the verge of discovering the Higgs boson.
Predicted in 1964, the boson was the last undiscovered particle of the Standard Model (SM) and would explain how all particles that make up the world get their mass.
Then, on July 4, 2012, after a half-century search, the announcement came out of the European Organization for Nuclear Research (CERN), a laboratory built on the Franco-Swiss border, near Geneva. The finding validated the technology behind the discovery, CERN’s Large Hadron Collider (LHC), a $4.75 billion particle accelerator.
And today, nine years later, the future of high-energy physics remains closely tied to the LHC. Although the Higgs is often presented as a solution to many problems in physics, in reality, it both raised new ones and left many unanswered — such as the question of dark matter’s existence, the hierarchy problem, the Universe’s matter-antimatter asymmetry, and other beyond-the-SM physics.
Investigation into these open questions is requiring higher energies, more collisions, bigger data, and closer analysis of the Higgs. CERN’s Maria Girone, spoke to us about the computing challenges of this research, three weeks following her keynote talk on the matter at PRACEdays21.
She is the chief technology officer at CERN openlab, a public-private partnership which works with leading tech companies to drive innovation in the computing technologies needed for high-energy physics.
Behind the scenes in high-energy physics
One of the most complex technologies ever built, the LHC’s 17-mile tunnel of superconducting magnets and mammoth detectors winds beneath the Swiss countryside, into France, and back again. Thousands of magnets along its circumference guide beams of protons traveling at just shy the speed of light along collision paths.
Those collisions produce a maelstrom of new particles and decaying byproducts, which are detected at one of four points by the LHC’s particle detectors. At the time of the Higgs discovery, the LHC generated millions of collisions per second, today one billion.
However, the challenge isn’t just in building the technology, imagining the experiments, and producing the particle collisions. It’s in capturing, disentangling, and analyzing the enormous amounts of data they produce.
“The detectors contain 150 million sensors which survey the data out 40 million times per second. So, filtering and data selection are very important steps in getting the data online within milliseconds from the event collisions,” Girone says.
Prior to the LHC’s first run, scientists expected to see an average of 25 simultaneous collisions. Researchers must disentangle and reconstruct these events, sifting through billions of data points, to study individual collisions — a step crucial to both measuring Higgs decays and hunting for unknown particles.
“Today, the pileup is already between 40 to 50, with records beyond 50. But after upgrades to the LHC, the pileup will be around 200,” Girone says.
The reconstruction process brings immense computing and storage needs, to the scale of CERN’s 1 million processors, which are distributed across 161 sites and 42 countries.
Physics is one of the most data-intensive fields. CERN, for example, is already in exascale science with one exabyte of data stored across its distributed infrastructure, Worldwide LHC Computing Grid (WLCG.)
Back when CERN first published designs for the LHC in 1995, 13 years before the first run, it also launched several data processing initiatives. And throughout its history, the organization has continued to modernize and innovate its code and infrastructures in parallel with technological advancements. But soon, the needs in particle physics will outstrip the gains in technology.
“Even if we are assuming that some of the annual improvements that are coming from technology are factored in — typically what we have been seeing up until now is 10 to 20 percent annual improvement — even if we factor this in, we will have a resource gap. Technology will fall short by about a factor of six to 10 in processing and three to five in storage,” Girone says.
The upgraded LHC, or the High-Luminosity LHC, will be ready in 2027. In addition to the expected increase in pileup, it will also boost the LHC’s luminosity by a factor of 10 and the average yearly production of Higgs bosons from 3 million to 15 million. Given this, CERN estimates that, in total, the LHC has only collected about two percent of the data it will collect.
“When we move to the HL-LHC, the experiments will produce about one exabyte of data per year,” Girone says, compared to an annual figure of around 80 petabytes collected in recent years. “And, overall, the biggest computing challenge lays in realizing the scientific potential of this data. Processing, analyzing, storing, and providing access to such big quantities of data — exabytes of data — by a global research community that is distributed in the world and made of thousands of people is an enormous challenge.”
One of the ways in which CERN is working to tackle these challenges is by partnering in multi-science collaborations like RAISE and EGI-ACE, with particular focus on horizon technologies in supercomputing and AI. Since CERN’s user community spans over 12,000 researchers from 85 countries, its diversity in science, culture, and partnerships, spanning industry and academia, is no surprise.
Supported by this research and new technologies, “In a way, the LHC will be transitioning from looking for needles in a haystack to producing stacks of needles,” Girone says. “That then will help us to possibly uncover new phenomena, or in any case, to understand known phenomena with greater precision.”