If you ask, "hey man, shall we go for a beer later?" how many of your friends reply with something like, "maybe tomorrow, I really have to baby-sit my ntuples tonight."
Welcome to the happy and slightly geeky particle physics community. Happy, because I think we are a species of truly fortunate people, despite our constant skepticism and restlessness. And, most of us were already slightly geeky anyway, even before starting to baby-sit ntuples.
In particle physics, a ntuple is a standard way of storing data. Only variables useful for a certain analysis enter the ntuple, as 'columns' of a table, while events are listed as rows. Producing ntuples is one of the obvious steps towards analysis optimization and we need distributed computing resources speciﬁcally to reduce the large, large, amount of data produced by our experiments. This is where the grid becomes essential to us - it's where we sit, we type, we run jobs, we send and retrieve stuff, we type again, and we check, we wait, send again, and wait again. In a word: where we baby-sit.
My job-on-the-grid babysitting task is related to the search for evidence of B-meson to muon decays (Bs → µµ) at the LHCb detector at the Large Hadron Collider - "the biggest scientiﬁc experiment ever attempted", said Brian 'rock-star physicist' Cox. The LHC is a 27 kilometer (17 mile) long circular tunnel, excavated under the Swiss-French border nearby Geneva, Switzerland. Above are cows, sunﬂowers and vineyards, awesome climbing spots and gentle ski-slopes. Below are giant machines, unique works of art and the technology we need to accelerate protons and heavy ions almost to the speed of light.
The accelerator complex that culminates with the LHC is in fact more complicated than a single tunnel. Everything starts from a tiny hydrogen bottle and develops into multiple accelerating segments. The proton beams circulate in opposite directions, moving inside vacuum and guided by superconducting magnets. They meet at only four points. Here is where the collisions happen and our eyes go sharply on focus.
Every collision creates new particles. The detectors used to look at particles for the four big CERN experiments are placed at these collision points. The LHCb experiment is named after the 'beauty' or b-quark and it was designed to detect the decays of B mesons - particles made of a quark (either 'up', 'down', 'strange' or 'charm') and an antiquark, namely an anti-b quark.
To B or not to B - why do we care about B mesons?
The Standard Model describes the fundamental interactions between particles, but in all its beauty and coherence, it still leaves open questions. One of the unsolved puzzles is related to the origin and evolution of the universe that we know today. Why is it mainly made of what we call matter, rather than having matter and antimatter in equal amounts, as it was assumed to be at the beginning of its life? The answer to this problem might lie beyond the Standard Model, in other models that we call 'new physics'.
The B meson decays are especially convenient if you're looking for new physics. Take the decay of a B meson containing a 'strange' quark (Bs) into two muons (µ), elementary particles similar to the electrons. According to the Standard Model only about three Bs mesons out of a billion are expected to decay into two muons. Being so rare, the Bs → µµ decay is a powerful probe of new physics - measuring more than the expected amount of events would allow us to get a glimpse of new physics.
It's all about searching and counting. Observing the decay and measuring its rate accurately is extremely important. As a Bs → µµ hunter, you will need to ﬁnd its signal in a big stack of data collected by the experiment. And by 'big' what I mean is 'enormous'. It's enough to think that LHC deals with roughly 600 million collisions per second.
The data from these collisions is reduced to a few hundred interesting events per second, thanks to a combination of hardware and software selections that we call 'the trigger system'. What is left still needs to be investigated in detail to look for our Bs → µµ needles in the data haystack.
Here is where the grid comes to the rescue. First of all, to process, analyze and store the huge amount of data LHC produces. Don't forget that the distributed computing grid was originally conceived for this exact purpose. In our case, it means storing all triggered data that might contain a Bs → µµ candidate and running dedicated algorithms designed to produce smart ntuples.
But, we don't only use the grid to process acquired data: we need it to simulate data too. Every analysis needs to validate techniques on simulated samples - in our jargon, those are called 'Monte Carlo's' after the homonymous (having the same name) class of computational methods.
Now, back to the starting point: the reason why I'm not going for a beer tonight is my Monte Carlo job's baby-sitting task. We are studying how good certain selections are to improve our statistical techniques on the Bs → µµ search. We need to analyze big samples and to run multiple algorithms on top of each other. But luckily, I can aim at one night of grid jobs only. All things considered, that is super convenient. And, I might be free for a beer tomorrow already.
A version of this article first appeared on the EGI website.