• Subscribe

Feature - PanDA lets scientists stay cool

Feature - PanDA makes huge job sets more bearable

Simplified view of core PanDA architecture. Click on image to see the complete diagram in a larger size. Image courtesy of BNL.

Keeping track of huge job sets processed on hundreds of compute clusters around the world through the LHC Computing Grid might send the most organized of logical thinkers into a tizzy. The PanDA (Production and Distributed Analysis) system, developed for the ATLAS collaboration at the Large Hadron Collider, lets scientists stay cool while it takes charge of distributing jobs, collecting results and managing workflow.

An important feature of PanDA is that it allows the user to submit one job, called a pilot job, which coordinates a series of jobs that the user has put together and configured. When launched, the pilot job contacts the PanDA server, which in turn locates available resources and sends the collected jobs to run based on their relative priorities. The pilot system manages the workflow efficiently, providing a quick response time. And it frees users from tedious decision-making, said Kaushik De, a PanDA developer and University of Texas physicist.

PanDA was initially developed in 2005 for U.S.-based ATLAS production and analysis on the Open Science Grid, but it has since been adopted by the global ATLAS collaboration as its primary system for distributed processing. ATLAS uses a total of three different systems - OSG, EGEE and Nordugrid - but PanDA is the interface to them all.

ATLAS has also developed a separate data management system, variously called DQ2 or DDM (for "Distributed Data Management"), that catalogs the tens of millions of ATLAS files distributed worldwide at hundreds of storage locations. PanDA works seamlessly with DQ2/DDM to match user jobs to the input data required, either by sending the job where the data already resides or vice-versa.

This panda may not be so productive or widely distributed, but he knows his priorities. Image courtesy of sxc.hu.

At the moment, PanDA's jobs produce and analyze simulated data, which physicists can use to fine-tune their analyses in preparation for real data once the LHC is operational.

As of January 2009, PanDA had processed more than 25 million simulated data jobs. Its current daily rate is split into about 50,000 data production jobs and between 3,000 and 5,000 analysis jobs. Once real data starts coming in, scientists estimate job counts to approach 500,000 jobs a day.

"PanDA makes it possible to use huge amounts of computing resources distributed all over the world," Kaushik De said. "Without a system like PanDA, it would be almost impossible for physicists to do the type of large-scale processing necessary to analyze their data and quickly get results."

-Amelia Williamson, for iSGTW

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2023 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.