Feature - How to run a million jobs
At SC08, several experts organized an informal session to share information on up-and-coming solutions for expressing, managing, and executing "megajobs." They also discussed ways of repackaging work to avoid megajobs altogether.
Here iSGTW shares the latest ideas and developments about megajobs with its readers, and plans to follow up with articles on various mentioned technologies and trends in the coming months. Contributions are welcome.
Biting off a megajob-it's a lot to chew
As large systems surpass 200,000 processors, more scientists are running "megajobs", thousands to millions of identical or very similar, but independent, jobs executed on separate processors. From biology, physics, chemistry and mathematics to genetics, mechanical engineering, economics and computational finance, researchers want an easy way to specify and manage many jobs, arrange inputs, and aggregate outputs. They want to readily identify successful and failed jobs, repair failures, and get on with the business of research. System administrators need effective ways to process large numbers of jobs for multiple users.
As tools and resources change, people describe their computing jobs differently, says Ben Clifford of the University of Chicago. A decade ago, scientists submitted big jobs with big inputs and outputs. Now, due to the availability of massive processor farms, jobs tend to be numerous and small-seconds long with only kilobyte-scale data files.
Some older, well-established job management systems are extremely feature-rich, but their high overhead in scheduling and persistency makes them inefficient for executing many short jobs on many processors. Others have been developed specifically for the data-intensive, loosely-coupled, high throughput computing (HTC) grid model. These newer systems work well up to many thousands of jobs, short or long. Still newer ones, like Falkon and Gracie, which aim to scale even higher, have yet to achieve wide-scale deployment.
As job management systems change, so do applications. Ioan Raicu and Ian Foster, both of the University of Chicago and Argonne National Laboratory, have defined a class of applications called Many Tasks Computing (MTC). An MTC application is composed of many tasks, both independent and dependent, that are (in Foster's words) "communication-intensive but not naturally expressed in Message Passing Interface," referring to a standard for setting up communications between parallel jobs. In contrast to high throughput computing, MTC uses many computing resources over short periods of time to accomplish many computational tasks. Megajobs naturally fit in both the HTC and MTC class of applications.
Some computer systems are undergoing extensions to support megajobs, HTC and MTC; for example, IBM provides a new high throughput, grid-style mode on the Blue Gene/P supercomputer. Raicu and his colleagues have implemented MTC support on this new platform and others via Falkon, a fast, scalable and lightweight task execution framework from the University of Chicago.
"Users don't start with the idea of running a million jobs," Clifford says. "They start with some high level application, and it's just convenient to describe it as a million jobs." If they can break an application into separately schedulable, restartable, relocatable 'application procedures', he says, then they just need a tool to describe how the pieces connect. Then the jobs are easy to run.
Clifford has helped develop Swift, a highly scalable scripting language/engine to manage procedures composed of many loosely-coupled components that take the place of megajobs. In addition to describing jobs, Swift sports clever mechanisms to reward well-behaved, high-performing sites while penalizing slow or failing sites. It also throttles job submission as needed, and controls file transfers to ensure adequate performance.
Also in the megajob medicine cabinet are Falkon and Gracie, a framework for executing massive independent tasks in parallel on grid resources.
Indigestion? Rethink, repackage …
Some experts think that many research questions can be answered using the same grid-type computing model, but using a different approach than megajobs.
David Abramson of Monash University wants the grid community to help scientists find smarter ways to specify problems by focusing on the scientific questions in the first place. He cites his Nimrod family of tools that provides sophisticated methods to search for good solutions as opposed to all solutions. Users can ask complex questions such as "Which parameter values will minimize the output of my model?" This helps shield the user from the complexity of managing lots of independent jobs.
John McGee from the Renaissance Computing Institute notes that a number of workload management systems on Open Science Grid including PanDA, the OSG MatchMaker, and Condor DAGMan, are used extensively to repackage the total computational workload into chunks that are optimized for the infrastructure and scheduling overhead.
According to Gregor von Laszewski of the Rochester Institute of Technology, many supercomputing applications exist that manage millions of "tasks" internally. The differentiation between "tasks" and "jobs" is essential, he says, as many users do not care about the concept of a "job". They just want to get their work done. He believes that in the future scientists will benefit from tools that allow them to bypass the technical difficulties of mapping their work to jobs-tools that transparently perform the mapping between millions of tasks and traditional queuing systems.
In the end, a balanced diet
"The answer will not be a single tool," von Laszewski says, "but a suite of tools addressing a variety of solutions needed to simplify access to millions of science services working in coordination." As an example, he cites the CoG Kit and its workflow engine used internally in Swift to provide much of the functionality for scalable services, noting that it also can be used directly by scientists to simplify their job management.
To simplify access and avoid congestion, the session participants recognize that more work is needed to leverage tools and services that integrate megajobs with existing queuing systems and high performance interconnects, and to allow loosely-coupled applications to communicate directly between processors. In the meantime, researchers have some remedies to choose from.
-Anne Heavey and Amelia Williamson, iSGTW, and session participants David Abramson, Ioan Raicu and Gregor von Laszewski
Editor's note 5 December: small corrections and clarifications made to original article.