iSGTW Feature - Grid usability on the up

Feature - Grid usability on the up

This plot shows a marked increase in the success rate of test jobs run on GridPP over time.
Image courtesy of GridPP

Being a grid user isn't always straightforward. You might have 10,000 CPUs and terabytes of disk at your fingertips, but can you get your job to work on all of them-or any of them?

Although there are many more grid users than there used to be, getting started on a grid, and getting it to do what you want, is still not for the faint hearted.

Fortunately, there are people trying to make it easier. One of them is Steve Lloyd of Queen Mary, University of London.

As part of his work on the ATLAS experiment, Lloyd has been sending jobs to grids such as GridPP and EGEE for years.

But although many of his jobs went off without a hitch, Lloyd found some just kept failing, even though they were sent to a working site that passed all the grid's tests.

Six months ago, he decided to find out why. And as Chair of the GridPP collaboration in the UK, he was in a position to get problems fixed.

The inner guts of the ATLAS detector. These giant bits and pieces will soon be sending streams of data to computers around the world.
Image courtesy of CERN

The result of Lloyd's work is a suite of three test jobs that run hourly on sites in the UK particle physics grid. The complexity of these test jobs range from submitting "Hello World" to analyzing a file of particle physics data using the latest ATLAS software.

Lloyd explains: "When I first started running these tests, their success rate was only around 50%. I'd get a massive range of problems: broken resource brokers, difficulties with the information system, proxy certificates timing out, sites that didn't have the latest version of the ATLAS software, and even sites without the required compiler."

Using the detailed log files provided by Lloyd's test jobs, and with the aid of the GridPP Deployment Team, each grid site got to the bottom of their problems.

Lloyd's test jobs now run at a 90% success rate. This gives him some hope for future grid users.

"I used to wonder how users would ever be able to analyze the ATLAS data on the grid. Now I'm more hopeful-but we've still got a lot of work to do."

Lloyd's experience shows that things don't always go smoothly, even for experienced grid users. But things are on the up.

- Sarah Pearce, GridPP