Opinion - Avalanche warning: the new challenges of the data grid
How will we manage these data? Where will we store them? The new "data grid" brings with it certain responsibilities that must be borne by the users.
Managing storage for a large user base is not a new problem, but grids greatly amplify the scale.
Allocating storage resources is more complicated than managing compute clusters. Upon completion of a compute job, CPUs are simply returned to the common pool; however, their output must be stored and cannot be deleted indiscriminately.
Laws, limits, allocations and restraint
Even the allocation of small storage quotas to thousands of users will add up over time to cause serious problems.
One solution is to put time limits on storage allocation, but this does not help users who want to store data indefinitely.
Another solution is to charge for storage space, which, however, is not in the spirit of grids. Note that this is more of a management issue-of people, not of systems-than a technical one.
Take a look at your desktop
Many tools exist to manage large pools of data, but these do not indicate when the data can be safely deleted. What is a facility manager to do when faced with hundreds of thousands (if not millions) of files owned by thousands of users, to which he has no immediate access? Users are much less likely to worry about disk usage at a site halfway around the world than they are at their university IT centre. We have all been guilty of not cleaning up our desktop; we just go out and buy a larger disk!
Policies must be put in place to keep the explosion of data in check. I fear that this can only be done by making most storage space of the "scratch" or "temporary" variety, which will put restrictions on users. Exceptions can be made for large projects such as the LHC, which aggregate the needs of a multitude of users, but the number of these exceptions must be kept reasonable, or system administrators risk spending all of their time contacting users to clear space.
One year filled in three months
An example of the problem can be found at TRIUMF, Canada's national laboratory for particle and nuclear physics, which houses one of ten Tier-1 centers for the ATLAS experiment at the LHC. After an expansion of the centre in August 2007, one year's storage allocation was filled in about three months!
The main point of this comment is to convince users that it is their responsibility to control their use of storage on the grid. Resource providers and system administrators supply the tools for data management-directory structures, file movement and so on-but users must act responsibly when managing their storage allocation. It is not reasonable for users to wait for an email asking them to delete files because the center is running out of space.
Again, the scale of the problem is amplified by the grid. Whatever solution is adopted, it will inevitably require all of us to exercise restraint.
- Michel C. Vetterli, Simon Fraser University/TRIUMF