iSGTW Feature - Cruise control for eScience

Feature - Cruise control for eScience


Image courtesy of howstuffworks.com.

The paper "Feedback-Controlled Resource Sharing for Predictable eScience" by Sang-min Park and Marty Humphrey of the University of Virginia figured in the list of finalists for the SC08 Best Student Paper. This article is based on Park's SC08 presentation on this work.

As dynamic, data-driven applications become more common on the grid, it's tougher to guarantee when any given job will run, and more importantly, finish. For time-critical applications-weather, coastal hazard prediction, and other disaster-prevention-a late result is as bad as no result.

Standard batch-mode scheduling cannot predictably support time-dependent, adaptive computing demand, says Sang-min Park, a Ph.D. candidate at the University of Virginia. It can't change a job's completion time with fine granularity or cope reliably with unanticipated disturbances. Deadline-guaranteed processing is the key to success for time-critical applications.

At SC08, Park presented a feedback-controlled model for job scheduling. It works like cruise control. As long as you're driving on a flat, straight (and empty) road, the car runs at a constant speed. Now you hit a hill-a disturbance. A feedback mechanism kicks in to increase the power and maintain your speed. Inserting a data-driven job on a system is analogous to putting a few hills in the road. Park and his team want to supply the cruise control.

Park's model focuses on controlling a job's run time, not its wait time. It exploits Virtual Machine (VM) capabilities for isolating jobs on the same server and for run-time reconfiguration of the CPU scheduler.

"We can actually create the illusion that multiple jobs running in one machine execute at varying CPU speeds," says Park. "For example, if two jobs share a 3 GHz CPU, we can assign 2 GHz to one job and 1 GHz to another. This flexible CPU setting becomes the 'actuator' function in our feedback loop."

A closed-loop system needs sensors as well. Sensor calls get put at critical points of the application source code, for instance after processing a file.

The concept of the feedback loop to control the dynamic behavior of the system: this is negative feedback, because the sensed value is subtracted from the desired value to create the error signal which is amplified by the controller. (from Control Theory, wikipedia.org)

Image and caption courtesy of wikipedia.org

A controller-the third essential piece-manipulates the VM's CPU scheduler such that the sensor always equals the 'reference'-the "cruise control setting". Kept abreast of the application's progress by the sensor, the controller determines whether to throttle the actuator up or down. Currently, the user must code the controller.

Park and his colleagues have tested their model against both data-intensive and compute-intensive applications and achieved nine times higher accuracy than the batch-sharing best case, with less than 3% error on completion time.

With this promising result in hand, the team is now investigating a self-tuning controller.

"We've recently developed a runtime system that performs application modeling and control design without user effort," says Park. "Users put sensors in their application code and use our tools to automatically drive the application's progress to meet their deadlines."

-Anne Heavey, iSGTW

Authors