Feature - Flood of data can help prevent hurricane damage
|
||
|
||
When the National Hurricane Center issues a hurricane advisory, emergency teams have little time to predict the locations and effects of storm surges and waves in order to identify areas that should evacuate. A prototype system, known as the Southeastern Universities Research Association Coastal Ocean Observing and Prediction (SCOOP) program, promises to run hurricane computer simulations and produce results quickly. The complex hurricane models crunch very large data sets, and the data must move from storage to compute resources and back as quickly and reliably as possible. The SCOOP program relies on the Stork Data Scheduler software package - so called because it delivers data - to manage data placement and movement. Developed by researchers at Louisiana State University and the University of Wisconsin-Madison, and made freely available for download, Stork allows researchers to efficiently store, access, share and transfer large data sets. "There are 80 or more model configurations that we currently run very quickly for our hurricane predictions, and efficient and reliable data movement is extremely important," said Gabrielle Allen, SCOOP collaborator and associate professor at the Louisiana State University Center for Computation and Technology and Department of Computer Science. "Stork automatically chooses the best parameters and transport mechanisms so the data is transferred in the most efficient way." Distributed computing has historically focused on managing computing resources rather than data. Now that research has become more data intensive, inefficient data movement often creates a major bottleneck. Stork works with high-level batch schedulers to schedule computation and data movement tasks synchronously in one integrated system. |
||
"Batch schedulers, such as Condor, specialize in scheduling computational tasks, but do not specifically consider data scheduling or data movement," said Tevfik Kosar, Stork project leader and LSU computer science assistant professor. "Stork allows users to schedule and optimize data transfer tasks within a basic batch scheduler environment." Stork automatically verifies that files transfer correctly, and its error recovery mechanism ensures completion of the data transfer even if the initial transfer fails. Stork developers plan to release an update this spring with several new features, including caching of multiple data transfer requests and up-front estimation of completion time for a requested data transfer. "Data-driven science applications, like the SCOOP hurricane scenario, illustrate the critical need for new tools that fundamentally support data management," Allen said. "It is clear that integrating data-centric technologies, such as Stork, into existing distributed compute resources such as the Open Science Grid or the TeraGrid, will lead to a fundamental shift in how we undertake research in science and engineering fields." -Amelia Williamson, for iSGTW |