iSGTW Feature - Results from the SC07 Challenges: Analytics, Bandwidth, Cluster and Storage

Feature - Results from the SC07 Challenges: Analytics, Bandwidth, Cluster and Storage

This year saw four Challenges showcasing high performance computing resources at NovemberÂ's SC07 conference.
Image courtesy of Douglas Mansell

Every year competitors in the Supercomputing Challenges thrash it out in a match of fastest, cleverest and best.

The winners of NovemberÂ's SC07 Challenges are no different. Find out who walked away with the blue ribbons:

Analytics Challenge

Bandwidth Challenge

Cluster Challenge

Storage Challenge

Analytics Challenge

First place in the SC07 Analytics Challenge went to a Globus-based application for identifying attacks on cyberinfrastructure. Called Angle, or the New Approach for Protecting Cyber-infrastructure, the application uses a data and compute cloud that allows data to be left in place and computations to be performed over the data.

Cloud computing has been used in the past by companies such as Google, Yahoo, Amazon and Microsoft, however, these cloud infrastructures are by and large based on the standard Internet. In contrast, the Sector data cloud used by the Angle Project was based on wide-area high performance networks and Globus grid technology for federating distributed resources, which enabled easy handling of the large data sets produced by the project.

Â"Winning the Analytics Challenge shows the potential that second generation data and compute clouds have for changing the way we manage and compute with large distributed data,Â" said Robert Grossman, Director of the National Center for Data Mining at the University of Illinois in Chicago, leaders of the team who developed Angle.

This team also included participants from Northwestern University, the University of Chicago, Argonne National Laboratory and the University of Southern California.

Data produced by Angle was analyzed using the Swift system for data-intensive, loosely coupled parallel programming, developed at the University of ChicagoÂ's Computation Institute. This allowed data analysis tasks to be distributed over multiple grid clusters.


Indiana UniversityÂ's Data Capacitor is a high speed/high bandwidth storage system for research computing that serves all IU campuses and NSF TeraGrid Users. At peak performance, the Data Capacitor has a 14.5 gigabyte per second aggregate transfer rate per second.
Image courtesy of Indiana University

Bandwidth Challenge

First place in this yearÂ's Bandwidth Challenge went to a team using Indiana UniversityÂ's Data Capacitor, a system designed to store and manipulate massive data sets.

The team achieved a peak transfer rate of 18.21 Gigabits per second out of a possible maximum of 20 Gigabits per second. This performance was nearly twice the peak rate of the nearest competitor.

The team achieved an overall sustained rate of 16.2 Gigabits per secondÂ-roughly equivalent to sending 170 CDs of data per minuteÂ-using a transatlantic network path that included the Internet2, GÉANT and DFN research networks.

Â"This project simultaneously pushed the limits of networking and storage technology while demonstrating a reproducible model for remote data management. Best of all, we did this using a variety of research applications that we support every day at Indiana University,Â" said Data Capacitor and Bandwidth Challenge project leader Stephen Simms.

The Data Capacitor is powered by the open source Lustre file system and the Linux operating system. It is currently accessible to researchers though IUÂ's participation in the TeraGrid.

The winning team was led by Indiana University, with partners from the Technische Universitaet Dresden, Rochester Institute of Technology, Oak Ridge National Laboratory and the Pittsburgh Supercomputing Center.


Cluster Challenge

A team of undergraduates from the University of Alberta, Canada, won the inaugural Supercomputing cluster challenge, a three-day cluster-building marathon.

Competing teams assembled small clusters on the exhibit floor, running benchmarks and applications selected by industry and high performance computing veterans. Power consumption was limited: each team was allowed just a single 26 amp, 110 volt circuit.

Clusters were judged on the speed of benchmarks and throughput of application runs.

The University of AlbertaÂ's winning system was a 64-core (Xeon 2.66GHz) system with 20Gbit InfiniBand and 16GB of memory running Scientific Linux.

The competition was designed to show how accessible clusters have become: the systems built by the student teams would have been considered top-of-the-line super computers just ten years ago.


Storage Challenge

This award for the most effective approach to using large-scale storage for high-performance computing went to a novel software framework called ParaMEDIC, or Parallel Metadata Environment for Distributed I/O and Computing.

The ParaMEDIC software was used to search the sequences of all completed microbial genomes to discover missing genes and speed future searches by generating a complete genome similarity tree. The ParaMEDIC software framework used a semantics-based approach to create a metadata representation that was four orders of magnitude smaller than the actual output data.

Â"Using ParaMEDIC, the entire genome similarity tree, corresponding to a petabyte of data, can fit into a 4-gigabyte iPod nano,Â" said team member Pavan Balaji of Argonne National Laboratory.

This entire task required many millions of CPU-hours of computational capability and generated a petabyte of uncompressed output. Since not many supercomputer centers provide both the computational and storage resources required for this task simultaneously, the research team relied on a worldwide supercomputer that aggregated the compute resources from various locations within the U.S. and the TSUBAME storage resources at the Tokyo Institute of Technology in Japan, with technical support from Sun Microsystems.

Â"In total, we relied on six U.S. supercomputing institutions and accessed over 12,000 processors across eight supercomputers. The ParaMEDIC framework then improved compute utilization from 10 percent to nearly 100 percent for the compute resources and storage bandwidth utilization from 0.04 percent to 90 percent for the storage resources,Â" said Wu-cun Feng of Virginia Tech.

The team comprised researchers from Argonne National Laboratory, Virginia Tech and North Carolina State University.