iSGTW Feature - A Fair Shake for Seismologists

Feature: A Fair Shake for Seismologists

Instantaneous surface velocity 75 seconds after the earthquake origin time. The warm colors leading from the fault region into the Los Angeles area illustrate how a chain of sedimentary basins can guide earthquake waves away from the fault where they originate into heavily populated regions in Southern California.

In the Midwestern United States, in the spring, there are weeks when you can't get through an episode of your favorite television show without an alert telling you that your county is under a tornado watch or, more invasive still, the local meteorologist interrupting programming to tell you to head to your basement. It saves lives, and it represents an incredible amount of simulation and data collection Â- even if you do have to scurry to the Web later to find out how Â"LostÂ" ended that week.

There may never be an equivalent for temblors, a local Â"earthquake manÂ" breaking into prime time. But an ambitious group of more than 40 institutions, together called the Southern California Earthquake Center (SCEC), is Â"building earthquake modeling capabilities to transform seismology into a predictive science similar to climate modeling or weather forecasting,Â" according to Phil Maechling, SCEC's information technology architect.

To bring that vision of seismology as a predictive science to life, SCEC has built a set of grid-based scientific workflow tools. A series of simulations based on these tools Â- TeraShake 1, TeraShake 2, and the most recent, CyberShake Â- began in 2004. They've run on TeraGrid resources across the country, including the National Center for Supercomputing Applications (NCSA) and the San Diego Supercomputer Center (SDSC). Similar calculations by a SCEC team at Carnegie Mellon University are also being run at the Pittsburgh Supercomputing Center.

While the goals are to build a computational platform for predictive seismology and to improve the Â"hazard curvesÂ" that building designers use to estimate the peak ground motions that will occur over the lifetime of a building, these simulations are already yielding significant results.

Surface cumulative peak velocity magnitude, comparing results from TeraShake 1 and TeraShake 2. TeraShake 1 predicted very large peak ground motions in the Los Angeles area while TeraShake 2, with its more complicated physics, predicted more modest motions.

TeraShake 2, for example, simulated a series of earthquakes along the San Andreas Fault. Run at NCSA and SDSC, it revealed a Â"striking contrast in ground motion between ruptures that started at the northwestern end of the fault and those that started at the southeastern end,Â" says geologist Kim Olsen of San Diego State University. This effect is further influenced by a chain of sedimentary basins that run from the northern end of the fault to Los Angeles. In earthquakes that start at the southeast end, the basins trap seismic energy and channel it into the Los Angeles area. Studies of previous earthquakes in this region and others have confirmed the existence of this Â"wave guideÂ" effect in nature.

These findings were presented at April 2006's annual meeting of the Seismological Society of America and at a SCEC conference in June. They were also recently published in Geophysical Research Letters.

Â"The success of our national initiatives in supercomputing depends on the integration of hardware, software, and wetware, that is technical expertise, into an effective cyberinfrastructure. We have been leveraging our partnership with TeraGrid to promote this vertical integration,Â" says Tom Jordan, SCEC's director and an earth sciences professor at the University of Southern California.

Shaking through
The process of building hazard curves includes three basic steps. First, Strain Green's Tensors (SGTs) are generated. The SGTs capture how earthquake waves propagate through the complex geological structure underneath a 500 kilometer by 500 kilometer by 40 kilometer deep region of Southern California. To calculate the SGTs, a velocity mesh is created that describes the geological structure of the region as a mesh of grid points, forming 1.2 billion cubes 200 meters on each side. Second, the SGT data are used to calculate synthetic seismograms for Â"all, or almost all, possible earthquake variations, each of which has its own probability of occurring,Â" says Maechling. Finally, these seismograms are summed into hazard curves, which predict the probability of strong ground motions occurring at a given site.

Â"These calculations really represent three distinct aspects of high-performance computing,Â" he says. Â"The SGTs, which are figuring out the physics at more than a billion mesh points, represent highly parallel interaction between hundreds of nodes. That's capability computing. Extracting the synthetic seismograms is embarrassingly parallel Â- lots of small, unlinked jobs being run to get more than 100,000 seismograms. That's capacity computing. And all of this is very data-intensive computing, requiring large amounts of storage and very fast I/O. The TeraGrid helped us solve all three parts of this.Â" The first two steps are combined into workflows for each physical site that is being simulated, made of between 11 and 1,000 of the two-step analysis components. The final step of creating the hazard curves, meanwhile, is done in-house by SCEC.

A succession of successes
Predictive science always wants for more physics Â- models that more precisely capture reality, simulations that confirm that you're moving in the right direction. TeraShake 1, TeraShake 2, and CyberShake have progressively added more and more of that physics and are therefore more physically realistic. TeraShake 1, for example, used what are known as kinematic source descriptions to simulate the source of the earthquakes and the velocity of the earthquake fault over time. Waves spread through the simulated ground, but this first simulation did not accurately simulate friction on the fault, what seismologists call fault slip. The simulations must obey well-known sliding friction laws, and these laws constrain the motions in some ways. Â"TeraShake 1 was an important simulation, and it captured some basic physics, but we knew that some aspects of it weren't realistic,Â" says Maechling.

TeraShake 2 moved from a single TeraGrid site to orchestrated, simultaneous runs at SDSC and NCSA. It also went from the kinematic source descriptions, which are based on historical earthquake data, to source descriptions that were simulated using the physics of friction-based sliding at the fault. In TeraShake 1, before these embellishments were added, the team saw some modeled earthquakes that simply could not exist in nature. TeraShake 2's improved models significantly complicated the process and increased the necessary computing power, and they got rid of many of these outliers.

Â"In adding this physics, you're not quite sure what [the simulations are] going to show,Â" says Maechling. Â"TeraShake 1 predicted very large peak ground motions in the Los Angeles area while TeraShake 2 brought these motions down to earth. In this case, the results went in a positive direction [reducing the predicted impact that an earthquake would likely have]. But you just don't know. That's what makes this so exciting.Â"

CyberShake, again run at NCSA and SDSC, uses yet more realistic physics. The hazard curves calculated by CyberShake tend to be significantly different than those issued by the U.S. Geological Survey, which are considered standard. If the CyberShake curves are correct, this type of calculation could significantly change the character of the national probabilistic earthquake hazard maps. Accordingly, the next step for the team will be to run more simulations with the CyberShake code as a base and to validate these simulations.

Â"We completed about 10 sites with CyberShake, and they are very promising. But we need to scale that up,Â" says Maechling. That means increasing the frequency of the waves that propagate from the current 0.5 Hertz to about three Hertz. This is exceptionally taxing because each increase of .5 Hertz increases by a factor of eight the number of mesh points required to simulate the physics.

That also means dramatically increasing the number of sites simulated. The team expects to need about 625 in order to create a comprehensive map of a relatively small region of Southern California. Â"We believe that our grid-based workflow tools, based on the Virtual Data System (VDS), will enable us to scale up our CyberShake calculations to the level necessary to calculate CyberShake-based hazard maps by supporting job scheduling, data transfers, and file management capabilities as a workflow,Â" says Maechling.

TeraGrid plus
Simulations of that scale will require the computing power of the TeraGrid. CyberShake ran on 288 processors at NCSA, and each processor had 500 megabytes of RAM dedicated to it. NCSA also devoted more than 80 terabytes of storage space to the run, so that the team could stage data waiting for post-processing.

Moving from .5-hertz simulations to three-hertz simulations will easily push the amount of storage needed for a single site simulation above one petabyte. These runs also require systems that can handle high-capacity input-output calculations, so specialized I/O nodes on NCSA's Mercury system will continue to be crucial.

Sheer power is not enough, however. Large-scale collaborations like this one require the intimate relationships and expert services supplied by TeraGrid resource providers.

TeraShake 1 was based on a long-term collaboration between SDSC and SCEC. The center assisted with planning, code porting, and visualization, among other things. That relationship continues to this day.

The SCEC VDS-based workflow system handled job scheduling for the capacity runs using Condor glide-ins, which aggregate small jobs for queuing and then parcel them out to multiple processors once given access to the machine. Condor and the glide-in concept were developed at the University of Wisconsin in part under the auspices of NCSA.

NCSA gave the SCEC team dedicated time in its computing queues to debug the final implementation of the Condor glide-ins and to integrate them into the larger workflow. Tailored allocations that give computing time when and how it is needed are an NCSA specialty.

Â"SCEC wants to use seismological simulations in order to make socially relevant predictive statements about earthquake hazards in Southern California,Â" says Maechling. Â"These kinds of collaborations between geoscientists and the high-performance computing community are essential to us reaching this goal.Â"

Learn more at the SCEC Web site.

This article originally appeared on the NCSA Web site.

- J. William Bell, National Center for Supercomputing Applications