• Subscribe

Science DMZ: The fast path for science data

Speed read
  • A conversation with Larry Smarr about the Science DMZ
  • A Science DMZ optimizes network performance for science while leaving campus affairs safe behind their local area networks
  • Cooperation between the DOE and NSF is accelerating scientific discovery

Larry Smarr’s influence is felt throughout 21st century information technology. A founding director of both the National Center for Supercomputing Applications (NCSA) and the California Institute for Telecommunications and Information Technology (Calit2), Smarr is largely credited for bringing supercomputing out of the world of top-secret clearance into an open academic environment. 

We recently sat down with him to discuss the Science DMZ (De-Militiarized Zone) concept. 

 



The Science DMZ is the prevailing solution for managing the data deluge scientists are experiencing. What is the Science DMZ and what kind of gains will it bring to the research community?

It’s really a lot like what the US did in 1955 when Eisenhower said we’re going to make an interstate highway system to join up the city freeways. Well, we already had the city streets, and we had two-lane highways like Route 66 and US 40 — why did we need this redundant transportation infrastructure? Because we needed to drive from New York to San Francisco without a stoplight.

<strong>Tech traveler. </strong>A member of Disney's Tomorrowland generation, Larry Smarr was one of the first to use supercomputers to crack Einstein's gravitational wave theories, and was lead investigator in the NSF's <a href= 'http://www.optiputer.net' > OptiPuter project.</a> A chief reason supercomputing is accessible in the academic environment, Smarr has his finger on the pulse of US cyberinfrastructure. Courtesy Larry Smarr.

Due to that infrastructure investment, the US economy grew dramatically for the next 50 years. We still have the city streets, we still have two-lane highways, but if you want to get from point A to point B fast, or you want to take a big truck and move a lot of stuff, you use the city freeways and the interstate highway system.

On each of our campuses today, we have probably 30,000 - 40,000 end users of the commodity shared internet doing all kinds of things, from updating their Facebook page to writing scientific papers.

But now you’re having this scientific genomic and climate data as well as imaging data and Large Hadron Collider high-energy physics data pop up everywhere and guess what? It’s not megabytes, like photographs on Facebook — it’s gigabytes or terabytes, and in the case of climate and LHC research — it’s up to multiple petabytes of data.

So in response to this scientific and engineering big data challenge the Department of Energy (DOE) realized you could design an innovative type of networking on campus called the 'Science DMZ' architecture.

The Science DMZ is analogous to building the city freeway systems, whereas the campus-shared internet systems that support non-science missions are like the city streets. We need this Science DMZ network architecture because big data and routine applications have different bandwidth and performance requirements.

The US National Science Foundation (NSF) adopted the DOE Science DMZ approach and over the last four years they have funded over a hundred campuses to upgrade their connections to 100 Gigabits per second and to build an on-campus Science DMZ to specifically support large science data transfers.

As the next phase, the NSF recently funded the Pacific Research Platform (PRP) grant, for which I serve as Principal Investigator.  The PRP uses the California to Washington State Research and Education Network-CENIC/Pacific Wave to link together over twenty campuses, each with a Science DMZ.  In that way, the PRP is analogous to a regional version of the interstate highway system, linking together the campus Science DMZ freeways.


What are the challenges to implementing the Science DMZ?

The fundamental problem is that each campus has developed its own way of building their  cyberinfrastructure, because every campus is a product of its history. The Science DMZ is the overall idea, but it’s been implemented in many different ways on many different campuses.

And so one of the challenges is ensuring that these different Science DMZs can work as effectively as possible together. Even though they were engineered in distinct ways on the different campuses, they need to be able to operate 'disk to disk' or 'scientist to scientist' as seamlessly as possible.

Our country would be a dismally smaller version of itself without the NSF. ~Larry Smarr

By funding the PRP and other regional versions over the next five years, the NSF will work out solutions to that sort of thing, and you’ve got to believe that the NSF is looking toward eventually getting a national system in place.

But I think the NSF is very wise not to say what that is now — rather, let’s discover what works the best, have multiple teams working on it, and then come together with the best practices that we can then extend nationally on all the research campuses in the country.  


How do you view the role of the NSF as it informs scientific discovery, economic viability, and overall strength of the nation?

One of the reasons why America is preeminent in research and technology today in the world is because the NSF competitively funds the very best ideas across all areas of science and engineering. It has meant that there is a constant stream of innovation to harvest, and much of this investment in basic research has led to multi-billion dollar industries.

What's a Science DMZ?

Developed about six years ago by engineers at Energy Sciences Network (ESnet) and National Energy Research Scientific Computing Center (NERSC) the Science DMZ refers to an operationally-proven network architecture optimized for the transfer of large-scale scientific data.

The model includes recommended hardware devices, security policies, and network performance software which together provide the ideal environment for moving science data as efficiently as possible.

In practice, a Science DMZ creates an enclave on a campus network that is specially designed for science data (a vastly different data profile than a campus’ enterprise applications).

A Science DMZ recognizes that all the networked applications a university needs to run  — whether for science or for business — have variable needs. By applying best practices for data managment, the Science DMZ ensures the efficiency of science data and regular day-to-day university business applications is not impeded.

In the case of Science DMZs, this was another example of the DOE developing an idea with its national laboratory system and then the NSF moving it to an even larger scale across all research campuses in the country.

We last saw that 30 years ago, where the DOE labs developed computational science and engineering, supercomputer centers, mass storage systems, and all of that. And through the mechanism of the NSF a few of us were able to bring those resources into the university system.

So I think this cooperation between the agencies — the NSF and the DOE — has made giant impacts on our society and is something that is not as well understood.

Our country would be a dismally smaller version of itself without the NSF. 


You’ve been in the supercomputing field a while, and see a lot of cutting-edge technology. Do you still find this sort of thing exciting?

Oh, absolutely! I am as excited today at 68 years old as I was when I first realized you could solve partial differential equations with supercomputers in the early 1970s. But now in addition we’re measuring massive datasets about the world and using supercomputers to analyze it. So I’m doubly excited because we we can couple the computational models with unprecedented amounts of data on supercomputers to better understand the world.

In addition, we stand at the brink of literally a whole new kind of computer science and engineering, which will be biologically-inspired. Right now, we design and engineer supercomputers and smartphones, but nature has been evolving sentient creatures with nervous systems for half a billion years. The biologically evolved brain is a million times more energy efficient than the exascale computer we will engineer over the next decade — we can’t let that kind of advantage go unused.

We’ve got to figure out how brain-computing works, because we’re basically at the end of the road with conventional human-engineered computers.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2017 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.