• Subscribe

iSGTW Feature - Data for the people: how to share the data love

Feature - Data for the people: how to fast-track your network and share the data love

"You don't have to have a PhD to be interested in data and to want to analyze it. " Bob Grossman is working to make large data sets easily accessible and publicly available, even to children.
Stock image from sxc.hu

So you've used a grid to split up your job, process it faster, then return your results. You now have a nice chunky terabyte of data. What do you do with it?

Bob Grossman, Director of the National Center for Data Mining at the University of Illinois, Chicago, U.S., says the answer is share, share, share.

"In terms of impact on society, the ability to use transparently other people's data is going to be transforming," Grossman says.

"It is about 'network effects'," he continues. "In the same way that a network becomes more interesting as more people join it, you can draw more interesting conclusions about your own data if you put it into the context of other people's data."

A fine notion in principle

But how can you get these network-busting bundles of new data to the people who need them?

Simple, says Grossman. You just send them, to everyone and anyone who might like to take a look.

"Our motivation for the last ten years has been to create a web for data, so it's easy to browse, explore and download it. The system we built, called DataSpace, still controls who can write data, but we encourage anyone in the world to read it."

Driven by this ultimate goal, Grossman turned his eye to the networks: could they distribute large sets of data across thousands of miles, and all without wasting a second? No, not really, not at all.

The network effect: the more telephones you have access to, the more useful your telephone becomes. The same can be said of data sets, says Bob Grossman.
Image courtesy of Derrick Coetzee

Grossman describes the old faithful TCP internet protocol-still going strong after nearly 25 years-as "a huge success story," but, he says, new versions of TCP just weren't coming out fast enough to solve his problem in good time.

"It was clear the network would change, but we didn't want to wait ten years for that to happen. So we built our own infrastructure instead."

Enter the fast lane

UDT, or User Datagram Protocol (UDP)-based Data Transfer, is the result. Able to shoot data around the world at 10 gigabits per second, UDT compares well with the three or four megabits per second that standard TCP-as it was usually deployed-was achieving. "And if you're impatient like me…" jokes Grossman, "…I know which one I'd prefer."

UDT has enjoyed much initial success, winning the annual Bandwidth Challenge held at the SC06 super computing conference last November by transporting the 1.3 terabytes of Sloan Digital Sky Survey (SDSS) Data from Chicago to Florida, with a sustained data transfer rate of eight gigabits per second.

For those keen on a more global challenge, UDT was used just last month to move 1.4 terabytes of SDSS data from Chicago all the way to Moscow. The transfer was complete in about 4.5 hours using a one gigabit per second link.

Even more exciting, UDT is now an option for gridFTP.

This progress points in some interesting directions for Grossman and his team.

"We want to lower the cost of getting hold of other people's terabytes," he says. "I want to be able to find out, in just a few minutes, whether someone's data is going to be useful for my research."

When asked about the policy of some collaborations in restricting who can access their data, Grossman replied: "You don't have to have a PhD to be interested in data and to want to analyze it. And if you want to analyze it, you have to be able to touch it. We're building that infrastructure."

- Cristy Burne, iSGTW

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2023 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.