• Subscribe

iSGTW Image of the week - DILIGENT crunches Flickr over EGEE

Image of the week - DILIGENT crunches Flickr over EGEE


The DILIGENT team used the EGEE computing grid to process 37 million images from the online Flickr database in just 16 weeks. Approximately 1,000 grid jobs were submitted per day, with each job processing around 1,000 images.
Image courtesy of SAPIR

Ever wished for a more reliable way of searching for images on the Web?

Grid-enabled digital library project DILIGENT has recently completed a data challenge on image feature extraction that has taken us one step closer to just that: next-generation image searching.

Executed on the EGEE infrastructure, the recent DILIGENT challenge has created one of the world's largest collections of multimedia metadata to be made publicly available for research purposes.

37 million Flickr images in a flash

The DILIGENT team used the EGEE computing grid to process 37 million images from the online Flickr database in just 16 weeks. This computation generated approximately 112 million text and image objects-nearly 5 terabytes of data-containing more than 150 million extracted features. This is equivalent to an average processing capacity of over 300,000 images per day.

This unique collection will be used by the SAPIR project to develop new large-scale content-based data retrieval and automatic data classification techniques that combine both text and image content, expanding the limits of conventional search engines, which can only search text associated to images and audiovisual content.

The computational load required to generate this massive data collection was outsourced to DILIGENT, and then delegated to the EGEE Pre-Production Service Grid infrastructure via the gLite middleware. Approximately 1,000 grid jobs were submitted per day, with each job processing around 1,000 images. A total of 66,440 gLite jobs were submitted to the EGEE PPS resource broker and 44,333 of these jobs were successfully executed.

The data challenge lasted for 116 days, from 16 June to 9 October 2007, and was organized in three different phases. During the initial preparation phase experimental jobs were submitted to some EGEE PPS sites to test the feature extraction application and optimize the number of images to process per day. The next two phases involved actual execution of the data challenge, exploiting ten EGEE PPS sites that contributed their computational resources: University of Athens, Scuola Normale Superiore, ISTI-CNR, LIP, ESA-ESRIN, CERN, CESGA, University of Macedonia, Ben Gurion University, and CYFRONET. Four of these sites are maintained by DILIGENT partners.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2019 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.