iSGTW is now Science Node Learn more about our evolution

  • Subscribe

Preserving three decades of Tevatron data

High-tech tape robots. Image courtesy Fermilab.

No longer active, the Tevatron was host to the Collider Detector at Fermilab (CDF) and DZero experiments, and is recognized for the discovery of the top quark and for providing evidence for the existence of the Higgs boson, which was confirmed at CERN in 2012. Several years later, there is a continued effort to preserve the data resulting from the Tevatron's three-decade legacy.

The Run II Data Preservation system is expected to be sustainable through the year 2020. The project is moving progressively, having successfully tested both the CDF and DZero pilot systems. Tape migration is continuing on schedule, and both the hardware and software infrastructures have been running since February 2012. One of the biggest misconceptions about what data preservation entails, is that only the data is preserved on tape - when, in fact, the more difficult task is preserving the software and an environment on which it can run.

Willis Sakumoto, a senior scientist at Fermi National Accelerator Laboratory (Fermilab), confirms ongoing efforts to fully integrate CDF data into the Fermilab Intensity Frontier Structure and provide Run II documentation within the scope of the project. These efforts include running compatibility validation tests for the transition from Root4 to Root5, as well as the integration of the Cern Virtual Machine File System (CernVM-FS). "The project is well on its way to accomplishing its goal of handing off CDF analysis and documentation infrastructure to Fermilab Scientific Computing Division (FSCD) operations."

Michelle Brochmann, a student working on the DZero data preservation project, is also optimistic about the progress made thus far. "CernVM-FS facilitates cooperation among scientists by enabling them to access a consistent computational analysis environment." It has some nice features: the software appears local despite being stored remotely, and files are accessed quickly because CernVM-FS uses optimized, existing http infrastructure and only fetches files from the remote server as they are needed. "Fermilab has committed to help maintain the CernVM-FS for the next decade or so," adds Brochmann.

Fermilab is home to the Tevatron, once the most powerful particle accelerator in the US and the second most powerful particle accelerator in the world. Image courtesy Fermi National Accelerator Laboratory (Fermilab).

Challenges the Run II Data Preservation team must overcome include lack of new resources and manpower. Fortunately, scientists like Kenneth Herner and Bo Jayatilaka - who have worked on the DZero and CDF experiments respectively - recognize the value of the labor they are putting forth and the overall significance it could have for a scientist who may need to revisit a measurement or make new theoretical calculations. "This data has the potential to make new discoveries," says Jayatilaka.

The growing spread of digital science means not only data but also software preservation is of critical importance to the long-term value of research outcomes. As the magnitude of the experiments - both in cost and in labor - increase, the need for a common forum of usable data is amplified. In response, projects such as the Data and Software Preservation for Open Science (DASPOS) and the Study Group for Data Preservation in High Energy Physics (DPHEP) are working to expand and improve data preservation technology.

Sakumoto is planning to integrate the use of cloud-based technology as a possible analysis solution. Regardless of the methodology chosen, the need for sustainable data preservation will continue to increase as science advances, experiments become less replicable, and data sets become more unique.

- Hanah Chang

Join the conversation

Do you have story ideas or something to contribute?
Let us know!

Copyright © 2015 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.