No longer active, the Tevatron was host to the Collider Detector at Fermilab (CDF) and DZero experiments, and is recognized for the discovery of the top quark and for providing evidence for the existence of the Higgs boson, which was confirmed at CERN in 2012. Several years later, there is a continued effort to preserve the data resulting from the Tevatron's three-decade legacy.
The Run II Data Preservation system is expected to be sustainable through the year 2020. The project is moving progressively, having successfully tested both the CDF and DZero pilot systems. Tape migration is continuing on schedule, and both the hardware and software infrastructures have been running since February 2012. One of the biggest misconceptions about what data preservation entails, is that only the data is preserved on tape - when, in fact, the more difficult task is preserving the software and an environment on which it can run.
Willis Sakumoto, a senior scientist at Fermi National Accelerator Laboratory (Fermilab), confirms ongoing efforts to fully integrate CDF data into the Fermilab Intensity Frontier Structure and provide Run II documentation within the scope of the project. These efforts include running compatibility validation tests for the transition from Root4 to Root5, as well as the integration of the Cern Virtual Machine File System (CernVM-FS). "The project is well on its way to accomplishing its goal of handing off CDF analysis and documentation infrastructure to Fermilab Scientific Computing Division (FSCD) operations."
Michelle Brochmann, a student working on the DZero data preservation project, is also optimistic about the progress made thus far. "CernVM-FS facilitates cooperation among scientists by enabling them to access a consistent computational analysis environment." It has some nice features: the software appears local despite being stored remotely, and files are accessed quickly because CernVM-FS uses optimized, existing http infrastructure and only fetches files from the remote server as they are needed. "Fermilab has committed to help maintain the CernVM-FS for the next decade or so," adds Brochmann.
Challenges the Run II Data Preservation team must overcome include lack of new resources and manpower. Fortunately, scientists like Kenneth Herner and Bo Jayatilaka - who have worked on the DZero and CDF experiments respectively - recognize the value of the labor they are putting forth and the overall significance it could have for a scientist who may need to revisit a measurement or make new theoretical calculations. "This data has the potential to make new discoveries," says Jayatilaka.
The growing spread of digital science means not only data but also software preservation is of critical importance to the long-term value of research outcomes. As the magnitude of the experiments - both in cost and in labor - increase, the need for a common forum of usable data is amplified. In response, projects such as the Data and Software Preservation for Open Science (DASPOS) and the Study Group for Data Preservation in High Energy Physics (DPHEP) are working to expand and improve data preservation technology.
Sakumoto is planning to integrate the use of cloud-based technology as a possible analysis solution. Regardless of the methodology chosen, the need for sustainable data preservation will continue to increase as science advances, experiments become less replicable, and data sets become more unique.
- Hanah Chang