Lack of long-term preservation plan threatens to leave key information inaccessible for future analysis.
Four months before the Tevatron shuts down for good, physicists at Fermilab's giant particle collider near Batavia, Illinois, are pulling out all the stops to collect every last bit of data that they can. But some worry about what will eventually happen to the trove of data — approaching 20 petabytes (20 × 1015 bytes) — amassed over the machine's 26-year life. Although there is funding to continue sifting the data for traces of the Higgs boson and other subatomic prizes for the next five years, so far there is no plan and no budget for preserving them in the longer term.
At a workshop on data preservation at Fermilab on 16–18 May, some physicists called for that to change, arguing that Tevatron data could prove useful as an independent check on its successor, the Large Hadron Collider (LHC) now operating at CERN, Europe's particle-physics lab near Geneva, Switzerland. If researchers suspect that the LHC has spotted new physics, particularly at the lower end of its energy range, the claim could be tested for consistency with Tevatron data, says Rob Roser, spokesman for the Collider Detector at Fermilab (CDF), one of the Tevatron's two principal experiments.
Although many fields of science, from genomics to astrophysics, put substantial resources into archiving data and making them publicly available, the norm in particle physics has, until recently, been very different. When the analysis of data from an experiment trickles to a halt, researchers typically move on. The data languish or are even destroyed to make storage space available for something else. When the Tevatron was built, "we did not think about data preservation", says Qizhong Li, computing coordinator for D0, the other main experiment at the Tevatron. "This is a rather new concept."
Both D0 and the CDF expect to lose their dedicated computing infrastructure over the next five years. A gradual loss of knowledge about how to deal with the complex data, which includes raw detector readouts, reconstructed particle trajectories and higher-level analyses, could also present a serious hurdle to exploiting the data in the future.
Such neglect would be a mistake, says Cristinel Diaconu of the Centre for Particle Physics in Marseilles, France, who leads the H1 experiment at the Hadron–Electron Ring Accelerator (HERA) in Hamburg, Germany, which closed in 2007. "We always have new ideas that can be used to reanalyse data," he says. Diaconu, who is chair of an international study group on data preservation and long-term analysis in high-energy physics, estimates that good data preservation can increase the scientific potential of an experiment by 10% for less than a 1% increase in cost.
At BaBar, an experiment that produced B mesons at the Stanford Linear Accelerator Center in Palo Alto, California, until 2008, physicists have begun building a US$500,000 archival system that will save the raw data and software, and are also setting up virtual interfaces to run the older software on modern machines. "We have decided to save everything at least to 2018," says Tina Cartaro, BaBar's computing coordinator.
Other experiments are adopting cost-saving compromises. Collaborators on the HERA H1 experiment have decided that it is not necessary to keep all their seven attempts at reconstructing particle events from the raw data. "Our initial thinking was to keep everything, but we now think we will keep three iterations," says David South, computing coordinator for H1 at the Technical University of Dortmund in Germany.
The Tevatron can learn from those examples, says Roser. But Li says it's a tougher job than it would have been if data preservation had been planned for from the beginning. In addition, over the next five years, thousands of tapes' worth of data will somehow have to be migrated to newer, higher-density storage and a suitable retrieval system to go with it.
The situation at the LHC is strikingly different. Computing specialists there are already working towards permanent archiving of the data, says Elizabeth Sexton-Kennedy of Fermilab, who works on computing systems for the CMS, one of the LHC experiments. In its short lifetime, the LHC has collected five times as much data as the Tevatron. All the raw data are being kept, although the CMS is saving space by deleting old attempts at reconstruction when they are surpassed by newer ones. "When we know things much better, we delete older knowledge," says Sexton-Kennedy. "It's the tension between the old and the new."
Particle physicist Siegfried Bethke of the Max Planck Institute for Physics in Munich, Germany, who spent two years reconstructing data that hadn't been maintained from PETRA, a positron–electron collider that ran from 1979 to 1986 at the DESY accelerator in Hamburg, told the data-preservation workshop that his experience indicates that better planning is really vital. "These data have cost a lot of money to the taxpayer and not conserving them would be a crime," he said.
Related links in Nature Research
Related external links
About this article
Cite this article
Samuel Reich, E. Tevatron's legacy set to disappear. Nature 474, 16–17 (2011). https://doi.org/10.1038/474016a