To the Editor — For decades, particle colliders have exposed the fundamental building blocks of nature, most recently the Higgs boson, discovered at the Large Hadron Collider (LHC). In 2014, the Compact Muon Solenoid (CMS) experiment at the LHC took the unprecedented step of making a meaningful fraction of their data public. The CMS Open Data project (http://opendata.cern.ch/), now exceeding a petabyte of real and simulated collisions, has spawned several exploratory studies1,2,3,4, including our recent search for new particles5.
Why ‘unprecedented’? Collider datasets are huge and inherently complex. LHC proton collisions occur every 25 nanoseconds, and reconstructing the collision debris requires synthesizing information from hundreds of millions of readout channels. A filter (the ‘trigger’) discards all but the most interesting collisions, and accounting for its effects and those of the heterogeneous LHC detectors is challenging. The resources required to make such a complex dataset public and usable are substantial, but in short supply.
However, data from the LHC — whose successor is decades away — are priceless for future scientists and must be carefully archived, along with all necessary associated knowledge. As it is archived, the data should be made public, though not immediately. A delay of several years, enough for the experimenters who collected the data to perform thorough analyses, is appropriate; only those who spent years building the experiments have earned quick access. Furthermore, making LHC data ready for public use, with documentation and example code, requires significant funding and time.
But steady publication of LHC data has multiple benefits. First, it encourages prompt archiving, before collective memory fades and knowledge is lost. Second, other scientists can analyse the data while the LHC is still running, testing unconventional strategies and potentially leading to unexpected discoveries, new approaches and fruitful discussions. And third, as a by-product, these scientists can stress test the archiving methods; any deficiencies found are easier to fix now than later. In this way, public collider data can complement the overall LHC research effort. We, therefore, favour a slow but steady approach to full publication of the LHC experiments’ data; it is in the best interest of particle physics.
Larkoski, A., Marzani, S., Thaler, J., Tripathee, A. & Xue, W. Phys. Rev. Lett. 119, 132003 (2017).
Madrazo, C. F., Cacha, I. H., Iglesias, L. L. & de Lucas, J. M. Preprint at https://arxiv.org/abs/1708.07034 (2017).
Andrews, M., Paulini, M., Gleyzer, S. & Poczos, B. Preprint at https://arxiv.org/abs/1807.11916 (2018).
Lester, C. G. & Schott, M. Preprint at https://arxiv.org/abs/1904.11195 (2019).
Cesarotti, C., Soreq, Y., Strassler, M. J., Thaler, J. & Xue, W. Phys. Rev. D 100, 015021 (2019).
About this article
Physical Review D (2020)