Published online 22 July 2008 | Nature | doi:10.1038/news.2008.967


Physicists brace themselves for LHC 'data avalanche'

Particle collider will produce 700 megabytes of data every second.

As physicists prepare to inject the first stream of particles into the Large Hadron Collider (LHC) in August, they are are bracing themselves for a 'data avalanche' from the multi-billion-dollar particle accelerator.

CMSThe Large Hadron Collider will play host to millions of particle collisions every secondCMS/CERN

Speaking at the Euroscience Open Forum conference in Barcelona, Spain, on 20 July, the LHC team from CERN, the European particle-physics centre near Geneva, Switzerland, revealed some of the mind-bogglingly large numbers involved.

The LHC will slam together bunches of protons moving at close to the speed of light, producing around 600 million collisions per second. It will take hundreds of thousands of computer processors to analyse the collisions, and these will pour out 700 megabytes of data a second. Were a year's worth of data from the LHC to be burned onto CDs, they would form a stack 20 kilometres high, the team says.

The first element of the LHC system consists of radiation-toughened custom electronics sitting next to the detector. These sift through every item of collision data, using algorithms written by hundreds of physicists from across the world, to pick out just a few hundred collision events worth studying in more detail.

There's a lot riding on the process, says Pere Mató, one of the CERN physicists involved in solving the information technology challenges associated with the LHC. "Once you decide to reject something you can't recover it. The hardware has to be high quality, otherwise after years you realize 'oh, I was throwing away what I was looking for'."

After this, a second wave of computing kicks in, further sifting the data and allowing physicists to search for new particles created by the smashing. They hope to uncover evidence for the Higgs boson, a hypothetical particle believed to confer mass on other particles in the quantum zoo.

Once all of this data has been analysed, even more computing power is needed to make sense of it. Alongside watching actual collisions in the LHC, physicists will be also be simulating around 25% of the events to compare theory with what actually does happen.

"The data analysis is like trying to run time backwards to understand if there was a Higgs there, or any kind of novel physics," says Gonzalo Merino of Port d’Informació Cientifica, a centre for scientific data processing situated in Bellaterra, near Barcelona. "We try to reproduce in silicon what nature is doing down in the detector."

The computing power for this is spread around the world and linked in to a distributed network called the CERN Grid. In the same way that the internet distributes information, grid systems distribute processing power.

Lessons learned from building the grid and dealing with the 'data avalanche' will benefit other fields, such as astronomy and medicine, according to Merino. "Particle physics is not the only discipline where this explosion of data is happening," he says. "The computing challenge other fields are facing is similar to what we are facing." 

Commenting is now closed.