Genetic data from patients will soon be flooding doctor's offices. Credit: Kyu Oh /iStockphoto

A computer program that predicts the effects of gene mutations has earned its author a doctorate, a stack of journal publications — and now a dancing wind-up toy named Molly.

Yana Bromberg, a bioinformatician at Rutgers University in New Brunswick, New Jersey, won the toy for her program, SNAP, in an experimental contest that culminated on 10 December in Berkeley, California. The competition, called the Critical Assessment of Genome Interpretation (CAGI), asks researchers to predict the biological effects of different mutations, and compares their results against unpublished experimental data.

The contest was conceived by Steven Brenner, a computational genomicist at the University of California, Berkeley, and John Moult, a computational biologist at the University of Maryland in Rockville. Their goal is to accelerate the development of software that can quickly interpret large amounts of genetic data — for example, the whole genome sequence of a tumour from a biopsy.

Data mountain

Such data is already flooding labs and will soon be hitting doctors' offices. "We've already got an enormous amount of data to contend with and we're struggling to make sense of it," says Moult. "I see CAGI as one mechanism to help with that process."

He helped to start a similar competition in 1994, to improve scientists' ability to determine the shapes of proteins from their amino-acid sequences. That effort, named the Critical Assessment of protein Structure Prediction (CASP), challenges scientists to predict protein structures that have been determined experimentally, but not yet published. The results are revealed at a biannual meeting in Pacific Grove, California.

CAGI works in a similar way. Instead of proteins, Brenner, Moult and coordinator Susanna Repo, a postdoc in Brenner's lab, provided several challenges that typically involved determining the biological effect of mutations in particular genes and the proteins they encode.

For instance, one challenge provided entrants with different variations in the cancer-associated gene CHEK2 that had been uncovered by a study of the gene in patients with cancer and healthy people, but not yet published. CAGI participants were asked to determine whether given mutations belonged to a patient or a control.

We've got an enormous amount of genetic data to contend with and we're struggling to make sense of it. John Moult , University of Maryland, Rockville

Each team tackled these challenges differently. But their entries generally involved either predicting how a certain mutation changes the shape and function of a protein, or scouring genetic databases to determine the effects of similar mutations. "The ones that did best combined a large number of methods together," says Brenner.

Despite being hastily organized — some of the challenges were posted just a couple of weeks before their deadlines — CAGI drew more than 100 entries. About 40 people made the trip to Berkeley to learn the results and to collect prizes, which Brenner awarded to anyone who gave a talk on their approach.

Although the organizers were apprehensive about how the contest would work, "it went as well as it possibly could have", says Brenner. He and his team are still analysing the entries, and hope to reveal the official results in a peer-reviewed publication. On the basis of the success of the Berkeley workshop, they plan to hold the contest again within 2 years.

A challenge too far

There were a few hitches, however. One challenge proved so difficult to tackle at short notice that it generated no entries. In another, to predict the consequence of mutations in the tumour-suppressor gene P53, the relevant experiments became contaminated with mould, so the entries could not be compared against real data in time for the 10 December meeting.

Joost Schymkowitz and Frederic Rousseau's team at the Free University of Brussels worked on two of the problems. They fared better on one challenge than on the other, but Schymkowitz points out that failures can be as illuminating as successes because they highlight the shortcomings of particular approaches. "It makes you acutely aware of things you cannot do," he says.

Scott Kahn, chief information officer at the gene-analysis company Illumina in San Diego, California, who attended CAGI as an observer, says that the contest should help to speed up advances in genome prediction. "This does really focus effort in the community," he says.

Brenner, meanwhile, points out that protein-structure predictions improved greatly after CASP started. "Our hope is that same thing will happen here."