Researchers have developed software to predict whether certain genetic variants are harmful.

The effects of most mutations are unclear, especially for those in the 99% of the genome that does not code for proteins. Chris Tyler-Smith at the Sanger Institute in Hinxton, UK, and Mark Gerstein at Yale University in New Haven, Connecticut, and their colleagues took non-coding regions that had been identified as functional in a large-scale genomics project and used sequencing data from more than 1,000 people to catalogue how these regions varied in healthy individuals.

This revealed likely patterns of harmful mutations, such as those in DNA sequences to which regulatory proteins bind. The scientists incorporated the patterns into a predictive tool and applied it to genomes from cancer biopsies. This found nearly 100 non-coding variants that could contribute to the disease.

Science 342, 84 (2013)