Is it possible to predict just how harmful a mutation is without doing functional studies? Eric Stone and Arend Sidow have devised a method of predicting how a mutation might affect the function of a protein — using sequence information alone.

Their method — which the authors call multivariate analysis of protein polymorphism (MAPP) — hinges on the idea that it is the mutations at evolutionary conserved sites that impair protein function and lead to disease. The authors reasoned that, for each position in the protein, the variation between orthologous (and functionally similar) genes describes the range of physicochemical variation that is tolerated by the protein's function.

For a given protein, they aligned several orthologues and closely related paralogues, and from this estimated the physicochemical constraints on each amino-acid position using standard physicochemical scales such as hydropathy. From the mean and variance of each physicochemical property, the authors could estimate the magnitude of the constraints for each position in the protein.

They then determined the physicochemical dissimilarity of protein variants to the observed evolutionary variation — they called this a 'MAPP score'. A high MAPP score tells us that the protein variant is physicochemically different to the observed 'neutral' evolutionary variation and is therefore potentially deleterious.

The authors then tested their method using empirical data from four published studies (on Escherichia coli, bacteriophage T4 and human immunodeficiency virus) that had assayed single-substitution protein variants for their degree of functional impairment. They found that the calculated MAPP scores of the protein variants correlated well with protein function: deleterious variants had high MAPP scores, and positive variants had low MAPP scores. Furthermore, the MAPP scores were able to discriminate between intermediate and negative variants within the deleterious class of mutations.

Crucially, they also tested whether their MAPP scores could reflect the severity of disease-causing mutations. Variants of β-haemoglobin cause a range of anaemias, and the authors found that MAPP scores for human β-haemoglobin proteins were quantitatively consistent with clinical severity. Intriguing correlations were obtained for cancer too — it is generally observed that the mutations in the tumour suppressor p53 that lead to tumour progression are more common in tumours than less-deleterious or neutral mutations. Sure enough, the more common somatic p53 variants had higher MAPP scores.

The next, more ambitious step will be to extend the method to combine MAPP predictions for multiple disease loci or modifiers.