Surprising observations are made by a recent study scrutinising the aetiology of complex diseases and the genetic mutations that they spring from.

Finding the common DNA variants that contribute significantly to genetic risk for common diseases is a key goal for medical science.1 However, while distinguishing mutations that cause disease from harmless single nucleotide polymorphisms (SNPs) is difficult enough for the relatively few genetic diseases that are inherited in a simple Mendelian manner, it is a daunting task for complex traits where pathology arises from the interactions of multiple genes with the environment. Point mutations exert a broad spectrum of effects on human health. The most fearsome are those that disrupt development: they cause embryonic morbidity and are seldom observed in postnatal disease; at the other end of the scale are base substitutions under few constraints, such as most common SNPs, and in between fall most of the 1000 or so2, 3 deleterious mutations that are carried by the average person. In a recent article in Proc. Natl. Acad. Sci. USA, Thomas and Kejariwal4 show what types of coding mutations we should expect in complex diseases. They find that these amino-acid changes mostly fall outside of conserved regions and cannot readily be distinguished from the coding sequence variation seen between healthy individuals.

These results are surprising because most amino-acid substitutions associated with Mendelian diseases are of conserved, and thus presumably essential, amino acids.2, 3, 5, 6 Hitherto, there has been little evidence that this would be any different for complex diseases. Thomas and Kejariwal give three possible explanations for their findings.

First, they suggest that coding mutations in complex disease might cause subtle and almost imperceptible alterations to molecular function. If this were true, researchers proposing molecular dysfunction in complex diseases would not be able to use sequence evolutionary information to prosecute their case. Mutations that are mildly deleterious are difficult enough to substantiate experimentally for Mendelian diseases; for complex traits that are multifactorial, it may well be impossible to detect the knockon effects of such subtle alterations.

Thomas and Kejariwal also entertain the possibility that some of the 37 cases of coding SNPs they examined might not directly contribute to complex diseases. They felt that this explanation was unlikely because their findings remain significant even for a reduced number of cases, those about which they were most confident. Nevertheless, it cannot be discounted that many of these coding SNPs, instead of directly contributing to the complex disease, may merely be closely linked to the real disease-causing polymorphisms. These may lie, for example, in adjacent noncoding sequence that regulates transcription or translation. This suggestion should find favour among those advocating hunts for causative SNPs within regulatory regions, for example, King and Wilson7 and Prokunina et al.8

The authors' final explanation is that lack of conservation may not rule out the functional importance of their disease-associated SNPs if these functions have been acquired only recently in primate evolution. Comparisons with more distantly related mammals might not show conservation if SNP sites have been evolving rapidly under adaptive pressures in our lineage. Adaptive evolution can be detected if KA/KS ratios9 between mouse and human genes are greater than one. However, though the KA/KS ratios for complex disease genes are elevated relative to randomly selected genes, the ratios are still much less than one. In any case, the issue here is whether, out of all the many codons in a gene, a particular complex disease-associated SNP has been changing adaptively. This cannot be determined simply by comparing the entirety of one gene from one species with that from another. At best, the jury is still out on this question.

A re-examination of the same KA/KS data, however, indicates a functional bias in complex disease genes that was not noted in the original article. Secreted proteins and transmembrane molecules, such as receptors, are greatly over-represented among those encoded by the 32 complex disease genes that Thomas and Kejariwal analysed. These number 8 (25%) and 15 (47%) out of 32, respectively, far more than just the six of each type expected from their frequency (∼20%) in the human genome.10 Transmembrane and secreted proteins evolve more rapidly than average.11 This alone would explain much of the elevation in KA/KS ratios seen for complex disease genes. Transmembrane and secreted proteins would be prime suspects in complex disease as they tend to have restricted expression profiles: the dysfunction of disease genes commonly afflicts few organs or tissues, rather than being systemic.12

Sites mutated in complex disease, according to Thomas and Kejariwal, are distinguished by their very ordinariness, being neither essential over long evolutionary time periods, nor different from healthy variation. If such data are representative of all complex diseases, it will be difficult in the future to associate single amino-acid changes with protein dysfunction and complex pathology. Fortunately, it seems that it may be much easier to construct a photofit picture of what an average complex disease gene might look like: it would encode a transmembrane or a secreted protein, and would be expressed in few tissues. Perhaps, like its Mendelian disease counterparts, it would also suffer more germline mutations than other genes.13, 14 Investigators hunting the culprits of complex disease might be satisfied even with these few clues when narrowing down their lists of suspect genesâ–ª