The genes underlying mendelian disorders have for the past several decades been identified through positional cloning, a process of meiotic mapping, physical mapping and candidate-gene sequencing1. Recently, whole-exome sequencing combined with a filtering methodology was demonstrated as an approach to identify the gene underlying a mendelian disorder using a small number of affected individuals, with a proof-of-concept study that correctly identified the gene previously known to underlie Freeman-Sheldon syndrome2. Now, on page 30 of this issue, Michael Bamshad and colleagues3 report the gene underlying an uncharacterized mendelian disorder, Miller syndrome, using the same strategy. Miller syndrome, also known as postaxial acrofacial dysostosis (MIM#263750), is a rare malformation syndrome that comprises anomalies including cleft palate, absent digits, ocular anomalies and others. The identification of the gene mutated in this disorder will allow improved diagnosis and a starting point for biological investigations, but the real advance of these two studies is the demonstration that this approach can be used to characterize the genetic basis of rare monogenic disorders.

Exome sequencing approach

Ng et al.3 sequenced the exomes of four individuals with Miller syndrome, including two siblings. The authors used an approach of targeted exome capture, including enrichment by array hybridization to 164,000 targets defining the exome, followed by sequencing in a massively parallel short-read sequencer to sequence at 40-fold coverage. They used a stepwise filtering approach to screen the identified variants in order to select those likely to be implicated in the disorder (Fig. 1). They first screened for genes that contained nonsynonymous variants, splice site mutations or coding indels. They then compared the four exomes to those of eight control individuals (from HapMap and reported in ref. 2) and to the dbSNP database to exclude common variants. Finally, they excluded variants predicted not to be damaging by PolyPhen software. Although Miller syndrome was suspected to be recessive, they also tested a dominant model, and the recessive model fit the data best. They identified eight candidate genes under a dominant model, and only one, DHODH, under a recessive model. The four individuals with Miller syndrome were found to have six rare variants in DHODH. The excitement surrounding this study centers on the brute-force approach of exome sequencing combined with filtering that identified the disease-causing gene. This new approach will be critical to uncover the genes underlying rare mendelian traits, especially where the number of available individuals for study is small.

Figure 1: Exome sequencing and filtering strategy.
figure 1

In Ng et al.3, the list of variants from the exome sequences of four individuals with Miller syndrome was first screened to select for genes found to have two nonsynonymous, splice site or indel sequence variants in each of the individuals. This list was then compared to the exome sequences of eight healthy controls2 and dbSNP to exclude common variation and combined with a filtering strategy used to narrow the list of likely candidate genes underlying this rare disorder.

Ng et al.3 interrogated four DNA samples with one technique (exome sequencing), but a set of exome data could also be interrogated with, for example, transcriptome sequences, proteomic data or methylation data. By integrating an exome dataset with a transcriptome dataset, one might identify abnormal mRNA isoforms caused by a previously unrecognized deep intronic splicing variant. Although Ng et al.3 used a minimal amount of linkage data in their analysis (their sibling-pair requirement in the recessive model for the same two rare variants in the putative gene must be considered linkage analysis), one could envision using more linkage data in future analyses. Although costs are not yet delineated, exome selection and sequencing is clearly less expensive than whole-genome sequencing at high coverage, making this approach more practical for groups studying rare mendelian traits.

New statistical metrics

The many ways in which genomic datasets could be combined in the search for genes underlying mendelian disorders suggests the need for new methods to assess statistical significance. Ng et al.3 found six rare variants in the DHODH gene among four affected individuals and followed this by sequencing an additional three unrelated individuals with the disorder and a sibling in the second of the three initial families described above, for a total of 11 mutations found in six families.Together, this provides convincing evidence that the mutations in DHODH cause Miller syndrome. This also raises the question of how to assess significance, as there exists some threshold below which gene identifications by exome sequencing with filtering will arise by chance alone. Ng et al.3 provide one approach by measuring the average frequency of ‘new variants’ per gene across the genome, as found by comparing their newly sequenced exomes to the common variation found in dbSNP. They squared that frequency to reflect the recessive model, cubed it for the three initial kindreds, and applied a Bonferroni correction for 17,000 genes. This yielded a significance threshold of 1.5 × 10−5. Challenges to this approach will arise as it is generalized to other disorders. In addition, as the number of sequenced exomes rises, dbSNP will become populated by uncommon variants. We will need statistical metrics to distinguish false positives from true positives. We will also need metrics to account for locus and etiologic heterogeneity, which may often be unrecognized. Until these are available, we will have to rely on simple measures of coincidence and on supplemental proof such as animal models and functional studies.

Secondary findings

A fascinating finding was noted by Ng et al.3: the affected sibling pair here had a history of recurrent infections. It was difficult to determine whether this was an uncommon manifestation of Miller syndrome, a recessive contiguous gene syndrome or an unrelated disorder. Such complications continually bedevil clinicians who study and care for individuals with rare disorders, as most often the knowledge of the disorders and their molecular pathophysiology is insufficient to distinguish these possibilities. Ng et al.3 found that these siblings were compound heterozygotes for DNAH5, a known cause of primary ciliary dyskinesia, which manifests as bacterial infections of the respiratory tract4. Therefore, the primary ciliary dyskinesia was coincidental to the Miller syndrome, has no implications for others with Miller syndrome and facilitates the care for the family under study.

This leads to the question of how many clinically relevant mutations reside in these four exomes or can be expected to be found in other sequencing projects. Should individuals participating in a sequencing study receive all or part of their sequence? How should results from minors be handled, especially those regarding adult-onset conditions or carrier status? Who would analyze these sequences, and what tools would they use? Who will deliver these datasets to the participants and interpret them and their clinical relevance? Is it appropriate to return results to patients in settings where it is impossible to implement care for the clinically relevant variants? To answer these questions, we will need clinical and behavioral studies of participants in genome sequencing studies, we will need their preferences and abilities to interpret these data, and we will need to explore different approaches to returning these data to participants and study how they use the data. The task going forward is to rapidly explore the multifaceted challenges associated with this technology so that we can not only discover the causes of rare diseases, but also move toward a future where whole-exome and eventually whole-genome sequences of individual patients lead to improvements in medical care.