Introduction

Whole-genome sequencing (WGS) has the potential to revolutionize our approach to clinical genetic diagnostics [1,2,3,4,5]. One proposed advantage of whole-exome sequencing (WES) and WGS is the opportunity for periodic reanalysis of the data in individuals not diagnosed on initial testing [2, 6,7,8,9,10]. The Genome Clinic at The Hospital for Sick Children (Toronto, Canada) is a longitudinal multifaceted research project designed to integrate WGS into mainstream clinical practice [1, 2]. In a previous study, we prospectively recruited 100 paediatric patients referred for a clinical genetics assessment and meeting criteria for chromosomal microarray analysis (CMA) [1]. We found that singleton WGS identified diagnostic variants in 34 participants. This represented a 4-fold increase in diagnostic rate over CMA alone (8%), and a >2-fold increase over all genetic tests ordered by the clinicians (13%). In only two cases did targeted genetic tests lead to diagnoses not detectable by WGS: microsatellite analysis of parents and offspring for UPD14 (heterodisomy) and a methylation test for Silver–Russell syndrome. We have now systematically re-annotated and reanalyzed the WGS data from our original study, 3 years after the initial annotation. Explanatory variants have been discovered in seven (10.9%) of 64 previously undiagnosed cases, thereby increasing the cumulative diagnostic yield of WGS in the study cohort to 41%. These results provide further support for WGS as a first-tier genetic test.

Subjects and methods

The prospective recruitment and phenotyping of the study participants is described in detail elsewhere [1]. Families were eligible for this study if the proband met clinical criteria for CMA. The study was approved by the Research Ethics Board at The Hospital for Sick Children, and informed written consent was obtained for each participant. WGS was done as a singleton (not trio) experiment, using standard methods [1]. The WGS data were initially annotated in 2014, with all analyses completed by the end of 2015. These data were deposited in the European Genome–Phenome Archive (www.ebi.ac.uk/ega/) under accession number EGAS00001001623. The primary aims of the study were to compare the diagnostic rate of WGS with that of CMA alone, and with that of all genetic testing ordered in the course of routine clinical practice.

WGS variant calls were re-annotated in February 2017 at The Centre for Applied Genomics (Toronto, Canada) using a custom pipeline [1, 2]. This used recent downloads from publicly available databases for allele frequency, gene function, and human disease association. Molecular and clinical geneticists examined variant files and prioritized clinically relevant nuclear DNA variants using the following parameters: (i) sequence quality, (ii) allele frequency, (iii) conservation and predicted impact on coding and non-coding sequence, (iv) presence in ClinVar [11] or Human Gene Mutation Database (HGMD) [12], (v) genic phenotype in Online Mendelian Inheritance in Man (OMIM) and Clinical Genomic Database (CGD) [13], (vi) zygosity and genetic mode of inheritance, and (vii) relevance to clinical phenotype provided. One variant had initially been identified using an alternative analysis method [14], and another was previously identified and included in a case series describing a novel disease gene [15]. Updated phenotype data were extracted from the medical record. Candidate variants were classified according to the American College of Medical Genetics and Genomics (ACMG) guidelines [16], discussed with the referring clinician, and designated as diagnostic by consensus. These variants were then confirmed by Sanger sequencing in a laboratory with Clinical Laboratory Improvement Amendments (CLIA)/College of American Pathologists (CAP) certification. Inheritance of variants was determined via targeted analysis of parental DNA samples.

Results

New diagnostic variants were identified in seven (10.9%) of the 64 cases after reassessment of all sequence and structural variation in the WGS data (Table 1). All were single nucleotide variants (SNVs), and were successfully confirmed by Sanger sequencing. Five were designated as likely pathogenic or pathogenic using ACMG criteria [16]. The remaining two (in SMAD6 and ZNF711) were designated as variants of uncertain significance and returned to the families by the clinician as probable contributors to the respective proband’s phenotype. No diagnoses were made in these 64 study participants by any clinical genetic testing arranged in the interval period. No diagnoses were made by systematic reanalysis of the existing CMA data (data not shown). Thus, the seven new diagnoses increased the cumulative diagnostic yield of WGS in the entire study cohort to 41%, which represents a >5-fold increase over CMA and a >3-fold increase over all testing arranged in the course of routine clinical practice (Fig. 1).

Table 1 Seven diagnostic variants identified after reanalysis of whole-genome sequencing data
Fig. 1
figure 1

Diagnostic yield in a prospective cohort study after systematic reanalysis of whole-genome sequencing data. Bar plot showing percentage of study participants (n = 100) with molecular diagnoses via chromosomal microarray analysis (CMA), all clinical genetic testing performed in this cohort (CMA+), and whole-genome sequencing (WGS). The CMA and CMA+ diagnostic yields are significantly different (p < 0.0001) from the WGS diagnostic yield using a chi-square proportion test. Lighter blue colouring represents the new diagnoses made upon reanalysis of the WGS data

All seven variants were detected by the initial WGS experiments but not recognized as pathogenic. At the time of the first data annotation in 2014, five of the seven genes (AP3B2, HMGA2, KCNB1, SON, and WAC) were not recognized in OMIM to cause human disease [17,18,19,20,21,22]. In one case (SMAD6), the phenotypes of the individuals reported in the literature did not overlap the clinical presentation of our patient. Variants in SMAD6 have since been associated with craniosynostosis, in conjunction with incomplete penetrance [23]. For the SNV in ZNF711, there was felt to be insufficient evidence in support of pathogenicity at the time of the initial review. The identification of additional cases has now bolstered the argument for causality [24].

Discussion

A diagnostic rate of ~10% after reanalysis is consistent with that of a previous study that reanalyzed singleton WES data after a 1–3 year period (10%; 4 of 40) [7]. Reassessment of pre-existing data can be performed rapidly relative to performing new genetic testing. Currently, a main advantage is the ability to immediately capitalize on the discovery of new disease genes. For example, since 2015 the first three probands have been reported in the literature with HMGA2 sequence variants and a phenotype resembling Silver–Russell syndrome [17, 18]. The phenotype of Case 1096 was notable for intrauterine growth restriction, short stature, and other features (Supplemental file). Clinical genetic testing included CMA, methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) for 11p15.5 gene dosage and H19 hypomethylation, short tandem repeat analysis with DNA markers on chromosome 7 for uniparental disomy, and sequencing of PIK3R1. All results were negative or normal. Reanalysis of his WGS data identified the novel loss-of-function variant in exon 5 of HMGA2 (Table 1). Databases used for clinical annotation lag behind the fast pace of the published literature, and many diagnostic laboratories cannot afford to frequently validate new pipelines with updated downloads from these databases. The variant could have been missed, for example, because as of December 2017 there is no phenotype associated with HMGA2 in either OMIM or CDG. This further emphasizes the importance of periodic reanalysis.

This study was not designed to compare WGS with WES. Although the SNVs in Table 1 could potentially have been found with WES, other diagnoses we have made with WGS were (or would have been) missed [1, 2]. In one study that re-annotated and reanalyzed six undiagnosed WES trios, it was necessary in some cases to add coverage to detect the causal variant [8]. There are several reasons why periodic reanalysis of WGS data may result in more diagnoses over time than WES, such as: (i) more uniform and more comprehensive coverage, including within the exome; (ii) our improving ability to interpret variation in regulatory regions, deep intronic regions, and non-coding DNA; and, (iii) the superior detection of structural variation. More generally, advances in clinical annotation of genome-wide sequencing data [25], a trio (as opposed to singleton) design, and pairing WGS with ancillary RNA sequencing, may all further increase the diagnostic yield in our cohort.

These findings highlight periodic reanalysis as yet another advantage of genomic sequencing in the diverse paediatric population meeting criteria for CMA. The revised diagnostic yield of 41% in this cohort is similar to that observed in the second WGS study from the Genome Clinic (42%), which involved a heterogeneous group of 103 patients recruited from non-genetic paediatric subspecialty clinics and where data were annotated in 2016 [2]. We recommend reanalysis of an individual’s genome-wide sequencing data every 1–2 years until diagnosis, or sooner if their phenotype evolves. This should be part of pre-test counselling. Detailed phenotyping and the opportunity for reverse-phenotyping are essential, as WGS is both hypothesis-free and hypothesis generating. One limitation of clinical WGS is the relative shortage of those trained to medically interpret a genome. Another major practical and financial consideration is long-term data storage. With decreasing costs of sequencing and advancements in sequencing technology, it may become cost effective in time to periodically re-sequence a banked DNA sample rather than store and reanalyze pre-existing WGS data. Regardless of these factors, the data suggest that utilization of WGS early in the diagnostic odyssey warrants further consideration in routine clinical genetics practice.