Introduction

Congenital anomalies and intellectual disability (CA/ID) comprise a vast, heterogeneous group of disorders, encompassing more than 3,000 different clinical entities, individually rare but collectively frequent. Most CA/ID are of genetic origin and incurred via Mendelian inheritance. Because the prevalence of each disorder is low and a large portion of the molecular bases of CA/ID are still unresolved, their diagnosis remains challenging. These chronic, early-onset disorders contribute significantly to morbidity, mortality, and health-care expenditure,1 and their etiologic diagnosis is essential for genetic counseling, prenatal testing, accurate follow-up, prevention of complications, and personalized treatment.2 The current standard of care for the diagnosis of CA/ID includes multiple clinical evaluations by specialized physicians, and countless paraclinical investigations such as imaging, metabolic, and biological tests, which are potentially invasive for patients. The genetic investigations include cytogenetic tests and successive single-gene testing, and more recently gene panels. This long and tedious traditional approach leaves approximately half of the families with no diagnosis.3

Next-generation sequencing (NGS) has revolutionized medical genetics by improving the chances of obtaining a molecular diagnosis for rare genetic diseases. NGS was initially applied in research, and different strategies were considered to implement NGS for diagnostic purposes. Whole-exome sequencing (WES) has shown an unprecedented success rate in the identification of disease-causing genes in projects ranging from tailored sequencing used to discover the molecular bases of a recognizable syndrome in a homogeneous group of patients, to the systematic application of pan-genomic sequencing in large heterogeneous cohorts.4 The usefulness of an unbiased sequencing approach has been highlighted in various heterogeneous disorders, including categories of CA/ID, such as syndromic ID,5 developmental delay (DD),6 autism,7 epilepsy,8 and congenital heart defects.9

Later, a more accurate interpretation of the data and a reduction in sequencing costs enabled its widespread implementation in clinical practice. Between 2010 and 2015, about 555 genes implicated in Mendelian phenotypes were discovered using NGS. This has resulted in WES becoming the current standard of care for the diagnosis of highly heterogeneous rare disorders with suspected Mendelian inheritance,10 thus blurring the line between diagnosis and research. The widespread application of this test in cohorts of patients with undefined CA/ID allows a diagnostic yield ranging from 25 to 32% (refs. 11, 12, 13, 14. This diagnostic yield corresponds to the identification of a disease-causing variant in a gene previously implicated in a human disorder and with a published compatible phenotype. The sequencing strategy may vary from center to center with trio-based or proband-based WES. Although the diagnostic yield should not vary, the likelihood of identifying a candidate variant for a new disorder may depend on the phenotype and the chosen sequencing strategy.13, 15

This recent acceleration in the discovery of disease-causing genes makes it difficult for physicians to remain up to date with genetic medical knowledge. Initially, the routine use of WES demonstrated the limitations of usual phenotype-driven strategies, based on the clinical expertise of physicians in reference centers for rare diseases, especially in the following situations: (i) atypical presentations of known diseases making it hard to make the diagnosis at first sight; (ii) ultrarare diseases described in very few cases and therefore unknown to most specialists, and (iii) patients exhibiting a specific but only recently discovered phenotype. International data sharing is an efficient solution that overcomes these limitations. By catalyzing the identification of additional patients with similar phenotypic and genotypic profiles, initiatives such as the Matchmaker Exchange project16 allow fast and accurate phenotype matching to assess the clinical relevance of candidate variants and genes. Reanalyzing and reinterpreting clinical WES data from large research cohorts is also proving to be an effective way to reveal new disease-causing variants. In a clinical context, only three articles have assessed the relevance of reanalyzing data. The first focused on data reanalysis and reported an additional diagnostic yield of 10% in 40 patients.17 The second, in a series of 2,000 sequential cases submitted to Ambry Genetics for testing prior to 2016, showed that 5.6% of cases that initially received negative or candidate results were upgraded to positive/likely positive or uncertain in a characterized gene.18 The third one reported seven changes in the result for 14 reanalyzed cases performed by the molecular laboratory 12 to 18 months after the initial report, of which four resulted in a new definitive diagnosis.19 These observations led us to adapt our WES-based clinical practice and diagnostic process by setting up systematic reanalysis and international data sharing.

This retrospective study reports the results and consequences of implementing clinical WES in our current diagnostic practice, and of introducing a systematic reanalysis strategy of unsolved results combined with translational research for candidate genes in a cohort of 416 consecutive patients with CA/ID.

Materials and methods

Patients

From June 2013 to June 2016, WES was performed in 323 patients referred to the Reference Center for Congenital Anomalies and Malformative Syndromes in Dijon, France, for an etiological diagnosis and in 93 patients referred to the Orphanomix service (http://www.orphanomix.com/index.html) by other centers in France.

The inclusion criteria were (i) signs of ID or DD when the age of the patient (<6 years) did not permit a diagnosis of ID or the presence of at least one congenital anomaly with or without ID of suspected genetic origin; (ii) a negative prior diagnostic workup; and (iii) informed consent of the patient or parents/guardians for inclusion. Fetuses with multiple malformations were not included in this study. Array-comparative genomic hybridization was systematically performed before WES in patients with DD, isolated or syndromic ID (associated with dysmorphism or one congenital anomaly), autism spectrum disorders, or pre- or postnatal malformations (two or more), as well as for the characterization of an anomaly detected by another cytogenetic method. In most patients with a convincing diagnostic etiology, a targeted genetic test (single-gene or gene panel) was first ordered. The prescription of WES or gene panel testing was discussed weekly by a group of trained physicians and depended on (i) the clinical and genetic heterogeneity of the suspected disorder and (ii) the availability, turnaround time, and cost of a targeted approach. The conduct of the pretest consultation has been detailed elsewhere.12 The local ethics committee approved this study.

Standardized deep phenotyping

Patients were separated into three major phenotype groups according to the clinical indication: neurodevelopmental disorders, CA without ID/DD, and neuromuscular disorders. The neurodevelopmental disorders group was divided into four subgroups: nonsyndromic ID, syndromic ID (defined as ID with CA and/or dysmorphism), epileptic encephalopathy (EE), and syndromic DD (Figure 1b). The proportion of patients with EE was higher during the first year. This initial overrepresentation of patients with EE can be explained by the work done in our center during the first year of inclusion, which focused on the diagnosis of EE by WES12. Detailed phenotypic data were anonymously collected in the PhenomeCentral database (https://phenomecentral.org/) using the standardized Human Phenotype Ontology terms.

Figure 1
figure 1

Overview of phenotype distribution in the 416 patients. (a) The phenotype distribution among the 416 patients. (b) The phenotype distribution among the 323 patients of the neurodevelopmental disorders subgroup. CA, congenital anomaly; ID, intellectual disability; PMD, psychomotor delay.

Whole-exome sequencing

Sequencing and bioinformatics analysis

In all index cases, libraries of genomic DNA samples were prepared using the Agilent Sureselect Human All Exon v5 kit (Agilent Technologies, Santa Clara, CA), and were sequenced on a HiSeq instrument (Illumina, San Diego, CA) according to the manufacturer’s recommendations for paired-end 76-bp reads. The bioinformatics pipeline, alignment processes, and quality procedures have been described elsewhere.12 Version 3.4–46 of the Genome Analysis Toolkit was used for this study. Among the 416 patients, 82 (19.7%) were analyzed during year 1, 119 (28.6%) during year 2, and 215 (51.7%) during year 3 (Figure 1a).

Copy-number variant detection

The in-house pipeline for copy-number variant (CNV) detection was developed in November 2015. CNV analysis was retrospectively applied to all patients. The procedure is detailed elsewhere12 and in the Supplementary Data online.

Variant interpretation strategy

The diagnostic interpretation of the filtered variants was done according to the American College of Medical Genetics and Genomics (ACMG) recommendations of 2008 and 2015 (refs. 20, 21) during the first 2 years and the third year of the study, respectively. The detailed diagnostic interpretation procedure has been reported elsewhere12 and is described in the Supplementary Data. The familial segregation study is also detailed in the Supplementary Data. The 56 genes on the list of medically actionable secondary findings defined by the ACMG were also studied and interpreted according to the ACMG recommendations available at the period of reanalysis.22 The results were returned to the patient when consent had been given.

Annual reanalysis

Negative and uncertain results were reanalyzed from the raw sequencing data stored as compressed fastq files (Supplementary Data). All variants of the final analysis file were interpreted. The interpretation first focused on variants previously and newly reported as pathogenic/probably pathogenic in public databases of clinical interest (ClinVar, http://www.ncbi.nlm.nih.gov/clinvar; DECIPHER, https://decipher.sanger.ac.uk/) or as affecting well-established human disease genes. The interpretation was then extended to all of the other variants, namely those not meeting the diagnostic interpretation criteria. For relevant variants presenting a good genotype–phenotype correlation, but reported in an insufficient number of patients (only one family, one single isolated population) or in several patients of a large cohort without clinical details, we actively searched for additional patients carrying variants in the same gene with a similar phenotype through national collaborations or international data sharing to confirm the genotype–phenotype relationship. This strategy was also used for atypical presentations or new phenotypes linked to an already known gene, but reported only once in the literature or presented in congresses. Reverse phenotyping and data sharing were widely used in these cases to compare and gather patients with the same mutated gene, and look for common clinical features, and thus increase the recurrence. For variants in genes never associated with human disease, the ACMG interpretation criteria were partially applicable. We based on the evidence proposed by the ACMG guidelines with particular attention to the encoded protein function, functional studies, animals models, and an intensive search of new patients with a similar phenotype carrying a variant in the same gene through a translational research approach23 (Figure 2).

Figure 2
figure 2

Representation of interpretation approach during reanalysis. *Reported as pathogenic/probably pathogenic in public variant databases of clinical interest (ClinVar) or affecting a gene associated with a human disorder referenced in OMIM (http://www.omim.org) or manually curated via PubMed (https://www.ncbi.nlm.nih.gov/pubmed/); VUS, variant of unknown significance.

Time to diagnosis

The time to diagnosis was calculated only for patients seen in our local center with a positive diagnostic result obtained after the first analysis. It corresponded to the overall duration of the diagnostic process, from the first consultation in our center to the date of the WES report.

Results

Patients

Between 1 June 2013 and 30 June 2016, 421 patients were referred. Five patients were removed from the analysis because of failed quality control of the sequencing data and the absence of biological samples to repeat the analysis. Overall, 323 patients (77.6%) were referred by the Reference Center for Congenital Anomalies and Malformative Syndromes of Dijon by seven geneticists, and 93 patients (22.4%) were referred by 31 geneticists from 18 French institutions. The 416 patients included 171 females (41%) and 245 males (59%) (sex ratio = 1.4), with 84.4% children (351 patients) and 15.6% adults (65 patients). The mean age of patients when samples were sent for sequencing was 6.7 years for children and 30.7 for adults (global average: 10.5 years). Thirty-seven patients were born to consanguineous parents.

Molecular results

Over the 3 years and after two reanalyses, 128 of the initial 416 patients (30.8%) received a positive molecular diagnosis. The distribution of the Mendelian mode of inheritance and variant type are presented in Figure 3. Of the 79 autosomal-dominant conditions diagnosed, 64 (81%) arose as a result of a de novo variant, 11 (13%) were inherited, and four (5%) were undetermined due to a lack of parental samples. Among the 34 autosomal-recessive disorders, there were 16 (45.7%) cases of homozygosity, and 18 (51.4%) cases of compound heterozygosity. Of the 15 X-linked disorders, 8 (53%) were linked to a de novo variant and 7 (47%) were inherited. The three pathogenic or probably pathogenic CNVs included only deletions, which ranged from 0.637 to 4.5 kb (Supplementary Table S1a). The resolution of the array-comparative genomic hybridization was not sufficient to detect these CNVs. The 148 pathogenic or likely pathogenic single-nucleotide variants or CNVs occurred in 104 different genes.

Figure 3
figure 3

Distribution of Mendelian mode of inheritance and variant type for cases with positive molecular diagnosis, variants of unknown significance, and secondary findings. AD, autosomal dominant; AR, autosomal recessive; CNV, copy-number variant.

In addition, 39/416 patients (9.4%) remained with an uncertain result (Figure 3).

Molecular results after the initial analysis

The global initial diagnostic yield was 25% (104/416), ranging from 22% in year 1 to 27.4% in year 3 (Figure 4). Six of the 416 patients had a medically actionable secondary finding in one of the 56 ACMG genes (Supplementary Table S1b) and 2 of these 6 patients harbored a secondary finding with no etiological diagnosis. The pathogenic and probably pathogenic variants, corresponding to a positive result, and the variants of secondary findings are being submitted to the ClinVar database (submission ID: SUB2609258).

Figure 4
figure 4

Evolution of diagnostic yield regarding prospective reanalysis.

Annual reanalysis

The first annual reanalysis was performed for patients with a nonpositive result included in year 1 (64/82 patients) and in year 2 (92/119 patients), meaning 156 of the 416 (37.5%) patients. In the first reanalysis for year 1 patients, we obtained one additional positive result (1.2%) using a strict diagnostic approach and four (4.8%) by a translational research approach, leading to an additional diagnostic yield of 6% (5/82). For the first reanalysis for year 2 patients, we obtained 10 additional diagnoses by a diagnostic approach, corresponding to a supplemental diagnostic yield of 8% (10/119) and 2 (1.7%) by a translational research approach. In the second reanalysis for year 1 patients, we obtained one additional positive result (1.2%) using a strict diagnostic approach and six (7.3%) by a translational research approach (Figure 4). The results for the reanalysis of patients included in year 3 are not complete and not yet available because the period of 12 months was not finished at the time of submission.

Finally, among the 156 reanalyzed patients, 24 results (15%) became positive. Of these 24 positive results, 14 were initially negative, and 10 were initially uncertain because of the lack of data in the literature or because of the nondetection of a second event in recessive disorders. Twelve of the 24 cases were reassessed through usual diagnostic processes thanks to a recent publication, reconsideration of the first interpretation, or the detection of a CNV (Table 1). The other 12 cases were resolved through a translational research strategy using proactive international data sharing and publications of our team or in collaboration with other teams. This strategy allowed us to identify at least five novel genes, not previously associated with a human disease (KCNA2,24FIBP,25AP3B2,26SLC13A5,27 and MAB21L128). We collected as much evidence as possible to fulfill the ACMG criteria and thus to consider these results as certain/diagnostic results.

Table 1 Diagnoses obtained by reanalysis of WES data

Phenotype distribution and diagnostic yield per phenotype group

Over the 3 years, we observed a globally similar distribution of the phenotype groups with a majority of patients presenting neurodevelopmental disorders (n = 323), which is in keeping with the recruitment criteria for our cohort (Figure 1a).

The highest global diagnostic yield after two reanalysis procedures was obtained for the neuromuscular disorders group (33%), followed by the neurodevelopmental disorders group (31.5%) (Supplementary Figure S1). Chi-squared tests did not show any significant difference in the diagnostic yield between groups. We observed an increase in positive results at each reanalysis for neurodevelopmental disorders, and a clear improvement for syndromic ID and EE (Supplementary Figure S2).

Time to diagnosis

Over the 3 years, the median time to diagnosis was reduced by 9 months (Supplementary Figure S3). The reduction was greatest between the first and the second year (6 months), which is in keeping with the complete implementation of diagnostic WES in our current diagnostic process in June 2014.

Time devoted to research reanalysis by the laboratory and clinical teams

It took the bioinformaticians a few hours to update the in-house pipeline and reanalyze the raw data, but pipeline development for CNV detection required approximately 600 hours of work. After negative diagnostic result, it took a trained interpreter on average three hours to analyze variants via a research process and a variable amount of time for multidisciplinary discussions. Supplementary Sanger validations should also be considered (one or two per patient; probably excessive because of the solo strategy), requiring additional time for technicians and other techniques. The physicians provided accurate, updated phenotype data by regular consultations, by asking patients and their family to return for reverse phenotyping or to extend the familial segregation, by answering numerous mails in case of collaborations, and by collecting data for recruited patients through data sharing. Time was also spent explaining new results to patients. The time devoted to each patient varied widely, ranging from a few hours to a few days. Finally, the report update took an average of 30 minutes.

Discussion

This retrospective study reports the molecular results of proband-based clinical WES in 416 consecutive patients with CA/ID included over 3 years. It highlights the increase in diagnostic yield thanks to the yearly reanalysis and interpretation of raw WES data. We obtained a global diagnostic yield of 27.9% after two reanalyses, and 12 (2.9%) more cases were resolved through the combined diagnostic/research approach.

Considering the exponential evolution of genetic discoveries and the improvements in computer performance over the past 5 years, prospective reanalysis of nonpositive WES results appears manageable and worthwhile for undiagnosed patients. This strategy allowed us to obtain 24 (15%) additional positive diagnoses among the 156 nonpositive initial results. Recent publications allowed us to assign a positive result after reanalysis in five initially negative cases and two uncertain cases (4.5%). This principally concerned patients included in the second year and is consistent with other studies,17, 18, 19 one of which reported that 73% (70/96) of reclassifications were linked to the discovery of new gene–disease relationships described in the medical literature.18 It also allowed us to reclassify uncertain missense variants occurring in a human disease gene, in which only truncating mutations were reported as pathogenic in the first publications. Wiedemann–Steiner syndrome due to KMT2A variants, now with three published pathogenic missense variants, is a good example of this situation.29 In addition to this effective way to resolve unsolved cases, some variants became positive after reanalysis because of an initial misinterpretation, allowing four additional diagnoses (2.5%). This phenomenon has been underscored in several studies reporting the difficulty of determining the pathogenicity of a variant, considering the extremely heterogeneous evidence taken into account.18, 30, 31, 32 In 2016, Amendola et al.30 reported a rate of variant discordance between nine laboratories of 66% for 99 analyzed variants, despite the application of the ACMG–Association for Molecular Pathology guidelines. Among the differences found, 22% might affect medical management. One additional diagnosis (1%) was made possible thanks to the upgrade of the pipeline for detecting CNVs. Finally, a search of published data in the context of diagnostic reanalysis led to a positive diagnosis in 12/156 cases (7.7%), which is similar to recently published data in a large number of patients.17

The major reason for our high number of positive results after reanalysis was the active search for additional evidence of pathogenicity for strong candidate variants, leading to 12 additional positive diagnoses (7.7%), with nine publications by our team or collaborations with another team, including seven through the Matchmaker Exchange initiative24, 25, 26, 27, 33, 34 (Table 1). This tool has become essential to rapidly identify new patients carrying a variant in the same gene, thus enhancing the identification of the molecular basis of ultrarare disorders of suspected Mendelian inheritance. This integrated translational strategy blurs the line between a standardized diagnostic procedure and a dedicated research procedure, involving a research team for this task.18, 35

These results were directly linked to a complementary strategy that combined usual diagnosis techniques with translational research and showed the importance of (i) tools to aid variant classification such as ClinGen;36 (ii) double interpretation and regular multidisciplinary discussions in the laboratory, and also the sharing of practices between laboratories; (iii) data sharing for the available genomic data of affected patients, asymptomatic control individuals, and different ethnic populations; (iv) need for training; and (v) regular reanalysis.

For the 30.7% of patients given a positive molecular result, this step not only ended a diagnostic “odyssey,” but also provided precious and accurate information for medical care for patients and their families. Novel therapeutic options or recommendations for clinical care were introduced following positive WES diagnosis in nine patients. For example, a cetogenic diet was started in a girl carrying a variation in SLC2A1 (GLUT1 deficiency syndrome 1 (MIM 606777)). This diet stopped the seizures and clearly improved the psychomotor development.

In the remaining nonpositive cases, prospective WES reanalysis should be continued because the causal variant(s) is probably located in genes not yet known to be associated with disease,10 or because the phenotype is characterized by multiple molecular diagnoses. In our cohort, we identified only one patient with a double molecular diagnosis, which included a homozygous variant in HPS3 associated with Hermansky–Pudlak syndrome, and a heterozygous variant in TTPA, explaining the ataxia present in his brother, who carried this variant in a homozygous state. Therefore, multiple diagnoses are probably underidentified and physicians must keep in mind that the entire phenotype is not necessarily explained by only one diagnosis. Indeed, our results were under the 5% found by Posey et al.37 in their retrospective study, which included 7,374 patients who underwent proband or trio-based diagnostic WES. Interpreters and physicians must be vigilant and make a critical appraisal of positive results, particularly for syndromes reported only once or twice.

This study highlighted that clinical WES, together with the regular reanalysis of nonpositive results, is a powerful tool in the diagnosis of CA/ID. Its early use in the diagnostic strategy reduces the time to diagnosis. This reduction is also explained by the disruption of our diagnostic reasoning, which was initially based on a phenotype-driven approach. In the case of ultrarare or atypical phenotypes, which are frequent in syndromic ID, reverse phenotyping is an important tool to help in the interpretation of WES results. It involves an accurate reexamination of the clinical data, to compare them with phenotypes described in the literature.38 The use of this genotype-driven approach, as a complement to the traditional phenotype-driven approach, allows the more efficient ordering of complementary exams, which will probably lead to a cost reduction. This has been illustrated by several medico-economic studies in subjects with a suspected genetic condition. All of these studies highlighted the ability of WES to establish a molecular diagnosis at a lower cost than is the case with classical techniques.39, 40

Although sequencing did not lead to additional costs, the organization of systematic reanalysis led to a sizeable workload at different steps of the process. First, bioinformatics updates and the reanalysis of raw data are quite fast, but because the software is constantly being updated, regular improvements of the pipeline are required to maximize technological performance. Second, there is a considerable increase in technical procedures that need to be automated. Finally, participation of clinicians is required for reverse phenotyping and data sharing, which are more and more frequently used in the research approach. This annual reanalysis strategy is a great advantage for patients, but seems difficult to integrate into the current activity of a clinical laboratory without major adaptations. Indeed, research reanalysis is time-consuming (on average three hours per patient), and could be systematized only if a sufficiently large research team works in partnership with the clinical laboratory.

In conclusion, our study underscores the considerable interest of prospective reanalysis of WES data in patients with no diagnosis. For atypical and ultrarare disorders, the border between diagnosis and research is becoming increasingly thin, thus underlining the need for a translational integrated organization from diagnosis to research to provide careful and reliable interpretation of data. For the remaining negative results of solo WES despite recurrent reanalysis, a trio strategy, to identify de novo candidate variants, could be the first step before the prescription of whole-genome sequencing and other omics technologies to determine molecular etiologies that remain undetectable by WES. When the cost of whole-genome sequencing decreases, the whole-genome sequencing–first approach will be the best option to tend toward a unique diagnostic test.