## Introduction

Genomic technologies have improved Mendelian disorder diagnosis, with whole genome sequencing (WGS) having the greatest diagnostic yield [1,2,3]. The higher cost of WGS sequencing and long-term data storage remain barriers to its routine implementation. Without public funding for genomic testing in most countries, diagnostic yields are balanced against budgetary limitations. The impact of coding variation on gene function identified through whole exome sequencing (WES) and WGS is well understood. The advantages of WGS for improving diagnostic yield are coding region coverage consistency, sequencing of newly-annotated coding regions, and improved detection sensitivity for structural variants (SVs), particularly copy number variants (CNVs) [2]. Interpreting genetic variation in non-coding regions identified primarily through WGS remains challenging, leading to a perceived lack of additional WGS utility compared to WES [4], however several reports have identified non-coding causes of Mendelian diseases [5,6,7].

While WGS increases the diagnostic yield over WES in Mendelian disorders, there are few studies exploring the degree of improvement. Such studies would assist in selecting the optimal clinical genomic investigation. A small number of studies have assessed WGS diagnostic yields in WES-negative Mendelian disorder cohorts, with diagnostic rates between 7 to 34% [8,9,10,11]. The increased diagnostic rate in these studies was due to CNV detection, improved coverage of difficult to sequence regions, and identification of pathogenic variants in non-coding regions and mitochondrial DNA. In addition to clinical impact, economic evaluation of a new technology is important before seeking scarce funding for its routine implementation into standard care. Here, we performed WGS in a WES-negative Mendelian cohort to determine the extent that WGS increases the diagnostic yield over WES and impacts diagnostic costs.

## Subjects and methods

### Cohort ascertainment

Individuals (n = 91; 64 families) with undiagnosed suspected Mendelian disorders were recruited from genetics units in New South Wales (NSW), Australia, from 2013 to 2017. Affected individuals had undergone a range of diagnostic investigations such as chromosomal microarray (CMA) in those with intellectual disability (ID), and in some, targeted gene sequencing, but no WES or WGS prior to this study [12]. Original WES studies were performed at the Kinghorn Centre for Clinical Genomics (KCCG) and the NSW Health Pathology Randwick Genomics Laboratory (RGL), with one family sequenced at Radboud University Medical Centre Nijmegen (RUMC). 41% of the original KCCG and RGL WES cohort had diagnostic findings [12, 13]. Following completion of WES analysis, individuals who remained undiagnosed were recruited for WGS, resulting in 38 families with 59 affected individuals and 41 unaffected first-degree relatives.

### Genomic sequencing and bioinformatics analysis

#### Original WES

WES was performed from 2013–17. RGL WES was performed on the Ion Proton using the Ion AmpliSeq Library kit V2 and PI Chip V2. KCCG WES was performed on the Illumina HiSeq 2500 [12]. Family 12 had WES at RUMC on the SOLiD platform as described previously [14]. Accredited WES bioinformatics pipelines were utilised including GAIA at RGL [13], in-house methods at the RUMC, and Seave at KCCG [12]. CNV analysis was performed using Conifer [15] or XHMM [16].

#### WGS

DNA was extracted from EDTA blood or cultured fibroblasts (2 families). Sequencing was performed on probands and unaffected relatives between 2016–17 on Illumina HiSeq X instruments on libraries generated using either the KAPA Hyper PCR-free kit (36 families) or the TruSeq Nano DNA kit (2 families). Variants were called after hs37d5 reference human genome [12] alignment using a BWA/GATK best practices pipeline. Single nucleotide variants (SNVs) and small insertion/deletion (indel) variants were annotated using VEP, converted into GEMINI databases [17], and loaded into the web-based variant filtration platform, Seave [18]. Sample gender and relatedness quality checks were performed using KING (v1.4) [19] and PLINK (v1.90b1g) [20].

Homozygosity mapping was performed using ROHmer (Puttick et al., manuscript in preparation). Mitochondrial SNV/indel analysis used mity [21] which runs FreeBayes (unpublished data). SVs including CNVs were identified using ClinSV [22], combining discordantly-mapping read pairs, split-mapping reads, and depth of coverage changes. SVs were annotated with population allele frequencies derived from 500 healthy controls [23], the 1,000 Genomes Project [24], and for protein-coding gene overlap.

### WGS variant prioritisation and interpretation

Nuclear SNVs and indels were filtered, prioritised, and interpreted by a clinical geneticist with genomic analysis expertise. Variants were discarded if the minor allele frequency was >2% (autosomal recessive (AR) or X-linked recessive inheritance) and >0.1% (autosomal dominant (AD)) in population databases, or with a predicted low impact on protein function. Candidate variant pathogenicity assessment was made using in silico prediction tools (SIFT [25], PolyPhen2 [26], PROVEAN [27], CADD [28]), and aggregate pathogenicity scores from Varcards [29]. Mitochondrial variants were filtered to known disease variants or overlapping phenotypes in MITOMAP [30]. SVs and CNVs were filtered by rarity, genotype-phenotype overlap, and family segregation.

Variants with genotype-phenotype correlation were reviewed for sequence quality in the Integrative Genomics Viewer (IGV) [31]. Candidate variants were classified by genetic pathologists utilising the American College of Medical Genetics (ACMG/AMP) guidelines and subsequently validated by Sanger sequencing, including family segregation, and reported if likely pathogenic/pathogenic [32].

### WES retrospective reanalysis

Retrospective WES reanalysis was performed on original WES data approximately 2 years following original WES analysis to determine if WGS diagnoses could be identified using contemporary techniques. If WGS-diagnosed variants were absent from WES reanalysis, an assessment was made of WES coverage over the critical region and the variant presence in VCF files.

### Health economic analysis

A health economic analysis was undertaken to understand the cost implications of genomic sequencing in Mendelian disorders. The incremental diagnostic and cost differences were analysed between the provision of WGS and WES for: (1) WES-negative individuals (38 families) and (2) individuals modelled as having had WES and WGS available with a contemporary analysis pipeline ab initio for the original 64 families (referred to as the simulated early genomic testing model).

1. 1.

Economic analysis for 38 WES-negative families. We calculated the incremental costs per additional WGS diagnosis when WES reanalysis was performed followed by WGS, and when only WGS was performed. As WES reanalysis on WES-negative cases is standard diagnostic care in the Australian healthcare system, WES reanalysis was the comparator for our analysis.

2. 2.

Economic analysis of the simulated early genomic testing model (64 families) Initial WES and WGS diagnostic rates were estimated using combined diagnoses from WES on the original cohort (64 families) and subsequently either contemporary WES reanalysis or WGS on the remaining 38 WES-negative families. We calculated the incremental costs per additional WGS diagnosis for contemporary WES followed by WGS and WGS alone, both compared to contemporary WES.

#### Sensitivity analysis

Diagnostic laboratory WES and WGS costs were sought in May 2020. For uniformity, laboratories offering WES and WGS were initially included (six laboratories), then refined to three laboratories offering singleton and trio studies for both technologies (Centogene, PerkinElmer, and Victorian Clinical Genetics Service (VCGS)). For primary analysis, VCGS laboratory costs were utilised for a local cost applicable for our base costs. WES and WGS costs were determined for varying family referral combinations such as singleton, trio, sibling pair, and multigenerational affected families. Costs for WES and WGS included sequencing, bioinformatics, interpretation, and reporting costs, and for WES reanalysis, bioinformatics, interpretation, and reporting costs. Additional affected individuals in families with multiple affected individuals incurred additional reporting costs. A sensitivity analysis was performed using the lowest and highest costs across the three laboratories (Supplementary Table 1). Only WES and WGS test costs were analysed for a direct comparison.

A bootstrapping method was used to assess the uncertainty of the economic model results for incremental costs and presented as 95% confidence intervals (CI). Replicated datasets were created by drawing a random sample of the WGS families 1000 times with replacement. Analyses were performed in Microsoft Excel and SAS version 9.4.

## Results

### Cohort demographics

Proband ages ranged from newborn to 73 years, with half of paediatric age, more affected males (64%), and parental consanguinity in 18%. Twenty families had a single affected proband, most undergoing trio sequencing (17/20). Eighteen families had multiple affected probands, with most (14/18) undergoing WGS of two affected family members. Patient demographics are summarised in Supplementary Table 2. The average time between original WES and WGS analysis was 1.8 years (SD ± 0.4) at KCCG, 2.4 years (SD ± 1.0) at RGL; combined 2.1 years (SD ± 0.7).

### WGS diagnoses were made in one-third of WES-negative families

WGS-based analysis diagnosed 34% of the previously undiagnosed cohort with one diagnosis per family (13/38 families; Table 1). Diagnoses were made in well-characterised diseases genes and due to SNVs and indels in 12 families and a CNV in 1 family. The greatest proportion of diagnoses by disease categories were haematological (2/2 families), skeletal (2/3 families), and ID (8/24 families; non-syndromic ID 3/7, syndromic ID 5/17) (Supplementary Fig. 1).

WGS diagnoses were not detected through the original WES (Table 1, Fig. 1) due to a previously unknown gene-disease association (23%), insufficient sequencing coverage (31%), the variant prioritisation pipeline (15%), the bioinformatics pipeline (23%), or CNV detection (8%).

### WGS had increased sensitivity for detecting structural variation

WGS data were evaluated to assess the impact of SVs on diagnostic yield. A 1.4 kb deletion encompassing part of exon 1 of RAB39B was identified in an X-linked ID family (Family 4, Table 1; Fig. 2A). The RAB39B deletion was validated in males using high-resolution CMA, adopting a lowered detection threshold of 4 probes from the standard 5 probes and a custom multiplex ligation-dependent probe amplification (MLPA). This deletion was missed on WES CNV analysis although visualisation of raw reads showed an absence of exon 1 coverage.

A family with Opitz G/BBB syndrome had a WGS-detected SV of uncertain significance. Prior MID1 sequencing and WES were negative. There was evidence for two linked SV duplication events involving an intron of MID1 on chromosome X and a region on chromosome 1 involving SDF4 without a disease association. The X-linked pedigree is consistent with co-location of the duplications on chromosome X segregating with disease (Fig. 2B, C). Studies investigating the impact of the SV on MID1 are in progress.

### Diagnoses made outside the standard variant analysis pipeline

Two diagnoses were made from bespoke analyses following initial negative routine variant prioritisation. Family 13 had a suspected X-linked or AD connective tissue disorder with features similar to Weill-Marchesani syndrome. Analysis for a shared candidate allele in an affected aunt and nephew was negative. Individual analyses were performed and, unexpectedly, a homozygous variant in ASPH (NM_004318.3:c.1695C > A; p.(Tyr565*)) associated with AR Traboulsi syndrome was identified in the aunt. This variant was present in the nephew in compound heterozygosity with a separate nonsense ASPH variant (NM_004318.3 c.1782G > A; p.(Trp594*)), demonstrating the presence of both homozygous and compound heterozygous ASPH alleles in the same family. Patient and pedigree review confirmed that their phenotype was consistent with Traboulsi syndrome, and that the aunt’s parents were third cousins. WES reanalysis confirmed that the family would have been diagnosed had this unusual mode of inheritance been considered.

Family 2 with AD macrothrombocytopaenia (Table 1) underwent an extended analysis to assess low impact variants in platelet disorder genes. This identified a previously reported pathogenic variant in the 5ʹUTR of ANKRD26 (NM_014915.2:c.−116C > T) with a consistent haematological disease phenotype [33]. Sanger sequencing confirmed the variant segregated with disease. WES reanalysis did not identify this variant due to absent coverage of the 5ʹUTR in the earlier capture system. Improved coverage in a newer, alternate WES platform means this diagnosis would most likely have been made using current WES technology, provided variants of predicted low impact were prioritised through the pipeline (Supplementary Fig. 2).

### WES reanalysis would have detected over half of the WGS diagnoses

WES reanalysis identified the diagnosis in 7/13 families where WGS provided a diagnosis (54%; total WES-negative cohort 7/38 families, 18%; Table 1; Fig. 3). Therefore, WGS provided an additional diagnostic yield following WES reanalysis of 19% of WES-negative families (6/31 remaining families). The majority of new WES diagnoses were due to improvements in the bioinformatics/variant filtering pipeline (4/7; Table 1: families 1, 6, 8, 9), followed by new disease gene identification (2/7; families 7, 11), and analysis outside the standard pipeline (1/7; family 13). In those that remained undiagnosed after WES reanalysis (6/13 families), insufficient coverage of the diagnostic variant was the main cause (4/6; Table 1: families 2, 3, 5, 12), and one missed CNV (family 4) and one bioinformatics pipeline error (family 10).

### WGS versus WES as an initial genomic test

We estimated the diagnostic yield of WGS over a contemporary WES pipeline in the original cohort of 64 genomic-naïve families (Fig. 3). To do this, original WES diagnoses and additional diagnosed families from WGS or contemporary WES reanalysis were combined, assuming all original WES diagnoses would have been made with contemporary WES and WGS. From this analysis, a contemporary WES pipeline would have yielded a 52% diagnostic yield (33/64 families: 26 original WES families and 7 additional from contemporary WES pipeline). Had WGS been performed at the outset, 61% of families would have been diagnosed (39/64 families; 26 original WES, 13 additional WGS), resulting in an additional 9% yield over contemporary WES, unique to WGS.

### Health economic analysis

#### Economic analysis for WES-negative families

The incremental cost per additional WGS diagnosis following WES reanalysis was AU$36,710 (£19,407; US$23,727) (95% CI: AU$20,607;$112,902) compared to WES reanalysis alone. For WGS alone, the incremental cost per additional diagnosis was AU$41,916 (£22,159; US$27,093) (95% CI: AU$22,790;$128,107) compared to WES reanalysis alone. This pathway conferred the greatest costs with the same diagnostic rate as the WES reanalysis followed by WGS pathway.

#### Economic analysis for the simulated early genomic testing model

WGS following initial WES (Table 2, ii): for each additional (incremental) WGS diagnosis over initial WES, the cost would be AU$36,710 (£19,407; US$23,727) (95% CI: AU$20,946;$112,942). Initial WGS (Table 2, iii): for each incremental initial WGS diagnosis, there would be an additional cost of AU$29,708 (£15,705; US$19,201) (95% CI: AU$16,612;$90,195) compared to initial WES. Thus, of the two WGS options, WGS as the initial test was the best value for money, producing the same diagnoses at a lower cost than WGS following WES.

#### Sensitivity analysis

There were substantial differences in genomic sequencing costs between laboratories, with more widely divergent WGS costs. Trio WGS costs ranged from AU$7,557 (£3,995; US$4,884) (VCGS laboratory) to AU$11,446 (£6,051; US$7,398) (Centogene laboratory), and trio WES ranged from AU$3,713 (£1,963; US$2,400) (PerkinElmer laboratory) to AU$4,345 (£2,297; US$2,808) (Centogene laboratory). Accordingly, we undertook a sensitivity analysis to determine what impacts the use of higher or lower WES and WGS costs may have on the results.

### Economic analysis for WES-negative families

The incremental cost per additional WGS diagnosis following WES reanalysis ranged from AU$36,659 (£19,380; US$23,693) (lowest WGS cost) to AU$53,307 (£28,181; US$34,453) (highest WGS cost) when compared to WES reanalysis alone, with the cost of WES reanalysis unchanged. The sensitivity analysis showed that the conclusions for WES-negative families would not be altered by lower or higher genomic costs.

### Economic analysis for the simulated early genomic testing model

One-way sensitivity analysis of the incremental cost for each additional initial WGS diagnosis compared to initial WES was performed for a range of available WES and WGS costs (Supplementary Fig. 3). The result is more sensitive to WGS costs, with the incremental cost ranging from AU$28,262 (£14,941; US$18,266) to AU$60,681 (£32,079; US$39,219) per additional WGS diagnosis, primarily driven by the wider range of commercially available costs for WGS compared to WES.

## Discussion

In this Mendelian disorder cohort, WGS resulted in a diagnosis in one third (34%; 13/38 families) of undiagnosed families who had previously had WES. However, when controlling for factors such as improvements to gene-disease knowledge and genomic pipelines through contemporary WES reanalysis, the WGS diagnostic yield reduced to 19% (6/31 remaining families). If WGS was applied as an initial test to our original cohort of 64 undiagnosed Mendelian families, the increased diagnostic yield unique to WGS was 9% relative to contemporary WES. The majority of the WGS diagnostic gains (4/6 diagnoses; Fig. 1C) were due to reduced WES coverage of critical regions that may be solved through an improved WES platform. Inspection of sequencing coverage in unrelated individuals using the newer Illumina NovaSeq 6000 ES Agilent CREv2 showed adequate coverage for variant identification in 3 of the 4 missed WES diagnoses (Supplementary Fig. 4). Although there was a low detection rate of pathogenic SVs in this study, this may increase with time as more clinically important SVs are characterised and thus influence WGS diagnostic yields over WES [34, 35].

### Solving the undiagnosed

Understanding why genomic diagnoses are missed can lead to alterations to genomic pipelines and improved Mendelian disorder diagnosis. A WGS diagnosis in a deceased foetus with suspected Raine syndrome followed multiple sequential non-informative investigations including prenatal CMA, FAM20C sequencing and MLPA, a craniosynostosis panel, and WES. On WGS, a de novo pathogenic FGFR2 variant (p.Y375C) was identified, diagnosing Beare-Stevenson syndrome, conferring a greatly reduced reproductive recurrence risk compared to the suspected AR disorder. The craniosynostosis panel had included FGFR2, but not the critical exon, and the missed WES diagnosis was due to a failure of the variant caller despite good sequencing coverage, which has been subsequently addressed.

Generic genomic filtering pipelines may rely on assumptions about inheritance patterns or predicted protein impacts. Failure to identify a molecular aetiology after a familial analysis should prompt consideration of an alternative analytical method, such as singleton proband analysis in the family with Traboulsi syndrome. Similarly, incorporating known Mendelian disorder disease-causing variants from ClinVar that are bioinformatically predicted to be of low impact, improves variant detection. Accessing specialist gene-disease knowledge will be important for recognition of such variation.

WES reanalysis remains valuable in increasing diagnostic yields in unsolved cases, with an additional diagnostic rate of 11% (7 of 64 families) made over an approximate 2-year period [12]. However, reanalysis of WES obtained from older platforms may be ineffective in some unsolved individuals due to overall reduced sequencing coverage compared to contemporary platforms. There remains a diagnostic gap with WES for smaller SVs that is best approached through non-WES methodologies such as exon-level arrays or WGS. While contemporary WES coverage has improved, including slightly expanded coverage of non-coding regions containing pathogenic variation [5, 6, 36], WGS enables the unbiased detection of non-coding variants without the limitation of target enrichment based on potentially outdated gene annotations. Although less is understood how non-coding region variation impacts biological function, there are numerous examples of deep intronic variation affecting gene splicing [7] and other pathogenic non-coding variants [5, 6] such as the 5ʹUTR ANKRD26 variant in this study. Proof of causation for novel non-coding variation is challenging but higher throughput methodologies for functional studies may lower costs and improve understanding of such variation, making diagnostic reporting more feasible and increasing the importance of WGS [37]. While we have compared current diagnostic WES and WGS pipelines, there are a number of techniques such as improved splicing prediction tools [38] and RNAseq [36] that are not yet routinely available but have potential to further increase diagnostic rates over current WES and WGS.

### WES or WGS as an initial genomic diagnostic test?

Variants assessed for disease diagnosis are almost exclusively in coding regions and so it has been argued that a well performed contemporary WES study is a cost-effective screen and the best first-line methodology [12, 39]. However, we may be moving towards a time when WGS will be adopted as a first-line test [40]. The main limitation of WES is a lower sensitivity for detecting structural variation, particularly complex variation [39]. Further, when considering the maximum diagnostic yield alone, this study and others have shown that WGS boosts the diagnostic yield in WES-negative Mendelian disorder cohorts [8,9,10,11]. The magnitude of this diagnostic increase depends on the modernity of the WES approach relating to exome enrichment, analytic pipelines, and the likelihood of CNVs or the presence of an unusual genomic mechanism. There is evidence that small CNVs may be more important in Mendelian disease diagnosis than previously recognised [35] so the increased sensitivity of WGS for CNV detection is advantageous. The combination of WES with newer technological platforms such as long-read sequencing could result in an increased diagnostic sensitivity for CNVs without the higher costs of performing WGS.

Decisions about when to use WES and WGS remain important because there is a trade-off between the lower cost of WES and the higher diagnostic yield of WGS. To date, there have been few studies on comparing the relative costs of WGS with WES or after WES reanalysis [41]. The economic analyses in this study show that the economic decision whether to use WES or WGS in part depends upon whether prior genomic testing has occurred. If additional diagnoses are sought when WES has been performed previously, the lowest cost use of resources is to perform WES reanalysis. However, to achieve maximal diagnoses, the most cost-effective strategy is to perform WGS after WES reanalysis, with an incremental cost per additional WGS diagnosis of AU$36,710 (£19,407; US$23,727) in this study. This strategy incurs a lower cost than performing WGS after original WES without WES reanalysis, with the same diagnostic yield.

For people who have not had genomic testing, the most cost-effective strategy for maximal diagnoses is to perform initial WGS, with an incremental cost of AU$29,708 (£15,705; US$19,201) per WGS diagnosis. However, acknowledging that some diagnoses will be missed and that not all jurisdictions have access to the required resources for WGS, the lowest cost pathways are to perform WES reanalysis in WES-negative individuals and initial WES in people who have not had genomic testing. It is important to note that the cost differentials between WES and WGS may be specific to this study cohort and that there is no universally acknowledged willingness-to-pay-threshold for a diagnosis [42]. Further, the additional expenditure for each WGS diagnosis achieved may still result in downstream health and social cost savings, which, over a lifetime, may dwarf the costs of WGS [43].

The implications of diagnoses for families on quality of life outcomes, management change, access to reproductive technologies, eligibility for services, access to support groups and the impact on both health and social costs all need to be considered when allocating scarce resources. The economic analysis in this study lacks information about such outcomes that would provide information on quality adjusted life years (QALYs) and allow for a cost utility analysis. Further, we have not calculated the costs of additional investigations that may be incurred following a negative WES result compared to WGS. However, the economic analysis does provide important information about the financial resource implications of implementing WES and WGS, when considering those test costs alone.

In addition to balancing test cost and maximising diagnoses, the clinical scenario also influences genomic test choice. In settings where there is a high chance of intervention if a genomic diagnosis is made, it can be argued that WGS, with the maximal chance of diagnosis, should be chosen. Such scenarios may include the acutely unwell children in the neonatal or paediatric intensive care units (NICU/PICU) [44], or for urgent reproductive situations such as an at-risk pregnancy. However, such decisions are not made in isolation, with availability and resourcing impacting the option to provide, or choice of genomic testing, even in urgent clinical scenarios.

WGS is the optimal genomic test choice to maximise the diagnostic rate in Mendelian disorders across all clinical scenarios. However, accepting a small reduction in diagnostic yield, WES with reanalysis confers the lowest costs. Whether WES or WGS is utilised will depend on the clinical scenario and local resourcing and availability.