Autosomal recessive (AR) conditions constitute a subgroup of human genetic disorders that are caused by defects of both copies of a gene located on the autosomes. Individuals affected with an AR disorder often inherit disease-causing alleles from asymptomatic carrier parents. Such disease-causing alleles mostly occur as a de novo variant ancestrally, forming a “founder pathogenic variant” for a specific population by escaping purifying selection that often exerts strong effect against deleterious variant alleles causing dominant diseases.1

AR conditions are caused by complete or near-complete loss-of-function of the gene product, which are largely attributed to single-nucleotide variants (SNVs), small insertions/deletions (indels), or copy-number variants (CNVs), and rarely copy neutral events such as balanced chromosome translocations/inversions and uniparental disomy2,3 (UPD). Advancing molecular technologies enables genome-wide SNV/indels detection. In the past decade, expanded next-generation sequencing (NGS)–based carrier screening, with a focus on reproductive medicine, has screened apparently healthy individuals for reproductive risks of recessive disorders.4

CNVs range from hundreds to millions of base pairs (bp). CNVs can cause genetic defects by exon-level events disrupting the reading frame of one disease gene (genic AR-CNV) or deletions of a genomic interval involving two or more disease genes with the potential to cause more than one disorder (genomic AR-CNV) (Fig. 1a). Chromosomal microarray analysis (CMA), representing the first-tier diagnostic method for developmental disorders and congenital anomalies by clinical consensus,5 has been widely used for the detection of large CNVs (>400 Kb) causing microdeletion/duplication syndromes. Improved resolution of microarrays with exonic coverage has expanded the recognition of contributions of CNV to single-gene disorders.6,7 Although the detection of large CNVs at the megabase (Mb) level has become possible using NGS-based methods, detection of small heterozygous CNVs at the exonic level still largely relies on CMA.

Fig. 1: Autosomal recessive copy-number variants (AR-CNVs) contribute to diseases by multiple ways.
figure 1

(a) A defective gene may have a point variant, an intragenic duplication or deletion, a whole-gene deletion, or a contiguous gene deletion of multiple genes. (b) AR-CNVs affecting multiple genes (genomic AR-CNV) may contribute to a more complex outcome. For (a) and (b), genes are presented at the top of the diagrams with the wide dark blue segments representing the exons and the narrow dark blue segments representing the introns. Below the gene diagrams are shown different types of variants: green asterisk, point variant; blue segment, duplication; red segment, deletion. AD autosomal dominant.

CNVs contribute to biallelic variations causing AR conditions by ways that may or may not involve SNV/indels. CNVs affecting a single disease-associated locus may act as the sole variant type to cause an AR disorder by affecting both copies of a gene in homozygous or compound heterozygous configurations. This has been increasingly recognized in individuals with Mendelian diseases.8 Alternatively, single-gene events combining an AR-CNV and SNV/indel in trans have also been reported to cause recessive disorders.9 In the context of genomic disorders, large deletions may affect multiple disease-associated loci leading to a complex phenotype: dominant traits may manifest if the deletion causes haploinsufficiency; a recessive genic variant may be unmasked to cause a recessive disorder through compound heterozygosity10,11 (Fig. 1b). Joint CNV and SNV/indel analyses provide extensive utility to identify such events.

We investigated the different ways AR-CNVs contribute to AR conditions by examining a clinical cohort of cases subjected to CMA and/or exome sequencing (ES). Our study exemplified the importance of CNV detection in providing accurate molecular diagnoses. The high frequency of homozygous CNVs identified in this study highlighted the transmission of recurrent CNV alleles in healthy populations.


Samples and ethics statement

We retrospectively analyzed clinical samples submitted for ES (N = ~12,000) and CMA (N = ~70,000) at Baylor Genetics (BG). The aggregated analyses of anonymized cases were approved by the Baylor College of Medicine Institutional Review Board (protocols H-37568 and H-42680).

CNV detection by CMA

The CMA experimental procedures were performed according to the manufacturer’s protocols with minor modifications (Agilent Technologies Inc., Santa Clara, CA, USA).12 Since 2006, BG has clinically developed and implemented six different versions of customized oligonucleotide arrays (OLIGO V6–V11). OLIGO V8–V11 were designed to interrogate clinically relevant genes with exonic coverage (V8, ~1700 genes; V9–V11, >4200 genes). Single-nucleotide polymorphism (SNP) probes were also included in the V8.2, V8.3, and V9–V11 OLIGO arrays to enable detection of genomic intervals with absence of heterozygosity (AOH, also referred to as runs of homozygosity). Data analyses were performed using an in-house developed pipeline with published decision-making algorithms.7 Ambiguous CNV calls at the borderline of the cut-off criteria were confirmed by an orthogonal approach, such as polymerase chain reaction (PCR) or fluorescence in situ hybridization (FISH).

CNV detection by ES and companion SNP array

ES experimental procedures and bioinformatics pipelines were performed according to the previously described methods.13,14 Variant classification followed the American College of Medical Genetics and Genomics (ACMG) guidelines.15 Homozygous/hemizygous deletions were called using an in-house developed pipeline based on exome read-depth analysis as previously described.16

As part of the quality control measurements, an Illumina SNP array (companion SNP [cSNP] array) of two versions (HumanExome-12v1 array, >240 K probes, from 2012 to 2016; HumanCoreExome-24v1, >500 K probes, since 2016) was run concurrently with ES using aliquots from the same DNA preparation used for ES. The cSNP array data were analyzed for CNV and AOH detection as previously described.17

Variant deposition

The variants identified in this study have been submitted to ClinVar (accession numbers SCV001334073—SCV001334129).


Genomic features of identified AR-CNVs in this study

While genome-wide analyses studies indicated deletion CNV contribute to carrier states at many genic loci, the AR-CNVs in this study only include diagnostic variants contributing to biallelic variation. Among the cases subjected to CMA and/or ES, we identified molecular diagnoses involving clinically relevant AR-CNVs in 87 cases (81 singletons and three pairs of siblings). Among the 174 chromosomes of these 87 cases, 17 contained SNV/indel alleles; the remaining 157 contained AR-CNV alleles.

The majority of AR-CNVs were below 1000 Kb, except for four CNVs (Fig. 2a). The AR-CNVs ranged from a 68-bp deletion of FAM177A1 exon 5 to a 22-Mb deletion of 5p14.3p15.33 encompassing the cri-du-chat region. Homozygous AR-CNVs were identified in 63 cases, containing 126 AR-CNV alleles with size distribution similar to the size distribution of all AR-CNV (Fig. 2a). None of the homozygous AR-CNV alleles exceeded 1000 Kb (Fig. 2a).

Fig. 2: Characteristics of autosomal recessive copy-number variants (AR-CNVs).
figure 2

(a) AR-CNVs had different genomic sizes. The graph compares the genomic sizes of all AR-CNVs and AR-CNVs in a homozygous state. Blue, all AR-CNVs; red, AR-CNVs in a homozygous state. (b) AR-CNVs may affect different numbers of disease genes. The left panel shows the distribution of the number of genes affected by AR-CNV in the cohort. The middle panel shows the distribution of gene numbers in cases with homozygous AR-CNV. The right panel shows the distribution of exonic span of the AR-CNV alleles.

Of the 157 AR-CNVs alleles, the majority (94.3%) of the AR-CNVs alleles affected one gene, followed by 2.5% affecting two genes and 3.2% affecting three genes or more (Fig. 2b, left panel). Homozygous AR-CNV alleles included fewer genes—the majority (96.8%) affected one gene, and the remaining (3.2%) affected two genes (Fig. 2b, middle panel). This observation was consistent with the increased deleterious impact of larger CNVs. On the exonic level, 41.4% of all AR-CNV alleles affected a single exon, 35.0% spanned multiple exons of a gene, and the remaining 23.6% included the entire gene locus (Fig. 2b, right panel).

AR-CNVs contribute to diseases in multiple ways

AR-CNVs contribute to recessive disease with or without involvement of SNV/indels. Biallelic AR-CNVs were identified in 70 cases (Fig. 3a, b). Sixty-three cases had homozygous deletion (N = 62) or duplication (N = 1). Seven cases had biallelic CNVs in compound heterozygous states involving AR-CNVs that may or may not overlap. Among these seven cases, six had one deletion either partially overlapping with or being nested within the other deletion, resulting in homozygous loss of the overlapping region and biallelic loss of the affected gene; one had two nonoverlapping deletions, which were determined to be in trans by parental testing.

Fig. 3: Different mechanisms of autosomal recessive copy-number variants (AR-CNVs) contributing to diseases in the current cohort.
figure 3

(a) AR-CNVs may contribute to recessive disorders in forms of homozygous CNV (62 cases with deletion and 1 case with duplication), compound heterozygous CNVs with overlapping boundaries (two cases with deletion), one embedded in the other (four cases with deletion), or nonoverlapping boundaries (one case with deletion). AR-CNVs may also contribute to recessive disorders in combination with an SNV/indel either inside (11 cases with overlapping deletion and SNV/indel) or outside of the CNV (five cases with nonoverlapping deletion and SNV/indel, one case with nonoverlapping duplication and SNV/indel). The genes affected by the AR-CNV events are noted behind each category. aThe 34 genes include ABCB11, ACBD5, ADCK3, ARSB, C12orf65, CFAP52, CLDN1 (2), CLN3, CNTNAP2, CRX, DDR2, DIAPH1, ECE1 (2), ETHE1, FAM177A1, FBP1, GJB6, IFT140, ITGB4, LARGE, NDE1, PLA2G6, PRDM12, SERAC1, SLC3A1, PREPL, SLCO1B3/SLCO1B1, SMN1, SPTA1, SRD5A2, TNNT1, TNNI3, TRAPPC9, and TRIM37. CLDN1 and ECE1 each had AR-CNVs in two patients, who were siblings. Two AR-CNVs involved two independent disease-associated genes (SLC3A1 and PREPL, TNNT1 and TNNI3) in one event, respectively. bTAX1BP3 had AR-CNV in two cases, who were siblings. (b) AR-CNVs contributed to the molecular diagnoses in variable ways. The left pie chart compares the percentages of cases with pure CNV contribution (CNV + CNV) versus those with combined CNV and SNV/indel contributions (CNV + SNV/indel). The right pie chart compares the percentages of homozygous CNV (hom CNV) versus compound heterozygous CNV (comp het CNV) within the cases with pure CNV contribution. Blue, percentage of cases with pure contribution from AR-CNVs; maroon, percentage of cases with contributions from both AR-CNV and SNV/indel; brown, percentage of cases with homozygous AR-CNVs; gray, percentage of cases with compound heterozygous AR-CNVs.  (c) Genes are variably affected by AR-CNVs. The genes recurrently affected by AR-CNVs are presented along the x-axis. The last bar (brown) represents the genes (N = 49) that are affected by AR-CNV once in our cohort. The height of each bar represents the total number of cases with contribution from AR-CNVs of the corresponding gene on the x-axis. The colored portions of each bar represent different forms of AR-CNV contribution. The right panel details the different forms of AR-CNV contribution for the uniquely affected genes. (d) Most homozygous AR-CNVs were embedded in a region exhibiting absence of heterozygosity (AOH). The graph compares the AR-CNVs in a homozygous state versus those in a compound heterozygous state. (e) Comparison of heterozygous or homozygous CNV detection by exome sequencing (ES), companion single-nucleotide polymorphism (cSNP) array and chromosomal microarray (CMA). The bar graph shows percentage (y-axis) of detected versus nondetected CNVs by each method (x-axis). Heterozygous and homozygous CNVs are show in separate bars for each method. Light gray, percentage of CNV not detected; dark gray, percentage of CNV detected. The number of CNVs not detected versus detected by each method is listed under the bar graph. SNV single-nucleotide variant.

Seventeen cases had AR-CNVs in trans with SNV/indels (Fig. 3a, b). These include 11 cases with overlapping deletion and SNV/indels, 5 cases with nonoverlapping deletion and SNV/indels, and 1 case with nonoverlapping tandem duplication and SNV/indel alleles. Parental studies were performed to determine the phase of variants except for the WDR19 variants.

Genes were variably affected by AR-CNVs

A total number of 57 AR loci were affected by AR-CNVs in forms of homozygous CNV, compound heterozygous CNVs, and compound heterozygous “CNV + SNV/indel” biallelic genotypes in our cohort. TANGO2 was the most frequently affected gene by AR-CNV (N = 9). Homozygous deletions affecting TANGO2 were identified in six cases, along with compound heterozygous deletions in one case and compound heterozygous “deletion+SNV/indel” in two cases (Fig. 3a, c). Other recurrently affected genes include VPS13B (N = 6), TBCK (N = 5), HBA1/HBA2 locus (N = 4), NPHP1 (N = 4), WWOX (N = 4), STRC (N = 3), and NDE1 (N = 2). Similar to TANGO2, more biallelic CNVs were detected than “CNV + SNV/indel” for recurrently affected genes, except for NDE1 (Fig. 3c). Forty-nine other genes were nonrecurrently affected by AR-CNVs: 33 genes had homozygous CNVs, 4 genes had compound heterozygous CNVs, and 12 genes had “CNV + SNV/indel” genotypes (Fig. 3c).

Homozygous AR-CNVs were associated with AOH

Homozygous pathogenic variants may be embedded in AOH regions formed by haplotype blocks transmitted in a specific population, a result of identity by descent (IBD).18 AOH has also been used to guide identification of variant or discovery of new disease gene.8,19,20,21,22 Recurrent AR-CNV alleles tend to be less frequently detected than SNV/indel due to the limited resolution of genomic assays. We identified homozygous AR-CNVs in 63 patients, 61 of whom had homozygous AR-CNVs affecting a single gene. The remaining two cases had homozygous AR-CNVs concurrently affecting two disease-associated genes, resulting in dual molecular diagnoses. Fifty-eight cases with homozygous deletions had homozygosity mapping data available. Among those, 40 cases (69.0%, 39 deletions and 1 duplication) had homozygous AR-CNVs embedded within an AOH interval with variable sizes (minimum <1 Mb, maximum 46 Mb) (Fig. 3d). This is significantly different from the 22 cases harboring compound heterozygous biallelic CNVs or CNV + SNV/indel, none of which were identified in an AOH region (Chi-square p < 0.0001) (Fig. 3d).

The sum of AOH regions identified in the 40 cases ranged from a single genomic region of 7 Mb to multiple genomic regions summing 367 Mb in each personal genome (Supplemental Table 1). Three cases had a single stretch of AOH with sizes <10 Mb encompassing homozygous AR-CNVs of TBCK, ACBD5, and SPTA1, respectively. Although consanguinity was not indicated for these cases, the observation of the genetic defects being embedded in an AOH region was suggestive of a result of IBD, or a de novo event of UPD. Multiple AOH regions totaling 50 Mb and above were identified in 29 cases, indicating consanguinity between second cousins or closer relatives.

Homozygous recurrent CNVs were identified in unrelated cases. These recurrent homozygous CNVs included TANGO2 exons 3–9 deletion in five cases, VPS13B exons 18–23 deletion in two cases, TBCK exon 23 deletion in four cases, and WWOX exon 7 deletion in two cases. These recurrent AR-CNVs, lacking extensive homologous sequences surrounding the breakpoint region, were unlikely to be events resulting from nonallelic homozygous recombination (NAHR) and may instead be ancestral variants transmitted through generations.

Homozygous AR-CNVs may simultaneously provide more than one diagnosis

Five cases had genomic AR-CNVs affecting both haploinsufficient genes and AR disease genes (cases 1–8, 8–1, 21, 25, and 50), resulting in genomic disorders as well as recessive conditions (Supplemental Tables 1 and 2). Homozygous AR-CNVs affecting three or more genes were not identified among the 63 cases with homozygous AR-CNVs, supporting the expectation that multigene CNVs are more likely to reduce reproductive fitness. We identified two homozygous deletions concurrently affecting two genes, potentially providing dual molecular diagnoses for these two cases (Table 1). The first case (case 44) had a 77 Kb homozygous deletion encompassing the entire SLC3A1 gene (OMIM* 104614), causing cystinuria (OMIM 220100) in either an AD or AR manner, and exons 5–15 of PREPL (OMIM* 609557), causing congenital myasthenic syndrome 22 (OMIM 616224) in an AR manner. The parents were heterozygous for this deletion. SLC3A1 and PREPL map on the human genome in close physical proximity with the last exons overlapping, increasing the chance for a deletion to affect both genes. In fact, deletions with variable genomic span encompassing both genes were reported in individuals with hypotonia–cystinuria syndrome.23 Close gene proximity is associated with a higher frequency of contiguous gene deletion. In the Database of Genomic Variants (DGV,, 11/48,531 individuals were found to carry nonrecurrent deletions involving both genes. Consistently, in the BG databases where ~70,000 cases were tested by CMA, 6 unrelated cases were reported as carriers for such deletions.

Table 1 Deletions affecting consecutive disease-associated genes.

The second case24 (case 52) had an 11 Kb homozygous deletion encompassing exons 1–9 of TNNT1 (OMIM* 191041), causing Amish type nemaline myopathy 5 (OMIM 605355) in an AR manner, and exon 8 of TNNI3 (OMIM* 191041), causing cardiomyopathies (OMIM 115210, 611880, 613286, 613690) in either autosomal dominant (AD) or AR manner (Table 1). Although TNNT1 and TNNI3 are separated by ~2.5 Kb, deletions involving both genes were rare in the general population. Such deletions were not observed in either DGV or the BG internal database. Interestingly, this case had multiple AOH regions totaling 18 Mb, and this deletion was embedded in a stretch of a 3 Mb AOH region, indicating a result of IBD. The parental genotypes were unknown.

Other multiple molecular diagnoses involving AR-CNV

Among the 87 cases with molecular diagnoses attributed to AR-CNV, 17 (19.5%) cases had more than one molecular diagnosis (Supplemental Table 2). These included cases with molecular diagnoses involving additional loci in the mitochondrial and/or nuclear genomes. More importantly, seven cases had AR-CNV related multiple molecular diagnoses. In addition to the two cases described above with homozygous deletion affecting two adjacent disease-associated genes, five large deletions caused haploinsufficiency of dosage-sensitive genes and in the meanwhile unmasked a recessive pathogenic variant allele, resulting in possible dual molecular diagnoses (Table 2). These deletions included a 5.2-Mb deletion of 3p26.3p26.1 causing 3p minus syndrome (OMIM 613792), a 22.0-Mb deletion of 5p15.33p14.3 causing cri-du-chat syndrome (OMIM 123450), a 0.77-Mb deletion of 16p13.11 causing 16p13.11 deletion syndrome,25 a 1.3-Mb deletion of 17p12 causing hereditary neuropathy with liability to pressure palsies (OMIM 162500), and a 2.5-Mb deletion of 22q11.21 causing DiGeorge/velocardiofacial syndrome (OMIM 188400). These deletions unmasked variants including three SNV/indels (SUMF1, NDE1, and COX10) and two AR-CNVs (DNAH5 and TANGO2) at the corresponding loci (Table 2).

Table 2 Contiguous gene deletions unmasked a recessive disease locus.


We focused on the CNV events including deletions and intragenic tandem duplications that are predicted to cause reduced dosage or functional defects of genes associated with AR disorders in a clinical cohort assembled from cases with CMA and/or ES analyses. We identified 87 cases with molecular diagnoses of AR conditions involving CNVs, emphasizing the important contribution of CNVs to disease etiologies of AR diseases. Our data demonstrate the clinical utility of integrated CNV and SNV/indel analyses for a more comprehensive molecular diagnostic evaluation.

This study suggested that AR-CNVs may be underrecognized for AR conditions. Nine loci (TANGO2, VPS13B, TBCK, HBA1/HBA2, NPHP1, WWOX, STRC, and NDE1) were affected by AR-CNVs in more than one case in our cohort (Fig. 3c). CNV alleles of these loci outnumbered SNV/indels, indicating remarkable contribution of CNV to disorders associated with these loci. However, this observation may be biased by our cohort being more likely to contain cases with heterogeneous and clinically unrecognizable phenotypes and by the diagnostic techniques. In fact, the deletion alleles of NPHP1 and HBA1/HBA2 are well-known major pathogenic alleles for the NPHP1-associated ciliopathies and α-thalassemia, respectively, and the high carrier frequencies of these deletions in the general population have demanded extensive carrier screening.6,9,26,27,28 The recurrent NPHP1 deletion is mediated by NAHR and no ethnic specificity is reported,27 while deletions involving HBA1/HBA2 are known to include multiple types highly specific to ethnicities.29 Defects of VPS13B cause Cohen syndrome (OMIM 216550) in an AR manner. Numerous SNV/indels or gene-disrupting CNVs resulting in VPS13B defect have been reported in individuals with Cohen syndrome, among which deletions or duplications of VPS13B were observed at high frequency.30 TANGO2, TBCK, WWOX, and NDE1 are recently identified AR disease genes that were associated with disorders with extensive phenotypic heterogeneity. Although a limited number of disease-causing alleles were identified for these genes, CNV alleles appeared to constitute a relatively large fraction of the mutant alleles. The TANGO2 exons 3–9 deletion was recurrently observed in the Hispanic/European descent, while the exons 4–6 deletion allele has been confined to the Arab population to date.16 Another recurrently identified deletion in our cohort is the TBCK exon 23 deletion. Among the five cases with ethnicity information, three were European Caucasian, one was Middle Eastern, and one was South Asian (Supplemental Table 1). Such diversity may represent a genetic drift event after the origin of the variant allele in the ancestral population, or de novo events in diverse populations due to locus-specific genomic instability. Our data are limited due to the heterogeneous nature of the clinical cohort, thus a comprehensive analysis of the correlation between the CNV alleles and ethnicity was not performed. The gnomAD and BG internal CMA databases contained heterozygous AR-CNVs in the above loci (Supplemental Table 3 and 4). Further analysis is warranted especially for genes with recurrent AR-CNVs.

The contribution of AR-CNVs may be underrepresented in our cohort due to the technical limitation of CNV detection methods. SNP array, CMA, and ES were the three major assays used for CNV detection in this study. The number of probes on a SNP array can range from hundreds of thousands to millions, which potentially provides high resolution CNV detection with larger number of SNP probes. However, the design of probe coverage depends on the SNP distribution, hence noncustomizable coverage design. CMA at BG used the array comparative genomic hybridization (aCGH) platform, allowing customized probe design at regions of interest. This provides ultrahigh resolution at targeted regions and enables CNV detection at the exonic level, which is less achievable by SNP arrays.22 Numerous computational algorithms have been developed to improve the CNV detection using capture-based ES data,31 which remains challenging for clinical usage due to suboptimal sensitivity and specificity and requires extensive confirmation by secondary methods such as aCGH or multiplex ligation-dependent probe amplification (MLPA).3 The sensitivity of ES in clinically relevant CNV detection has been demonstrated to be higher with larger (e.g., Mb sized) events.32,33 Homozygous or hemizygous deletions can be readily detected by ES.8 Interassay comparison among ES, CMA, and cSNP array used in this study shows that ES has a higher sensitivity for detecting homozygous CNVs than CMA (Fig. 3e). The missed CNV detection by CMA may be caused by lack of probe coverage due to novel disease loci or unavailability of appropriate probes in that region. cSNP array has the lowest sensitivity for homozygous CNVs, because many such CNVs in our cohort are small, exonic CNVs, which are beyond the resolution of the cSNP array. For heterozygous CNVs, CMA apparently has the highest sensitivity because of exon-by-exon coverage of the CMA design, which allows detection of ultrasmall CNVs. Intra-assay comparison of three assays suggests that ES has a higher sensitivity for homozygous events than heterozygous ones, while cSNP array and CMA have comparable CNV detection capability for either heterozygous or homozygous events (Fig. 3e). cSNP array has overall low sensitivity for both heterozygous and homozygous events, which may be caused by the lack of SNP probe coverage for certain regions, especially small ones, of the genome.

Small heterozygous duplications/deletions involving few or partial exons remain challenging for ES. For scenarios where a deletion overlaps another deletion or SNV/indel in trans, the resulting call of a homozygous deletion or SNV/indel may trigger an alert for an overlapping deletion event. However, for nonoverlapping CNVs or SNV/indels, the CNV events may potentially be missed if they are below the resolution of detection, and therefore underrecognized due to assay limitation. The vast majority of AR-CNVs detected in our study are deletions. Only two duplications were detected in our cohort, consistent with the technical challenges of detecting small duplication events.

Concurrent analyses of both SNV/indels and CNV are needed for a more comprehensive evaluation of the genetic changes underlying the personal genome of a clinically affected individual. For example, in our cohort, one case had a COX10 SNV/indel in trans with the 17p12 deletion. The COX10 gene spans one end of the breakpoint of 17p12 deletion, and therefore one copy of the gene is disrupted in individuals with such deletion. The identified hemizygous p.M426_L427dup variant in exon 7 on the intact allele in combination of the deletion resulted in COX10 deficiency. Notably, more than 20 years ago, COX10 deficiency had been predicted for individuals with a more complicated clinical phenotype involving mitochondrial myopathy in addition to a neuropathy.34

CMA serves as the first-tier genome-wide assay for individuals with neurodevelopmental disorders, with a 10–20% diagnostic yield.5 ES can effectively provide potential molecular diagnoses for 25% or more of individuals affected with rare genetic disorders.13,14,35 Recent publications have reported an ~2% increase when implementing CNV detection in ES, demonstrating the advantage of integrating both CNV and SNV/indel analyses.17,32,36 In our cohort, homozygous deletions were mostly detectable by exon-focused CMA or ES read-depth analysis. For recessive disorders involving both CNVs and SNV/indels, ES is needed to provide the SNV/indel findings to corroborate with the CNV findings. However, ES, CMA, or SNP arrays are not routinely used to detect balanced structural variants, such as inversions and balanced translocations. Genome sequencing (GS) provides CNV detection capacity comparable with CMA,2 as well as potential detection of balanced structural variants that are not readily detectable by CMA or ES,37 offering a unique opportunity to interrogate both SNV/indel and CNV/SV in one assay.

Molecular diagnoses involving two or more disease loci were reported in 4.9% of cases positive by ES.38 In this study, we identified two or more molecular diagnoses in 19.5% (17/87) of cases (Supplementary Table 2). This percentage is significantly higher than the rate previously observed in 2076 cases with positive molecular findings,38 a population without a preselection for CNV contributions (Fisher’s exact test p < 0.0001). This high rate is not unexpected when compared with the estimated multilocus diagnoses under a Poisson model (14.0%) or independence model (26.4%).38 The high multilocus diagnoses rate in our cohort may be largely attributed to the CNV contributions. In the 17 cases identified with multiple molecular diagnoses, 11.8% (2/17) were attributed to homozygous AR-CNVs affecting two disease-associated loci, and 29.4% (5/17) were attributed to large genomic deletions that unmasked a recessive disease allele. Therefore, multiple molecular diagnoses may be related to a genomic disorder. The recessive conditions unmasked by a genomic deletion may contribute to a more complicated clinical presentation. Note that some genomic deletions, such as the deletions of 16p13.11 and 17p12, have reported incomplete penetrance or an age-dependent disease manifestation. These deletions may be present in asymptomatic individuals and run through generations, further complicating genetic counseling and reproductive risk assessment. If we exclude the cases with multilocus molecular diagnoses due to genomic deletions, the multilocus diagnosis rate (11.5%, 10/87) is still significantly higher than the previous observation of ~5% (Fisher’s exact test p = 0.0119). Although the majority of multiple molecular diagnoses are attributed to SNV/indels, attention needs to be paid to CNV contribution especially for recessive disorders. Nonetheless, it is always recommended to examine for potential CNV alleles for an AR gene found with one pathogenic variant in an individual with related phenotypes.

Currently, CMA and ES are the two most frequently used approaches for genome-wide detection of genetic variants. Combined ES and CMA would provide more informative molecular diagnosis although the combined cost is high. Alternatively, sequential testing can be used. For cases with prior CMA results, ES should be considered for those with negative results, and for those with positive CMA results yet not fully explaining clinical phenotypes based on clinical correlation. For cases with one heterozygous finding in a gene highly specific to clinical phenotypes by ES, additional CMA or targeted CNV analysis should be considered.

In summary, we described different ways by which CNVs may contribute to AR disorders. Similar to SNV/indels, AR-CNVs may occur ancestrally and transmit through generations. Homozygous AR-CNVs may result from IBD. AR-CNVs may be of higher carrier frequencies in the general population for specific AR diseases, suggesting expanded carrier screening for both SNV/indels and CNVs. Since AR-CNVs may contribute to multiple molecular diagnoses via concurrent impact on contiguous disease genes, or by unmasking of recessive disease alleles, a comprehensive genomic evaluation, such as combined CMA and ES analyses or perhaps GS, should be considered for individuals with complex or atypical phenotypes.