INTRODUCTION

The use of genomic microarrays in the diagnostic laboratory setting has enhanced the ability to detect copy-number variation (CNV) that underlies the pathogenicity of human disease. Genomic microarrays for copy-number assessment, both array-based comparative genomic hybridization (aCGH) and single-nucleotide polymorphism (SNP)–based microarrays, are collectively described as “chromosomal microarrays” (CMA). These assays have evolved over time, from low density targeted bacterial artificial chromosome (BAC) arrays to the now high density copy-number plus SNP arrays that not only detect CNVs but also absence of heterozygosity.1,2 The appreciation for the diagnostic utility of CMA is reflected in numerous consensus statements establishing these assays as first-tier tests for many clinical indications.3,4 Additionally, reporting of CNVs from genome-wide sequencing is increasingly becoming a standard approach, with performance for CNV detection near or exceeding CMA.5 An incidental casualty of such comprehensive genomic screens is the detection of variants of uncertain clinical significance. The goal of clinical interpretation and reporting strategies for discovered variants is to maximize return of useful diagnostic findings, while minimizing the return of uncertain variants, especially those that are likely benign.

When first introduced, professional guidelines for CMA required that positive findings be orthogonally confirmed, usually by fluorescence in situ hybridization (FISH);6 however, current guidance no longer recommends this routine practice.7 Given that most FISH assays have a lower limit of detection of ~200 kb for deletions and ~500 kb for duplications, reporting thresholds based on these, or similar, sizes were adopted by the majority of clinical laboratories for CMA findings, and have persisted for more than a decade.

Interpretation of the clinical significance of CNVs is complex and thoroughly addressed by recently updated American College of Medical Genetics and Genomics (ACMG)/ClinGen guidance.8 Interpretive considerations for nonrecurrent duplications can be broadly grouped into three categories: (1) evaluation of fully encompassed genes for potential triplosensitive phenotypes; (2) evaluation of genes mapping to duplication breakpoint(s), for potential haploinsufficient phenotypes; and (3) consideration of more rare pathogenic mechanisms such as gene disruption at other loci due to insertional translations9,10 or positional effects on gene regulation.11 Considering the first category, triplosensitivity is an uncommon cause of abnormal phenotypes.12 Therefore, it is not surprising that even moderately sized (500 kb–1 Mb) duplications are frequently observed in the general population without observed clinical consequence.13 Considering the second category is less straightforward. Intragenic duplications with both breakpoints within the same gene are generally expected to disrupt coding sequence and result in loss-of-function variants. However, as the majority of interstitial duplications have a direct tandem orientation, partial gene duplications with only a single breakpoint in the gene of interest most often occur without associated gene disruption.14 Therefore, small duplications (<500 kb) involving only a single breakpoint in dosage-sensitive genes are almost always inherited and inferred to have an intact copy of the gene(s) at the breakpoint; data supporting this observation are presented below. Considerations involving the third category are more complex, and often outside the available scope of a routine clinical study. However, technological and knowledgebase advancements continue to evolve, allowing for increasing diagnostic clarity in such complex cases.

After more than 15 years of experience interpreting, reporting, characterizing, and performing parental/familial follow-up testing for duplication CNVs, our laboratory has appreciated limited clinical impact for duplications less than 1–2 Mb in size initially assessed to have uncertain clinical significance. We designed a retrospective analysis of our experience to provide evidence for this assumption and to support a clinically appropriate refinement of our laboratory reporting criteria.

MATERIALS AND METHODS

A retrospective analysis of all clinical postnatal chromosomal microarray cases reported from 2006 to 2016 was performed to identify all cases with reported duplication CNVs (n = 4059). During this time period, using standard interpretive approaches,15,16 all pathogenic or likely pathogenic duplications were reported, regardless of size. Duplications with breakpoints in genes with known or strongly suspected haploinsufficiency phenotypes were also reported, regardless of size. Other duplications of uncertain clinical significance were only reported when exceeding 500 kb in size. Duplications of any size without protein-encoding genes or known functional elements were not reported, nor were events representing common benign polymorphism, therefore this series does not include such duplications. Of the 4059 cases with reported duplications, 1624 cases were analyzed using the CytoScan HD microarray (Thermo Fisher Scientific, Waltham, MA) and 2435 cases were analyzed using a custom 44K or 180K Agilent microarray (Agilent, Santa Clara, CA). Pathogenic and likely pathogenic intragenic duplications in known haploinsufficient genes with both breakpoints within the gene (high likelihood of pathogenic loss-of-function variants) were excluded from further analysis. Duplications mapping to the sex chromosomes were excluded from further analysis given the frequency of maternal inheritance and the complexity in deciphering pathogenicity and clinical significance. We further restricted our focus to those cases with available parental inheritance information, either both parents tested, or a single parent tested with documented inheritance. These cases were binned by size (<500 kb, 500 kb–1 Mb, 1 Mb–2 Mb, >2 Mb) to roughly determine the proportion of de novo (and thus more likely pathogenic) duplications in each size range. Statistical analysis was performed using Fisher's exact test; two-sided P value <0.05 was considered statistically significant.

RESULTS

Of the 4059 total cases reported with duplications, there were a total of 708 duplication CNVs less than 500 kb (reporting in this size range already is restricted to known or suspected pathogenic findings, therefore this is an underrepresentation of observed duplications in this size range), 1503 cases with duplication CNVs in the size range of 500 kb to 1 Mb, 689 cases with duplication CNVs in the size range of 1 Mb to 2 Mb, and 1159 cases with duplication CNVs greater than 2 Mb (Table 1). Of these total reported cases, 1112 met our inclusion criteria defined in “Materials and Methods”: 197 duplication CNVs less than 500 kb, 515 cases with duplication CNVs in the size range of 500 kb to 1 Mb, 235 cases with duplication CNVs in the size range of 1 Mb to 2 Mb, and 165 cases with duplication CNVs greater than 2 Mb (Table 1). Using inheritance pattern as a rough proxy for likelihood of pathogenicity, we demonstrated a de novo origin in 4% (8/197) of selected duplications less than 500 kb, 6% (30/515) of duplications in the size range of 500 kb to 1 Mb, 14% (32/235) of duplications in the size range of 1 Mb to 2 Mb, and 28% (46/165) of duplications greater than 2 Mb. We observed a statistically significant increase in the occurrence of de novo duplications between those binned at 500 kb–1 Mb and those binned at 1 Mb–2 Mb (p = 0.0005) (Fig. 1a). Of note, there is an estimated background rate of de novo CNVs in control populations (estimated at ~1%)17 that could be represented in the data set, as well as a rate of misattributed paternity (estimated at ~1%), which is known to result in the false assessment of a small proportion of de novo cases;18 however, this should apply to a similar proportion of cases in each size category.

Table 1 Cohort of duplication cases included in the study binned by size.
Fig. 1: Size distribution of de novo duplications and RefSeq gene content assessment.
figure 1

(a) A statistically significant increase in the occurrence of de novo duplications at the size cutoff of 1 Mb. (b) Boxplot distribution of RefSeq gene content of de novo and inherited duplications binned by size.

As discussed previously, duplications less than 500 kb are reported at the Mayo Clinic when (1) one or more genes fully contained within the interval has documented triplosensitivity; (2) they are fully intragenic (both breakpoints within the gene), with potential/expected loss of function of a haploinsufficient gene; or (3) there is a concern for potential gene disruption of a haploinsufficient gene harboring a single duplication breakpoint. Pathogenic or likely pathogenic small duplications meeting criteria 1 or 2 were removed from this analysis, as these are always reported in our laboratory. A total of 197 cases were reported with concern for disruption of a haploinsufficient gene (due to a single duplication breakpoint within that gene) and had parental follow-up (Table 1). More than 95% of these duplications were inherited, suggesting that the vast majority of these events are not pathogenic. This is consistent with the demonstration by Newman et al. that the majority of such duplications are direct tandem events that are not disruptive to genes at the breakpoint.14

To further assess the utility and clinical impact of parental follow-up for small duplications, we retrospectively reviewed the 30 duplications 500 kb to 1 Mb in size found to have de novo inheritance after reporting (See Supplemental Table 1). Eleven of the 30 duplications (37%) would continue to meet reporting criteria in our laboratory, regardless of size, due to documented or high likelihood of pathogenicity or known recurrent regions that have documented reduced penetrance or variable expressivity. The remaining 19 duplications were classified as uncertain at the time of original reporting, and remain classified as uncertain today leveraging the current knowledgebase and ACMG/ClinGen guidelines, despite de novo inheritance.

Lastly, in an attempt to further refine the reporting thresholds, we used our data set to compare the RefSeq gene content of de novo and inherited duplications within each size bin. We identified a median number of 5 genes for inherited duplications and a median number of 11 genes for de novo duplications in the size range of 500 kb to 1 Mb, a median number of 14 genes for inherited duplications and a median number of 32 genes for de novo duplications in the size range of 1 Mb to 2 Mb, and a median number of 52 genes for inherited duplications and a median number of 64 genes for de novo duplications greater than 2 Mb. The variability around the median, however, was pronounced (Fig. 1b).

DISCUSSION

Here we report the retrospective assessment of a large series of duplication CNVs reported in a diagnostic laboratory setting in an attempt to evaluate and refine reporting criteria. Our data suggest that, apart from fully intragenic duplications expected to result in loss-of-function variants in haploinsufficient genes and those fully encompassing known triplosensitive genes, duplication CNVs less than 1 Mb are very unlikely to be clinically significant in the absence of additional evidence of pathogenicity. Approximately 95% of duplications less than 1 Mb were shown to be inherited from a carrier parent, suggesting that the majority of these reported duplications are likely benign familial variants. Our retrospective assessment of the small number of de novo duplications in this size range supports our assertion of minimal clinical utility of reporting duplications <1 Mb of uncertain clinical significance, as those that were classified as uncertain remained so even after the finding of de novo inheritance and maturation of our collective knowledgebase (see Supplemental Table 1). There are caveats to this assumption that are important to acknowledge. For example, a small proportion of the remaining 5% of cases with de novo duplications 500–1 kb in size actually represent inherited variants with misattributed paternity (estimated at ~1%),18 and a small proportion of de novo cases will simply represent spurious new variants unrelated to the proband’s phenotype of interest.17 Additionally, incomplete penetrance or variable expressivity of abnormal phenotypes complicates the assumption that CNVs inherited from unaffected parents are not clinically significant. Therefore, we are making the assumption of pathogenicity in aggregate across the large data set, with the understanding that a small proportion of inherited cases do have relevant clinical significance and a small proportion of de novo cases do not have relevant clinical significance.

Our assessment of the number of genes in inherited versus de novo duplications supports the general assertion that duplications containing larger numbers of protein-encoding genes are more likely to have clinical significance; however, the significant variability around the median in these data did not allow for further confident refinement of reporting thresholds beyond those established on the genomic size of the duplication. Regardless, the number of genes in a CNV interval is now included in the current pathogenicity assessment rubric per ACMG/ClinGen guidance,8 therefore the total gene count will influence reporting decisions.

In conclusion, this retrospective analysis supports a more clinically appropriate reporting threshold of ≥1 Mb for duplications of uncertain clinical significance. This revised reporting threshold is applicable to duplication CNVs detected by any methodology, including CMA and exome/genome sequencing. Importantly, any duplication <1 Mb will continue to be reviewed and will be reported in our laboratory when there is known or highly suspected clinical significance. This threshold change is estimated to impact approximately 5% of CMA cases, which are currently reported as uncertain findings and, after family studies, presumed to have unlikely significance. This reporting policy change will result in significant savings of health-care costs and eliminate the uncertainty and burden of further testing for the families involved. As variants not meeting reporting criteria are maintained in laboratory databases, they will continue to contribute to the growth of our collective knowledgebase such that when the pathogenicity of smaller CNVs is more clear, they can be reported with clinical confidence.