Copy number variants in clinical WGS: deployment and interpretation for rare and undiagnosed disease

Purpose Current diagnostic testing for genetic disorders involves serial use of specialized assays spanning multiple technologies. In principle, whole genome sequencing (WGS) has the potential to detect all genomic mutation types on a single platform and workflow. Here we sought to evaluate copy number variant (CNV) calling as part of a clinically accredited WGS test. Methods Using a depth-based copy number caller we performed analytical validation of CNV calling on a reference panel of 17 samples, compared the sensitivity of WGS-based variants to those from a clinical microarray, and set a bound on precision using orthogonal technologies. We developed a protocol for family-based analysis, annotation, filtering, visualization of WGS based CNV calls, and deployed this across a clinical cohort of 79 rare and undiagnosed cases. Results We found that CNV calls from WGS are at least as sensitive as those from microarrays, while only creating a modest increase in the number of variants interpreted (~10 CNVs per case). We identified clinically significant CNVs in 15% of the first 79 cases analyzed. This pipeline also enabled identification of cases of uniparental disomy (UPD) and a 50% mosaic trisomy 14. Directed analysis of some CNVs enabled break-point level resolution of genomic rearrangements and phasing of de-novo CNVs. Conclusion Robust identification of CNVs by WGS is possible within a clinical testing environment, and further developments will bring improvements in resolution of smaller and more complex CNVs.


Introduction
Variation in DNA copy-number is a well-described cause of human genetic disease (Lupski, 2015). Copy-number variants (CNV) associated with human pathologies range from chromosomal aneuploidy, to micro-duplication and -deletion syndromes, and include smaller structural variants that affect single genes and exons (Harel and Lupski, 2017;Lupski, 2015;Riggs et al., 2012;Swaminathan et al., 2012;Vulto-van Silfhout et al., 2013). Karyotype and microarray have served as gold-standards in molecular diagnostics for CNVs, but the increasing number and complexity of possible genomic changes requires testing that can simultaneously address the complete range of cytogenetic abnormalities and smaller structural variants.
Whole genome sequencing (WGS) can be used to detect almost all classes of alleles. It is sensitive and specific for SNVs and indels (Kim et al., 2017;Van der Auwera et al., 2013), enables detection of complex repeat expansions (Dolzhenko et al., 2016) and proof-of-principle studies have shown the ability to detect copy-number (Abyzov et al., 2011;Roller et al., 2016) and structural variation (Sudmant et al., 2015). Approaches have been developed to enable CNV detection using other next generation sequencing (NGS) panels and exomes, which have improved diagnostic yield (Eisenberger et al., 2013;Tian et al., 2015), but have technical limitations arising from non-uniform sequencing depth, PCR artifacts, GC bias, and a larger variance in allele fraction (Lelieveld et al., 2015;Linderman et al., 2014;Meienberg et al., 2016;Meynert et al., 2014). In contrast, WGS sequencing depth is predictable and robust throughout the genome (Lionel et al., 2017;Meienberg et al., 2016), and eliminates the hazard of capture and PCR-based artifacts. This uniformity of signal enables sample-specific depth normalization that eliminates the need for batch processing (Boeva et al., 2012;Roller et al., 2016). Furthermore, coverage of the non-coding genome (including deep intronic regions) allows for increased resolution to detect small CNVs, more accurate estimation of variant boundaries, and in many cases direct evidence for the underlying DNA rearrangement via observation of paired-sequencing read alignments (Newman et al., 2015;Sudmant et al., 2015).
Here we describe the deployment of a CNV detection pipeline as a component of a clinical WGS (cWGS) diagnostic test for patients with rare or undiagnosed genetic disease (RUGD). As a single assay, WGS has the potential to benefit RUGD patients by enabling detection of multiple variant types simultaneously, decreasing the number of molecular tests performed, increasing the range of detectable disorders, and shortening the diagnostic odyssey. Below we describe the technical feasibility assessment and validation of WGS-based CNV calls compared to a microarray-based clinical diagnostic test, and the deployment of CNV calling as part of a cWGS test for RUGD.

CNV truthset generation and sensitivity assessment
Twenty reference samples (Coriell, Camden, NJ) events were chosen for validation (Table S1). Among these, 18 samples had known pathogenic CNVs representative of a large size range and inclusive of deletions and copynumber gain, and two samples were included as negative controls. Prior to sequencing and analysis, coordinates for 'truth-set' CNVs were compiled from descriptions on the Coriell website, reference publications or previously conducted microarray-based CNV analyses (Table S1, Tang et al). We note that while all cell-lines contain pathogenic CNVs which established the baseline for our sensitivity analysis, we also examined other all other CNVs detected in these samples by either microarray or WGS.
DNA samples were procured from Coriell and libraries were prepared using the Illumina TruSeq PCR-free kit and sequenced on HiSeq X with paired-end 150bp reads in the Illumina Clinical Services Laboratory (Illumina Inc, San Diego CA). Data were mapped to the hg19 reference genome with the ISAAC aligner (Raczy et al., 2013). The resulting BAM files were analyzed with the Canvas CNV caller (Roller et al., 2016, see also "Copy number variant detection and filtering"). In parallel, samples were assessed by an external clinical microarray lab (CombiMatrix Diagnostics, Irvine, CA), which included profiling on an Illumina 850k feature SNP array followed by automated CNV calling and manual curation by trained cytogeneticists. One sample failed microarray analysis, resulting in 17 positive control samples for further analysis.
To assess sensitivity, WGS and microarray call sets were compared (requiring 50% or 75% overlap) with reference calls (Table 1, Table S2, see Supplemental Note). For false-negative calls or calls with only partial overlap with the reference call, visualization of depth and microarray data was conducted to assess the accuracy of the call boundaries or identify discrepancies of WGS based boundaries with the vendor-supplied CNV annotation (Results and Supplemental Note).

Assessment of cWGS CNV calling false positive rate
To determine an upper bound on the false positive rate for the WGS Canvas CNV calling pipeline, we systematically assessed CNV calls made on Platinum Genome NA12878 (Eberle et al., 2017) using the 1000 Genomes Project NA12878 reference assembled using a combination of Pacific Biosciences (PacBio) and BioNano DNA data (Pendleton et al., 2015). A CNV call was considered validated if there was at least 75% overlap with BioNano call boundaries. Calls with partial overlap were manually curated to assess possible false positive or partially called BioNano CNVs resulting in a low overlap (see Supplemental Note). CNVs that were not called in the PacBio+BioNano dataset were manually reviewed for the presence of discordant sequencing reads spanning the boundaries of a deletion or copy-number gain (indicative of a tandem-duplication), and the presence of hemizygous and homozygous deletions with similar breakpoints in an independent set of samples from population controls (N=3000).

Clinical cohort inclusion criteria
Variant calling and interpretation of CNVs was deployed into the Illumina Clinical Services Lab (ICSL) as part of routine practice for RUGD cases finishing sequencing and primary analysis on or after June 2, 2016. Clinically relevant losses or gains greater than 10kb were reported. Since that time, 79 patients were consented for the TruGenome Undiagnosed Disease test to be performed by ICSL. These patients had a wide spectrum of phenotypes, as well as previous testing ranging from no prior molecular investigations to panels and whole exome sequencing. The age at the time of testing ranged from 1 year to 20 years.

Copy number variant detection and filtering
CNV detection and annotation were validated and deployed using paired-end 150 nucleotide HiSeqX sequencing runs, processed through the Illumina's secondary analysis pipeline for short-read alignment and variant calling.
CNVs are called from the generated BAM files using version 1.3.9 of the Canvas caller (Roller et al., 2016) under its germline WGS setting, with modifications to the default calling parameters as follows: • The circular binary segmentation (CBS) segmentation algorithm is used as opposed to the Haar wavelet based default in Canvas v1.3.9. This is specified as a parameter on the Canvas command line invocation.
• In practice we often see fragmented large CNV events. To limit this, candidate CNV calls spaced by less than 100kb are merged into a single call. When such a merge occurs, the magnitude of the gap between segments and any implications on variant interpretation are assessed during manual curation.
• Support thresholds for candidate CNVs were dropped to 8 depth bins to increase sensitivity in 10-50 kB range (a depth bin is defined as a sequence range with an expected 100 reads mapping).
• An automated b-allele based ploidy correction step was omitted, to limit false negatives. Screening for presence of heterozygous variants in a candidate deletion was deferred to the manual curation stage.
• To include common variation, the grey list of filtered regions supplied within Canvas was omitted and replaced with a minimal list of chromosomal segments covering centromeres.
• Canvas quality scores were not used as a filter for candidate CNV events.
All CNV calls are processed through a series of automated filtering steps ( Figure S1) to reduce false positives and limit downstream CNV curation ( Figure S3) to those likely to have medical relevance.

Canvas grey-list filter
Canvas provides a set of grey-list regions that contains problematic genomic segments as well as common CNVs.
In filtering, CNVs that have greater than 50% of their range spanned by grey list regions are filtered out.

Gene annotation and filtering
CNVs are annotated with overlapping or nearby (<5kb away) genes using RefSeq gene definitions. Calls with no gene annotation are filtered.
Population frequency annotation and filtering CNV population frequency is estimated using an internal database of samples sequenced in the Illumina services lab and individually normalized through the Canvas CNV calling pipeline. Binned sequencing depth data (an intermediate output of Canvas) is mapped to a fixed 300bp uniform coordinate system to allow for efficient storage and recovery of data across many samples.
Due to uncertainty in the boundaries of many CNV calls, a heuristic calculation of CNV population frequency is implemented that includes (1) interrogation of the aggregate sequencing depth data across 3,000 genomes for the genomic interval defined by the CNV boundaries; (2) mean depth analysis for each sample compared to predefined thresholds calculated from the population given the expected ploidy of the region; and (3) the fraction of the population samples consistent with the proband GAIN or LOSS status is calculated. Note that for events on a sex chromosome, only samples with the same gender from the population are queried, and that this does not account for the magnitude of the copy number change, but rather only the direction of the change from the diploid or haploid expectation.
For interpretation, CNVs with a population frequency higher than 10% (~5% allele frequency) are filtered out, and the vast majority of CNV calls in the 1-10% range are not classified as being clinically significant after review ( Figure S2).

Interrogation of clinically relevant CNVs
When a CNV is annotated as of clinical interest by the curation process, additional bioinformatic analysis may be conducted to provide further annotation in order to aid in variant interpretation.

CNV Phasing
Where possible, the parental phasing is assessed by genotyping parental haplotypes using depth information, or using inheritance patterns of small variants when no evidence of a depth change is present in a parent (e.g. denovo CNVs or duo cases where there is no evidence of a CNV in the sequenced parent).
The de-novo CNV phasing algorithm first constructs prior state probabilities given the genotypes of the parents and the known copy-number of proband. Given the prior probabilities of each transition, we compute the model likelihood for all possible inheritance assumptions, and the most likely model is selected.
For example, at a haploid (copy number 1) site where the mother is heterozygous (0/1) and the father is homozygous reference (0/0) for a given SNV:  under the assumption that the CNV is inherited from the father, the probability of a REF or haploid SNV call in the proband are both 50%  under the assumption that the CNV is inherited from the mother, the probability of a REF or haploid SNV call in the proband are 100% and 0% respectively Probabilities for all inherence assumptions are calculated across all SNVs within the target region are calculated and the model is selected via a maximum likelihood criteria. For details of inheritance models and examples, see Supplemental Note.

Interrogation of structural variation at CNV boundaries
Sequencing reads adjacent to CNVs can provide evidence of complex chromosomal rearrangements. For CNVs indicative of large structural variants -including terminal chromosomal deletions, large tandem duplications and break-ends spanning non-homologous chromosomes -the Manta structural variant caller (Chen et al., 2016, version 0.29.3) was employed for further investigation. This enabled breakpoint linkage across multiple CNVs, and provided evidence for insertion of duplicated sequence into a chromosome. Additionally, reassembled breakpoints were visualized via realignment of sequencing reads using the SVViz program (Spies et al., 2015).

Validation of the CNV calling pipeline
Assessment of 17 reference samples with known pathogenic CNVs (Table S1) showed that cWGS had greater sensitivity to detect known CNVs compared to microarrays (86% versus 64%, Methods, Table 1, Table S2) with the largest difference in smaller (<50kb) events (Table 1). For the five 'truth set' CNVs not recovered by cWGS, manual inspection of sequencing and genotyping arrays did not support a CNV in these regions (Methods, Supplemental Note).
To assess the cWGS false positive rate, CNVs were called on the 1000 Genomes Project sample NA12878 and assessed against an orthogonal technology genome assembly (Sudmant et al., 2015). Among 93 deletions, 48 had analogs in a dataset derived from long-read sequencing technology (Pendleton et al., 2015) (BioNano calls, Methods), and nine additional calls were supported by the presence of discordant sequencing reads and/or evidence of Mendelian inheritance across a population of samples. Thirty-nine percent (36/93) of the cWGS NA12878 CNV calls do not have support from orthogonal data or independent bioinformatics analysis, which will include both false positives and suspected true calls without external support. Because of limitations in all predicate CNV calling methods, the discrepancies may arise from either WGS or the alternative platforms.
We found that the majority of putative cWGS false positives can be addressed with minimal heuristic filters. Specifically, application of a size filter restricting CNVs >10kb, removal of CNVs in regions that have variable data quality, putative mosaics, and eliminating CNVs found in >10% of the population (drawn from a cohort of more than 3000 genomes, see Methods, Figures 1-3) reduced our CNV interpretation burden to an average of 11 calls per case (range 2-26). Given these findings, these heuristics were deployed as a component of the clinical WGS pipeline. All CNV calls n=36 23 (64%) 31 (86%) * 50% overlap Note that +1 indicates calls that were not in call set, but recovered in manual review Seventy nine clinical cases were processed through the validated cWGS CNV pipeline between June 2, 2016 and April 19, 2017 and subjected to automated quality control, filtering, annotation, and visualization (Figures S1-S3, Methods). CNVs were curated and classified following the guidelines of the American College of Medical Genetics and Genomics (ACMG) ( (Kearney et al., 2011;South et al., 2013), Methods, Figure S2a-d). After filtering based on internal allele frequency, on average, we reported 3 benign, 3 VUS-likely benign, and 4 VUS CNVs per case (Figure S2e-g). In 15% (11/79) of cases, we reported variants with pathogenic or "uncertain significance -likely pathogenic" classifications across a diverse set of patient phenotypes ( Table 2). Clinical diagnosis of Down syndrome (confirmed by karyotype). Additionally, patient is reported to have phenotypic features that are not consistent with a diagnosis of Trisomy 21, including hypotonia that is more marked than expected, dermal ridge patterns with more arches than are typical, a small phallus, infantile spasms that are controlled on medication, a skeletal and long myopathic face, lumbar lordosis, scapular winging, tapered calves, absent reflexes, ptosis, and a lurching pelvis when walking. The patient is unable to walk without a walker. Family history is notable for nemaline myopathy. The patient's mother's phenotype includes weakness in childhood and a muscle biopsy revealing nemaline rods, consistent with a diagnosis of nemaline myopathy. She could not run or jump as a child, although her clinical presentation improved over time. Currently, she is reported to have few residual features although cannot run.
Note: CNV analysis revealed trisomy 21; In addition, a maternally inherited variant classified as likely pathogenic in ACTA1 was identified in both the proband and mother.

Resolution of complex CNVs
We found that the combination of depth-based CNV calling and the utilization of discordant read-pair information and reassembly of breakpoints can enable deconvolution of complex rearrangements. In one example, family based CNV analysis of case P1 identified a 55kb de-novo deletion of the first three exons of ZC4H2 on the paternal X chromosome (Figure 1), consistent with Wieacker-Wolff syndrome ( Table 2). Structural variant analysis (Methods) identified evidence for a tandem duplication in the proband's father, sharing a breakpoint with the deletion in the proband (Figure 1b). Read-depth information from the father shows a copy-number gain directly upstream of the de-novo deletion in the proband (Figure 1a). Taken together these data likely indicate a multi-stage repair mechanism contributing to the copy-number loss in the proband. Similarly, case P11 harbored multiple CNVs indicative of a large chromosomal disruption event (Fukami et al., 2017;Liu et al., 2017) including 15.5 MB and 2.5MB deletions on 6q along with a 2MB copy-number gain on 11p ( Figure S4a, Table 2). Inspection of structural variation near these events yielded discordant reads spanning the two deletions, supporting the presence of simple deletions as opposed to more complex events such as translocations or inversions (Figure S4b), however such complex structural variation cannot be definitively ruled out. Structural variation near the boundaries of the 11p gain indicated an insertion of this duplicated DNA segment into 17q21.3 ( Figure S4c).
Copy number analysis of case P7 (Table 2) identified a 7MB de-novo terminal duplication on 16q and a 3MB denovo terminal deletion on 2p (Figure 2a-b). Analysis of variant allele frequencies overlapping these two CNVs phased both alterations to the paternal chromosome (Figure 2c-d, Methods, Supplemental Note). Subsequent analysis of sequencing reads provided evidence for a balanced translocation in the father as well as the unaffected sister, while the proband had support for an unbalanced translocation (Figure 2e, Figure S5, Methods).
In case P6 we observed a similar unbalanced translocation with a non-homologous break-end linking the centromeric breakpoints of the two large terminal CNVs. Further inspection of copy-number depth as well as variant allele frequencies indicated that the CNVs were likely mosaic in the blood, with both events having similar estimated purity of 60-64% purity of the sample (Figure S6), which external testing confirmed at 63%. Taken together these events suggest the presence of a mosaic unbalanced translocation in the affected proband, a rare event which has been proposed to occur by various mechanisms of recombination (Gijsbers et al., 2011). cWGS identifies a clinically relevant mosaic non-coding CNV In case P12, standard CNV analysis identified a 23kb de-novo deletion upstream of MEG3 completely overlapping the IG-DMR region of 14q32.2 previously implicated in Kagami-Ogata syndrome (Kagami et al., 2010). Further analysis indicated that the CNV was likely mosaic, present in about 50% of cells, and that the deletion phased to the maternal allele, consistent with the paternal imprinting mechanism of Kagami-Ogata (Kagami et al., 2005). In this case, the mosaic deletion passed standard CNV analysis and was annotated as mosaic during manual curation. In contrast, in case P17, we identified a mosaic trisomy of chromosome 14 via a genome-wide visualization ( Figure S7). While this mosaic variant was identified outside of our clinically validated pipeline, the variant was sent out for external testing which confirmed and estimated its purity at 51%, compared to 47% as estimated by WGS. Figure 3. Mosaic 14q32.2 26kb microdeletion. a-c, Normalized depth across pedigree sequenced for subject P12. Shown here is the genomic region between 101.21MB and 101.34MB on chromosome 14 (hg19 coordinates) d, Annotations for the genomic region. The orange box represents the Canvas CNV call boundaries, the green box represents breakpoint assembled coordinates of the deletion from the Manta structural variant (SV) caller, the black lines represent subjects from Kagami et al (Kagami et al., 2010) with deletions in this region, the blue box represents the gene boundaries of the imprinted gene MEG3. e, Average depth across the region of the CNV call for samples across an internal reference population, depth for the proband is indicated with a horizontal dashed line. f, Variant allele frequency for the SNVs within the deleted region.

Discussion
Here we report the development, validation, and deployment of a multifaceted clinical test for individuals with rare and undiagnosed disease (RUGD). With whole genome sequencing, copy number variation can be profiled alongside small variant analysis without any additional sample preparation or experimental protocols. Due to the purely bioinformatic nature of this addition, we have been able to synchronize analysis and reporting of these multiple classes of variants.
In our CNV calling pipeline, we optimize the parameters of the caller to favor sensitivity (Methods). In our validation, this provided greater recovery of externally annotated CNVs than clinical microarrays (Table 1), but may also result in an increased false-positive rate. To address this, we have developed a stringent filtering and manual curation protocol (Methods, Figure S1-3). This curation relies heavily on our ability to annotate population frequency (Figure S2d), as well as visualization of the CNV call to assess the underlying data quality ( Figure S3). In addition, we leverage external databases of benign and pathogenic CNVs (Firth et al., 2009;MacDonald et al., 2014), internal aggregate data, and previously curated variants to assess the analytical validity of a call and provide a variant classification (Kearney et al., 2011).
These methods do not rely on bulk data processing or analysis (i.e., batching of samples), allowing for ingestion and interpretation of one family at a time. This addresses the time-lag from sample collection to interpretation present in some laboratory-based clinical CNV analysis protocols. Furthermore, these methods are suited to exploit future increases in sequencing coverage that will result in an increase in resolution to call small CNVs, allowing for test improvement with minimal modifications to the sample-preparation or bioinformatic pipelines.
In many families with previous genetic testing, cWGS was able to identify new variants and provide a diagnosis. For example, in the case of a child from a resource limited clinic in South America (P14, Table 2) who had a previous negative clinical exome, cWGS was able to identify a pathogenic 1.7MB deletion indicative of 16p13.11 Microdeletion Syndrome. In subject P16 ( Table 2) we observed trisomy 21 in the subject consistent with a preexisting Down Syndrome diagnosis, but were also able to identify a likely pathogenic SNV within the ACTA1 gene conferring the additional diagnosis of an inherited nemaline myopathy.
In a select number of cases, the cWGS CNV pipeline was able to identify variants that would have been missed by exome, single gene testing and microarray. Most notable of these was the deletion in P12, which is below the limit of most commercial microarrays (26kb), sits in a non-coding region upstream of a long non-coding RNA (MEG3), occurs in a locus that is paternally imprinted, phases to the maternal chromosome and is 50% mosaic. We are not aware of another agnostic genome-wide testing approach which would have been able to identify this variant and capture all the associated features.
Future test improvements will focus on bringing a broader diversity of variants within the umbrella of cWGS. Identification of mosaic CNVs remains a priority, especially for large copy-number events such as trisomy where cWGS has sufficient data to detect low purity alterations (Dong et al., 2016). Structural variant calling will be needed to fill the currently existing gap between small variants (SNVs and INDELs) and depth-based CNVs. Uniparental isodisomy and heterodisomy, which we have previously seen as incidental findings ( Table 2, case P18 and P19), will be incorporated via observation of inheritance patterns of small variants. Finally, validation of specialized variant callers to detect hard-to-call variation such as repeat expansions (Dolzhenko et al., 2016;Gatchel and Zoghbi, 2005) and SMA (Feng et al., 2017) from cWGS data will open up the test to new classes of disease.
In conclusion, we present our experience of the development and deployment of CNV calling on top of an existing cWGS assay. As sequencing costs continue to decrease, the use of a first line whole genome diagnostic spanning a broad spectrum of genetic variation will become the standard for rare and undiagnosed disease. Figure S1. CNV filtering pipeline. During data processing, automated filters are applied to call-sets to limit the number of calls presented to case managers for manual curation. Shown in each panel are the percentage of calls remaining after each filtering step is applied sequentially. Distributions reflect 79 samples assessed for CNVs in the ICSL cohort. For details on individual filters, see Methods.  Green dots show the locations of discordant read-pairs, while the height of the grey shaded region indicates the number of discordant reads spanning across a given region on the genome. While this is a useful confirmation for CNVs breakpoints in many cases, CNVs may have breakpoints in non-unique sequence resulting in an inability to uniquely map reads to the flanks of the CNV. Read depth: Normalized read depth across the proband and parents. This allows for evaluation of error modes in the caller and may expose patterns not yet be picked up by an automated caller such as inheritance of a mosaic CNV. 1000 Genomes Data: CNV calls the 1000 Genomes Project (WGS, 7x coverage) (Sudmant et al., 2015) used to identify common deletions. Population Depth Data: Normalized coverage for 200 samples selected from our internal population data. This allows for inspection of population trends which may expose artifacts in the read-mapping or data-normalization process leading to a false-positive call. Chromosome view and overlapping genes: This field allows the interpreter to view where the event takes place in context of the chromosome and displays the coding sequences of genes that overlap the events so it can be determined if the genes are relevant to the phenotype.  Read support for unbalanced translocation in subject P7. Shown here are modified plots from the svviz graph realignment program. In brief reads are realigned to normal (not shown) and recombinant (shown here) chromosomes across the pedigree. Purple and red colors represent the first and second reads in the read-pair, respectively, for details on the visualization and realignment methods see (Spies et al., 2015). See also Figure 2.

Deletion phasing
Here we simply compare inheritance of variants under the assumption of the deletion being on either the maternal or paternal alleles. Table 2: Model assuming deletion on paternal allele (all variants inherited from mother):

Gain phasing
For gains, there are four possible scenarios. A gain may be of maternal or paternal origin, and be either simple or complex. By simple we refer to a duplication of a single allele, while a complex gain refers to the scenario where a proband can inherit material from both parents' copies of the DNA segment (an example of this is in an unbalanced translocation).
Additionally, rather than having two copy states as in the case of deletions, gains have four possible variant copy states. Table 3: Model assuming a simple duplication of a maternal allele: Example gain: 7MB deletion on the paternal allele.

Supplemental Note
Allele fraction across genotypes clearly shows the dependence of copy number state on parental genotypes (Supplemental Note Figure 9).

Model likelihoods:
father-complex -7948 father-dup -26755 mother-complex -19933 mother-dup -24622 Prediction: The gain is resultant from inheritance of both paternal alleles, along with a single maternal allele. Figure 9: Variant allele fraction across parental genotypes (a) and copy-number state transition frequencies (b) for example duplication.