Pathogenic copy number variants that affect gene expression contribute to genomic burden in cerebral palsy

Cerebral palsy (CP) is the most frequent movement disorder of childhood affecting 1 in 500 live births in developed countries. We previously identified likely pathogenic de novo or inherited single nucleotide variants (SNV) in 14% (14/98) of trios by exome sequencing and a further 5% (9/182) from evidence of outlier gene expression using RNA sequencing. Here, we detected copy number variants (CNV) from exomes of 186 unrelated individuals with CP (including our original 98 trios) using the CoNIFER algorithm. CNV were validated with Illumina 850 K SNP arrays and compared with RNA-Seq outlier gene expression analysis from lymphoblastoid cell lines (LCL). Gene expression was highly correlated with gene dosage effect. We resolved an additional 3.7% (7/186) of this cohort with pathogenic or likely pathogenic CNV while a further 7.7% (14/186) had CNV of uncertain significance. We identified recurrent genomic rearrangements previously associated with CP due to 2p25.3 deletion, 22q11.2 deletions and duplications and Xp monosomy. We also discovered a deletion of a single gene, PDCD6IP, and performed additional zebrafish model studies to support its single allele loss in CP aetiology. Combined SNV and CNV analysis revealed pathogenic and likely pathogenic variants in 22.7% of unselected individuals with CP.


INTRODUCTION
Cerebral palsy (CP) describes a group of heterogeneous disorders affecting movement and posture that are caused by a nonprogressive lesion or abnormality in the immature pre or postnatal brain. Individuals living with CP frequently experience additional comorbidities such as mild, moderate or severe intellectual disability (ID), (48-51%), epilepsy (32-35%), speech impairments (60.7%), vision impairments (35.8-40%) and autism spectrum disorders (ASD), (8.9%). 1 The most common correlated risk factors for CP are prematurity, congenital anomalies and intrauterine growth restriction.
There is growing evidence for a significant contribution of genetics to the aetiology of CP. Familial studies have implicated GAD1, ADD3 and KANK1 in CP. [2][3][4] Genetic studies into some families originally diagnosed with CP have now defined a unique sub-syndrome associated with ID and microcephaly caused by recessive mutations in the genes that comprise the adaptor protein complex 4 (AP4M1, AP4E1, AP4S1 and AP4B1). [5][6][7] We identified pathogenic and likely pathogenic de novo and inherited single nucleotide variants (SNV) in at least 14% of individuals living with CP from an unselected cohort of 98 parent and proband trios. 8 In a similar fashion, de novo and inherited copy number variants (CNV) have been detected in 31% of cryptogenic CP cases with no identified aetiology, 9.6% in an unselected CP cohort when strict criteria for clinical relevance were used and 23.7% in a cohort of 97 individuals with hemiplegia. [9][10][11] Our previous analysis of a cohort of 50 unselected individuals revealed inherited CNV in 20%, however the clinical significance of these was uncertain. 12 These data collectively make a strong case for significant involvement of genetics in CP.
Whole exome and whole genome sequencing are rapidly becoming the gold standard to diagnose neurodevelopmental disorders. As data accumulate, it becomes possible to reanalyse rare variants with increasing power and for new genes to be implicated in these disorders. 13 Furthermore, combined analysis of genome and transcriptome data can be used to interpret the effects of DNA variants on gene expression and identify potentially pathogenic coding and non-coding variants. 14 For an unselected CP cohort of 182 individuals including the 98 trios we previously analysed, we identified an additional nine potentially pathogenic SNV (a further 5% of the cohort) with functional effects on gene expression using this strategy. 15 We hypothesised that within our wealth of sequence data from individuals living with CP, where our initial focus had been to identify rare coding sequence variants that we would also detect previously unnoticed CNV.

RESULTS
We had previously analysed these exome data for causative SNV. 8 Except for a pilot group of 50 cases studied by chromosomal microarray, 12 copy number analysis was not performed thus the results presented here are novel. Within the analysed cohort of 186 CP cases and parents where available comprising 460 individuals we detected 8136 CNV events ranging in size from 1 kb to over 10 Mb (Fig. S1A). There were ten families included in this study in whom we had previously detected 14 CNV of potential clinical significance using array based detection. 12 Copy Number Inference From Exome Reads (CoNIFER) analysis detected 11/14 of these CNV; thus sensitivity for detection was an estimated 78% [95% confidence interval (CI), 52.4% to 92.4%] similar to that reported in the original CoNIFER publication. 16 The three events that were not detected were less than 10 kb in size, each covering single exons (Fig. S1B). We did not detect any additional CNV in these 50 cases that were previously analysed by microarray. We additionally detected nine deletions and 14 duplications in 21 out of 186 probands after filtering (Table 1). Of these CNV, eight were unequivocally assigned as pathogenic because they overlapped significantly with loci that were implicated in known syndromes, one was deemed to be likely pathogenic based on supporting evidence and the remaining 14 were of uncertain pathological significance (Table 1; Tables S1-S23). There were two individuals in whom two CNV were identified. In both cases, these terminal deletion and duplication events resulted from de novo unbalanced reciprocal translocations, validated by karyotyping in the proband and ruled out in both parents (Table 1: 150432P and  181463P).
Individual validation of CNV with CytoSNP-850 arrays on 14 individuals, including 150432P and 181463P with the translocations, supported six out of seven deletions and all nine duplications with positive detection calls. The deletion, which was not detected with PennCNV, was in 98098P chr10:73822516-74987970 (all intervals are mapped to hg19 coordinates) (Table S18). Further investigation of the array data of all 14 individuals revealed 23 duplications and five deletions that included known genes (based on GENCODE v28 annotations) that were not identified by CoNIFER (Table S24). All of these CNV were of uncertain clinical significance and all except one were found in single genes.
We next sought to determine if the expression of genes in, or adjacent to all loci detected by CoNIFER were affected by these CNV. We used existing RNA-Seq data generated from RNA extracted from lymphoblastoid cell lines (LCL) of these individuals. 15 In 22 out of the 23 CNV loci identified, we determined a trend for significant differential expression of most, but not all expressed genes, as defined by Z-scores of greater than 2 or less than −2 (Tables S1-S23). The direction of differential gene expression (up or down) was highly correlated (Pearson r = 0.75) with the copy number gain or loss respectively (Fig. S2, Tables S1-S23). No significant deviation in gene expression was determined in sample 165447P (dup chr2:54893010-55200400), however only one gene in this interval had detectable expression in the RNA-Seq data (Table S10). Of note, sample 98098P that did not have their deletion validated by the SNP chip, had a consistent trend for significant down regulation of the expression of all 11 genes detected in that interval (Table S18), suggesting that it was a bona fide deletion.
Among the potentially pathogenic CNV we detected, there was one locus found in individual 157439P and his similarly though more mildly affected monozygotic twin brother in which a deletion down-regulated the expression of only one gene: PDCD6IP (Table 1 and Fig. 1a). Both brothers had bilateral cleft lip and palate and both had seizures with age of onset at 3 years. The seizures later resolved (earlier in the mildly affected twin). Movement was affected in both brothers manifesting as quadriplegia in the most significantly affected brother and mild quadriplegia in the mildly affected brother. Computed tomography (CT) scan of the more severely affected brother at age 3 years showed features consistent with periventricular leukomalacia (PVL) as predicted from early postnatal ultrasound scans. The more mildly affected brother had a MRI scan at age 8 which also revealed features consistent with PVL (also predicted by early postnatal ultrasound scan) and some thinning of the corpus callosum. The PDCD6IP gene has not been previously implicated in any neurodevelopmental disorder and presented a unique opportunity from this study to assess the likely pathogenicity of this deletion in an animal model. We knocked down expression of the corresponding zebrafish ortholog (pdcd6ip) using both translation blocking (PDCD6IP AUGMO) and splice blocking (PDCD6IP SBMO) morpholinos. We observed significant effects on zebrafish movement that included spontaneous, hyperactive and erratic swimming behaviour (Video S1 and S2). There was a significant (p < 0.05) increase in the maximum turning angle of fish treated with PDCD6IP AUGMO compared to controls (Fig. 1b). Video evidence showed that this was because the PDCD6IP AUGMO were unable to swim in a straight line, suggesting that motor control was impeded (Video S1 and S2). There were also consistent morphological defects such as microcephaly and cardiac oedema that were specific and concentration dependent to the PDCD6IP AUGMO and to a lesser extent the PDCD6IP SBMO compared to the control ( Fig. 1c and Fig. S3).
In combination, our results identified pathogenic and likely pathogenic CNV in at least a further 7 (3.7% [95% CI, 1.8-7.6%]) out of an unselected cohort of 186 individuals with CP ( Table 1). The total fraction of individuals with pathogenic and likely pathogenic genetic variants in this unselected CP cohort (including 14% detected by WES 8 and a further 5% of SNVs by RNA Seq 15 ) is 22.7%.

DISCUSSION
CNV have long been known to be significant contributors to the aetiology of neurodevelopmental disorders. 17 Similar to other neurodevelopmental disorders such as epilepsy, ID or ASD, CNV contribute to the pathogenesis of CP. [9][10][11][12] For the purpose of molecular diagnosis, chromosomal microarrays (with SNP and/or oligonucleotide probes) are usually selected for detection of CNV because of their high sensitivity and specificity. In this study, we elected to take advantage of our unique, existing exome sequencing data of 186 individuals with CP to make an efficient and cost-effective survey of CNV burden in the largest unselected CP cohort to date, rather than reanalyse the entire cohort with arrays. The proportion of this cohort with pathogenic and likely pathogenic CNV in CP was lower compared with a previous study of a similar unselected cohort which could be due in part to the lower resolution of detection of exome CNV analysis compared to arrays. 10 It is interesting to note that in our previous CNV study of 50 cases that were also reanalysed in this study (finding no additional CNV) that we did not find any de novo CNV. It is possible that the resolutions of the arrays we used previously, as well as the resolution for detection by exomes has contributed to this and smaller de novo CNV may yet be found.
RNA-Seq data identified genes within or adjacent to each CNV that were affected in expression and in one case, this confirmed a suspected deletion that was detected by the exome but not by SNP array. Based on the genotype tissue expression project (GTEx) median tissue expression data, 84.8% of transcripts expressed in cortex are also expressed in LCL (transcripts per million reads, TPM > 0) and expression levels are highly correlated, Pearson r = 0.83 therefore, this approach is useful for assessing the effects of CNV encompassing genes expressed in both tissues even in the context of neurodevelopmental disorders. 18 M.A. Corbett et al.  19 The neurological phenotypes associated with 2p25.3 deletions have been linked with haploinsufficiency of MYT1L and that gene was also deleted in the individual in this study, however it was not expressed in LCL 19 (Table S4). Full or partial monosomy of the X chromosome in females causes Turner Syndrome. While not reported frequently, impaired movement that is not associated with intelligence scores has been observed in Turner Syndrome. 20 There are two previously reported females with CP attributed to partial chromosome X monosomy. 10,11 In this study, we discovered an individual with Xp monosomy and a terminal 1q43q44 trisomy due to translocation t (X;1), (p11.22; q43) ( Table 1; Tables S2 and S3). This individual had a complex phenotype with dysmorphic features, short stature, microcephaly, global developmental delay and CP. By 11 years 4 months she was wheelchair bound. A review of 112 deletions and 47 duplications encompassing 1q43q44 noted problems with muscle tone and muscle control in 27 of the 47 individuals with pure or complex duplications. 21 We concluded that, this translocation was pathogenic and there may be contributions from both the deleted Xp and the duplicated 1q43q44 to CP in this case.
Both deletions and duplications of 22q11.2 have been previously reported in CP. 9,11 22q11.2 deletion syndrome [OMIM: 192430], also known as velocardiofacial or DiGeorge syndrome is clinically heterogeneous and the most common, recurrent, pathogenic microdeletion that occurs in humans. 22 CP and other movement phenotypes are infrequently reported in this multisystemic disorder that affects the heart, immune system, parathyroid, craniofacial and central nervous system development. 23 In one report, mild left Haploinsufficency percentile scores (HI scores) from DECIPHER are shown as a heatmap with genes with the darkest orange the most sensitive to copy number variation. GENCODE v28 gene models are shown at the bottom of the plot. b Average turning angle of zebra fish injected with the PDCD6IP-AUGMO at concentrations of 62.5 μM (red, n = 21) compared to a control injected group (blue, n = 37) over three trials stimulating movement with a tap in the first two trials and light in the final trial. PDCD6IP-AUGMO treated fish moved with significantly greater turning angles, p < 0.05 Student's two-tailed Ttest assuming equal variances. c Quantification of phenotypes with increasing concentration of the PDCD6IP-AUGMO. Larvae were classified as having mild, moderate or severe developmental abnormalities depending on whether they had one, two to three, or more than four of the following phenotypes: loss of pigment in eye, decreased eye size, grey matter hindbrain, small head, domed cranium, cardiac abnormality, decreased body size and tail curvature or kinking hemiparesis was noted in a male with 22q11.2 deletion and cerebellar atrophy on brain MRI. 24 Duplications of 22q11.2 are reported less frequently than deletions [OMIM: 608363] and penetrance is variable. A clinical review of multiple individuals affected by this duplication found that motor delay and hypotonia were frequent features. 25 Pathogenic loci not previously implicated in CP We identified two de novo CNV that are known to be pathogenic for neurodevelopmental disorders, also with incomplete penetrance and variable expressivity; a 16p11.2-p12.2 deletion [OMIM: 611913] 26 and a 1q21.1 duplication. A review of clinical observations in 136 individuals with 16p11.2 deletions noted abnormal agility in 50% 26 while duplications of 1q21.1 overlapping with the individual in this study have abnormal agility in 39%. 27 While the known pathogenic CNV loci described above are associated with distinct neurodevelopmental syndromes, our clinical and genetic data, combined with previous observations of significant effects on movement, strongly suggest them as causing CP in these individuals.
At the time of writing, there were 33 individuals in the DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) database with open access phenotype information that includes the term "cerebral palsy". We did not find any CNV or SNV in that cohort that overlapped with new CNV reported in this study.
Novel loci and genes A de novo microdeletion at 3p22.3 encompassing only the gene for programmed cell death 6 interacting protein PDCD6IP (also commonly referred to as ALIX) that segregated with CP in monozygotic twin brothers was associated with significant down regulation of that gene in LCL derived from one of the affected individuals (Fig. 1a, Table S6). PDCD6IP has critical roles in cytokinesis and also interacts with the endosomal sorting complex required for transport (ESCRT) multiprotein complexes which, in turn are required for trafficking of multi vesicular bodies in the cell. Mice null for Pdcd6ip develop hydrocephalus resulting in bilateral enlargement of the lateral ventricles, thinning of the cerebral cortex and atrophy of the hippocampus. 28 The loss of Pdcd6ip caused disorganisation of alignment of cilia and a loss of integrity of epithelial cell barriers in the brain and other tissues. 28 These mouse data combined with our observations from zebrafish strongly implicate PDCD6IP as a disease locus in neurodevelopmental disorders.
Variants of uncertain significance We identified five deletions and nine duplications that passed our filtering criteria and occurred at loci that are not yet implicated in CP or other neurodevelopmental syndromes. Based on available clinical data and functional annotation of the genes within these regions, we predict that six of these events have potential to be involved in CP (Table 1). We singled out these six CNV found on 6p22.3, 9p24.3-p24.2, 9q34.1, 10q22.1-q22.2, 19p13.3 and Xq22.1 (Table 1 and Tables S15-S18, S21 and S23) because in each case they have either multiple reported individuals in DECIPHER with similar but not identical phenotypes and/or encompass genes known to be involved in delayed motor development. Noteworthy among these CNV, was a duplication we identified located within Xq22.1 and inherited from an unaffected mother (Table 1, Table  S23). Within this X-linked duplication, the mRNA polyadenylation factor CSTF2 was found to be significantly up regulated (Table S23) while other genes with detectable levels of expression were unaffected. This male also had a male second cousin related through possible female carriers that was also affected by CP, however a DNA sample was not available for segregation testing.
Individuals with similar duplications in this region in DECIPHER have phenotypes consistent with those in the individual in this study including ID, ASD and epilepsy (Table 1).
For each of these variants of uncertain significance, additional patients and further cell molecular and animal modelling will be required to confirm or deny their pathogenicity.

CONCLUSIONS
Through CNV analysis of exome data and additional integration of RNA-Seq we have resolved at least 7/186 additional individuals accounting for 3.7% this cohort. In combination with SNV identified in our previous analyses, we have shown a pick up rate of potentially pathogenic variants in an unselected CP cohort of 22.7% (95% CI, 15.6-31.8%), (Fig. 2). 8,15 With this study, we have expanded on the known genetic determinants of CP and identified both new and recurrent CNV loci that confer significant risk for CP (Fig. 3). The genomic landscape of CNV in CP shows potential hot spots on chromosomes 2, 22 and X, and highlights considerable genetic heterogeneity that underlies the clinical heterogeneity of CP (Fig. 3). This anecdotal evidence for enriched loci implicated in CP suggests a meta-analysis combining all 450 individuals with CNV data to date with additional cases yet to be analysed through the newly formed International CP Genomics Consortium would be warranted. 29 Despite considerable genetic heterogeneity of CP, the number of individuals now identified with pathogenic variants to date suggests genetic testing with chromosomal microarray and if available, clinical exome sequencing should be considered early in the diagnosis. There are good prospects for further improvement of diagnostic pick up rates by employing whole genome sequencing, which will increase the resolution of detection of structural variants as well as coding and non-coding SNV. We have shown that integrated analysis of paired exome and RNA-Seq data combined with follow up animal model analyses aids in the identification and functional and diagnostic interpretation of novel pathogenic variants in CP.

Exome sequence data
This study was approved by the Women's and Children's Health Network Human Research Ethics Committee (HREC) and informed written consent was obtained from all participants. A flowchart giving an overview of this study is provided in Fig. S4. We started with a set of 472 exome BAM files from 191 unrelated and unselected individuals living with CP and their parents and siblings where available (1 family with multiple affected individuals, 98 parent-proband trios, 72 parent-proband duos and 20 singletons). 8 The unselected cohort had the following profile: 45:55% female to male ratio, 48% were premature (less than 37 weeks), 22% had intrauterine growth restriction (i.e. a birth weight below the 10th percentile relative to gestational age), 55% had at least one neurodevelopmental co- Fig. 2 Overview of pathogenic variants detected in three studies in an unselected Australian cohort of 191 individuals living with CP. Green slice shows percent of cases resolved by our original WES study. 8 Grey slice summarizes SNVs (premature terminations codons and splice variants) that affect gene expression. 15 Orange slice summarizes findings from this study morbidity (epilepsy, ASD, developmental delay or ID) and gross motor function classification scores (GMFCS) ranged from 1 to 5 with median of 2. The profile is similar to that of the Australian CP registry report suggesting minimal selection bias. 30 For additional information on selection criteria for CP diagnosis of this cohort, please refer to our previous study. 8 We used Picard tools CollectHsMetrics and found 12 samples (two trios and three duos) had low coverage of less than 80% of targets covered at 20× or more and less than 30× mean and median coverage of targets; these were removed from the study (Fig. S5A and S5B). We used the vcftools relatedness2 algorithm to confirm family relationships were as described in the original study and found evidence of two sample mix ups within these data that were corrected for subsequent analyses. 31 The results of the original study were not altered by these mix ups. The final cohort included 186 unrelated individuals with CP.

Exome CNV analysis
Copy number detection in exome data was carried out using CoNIFER scripts. 16 We used the SeqCap EZ HGSC VCRome V2.1 (Roche NimbleGen) target file (covering 45.2 Mbp of sequence) as the array template. A singular value decomposition (SVD) cut off score of 7 was used (Fig. S6). We removed CNV that overlapped by 80% or greater with CNV that had minor allele frequencies (MAF) greater than or equal to 0.01 from studies with more than 40 individuals in the DECIPHER common CNV database (DECIPHER v9.5). CNV that were recurrent in more than ten individuals (i.e. four or more trios, or MAF > 0.02) in the cohort were also removed because we considered these unlikely to be pathogenic due to their high frequency. De novo inheritance of deletions in incomplete trios was inferred by examining the remaining haplotype compared to the available parental sample.

SNP array validation
Validation of CNV regions identified by CoNIFER was carried out using Illumina Infinium CytoSNP 850K arrays. Library preparation, hybridisation, scanning and data acquisition were performed according to the manufacturer's protocols, as a service by the Australian Genome Research Facility. Detection of CNV calls was carried out with PennCNV using default parameters for autosomal and X-linked analyses respectively. 32 Gene expression outlier analysis RNA-Seq data corresponding to 182/186 CP cases in this study and 20 unrelated control samples were generated as part of a separate and dedicated gene expression analysis study for CP. 15 We incorporated this expression data resource into a new analysis to determine the influence of CNV identified in this study on gene expression. Briefly, RNA was extracted from Epstein Barr Virus transformed LCL using QIAGEN RNeasy kits, as per the manufacturer's protocol. Library preparation and RNA-Seq were performed as a service by the UCLA Neuroscience Genomics Core Facility. The TruSeq v2 kit (Illumina) was used to generate un-stranded libraries with 150 bp mean fragment sizes and 50 bp pair end sequencing performed using the HiSeq2500 (Illumina). Sequence data were mapped to Illumina iGenomes hg19 build of the human genome using tophat2. 33 Read counts were generated with htseq-count. 34 Z-statistics for each gene in each sample were calculated over all 202 samples using the scale function from the R statistical programming language v3.4.4.

Correlation and visualization of CNV and gene expression data
Exome ZRPKM values from CoNIFER and gene expression Z-scores from scaled RNA-Seq data were correlated in each sample by matching values in each matrix to their location in the genome. These data consisted of 11,507 genes expressed in LCL with matching exome coverage. Loci in which there was at least one pair of correlated gene expression and exome CNV Z-scores less than −2 or >2 were prioritized for further analysis. Combined exome ZRPKM, array log R ratio and outlier gene expression Z-scores were plotted in their genomic contexts using the Gviz package in Bioconductor (Tables S1-S23). 35 Zebrafish husbandry and morpholino injections Zebrafish studies were approved by the University of Adelaide Animal ethics committee. Danio rerio were bred and maintained at 28°C on a 14-h Fig. 3 CNV reported in CP to date. Circos plot generated using the ClicO FS tool shows the human karyogram on the outer edge with deletions in red and duplications in blue from this study and the previous four reports as indicated by the study key. The saturation of colour of each CNV indicates its likelihood for pathogenicity light//10-h dark cycle. Embryos were collected from natural mating of the Tübingen strain (Tu), grown in embryo medium (E3), and staged. 36 Translation blocking (PDCD6IP AUGMO 5′-ACGGGACAGAAATAAACGTCGC-CAT-3′), splice blocking (PDCD6IP SBMO 5′-TCTTCAGGCCAATGTCTTACCG-CAG-3′) and control (5′-CCTCTTACCTCAGTTACAATTTATA-3′) morpholinos were synthesized by Gene Tools LLC (Corvallis, OR, USA). Fertilized zebrafish embryos were rinsed in E3 medium and injected at the one cell stage. Embryos were always injected with individual morpholinos in solution at total concentrations of 250, 125 or 62.5 μM depending on the assay. At 6 h post fertilization, any embryos that were unfertilised were discarded.
Activity analysis using the DanioVision observation chamber Larval activity was tracked at 4 days post fertilisation (dpf) using the DanioVision Observation Chamber (Noldus) which is fitted with a Basler GenICam camera, independent light source, temperature control unit (set to 28°C) and tapping device. The settings used for tracking experiments were defined using EthoVisionXT software. Trials were run in triplicate and whole wells where tracking failed during the experiment were excluded. Experiments were performed with the researcher blind to the identity of the injected morpholino.

Imaging
Embryos were examined at 24hpf and any embryos which had failed to develop were removed. Larvae were examined at 4dpf and scored blind for morphological abnormalities and touch-evoked escape response. Larvae were categorised as phenotypically normal or having mild, moderate or severe abnormalities. Larvae were imaged after activity assays using a Nikon SMZ1000 dissecting microscope and micrographs taken using a Leica DFC450 C camera and Leica Application Suite software.

Statistics
Binomial CIs were calculated for all proportions using the Wilson method implemented by the binom package in R. Differences in movement measurements between zebrafish injected with control and PDCD6IP morpholinos were tested using Student's two-tailed T-test assuming equal variances with corrections for multiple comparisons made using the Bonferroni method.

DATA AVAILABILITY
The data generated or used during this work are available subject to compliance with our obligations under human research ethics from the corresponding author upon reasonable request.