Introduction

Genomic copy number variations (CNVs) are structural variations that involve deletions and/or duplications of segments of DNA. Their impact is not necessarily harmful, but loss, increase, or disruption of genes is often associated with, and can underlie human disease, including neurodevelopmental disorders (NDDs).1,2,3,4,5,6 Rare CNVs have been extensively studied in autism spectrum disorder (ASD),7,8,9 attention deficit hyperactivity disorder (ADHD),10 schizophrenia (SCZ),11,12 and less so, obsessive compulsive disorder (OCD).13,14 Various NDDs share genomic structural variations, including CNVs that perturb the same genes.1,15 For example, deletions or duplications affecting coding sequences of NRXN1 or CNTN6 have been implicated in ASD, ADHD, and SCZ.16,17

A shared genomic etiology across multiple disorders is supported through genome-wide association studies18 or analyzing small loss-of-function (LoF) variations.19 However, CNV analysis across disorders has been limited; e.g., ADHD and ASD,10 ASD and SCZ,20 reviews on selected well-established genomic disorders such as 16p11.2 deletions and duplications,21 or a meta-analysis of CNVs on a single gene (NRXN116). To date, there has been no genome-wide study of rare CNVs identified using an identical technology, encompassing ASD, ADHD, OCD, and SCZ. An identical method for interrogating CNVs across multiple disorders increases the chance of finding rare CNVs with cross-disorder implications. These could be missed if multiple technologies with different detection sensitivities were applied.

This project was established to provide a public resource of CNV samples with NDDs mostly from the province of Ontario, Canada, all genotyped on same microarray platform: the Affymetrix CytoScan HD platform, which consists of 1.9 million copy number markers and 750,000 genotype-able single nucleotides polymorphisms. A CNV resource of control population samples was published earlier under the Ontario Population Genomics Platform.22

In creating this data resource, we aimed to: (i) catalogue CNVs that are clinically relevant to each of ASD, ADHD, OCD, and SCZ, and (ii) identify genes and loci with CNVs that are shared among different NDDs. Where available, we analyzed whole genome (WGS) or whole exome sequence (WES) data, in search of variants that were not detected by microarray. The relevant genotypes and CNVs are available in dbGaP (accession number phs001881.v1.p1) and dbVar (accession number nstd173), respectively.

Results

Sample description and detection of CNVs

We analyzed 4,460 samples ascertained for four NDDs; 2,691 (60.3%) were from individuals recruited because of the diagnosis of one of these disorders, and the rest from apparently unaffected family members (Table 1, Supplementary Table 1A). ASD, ADHD, OCD, and SCZ cases were ascertained using criteria explained previously (Supplementary Information).8,10,13,23,24 Childhood onset OCD is not classically considered an NDD in the International Classification of Disease (ICD-11) or Diagnostic and Statistical Manual of Mental Disorders (DSM-5), but in view of its early onset, male preponderance and association with imaging findings, we included OCD in this study as others often do. Similarly, since SCZ has both neural and genetic correlates, including some evidence of overlap or sharing genetic risk with NDDs, we considered SCZ as well. The majority of samples (68.1%) were ascertained for ASD, with the others distributed approximately equally among the other disorders. The male:female sex ratio was almost 4:1 for ASD and ADHD, 2:1 for SCZ, and ~1:1 in the pediatric OCD cases (Table 1). Different family structures were sampled in the four sub-groups: some ASD and SCZ families included multiple affected individuals; some OCD samples were in trios (i.e., affected proband and both parents; details in Table 1, Supplementary Table 1A). We did not genotype parents of cases for ADHD (Table 1, Supplementary Table 1A). We defined a high confidence set of CNVs (Supplementary Table 1B) as those identified using two different detection algorithms, as previously described.25 Rare variants were those with a frequency of less than 0.1% in a 10,851 subject population control sample genotyped in multiple microarray platforms, including the Affymetrix CytoScan HD.25 We also analyzed prioritized CNVs from our previous publications on ADHD10 and SCZ.11,24 We had information with respect to intellectual disability (ID) for some cases. ID including borderline intellect and non-verbal learning disability was comorbid with SCZ in 31/204 cases (16.9%),11 with ASD in 149/599 cases (24.9%), and with ADHD in 3/427 cases (0.7%). No OCD case had ID.

Table 1 Stratification of 4,460 samples in the cross-disorder CNV analysis

Clinically relevant CNVs

Clinically relevant CNVs included five categories: aneuploidies, large CNVs ( > 3 Mb), CNVs consistent with known recurrent genomic disorders, those impacting genes previously established to be associated with NDDs, and all de novo CNVs (i.e., not found in either parent) (details in Table 2). We found 306 clinically relevant CNVs in 284 of 2,691 NDD cases (10.5%) (Tables 2, 3, Supplementary Table 1C). Of these CNVs, 115 found in 111/2,691 cases (4.3%) were “clinically significant” or “likely clinically significant” variants, as evaluated by expert clinical geneticists according to American College of Medical Genetics and Genomics guidelines.26 We did not find evidence for uniparental disomy. The clinically relevant autosomal deletions, and chromosome X deletions in females, were all single copy.

Table 2 Summary of cases carrying a CNV deemed relevant to neurodevelopmental disorders CNVs stratified by the disorder and variant type
Table 3 Copy number variations of clinical significance among four neurodevelopmental disorders

The first category included aneuploidies: trisomy 21, 47,XXY, 47,XYY, and 45,X, found in 17/2,691 cases (0.63%), only among ADHD or ASD cases (category A in Tables 2, 3; Fig. 1). The second CNV category contained variants larger than 3 Mb, but excluding those associated with recurrent genomic disorders and aneuploidies (category B in Tables 2, 3). We found these large variants in 23/2,691 NDD cases (0.85%), with none among the OCD cases. The third category was the variants associated with known recurrent genomic disorders, found in 115/2,691 cases (4.3%; category C in Tables 2, 3; Fig. 1). The most frequent were 16p13.11 duplications (17 cases), 15q11.2 deletions (breakpoint (BP1-BP2) (16 cases), and 15q11.2 duplications (BP1-BP2) (15 cases). Three distal duplications of 16p11.2 were found in ADHD, ASD, and SCZ cases. We found 15q11-q13 duplications in six cases diagnosed with ADHD, ASD, OCD, or SCZ. The high prevalence of 15q11.2 duplications (BP1-BP2) and 16p13.11 duplications likely reflects the relatively mild expression and reduced penetrance of these genotypes.

Fig. 1
figure 1

Distribution of a) aneuploidies and b) known recurrent genomic disorder CNVs found in cases diagnosed with autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), schizophrenia (SCZ), or obsessive compulsive disorder (OCD). Details of the copy number variants, sex, and variant sizes are in Table 1, Supplementary Table 1A. DEL deletion, DUP duplication, TAR Thrombocytopenia-Absent Radius syndrome locus, STS includes STS, BP breakpoint, LCR low-copy repeat, Prox proximal, Dist distal

Fourth, we identified de novo CNVs impacting genes in 35 cases: 31/448 (6.9%) for ASD and 3/167 (1.8%) for OCD (category D in Tables 2, 3). Of 13 SCZ trios with data, we identified one de novo genic CNV. Parents of ADHD cases were not analyzed, thus no data on de novo CNVs were available for this disorder.

The last class of variants included inherited clinically relevant CNVs that did not belong to any of the first three categories. These included CNVs impacting genes previously implicated for NDDs but either inherited or with unknown inheritance. We found these variants in 117/2,691 cases (4.4%) (category E in Tables 2, 3).

Cases with more than one clinically relevant CNV

We found more than one clinically relevant CNV in 20/2,691 (0.74%) of NDD cases: 12/1,838 (0.65%) of ASD cases, 4/427 (0.95%) of ADHD cases; 3/222 (1.35%) of those with OCD, and 1/204 (0.5%) of individuals with SCZ.

In ASD, there were eleven cases with two CNVs and one case with three relevant variants. The latter, case 4-0040-003, had a maternally inherited 53 kb deletion impacting CTNNA3, a maternally inherited 54 kb duplication impacting CNTN4, and a de novo 547 kb duplication impacting YES1, ADCYAP1, and six other genes. Examples with two clinically relevant CNVs each included: (i) a male (7-0293-003) with a 611 kb deletion of 16p11.2 and a 513 kb deletion of 15q11.2, of unknown inheritance; (ii) a female (2-1525-003) with a maternally inherited 1.5 Mb deletion consistent with the 16p13.11 recurrent microdeletion (neurocognitive disorder susceptibility locus), and a de novo 326 kb duplication impacting USP7, PMM2, C16orf72, and CARHSP1; and (iii) a male (2-0305-004) with a maternally inherited 432 kb deletion in the 1q21.1 locus associated with thrombocytopenia-absent radius syndrome and a de novo 1.09 Mb duplication impacting GPD2 and NR4A2.

The four cases with ADHD with more than one CNV were all male; each had two clinically relevant CNVs, all of unknown inheritance. Examples included: (i) 213050 with Klinefelter syndrome (47,XXY) and an autosomal 181 kb duplication affecting MCPH1; (ii) 206760 with a 191 kb deletion impacting DLGAP2 and a 154 kb duplication involving DRD4 and eight more genes; and (iii) 235983S with Klinefelter syndrome (47,XXY) and an autosomal 123 kb duplication impacting DPP6.

Of the three OCD cases carrying more than one CNV, there was one male (OCD146-JS-1254_188613) with three clinically significant CNVs: a maternally inherited 143 kb deletion impacting NLGN1, a second maternally inherited 115 kb deletion impacting DPP6, and a paternally inherited 67 kb duplication impacting PTPRN2. Another male (OCD125-896993) had a de novo 165 kb deletion impacting ADRA2C and a paternally inherited 1.7 Mb duplication impacting CNTN4 and CNTN6. A female (OCD109-1648) had a maternally inherited 1.4 Mb deletion of 17p12 and a 190 kb deletion impacting PTPRT. This case also had a de novo frameshift deletion in LRCH2 (c.2190_2193del:p.C730fs; Supplementary Table 1C) identified using WES.

The only individual with SCZ having two clinically relevant CNVs (Supplementary Table 1C) was a female (222720) with a 10.2 Mb deletion of chromosome 5p congruent with the cri-du-chat syndrome region, and a 7.3 Mb duplication of 6q26-27. Karyotyping confirmed this to be the result of an unbalanced translocation.11

Cases with clinically relevant CNVs identified by microarray also had SNVs or CNVs detected by WGS or WES

No WGS data were available for the ADHD and SCZ samples. However, we have previously published WGS data for 106/209 (50.7%) ASD cases with clinically relevant CNVs.8,27,28 Of these, 15 had a clinically relevant LoF mutation and one had a 4.5 kb deletion (Supplementary Table 1C). The latter case was a male (2-1086-004) with a maternally inherited 80.1 kb duplication impacting CTNND2, identified using array data. He also had paternally inherited 4.5 kb deletion in ANO3, known to be associated with autosomal dominant dystonia (OMIM: 610110). This deletion was missed by microarray for lack of probes in this region, due to its size (Supplementary Table 1C). Examples of LoF SNVs were: (i) a female autism case (7-0133-003) with a 2.5 Mb de novo duplication in 10q11.22-11.23 and a de novo nonsense mutation in SOX5 (c.C313T:p.R105X). Mutations of SOX5 cause autosomal dominant Lamb-Shaffer syndrome, characterized by global developmental delay and intellectual disability (OMIM:604975), and (ii) a male autism case (7-0123-003) with a 2.9 Mb duplication involving the 16p13.11 recurrent microduplication who also had a de novo splice-site variant impacting SHANK3 (c.2223 + 1 G > A), which is a gene strongly associated with NDDs.8 Of 13 OCD cases with clinically relevant CNVs, three also had clinically relevant LoF mutations previously identified using WES (Supplementary Table 1C).13 One example is a male OCD case (OCD146-JS-1254_188613) with three clinically relevant CNVs: a maternally inherited 142 kb deletion in 3q26.31, a maternally inherited 115 kb deletion in 7q36.2, and a paternally inherited 67 kb duplication in 7q36.3. He also had three clinically relevant SNVs found by WES: (i) a maternally inherited frameshift deletion in AFF2 (c.2976_2988del:p.992_996del), which is an X-linked recessive variant associated with mental retardation (OMIM:300806), (ii) a maternally inherited frameshift deletion in DRD4 (c.233_245del:p.A78fs), an autosomal dominant variant associated with autonomic nervous system dysfunction and ADHD (OMIM:126452), and (iii) a maternally inherited frameshift in MBD4 (c.939_940ins:p.E314fs), a gene involved in DNA methylation (OMIM:603574).

Complex phenotypes

Because we had clinical information on NDD phenotypes beyond the primary diagnoses for some cases, we investigated the pleiotropy of CNVs shared among different NDDs (Supplementary Table 1C). We defined complex as having multiple different NDDs. Of ADHD cases with an NDD relevant CNV, a few were noteworthy and highlight the clinical pleiotropy associated with many of these variants. A male ascertained for ADHD (176004), who also had a learning disability but no ASD, carried a duplication of 16p11.2, which is known to be associated with ASD.8 We found deletions at this locus in a female with SCZ, a male with ASD, and another male with ADHD, but none in our OCD cases. A male diagnosed with ADHD (206773), who carried a duplication of chromosome X (Klinefelter syndrome), also had ASD, learning disability, language delay, general anxiety disorder, and enuresis, all known feature of Klinefelter syndrome.29 Male case 181220 with ADHD, with 15q11.2 duplication (BP1-BP2), also had ASD. Of OCD cases, a male (OCD75-SB-1213) with a paternally inherited 62 kb duplication of DLGAP2 also had separation anxiety disorder, a Tourette disorder with tic, oppositional defiant disorder, and panic disorders and agoraphobia. Another male with OCD (OCD146-JS-1254_188613), ADHD (inattentive subtype), and a Tourette disorder with tic, had three different CNVs, impacting NLGN1, DPP6, and PTPRN2. Of SCZ cases, a male (153030) with a 1.6 Mb duplication of 16p13.11 also had a learning disorder but no ID (details in Supplementary Table 1C). A female (213684) with SCZ and a 549 kb deletion of NRXN1 also had moderate intellectual disability. A male with SCZ (166808) with a 15q11-q13 duplication had mild intellectual disability.

Cross-disorder gene discovery and genes in multiple cases in a single disorder

We searched for genes, excluding those from regions of known recurrent genomic disorders, that were affected in multiple cases by CNVs. We first restricted analysis to brain-expressed genes that are at least moderately constrained for LoF variants, (pLI > 0.45; Fig. 2; Supplementary Table 1D). We searched these for genes impacted by CNVs in at least two cases each, and found 20 genes impacted by deletions. Notably, NRXN1 was impacted in 10 subjects (three SCZ, six ASD, and one ADHD); deletions of 18p11.21 impacting novel candidate genes (GNAL, LDLRAD4, and SEH1L) were in four cases (three ASD, one ADHD). Genes MKRN1 and MYH9 involved CNVs in ASD and SCZ cases. Eight genes – NRXN1, GNAL, LDLRAD4, SEH1L, DLGAP2, DCTN2, GRID2, and KIF5A – involved CNVs in ASD and ADHD cases. Only CNVs containing ABR were shared between OCD and ASD, and no gene-containing variants were shared between OCD and ADHD or OCD and SCZ.

Fig. 2
figure 2

Genes impacted by rare CNVs in more than one case. a) brain-expressed and moderately constrained genes (pLI > 0.45) impacted by deletions in multiple cases, b) brain-expressed genes with duplication of their full-length transcript in more than one case

Genes impacted by deletions in multiple ASD cases (only) were: ASTN2, NRXN3, ANKS1B, GALNT13, DLC1, LAMP1, METRNL, PTPRK, and SPOCK1 (Fig. 2; Supplementary Table 1D). Although ASTN2 deletions were previously reported in ADHD,30 in this study we made no such observations. We saw no genes impacted by deletions in multiple cases of ADHD, OCD, or SCZ, other than those in the known genomic syndromes (Fig. 2; Supplementary Table 1E).

We identified 53 brain-expressed genes impacted in multiple cases by duplications of the entire longest transcript (Fig. 2; Supplementary Table 1E). Examples included DNAJC15, GNG13, CARHSP1, PCMTD2, RPS3A, and TMEM158 (Supplementary Table 1E). Most whole-gene duplications were from ASD cases, probably due to the latter’s disproportionate representation. Examples of genes with such variants in multiple disorders were: PCMTD2 in ASD, OCD and SCZ; KNDC1 in ASD and SCZ; CARHSP1, PCMTD2, SYNM, and EXOC3 in both ASD and OCD; and GNG13, MRPS33, RPS3A, FAM69C, and CD24 in ASD and ADHD.

We found 38 genes duplicated in multiple ASD cases (only). Examples included CDH15, UBTF, DUT, HYPK, ATXN7L3, and GLOD4. Duplication of NDUFV1 and RIMKLB were each observed in two ADHD cases (Fig. 2; Supplementary Table 1E). We found no repeated full gene duplications in OCD or SCZ cases in this collection.

Increased burden of rare CNVs impacting brain-expressed protein coding genes and brain-expressed long non-coding RNA (lncRNA)

We sought rare CNVs impacting exons of lncRNAs and found these in 1,130/2,691 cases (42%). Restricting to brain-expressed lncRNAs, only 234/2,691 cases (8.7%) carried such rare CNVs. We tested for the extent to which the protein coding genes and lncRNAs were impacted by rare CNVs in cases compared with parents of cases. We found a nearly significant excess in cases over controls of deletions in protein-coding genes (p = 0.08; false discovery rate (FDR) = 0.21), but not for lncRNAs (p > 0.1). We found no global excess burden of duplications for protein-coding genes and lncRNAs (p > 0.1). However, when focused on brain-expressed elements, we observed a modest increase of rare deletions impacting both protein-coding genes (p = 0.03; FDR = 0.19) and lncRNAs (p = 0.06; FDR = 0.21). We then performed a multivariate analysis to test whether the burden signal was from protein-coding, lncRNAs, or both. This analysis showed a statistically significant signal (p = 0.02) for deletions impacting protein-coding genes, suggesting an overlap between the protein-coding and lncRNA burden signal.

Given the increasing association of lncRNAs in disease, we highlight two example of such genes identified in multiple unrelated individuals. (i) AK127244: three subjects with ASD (1-0045-004, 7-0103-003, and 1-0629-003) harbored 2p16.3 deletions that directly disrupted the exonic sequence AK127244 (LOC730100). This is a 1.38 Mb non-coding RNA of unknown function adjacent to NRXN1 and transcribed in the opposite direction. Rare, inherited deletions intragenic to AK127244 have been identified in five individuals with ASD. Such deletions have been proposed as candidate factors for a broad range of neuropsychiatric disorders including SCZ and affective disorder.16,31,32 We identified seven additional subjects here, five with ASD and two with SCZ, with coding deletions of NRXN1 that extend and disrupt the transcription start site and exonic sequence of AK127244. (ii) PTCHD1-AS: we found five males with ASD and deletions impacting exons of this gene (Table 3, Supplementary Table 1C). Disruption of PTCHD1-AS has been linked to ASD.33,34

Increased burden of CNVs impacting NDD genes in cases carrying CNVs that impact genomic instability genes and fragile sites

We hypothesized that CNVs affecting genes involved in genome stability might lead to a higher incidence of additional variants. These subsequent variants could then add to the phenotypic complexity, by impacting genes involved in the development and functions of the nervous system. We therefore tested if individuals carrying a CNV that impacts “genomic instability genes” (GIG-CNV) have a higher burden of rare CNVs (measured as the number and the cumulative length of rare CNVs per individual) than do individuals not carrying such CNVs.35 We compiled a set of 958 protein coding “genomic instability genes” from the AmiGO database.36

The “genomic instability genes” were not disproportionately impacted by rare CNVs in cases compared with controls (parents or unaffected individuals for this analysis) (p > 0.1). In individuals who had a GIG-CNV, we found an increase in mean number of CNVs (3.3 vs 2.1; p = 2.11 × 10-8) and cumulative length of rare CNVs (4.4 Mb vs 315 kb; p = 2.11 × 10-8) compared to individuals without a GIG-CNV. We observed a similar trend excluding CNVs impacting the “genomic instability genes” from the burden analysis (mean number of rare CNVs: 2.8 vs 2.1; p = 0.003; cumulative length of rare CNVs: 564 kb vs 315 kb; p = 0.024). This difference was even higher when considering only cases (i.e. not controls)(mean number of rare CNVs: 2.9 vs 2.1; p = 0.013; cumulative length of rare CNVs: 686 kb vs 358 kb; p = 0.051). We also found a 2.77-fold increase in the number of cases with rare CNVs impacting NDD-associated genes (NDD-CNV)(n = 1,160; Supplementary Information) and a GIG-CNV (Fischer’s exact test, p = 4.7 × 10−05, odds ratio: 2.77[CI:1.66–4.54]), compared with cases with NDD-CNVs but without a GIG-CNV. We then excluded individuals with aneuploidy or a CNV impacting both “genomic instability genes” and NDD genes. We still observed the excess number of cases with NDD-CNVs and GIG-CNVs over those with NDD-CNVs only (odds ratio of 2.52[CI:1.50-4.16] (Fisher’s exact test, p = 2.7 × 10-4)).

We then tested whether there was an increase CNV burden among cases whose parents had a GIG-CNV. Such CNVs could have been generated de novo anywhere previously in the pedigree and had not necessarily arisen de novo in the affected individual. We excluded the cases carrying a de novo GIG-CNV that was not found in the parents. We found a higher average number of rare CNVs in cases whose parents had GIG-CNVs compared to cases whose parents did not (2.66 vs 2.10; p = 0.02). We observed a similar trend for this global burden when excluding the GIG-CNVs (2.49 vs 2.10; p = 0.08). However, we did not observe an increased global burden in the cumulative length of rare CNVs.

We then further investigated cases with de novo CNVs for whom WGS data were available (n = 16, all from the ASD cohort). We found no over-representation (p > 0.1) of cases with de novo CNVs from families in which at least one parent carried a LoF variant impacting a “genomic instability gene” (n = 10) compared to other families (n = 6). Again, this was a small sample size. There were notable examples of cases with de novo CNVs whose parents had LoF variant(s) on “genomic instability genes”. (i) Case 1-0627-007 had a paternally inherited frameshift deletion impacting PALB2 (c.509_510del:p.R170fs). He also had a 1.9 Mb de novo deletion in 16q23.3-q241 (Table 3, Supplementary Table 1C). (ii) Case 2-1525-003 had a stop-gain mutation on PALB2 (c.G2712A:p.W904X) and a 326 kb de novo duplication in 16p13.2. PALB2 plays a role in homologous recombination and checkpoint response.37 (iii) Case 1-0181-004 had a paternally inherited variant in EXO1 (c.G2482T:p.E828X) and a 5.3 Mb de novo deletion in 3p14.1-p13. EXO1 functions in DNA replication, repair, and recombination (OMIM:606063). (iv) The father of ASD case 2-1693-003 had a RAD1 variant (c.168_172del:p.A56fs) - a gene required for DNA replication and repair (OMIM: 603153). She carried a de novo 24 kb deletion at Xp11.22.

We also studied the extent of de novo CNVs overlapping genomic fragile sites. Of 33 de novo CNVs (excluding aneuploidies), 25 (75.6%) overlapped fragile site regions (Supplementary Table 1F). In addition, eleven of the de novo CNVs overlapped long genes (>300 kb), a feature associated with fragile sites and neuronal genes.38 Notably, five of these genes – MBD5, FAM19A1, FOXP2, AUTS2, and DLGAP2 – are involved in neuron formation and differentiation.

Discussion

We generated a bioresource to investigate the contribution of rare CNVs to the etiology of four NDDs – ASD, ADHD, OCD, and SCZ – among 2,691 diagnosed cases. We found that 10.5% of these cases carried CNVs with potential clinical relevance to NDDs. Of all cases, 4.1% carried CNVs that were formally classified as clinically significant or likely clinically significant, when evaluated according to ACMG guidelines.26 We also found variant genes/regions that were shared across some or all of the NDDs. Evidence included recurrent or non-recurrent CNVs impacting the same genes in cases with different NDDs, and in patients diagnosed with multiple comorbid NDDs.

Of the four NDDs, ASD had the highest proportion of cases with a clinically relevant CNV (11.4%). OCD cases had the lowest proportion with identified CNVs (5.6%). Deletions of 22q11.21 were found in 6/204 SCZ cases (2.9%) - three with mild intellectual disability - contributing to the relatively high proportion of SCZ cases deemed to have a clinically relevant CNV (10.8%). 22q11.2 deletions are expected to be identified in about one in every 100-200 individuals with SCZ and about one in 10 with dual diagnosis of SCZ and ID.11 The enrichment of the SCZ cohort studied for ID likely contributed more to the prevalence observed of 22q11.2 deletions. Of ADHD cases, 9.4% carried clinically relevant CNVs, which is slightly higher than 8.9% previously reported using a different microarray (Affymetrix SNP 6.0).10 We also found multiple clinically relevant CNVs in 20/2,691 (0.74%) of NDD cases.

We found 17 aneuploidies (45,X; 47,XXY and 47,XYY) in cases diagnosed with either ASD or ADHD (Table 2). The prevalence of Turner syndrome (45,X) (n = 2) was 1/1,300 among our cases, which is similar to previous reports.39 One had ADHD and the other ASD, similar to other reports.39 Cases with 47,XXY (n = 5) or 47,XYY (n = 4) had diagnoses of either ASD or ADHD, similar to previous reports.29 We found trisomy 21 (n = 6) only among ASD patients.40 Large CNVs other than aneuploidies were found in 23/2,691 (0.85%) cases, mainly in gene-rich regions of the genome. Although we did not find aneuploidies in SCZ cases, they have been reported in association with this phenotype previously.11,24

We found CNVs associated with known recurrent genomic disorders in 4.3% of cases (Table 2). This signified an increase of this type of CNVs among NDDs compared with that from a community population (1.1% (52/4,817); unpublished data), but similar to that of subjects with neurocognitive deficits in the UK Biobank (3.8%).41

Known recurrent genomic disorders were distributed differently among the four NDDs (Table 2; Fig. 1). We observed ASD in 80/115 (70%) of subjects with known recurrent genomic disorders, e.g., 7q11.23 deletion, 16p11.2 distal deletion, and 22q11.21 duplications. The 5p deletion was unique to SCZ. Deletions of 2q37.3, 16p11.2 proximal duplications, and Xp22.3 deletions were found only in ADHD, whereas 17q12 deletion was only found in OCD. Duplications of 15q11-q13, deletions of 15q11.2 (BP1-BP2), and 16p13.11 duplications were observed among cases of all four disorders (Fig. 1). Proximal duplications of 16p11.2 are found in up to 1% of individuals with SCZ.11,24,42

When parents were available to determine origin, we found 5.6% of this subset of cases to have a de novo CNV (Table 2). The highest de novo rate was for ASD (6.9%), consistent with previous reports from 4.7 to 7.1%.43 For OCD, 1.8% had de novo CNVs, which is higher than the rate found in the general population (0.9–1.4%),43 but lower than 2.3% for OCD previously reported from a larger sample size.13

We observed deletions and duplications, other than those associated with known recurrent genomic disorders, in the same genes in different NDDs, including some in multiple cases (Fig. 2; Table 2, Supplementary Table 1D). Deletions impacted NRXN1 (in 9 males, 1 female) among ASD, ADHD, and SCZ cases (Fig. 2). Similarly, we found deletions impacting GNAL, LDLRAD4, SEH1L, DLGAP2, DCTN2, GRID2, and KIF5A among both ASD and ADHD cases (Fig. 2).10

Disruptive variants in gene-sets involved in multiple intracellular signaling pathways and DNA instability have been observed previously in ASD.23 Variants in gene pathways associated with DNA/“genomic instability” are increased in both ASD and SCZ.20 Consistent with these studies, we observed a 2.77-fold higher proportion of cases with NDD-CNVs among those with GIG-CNVs, than among those without.

This study had certain limitations. (i) Most cases, with the exception of SCZ, were recruited as children or adolescents on the basis of a specific diagnosis. ASD and ADHD have early onset, and many participants would not have reached the age for adolescent or adult-onset disorders, including OCD and SCZ. It is possible that individuals with early onset conditions will develop additional later-onset comorbidities. All SCZ cases in the current study were adults. (ii) Recruitment was by clinicians who focus on a single disorder. It is possible that some cases may have had other NDDs, which were not reported. For example, we had data on intellectual disability/IQ for the SCZ cohort and for some cases with other NDDs. We searched the genotype data for possible multiple ascertainment of any case and found no examples of subjects that were recruited through multiple disorders. We examined for non-primary phenotypes for specific cases with variants in NDD-relevant genes. (iii) Due to limitations of the technology, we studied CNVs only of a certain size (>20 kb) for the majority of samples where we did not have sequence data. Smaller CNVs and single nucleotide polymorphisms also contribute to the etiology of NDDs,8,35 but these would have been missed. A more sensitive technology such as genome sequencing would allow more comprehensive detection of all relevant variants.8,44 The dataset also needs to be analyzed iteratively as more data and better analysis tools become available.45

In summary, we highlighted clinically relevant CNVs found through microarray data for ASD, ADHD, OCD, and SCZ. We also demonstrated that identical CNVs or genes could potentially contribute to the etiology of multiple NDDs, consistent with previous reports,10,20,46,47 and providing a valuable resource for comparison in other studies.

Methods

Samples

This project was a part of a multilateral collaborative project to investigate genetic etiology across four neurodevelopmental disorders: ADHD, ASD, OCD, and SCZ. This study was approved by the Research Ethics Board at The Hospital for Sick Children. A written informed consent was obtained from all participants or substitute decision makers. CNVs were detected on the same high-resolution microarray platform. The criteria for meeting a diagnosis of ASD, ADHD, OCD, or SCZ were detailed in our previous publications8,10,13,23,24 with a few modifications for ADHD (see Supplementary Information). Data from all OCD individuals and 139/435 (32%) of the SCZ cohort had been previously published,11,13 but we included them here for comparative purposes (Supplementary Information). ADHD and ASD samples were not previously published. Additional supportive evidence for cross-disorder associations of selected CNVs came from our previously published schizophrenia cohorts11,24 and an additional ADHD cohort10; all were genotyped on the Affymetrix SNP 6.0 microarray (Table 3).

Genotyping and detection of rare variants

We extracted genomic DNA from saliva or blood and genotyped samples on the Affymetrix CytoScan HD platform. Quality control and ancestry assessment procedures were as discussed previously.25 Using PLINK v1.90b2, we found 1,995 (74.1%) of cases to be of European ancestry (Supplementary Table 1A).

CNVs were identified as previously described.13,25 Briefly, four different algorithms were used to call high-confidence CNVs. These included the Affymetrix Chromosome Analysis Suite, iPattern, BioDiscovery Nexus, and Partek Genomics Suite. We defined a stringent set of variants of at least 20 kb wherein each was identified by at least two algorithms and spanned by at least five consecutive probes (Supplementary Table 1B). We defined rare CNVs as those present at no more than 0.1% frequency among 10,851 controls samples (detailed in Zarrei et al.25). We further restricted our list to those with more than 75% overlap with copy-number stable regions, according to our stringent CNV map of the human genome.2 We confirmed clinically relevant CNVs (Tables 1, 2, Supplementary Table 1C) (as defined below) using a SYBR® Green-based real-time quantitative PCR assays, TaqMan® copy number assays or whole genome sequencing data (if available). The genomic coordinates used are based on Human Genome Build GRCh37/hg19.

Prioritizing variants relevant to NDDs and the NDD gene list

To focus on CNVs relevant to NDDs, we first selected those variants coinciding with known recurrent genomic disorders, aneuploidies, and large (>3 Mb) deletions and duplications. We also analyzed whether rare CNVs in our cases were similar to those in clinically relevant CNV databases at The Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, comprising over 20,000 cases. We classified variants for their clinical impact according to American College of Medical Genetics guidelines.26 Our prioritized variants also included those impacting the coding sequences of genes with sufficient evidence for being clinically relevant to NDDs (Supplementary Information).

Cross-disorder gene discovery and genes in more than one case in a single disorder

We searched for genes that were impacted by CNVs of 20 kb to 3 Mb in more than one case. Of these, we analyzed brain-expressed genes48 that were impacted by rare deletions and that are moderately to strongly constrained in the general population for LoF variants (as defined by a LoF probability of > 0.4549; n = 1,116). We also analyzed genes whose full transcript length was impacted by duplication.

Global burden test for protein-coding genes and lncRNAs

We performed a univariate analysis to test the global burden of variants impacting coding sequences of protein-coding genes and all exons of lncRNAs using a logistic regression model. We further tested a burden for brain-expressed protein-coding genes (n = 3,666) and lncRNAs (n = 1,070) to compare with those not expressed in the brain. Chromatin states from the Roadmap Epigenomics Consortium50 were used to identify brain-expressed genes (Supplementary Information). We defined controls as parents of cases in the regression analysis. We used sex and the first three principal components of population stratification calculated using PLINK as covariates. The model was also corrected for the total length of CNVs. Finally, we performed a multivariate analysis to investigate whether the burden signals were from the same sets of CNVs as in the univariate analysis (details in Supplementary Information). We considered p < 0.05 as statistically significant. We also reported 0.05 < p < 0.1 as nearly significant.

Genomic instability and fragile sites

Replication stress can lead to CNV formation, and fragile sites. A recent study using genome-wide CNVs20 demonstrated a link between DNA/genomic integrity and ASD and SCZ. However, using a larger sample size than the current study (1,108 ASD and 2,458 SCZ), they were unable to find pathways enriched in ASD versus SCZ and vice versa. Given smaller cohorts, we performed our analyses in a combined set of all four NDDs to achieve an acceptable statistical power. We first investigated CNVs impacting the coding sequences of genomic instability genes, looking for change in the proportion of cases with rare CNVs in these genes, compared with that of controls. The genomic instability genes comprised 958 protein coding genes identified from the AmiGO database36 by searching for the following terms: DNA repair, DNA replication, genome maintenance, DNA damage, and DNA integrity. We also tested for the overall number of rare CNVs and total length of rare CNVs. We then considered whether cases with perturbed genomic instability genes had a different burden of rare CNVs in NDD genes compared to that in cases with intact instability genes.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.