A large data resource of genomic copy number variation across neurodevelopmental disorders

Copy number variations (CNVs) are implicated across many neurodevelopmental disorders (NDDs) and contribute to their shared genetic etiology. Multiple studies have attempted to identify shared etiology among NDDs, but this is the first genome-wide CNV analysis across autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), schizophrenia (SCZ), and obsessive-compulsive disorder (OCD) at once. Using microarray (Affymetrix CytoScan HD), we genotyped 2,691 subjects diagnosed with an NDD (204 SCZ, 1,838 ASD, 427 ADHD and 222 OCD) and 1,769 family members, mainly parents. We identified rare CNVs, defined as those found in <0.1% of 10,851 population control samples. We found clinically relevant CNVs (broadly defined) in 284 (10.5%) of total subjects, including 22 (10.8%) among subjects with SCZ, 209 (11.4%) with ASD, 40 (9.4%) with ADHD, and 13 (5.6%) with OCD. Among all NDD subjects, we identified 17 (0.63%) with aneuploidies and 115 (4.3%) with known genomic disorder variants. We searched further for genes impacted by different CNVs in multiple disorders. Examples of NDD-associated genes linked across more than one disorder (listed in order of occurrence frequency) are NRXN1, SEH1L, LDLRAD4, GNAL, GNG13, MKRN1, DCTN2, KNDC1, PCMTD2, KIF5A, SYNM, and long non-coding RNAs: AK127244 and PTCHD1-AS. We demonstrated that CNVs impacting the same genes could potentially contribute to the etiology of multiple NDDs. The CNVs identified will serve as a useful resource for both research and diagnostic laboratories for prioritization of variants.


Supplementary Methods: Neurodevelopmental disorders (NDD) diagnostic criteria
The Parent Interview for Child Symptoms (PICS) 1 is similar to the Schedule for Affective Disorders and Schizophrenia (KSADS) 2 with an enhanced module for attention deficit hyperactivity disorder (ADHD) and other disruptive behavior disorders. A clinical psychologist conducted the PICS. Reliability of ADHD diagnoses was assessed through videotaped interviews in 48 cases and was found to be high (interclass correlation for total symptom score = 0.93). Teacher information was collected using the Child Behavior Checklist Teacher form and the Strengths and Weaknesses of ADHD Symptoms and Normal Behavior (SWAN) teacher form. A clinical psychologist assessed intelligence and academic attainment.
To receive a best-estimate diagnosis of ADHD, arrived at through consensus between the assessing psychiatrist and psychologist, the participant had to present with impairing and developmentally atypical symptoms before age 7, meet Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria based on PICS and/or Teacher CBCL, exhibit evidence of symptoms and impairment both at home and at school, and not present with any of the exclusion criteria for ADHD as stated in DSM-IV. Individuals with full-scale IQ data were excluded if they had an IQ of less than 80 on both the verbal and the performance subscales of the Wechsler Intelligence Scale for Children. The mean full-scale IQ for these subjects was 100.32 (SD = 13.87; n = 174).
An autism spectrum disorder (ASD) diagnosis was of research quality when it met criteria on one or both of the diagnostic measures, Autism Diagnostic Interview-Revised and Autism Diagnostic Observation Schedule; it was considered a clinical diagnosis when given by an expert clinician according to DSM-IV or 5. 3,4 Ascertainment and phenotyping of the schizophrenia (SCZ) cohort was as described in detail elsewhere. 5,6 In brief, were ascertained adult patients who met the DSM-IV diagnostic criteria for schizophrenia or schizoaffective disorder from community mental health clinics in Central and Eastern Canada. The study was approved by local hospital and university institutional review boards and written informed consent was obtained for all participants. All study participants underwent direct clinical screening assessments for potential syndromic features using a standardized protocol that included review of available lifetime medical records and assessment of physical features. 7 All phenotyping was done blind to genotype. In addition to the ascertainment as above, the current study included five adults with schizophrenia referred by community psychiatrists for 22q11.2 deletion syndrome as described elsewhere. 8 Of the n=204 with SCZ studied (from 200 families), 31 (16.9%) with sufficient data on cognitive functioning had ID, including three with severe ID. Of the 204, 139 (68.1%) were previously published. 6 Six of the 204 had typical 22q11.2 deletions, five of whom were referred with syndromic features and previously published. 8 For obsessive-compulsive disorder (OCD), we interviewed participants and their parents with the Schedule for Schizophrenia and Affective Disorders for School-Aged Children-Present and Lifetime Version. 9 In addition, we used the Schedule for Obsessive-Compulsive and Other Behavioral Syndromes 10 at the University of Michigan and Wayne State sites. We assessed specific symptoms and current severity of OCD in the participants using the Children's Yale-Brown Obsessive Compulsive Disorder Scale. 11 The site clinical investigator -a child and adolescent psychiatrist -made lifetime and current axis one diagnoses using all sources of information according to DSM-IV criteria.

Supplementary Methods: NDD genes
Our NDD gene list (Supplementary Table 1G) included those expertly annotated for association with ASD or intellectual disability, and high confidence or statistically significant genes from NDD sequencing studies. These reference sources included: 1) Tier 1 and 2 high confidence developmental brain disorder genes (each with at least two de novo LOF variants) 12 2) 61 high confidence ASD genes from the MSSNG whole-genome sequencing project 3 3) 65 ASD candidate genes (false discovery rate ≤ 0.1) identified from Simons Simplex Collection and Autism Sequencing Consortium exome sequencing data 13 4) 10 novel candidate genes for intellectual disability (Benjamini-Hochberg corrected P-value <0.05) 14 5) 94 genes with increased numbers of damaging de novo variants (p < 7 x 10 -7 ) from Deciphering Developmental Disorders study 15 6) genes with increased de novo variants in epileptic encephalopathy 15 intellectual disability/epilepsy 16 or schizophrenia cohorts 17,18 7) 346 genes with haploinsufficiency or triplosensivity scores of 1 to 3 from the ClinGen Dosage Sensitivity Map 19 and 8) SFARI autism candidate genes (n=260) with associated scores ranging from 1 to 3 (https://www.sfari.org/).

Supplementary Methods: Burden analysis of CNVs impacting brain expressed protein coding and lncRNA genes
We applied a logistic regression test for the burden of protein coding and lncRNA genes impacted by rare CNVs, in cases compare to controls (parents or unaffected individuals). Only rare CNVs (< 0.1% frequency) larger than 20kb were tested. We corrected for the distribution of CNVs in different sexes and sub-populations using sex and population stratification by principal component analysis (PCA). We also corrected for the difference in number of rare CNVs among samples by total size of rare CNVs having excluded centromeres, telomeres and segmental duplications. We independently tested the burden of deletions and duplications in exons of protein coding and lncRNA genes. The formula of the logistic regression model is shown in below, where M0 and M1 models were compared by anova function of the R package to derive p-value of the test. M0: case/control status = β1×sex + β2×PC1 + β3×PC2 + β4×PC3 + β5×clean_size + ε M1: case/control status = β1×sex + β2×PC1 + β3×PC2 + β4×PC3 + β5×clean_size + β6×no_genes + ε where β [1][2][3][4][5][6] are beta coefficients, sex is sex of sample, PC [1][2][3] are principal components from population stratification, clean_size is total size of rare CNVs and no_genes is number of protein coding or lncRNAs impacted by rare CNVs.
Besides the burden testing of all genes, we tested the brain-expressed genes independently. Due to the lack of expression data of lncRNAs, we used the chromatin states data from Roadmap epigenomics 20 to define the brain-expressed regions of the genome. The chromatin states data contained data from 48 non-brain tissues and 10 brain tissues. We only focused on analysing regions defined as a transcribed state (3_TxFlnk, 4_Tx or 5_TxWk). To be more specific to brain, the brain-expressed regions were those with no transcribed state in any of non-brain tissues, or an odd ratio of brain tissues over non-brain tissues is >5. We defined a brainexpressed gene as one that had exonic overlap with brain-expressed regions. We corrected for multiple testing using permutation-based FDR, as some tests are correlated with each other. We performed 1,000x of label permutation using BiasedUrn library of R package. 21 The permutation-based FDR was calculated as the ratio of permutated tests that passed the given pvalue threshold over the ratio of actual tests that passed the same p-value threshold. Finally, we did a multivariate analysis to test whether signals from brain-expressed protein coding and lncRNAs were in common. The formula of logistic regression for multivariate analysis is: M0: case/control status = β1×sex + β2×PC1 + β3×PC2 + β4×PC3 + β5×clean_size + ε M1: case/control status = β1×sex + β2×PC1 + β3×PC2 + β4×PC3 + β5×clean_size + β6×no_lncRNA_genes + β7×no_protein_coding_genes + ε

Supplementary Methods: Burden analysis of CNVs impacting NDD genes in case with rare CNVs impacting genomic instability genes
We compiled 958 protein coding genes involved in genomic instability from the AmiGO database. 22 We tested for the burden of genomic instability genes impacted by either deletions or duplications between cases and controls using brain-expressed genes. We performed the same burden correction for sex, population stratification and total size of rare CNVs. We compared the number of CNVs and total size of CNVs between samples with and without rare CNVs impacting genomic instability genes, using a simple Welch two sample t-test. Additionally, we compiled a list of 1,160 genes associated with NDD phenotypes for the last set of tests to answer whether cases with CNVs impacting genomic instability genes tend to have CNVs impacting NDD genes as well. The test was done using Fisher's exact test.