ASDs, including autism, are neurodevelopmental disorders characterized by impairments in social and communication skills, as well as stereotyped and repetitive behaviours and/or a restricted range of interests. Current prevalence estimates in the United States are 0.1–0.2% for autism and 0.6% for ASDs1,2.

Linkage and candidate gene association studies have implicated several chromosomal regions in autism3,4. However, positive findings in one study often fail to replicate in other studies, and a consistent picture of susceptibility loci in autism is still lacking. Some telling clues about ASD genetics arose from recent studies on CNVs5, including the association of de novo CNVs with ASDs6. Although de novo CNVs that disrupt specific genes may contribute to the pathogenesis of ASDs, heritable CNVs are much more common but have been less studied as risk factors of ASDs. A family-based genome-wide linkage and CNV analysis by the Autism Genome Project Consortium using Affymetrix 10K single nucleotide polymorphism (SNP) arrays implicated chromosome 11p12-13 and neurexin 1 (NRXN1) as candidate loci7. A study using the Affymetrix 500K SNP array in a Canadian population reported 277 rare CNVs that were only observed in ASD patients but not in 1,652 healthy controls or in the Database of Genomic Variants8. Furthermore, 16p11.2 deletions and duplications have been reported in independent cohorts of autism patients9. These studies concordantly implicate a role for CNVs in the genetic susceptibility to ASDs.

To search systematically for CNVs that confer a risk to ASDs, we used a genome-wide approach with the Illumina HumanHap550 BeadChip. We assembled an Autism Case-Control (ACC) cohort by collecting 859 ASD cases (from a total of 1,246 ACC cases, parents, and siblings) of European ancestry, and 1,409 healthy controls. Among these case subjects, all met the diagnostic criteria for autism on the basis of Autism Diagnostic Interview (ADI), and 124 met the criteria for other ASDs on the basis of Autism Diagnostic Observation Schedule (ADOS)13. Fifty-four per cent were from simplex families, the rest were from multiplex families. In addition, we also analysed 1,336 ASD cases (from a total of 3,398 cases, parents, and siblings) in the Autism Genetic Resource Exchange (AGRE)14 collection, as well as 1,110 control subjects as a replication cohort. Among the AGRE cases, 5% were from simplex families and 95% were from multiplex families: 1,202 met the criteria for autism and 134 for other ASDs13 (Supplementary Tables 1 and 2).

We generated 78,490 CNV calls (22,581 in the ACC series and 55,909 in the AGRE series) from all the ASD subjects and their family members that met strictly established data quality thresholds (Methods). On average, 15.5 CNV calls were made for each individual using the PennCNV software15, with similar frequency observed in cases and controls (Supplementary Fig. 2).

We first examined eight genomic regions that have been previously implicated in ASDs. Among those, CNVs involving the 15q11-13, 22q11.21, and NRXN1 regions have well-established associations with autism10. CNVs affecting CNTN4 in ASD cases have also been reported in independent studies11,12. We statistically adjusted for relatedness of cases with permutation and our results demonstrate that duplications of 15q11-q13 and the 22q11.21 region, deletions of NRXN1, as well as deletions and duplications of CNTN4 replicate in our cohorts (Table 1). Conversely, we did not obtain statistical support for several other genomic regions previously shown to associate with ASD, including AUTS2 (ref. 16), NLGN3 (ref. 17), SHANK3 (ref. 18) and 16p11.2 (ref. 9) (Table 1). We observed a similar frequency of deletions and duplications of the 16p11.2 locus in the ASD cases (0.3%) as previously reported9; however, the CNV frequency in the control subjects at this locus was also comparable to that of the cases (Supplementary Fig. 3). It is noteworthy that CNVs at the 16p11.2 locus do not segregate to all cases in three of the affected families and they are also transmitted to unaffected siblings (Supplementary Fig. 4). These results indicate that CNVs at the 16p11.2 locus may not be sufficient to be causal variants in ASD.

Table 1 CNVs in gene regions previously implicated in ASDs

To identify other new genomic loci contributing to ASDs, we applied a segment-based scoring approach that scans the genome for consecutive SNPs with more frequent copy number changes in cases compared to controls. This approach defines copy number variation regions, or CNVRs (Supplementary Fig. 5, upper panel). In the ACC cohort, we identified four CNVRs that were observed in cases but not in controls, as well as five CNVRs that had significantly higher frequency in cases versus controls (Table 2).

Table 2 New common CNVRs over-represented in ASD patients

To replicate the CNVRs exclusively observed in ACC cases, we examined the AGRE case-control data set; of the four case-specific CNVRs, two were also exclusive to AGRE cases (PARK2 and RFWD2), whereas the other two (AK057321 and FBXO40) were not observed in either the cases or controls (combined P values ranging from 3.57 × 10-6 to 0.1, unadjusted for multiple testing) (Table 2). Interestingly, four genes (UBE3A, PARK2, RFWD2 and FBXO40) that were significantly enriched for CNVs and observed in the ASD cases only, belong to the ubiquitin gene family (UniProt category ‘Ubl conjugation pathway’, P = 3.3 × 10-3). The other five CNVRs, as well as being enriched in the ACC cases compared with controls, were over-represented in the AGRE cases compared with the independent controls (Table 2). Figure 1 shows the most statistically significant locus, a duplication 55 kb upstream of AK123120, using UCSC Genome Browser19 with Build 36 (March 2006) of the human genome. To ensure reliability of our CNV detection method, we experimentally validated all the significant CNVRs using other methods, including quantitative PCR (qPCR) and multiplex ligation-dependent probe amplification (MLPA) (Fig. 2). Affymetrix 5.0 array data were also available for a subset of the AGRE subjects for validation.

Figure 1: AK123120 : example of overrepresented CNVs.
figure 1

AK123120 chromosome 2 (chr2): 12,986,750–13,291,000 divided into subsections with headers for ACC CNVs, AGRE CNVs, AGRE Affymetrix validation CNVs, and control CNVs. The AGRE Affymetrix Replication track is on the basis of genome-wide 5.0 SNP genotyping data from the Broad Institute (see Supplementary Methods and Acknowledgements), and were generated using the PennCNV-Affy algorithm (see Supplementary Methods), to serve as a further means to validate the Illumina-based CNV calls. SNP and copy number (CN) probe coverage are shown as blue lines across the top. Produced with custom tracks listing CNV calls uploaded to Figures for all loci are included in Supplementary Information.

PowerPoint slide

Figure 2: Independent validation using qPCR and MLPA.
figure 2

Fluorescent probe-based qPCR assays using Roche Universal probe library and/or MLPA were designed to validate every candidate CNV with a completely independent test (representative series shown for each locus). Error bars denote the s.d. of quadruplicate runs. bp, base pairs; Del, deletion; Dup, duplication.

PowerPoint slide

Besides a segment-based scoring approach for CNV association, an alternative method is the gene-based scoring approach that examines CNV calls affecting any region of the gene (Supplementary Fig. 5, lower panel). Using this approach, we further identified seven genes with an increased frequency of CNVs in ASD cases versus controls (Supplementary Table 3). For each gene, most CNVs target different parts of the gene and would have been missed by the segment-based approach. Of note, four of the genes identified by the segment- and gene-based approaches are involved in neuron development (NRXN1, CNTN4, ASTN2 and NLGN1) (Gene Ontology term ‘neuron development’, P = 9.5 × 10-3). Therefore, by combining evidence from two complementary CNV association approaches, the large sample size has enabled us to implicate two specific gene networks or biological pathways in ASDs: the ubiquitination system and neuronal cell-adhesion molecules.

The genes from the ubiquitin pathway (UBE3A, PARK2, RFWD2 and FBXO40) represent a new CNV finding in ASD susceptibility. Ubiquitination is a post-translational modification which can rapidly alter protein function and target proteins for proteasome-mediated degradation. The ubiquitin–proteasome system operates pre- and post-synaptic compartments, regulating synaptic attributes, including neurotransmitter release, synaptic vesicle recycling in presynaptic terminals, and dynamic changes in dendritic spines and the post-synaptic density (PSD)20. Of the four ubiquitin-related genes highlighted in our study, UBE3A, a ubiquitin protein ligase, has been the most extensively studied in the context of autism. PARK2 is a ubiquitin-protein ligase, mutations of which cause autosomal recessive juvenile Parkinson’s disease21, and RFWD2 and FBXO40 are also ubiquitin-protein ligases, but neither has been previously associated with disease-causing mutations. The role of ubiquitin in the turnover of synaptic components such as the neuronal cell-adhesion molecules in a process involving regulation of activity-dependent synaptic plasticity presents a mechanism that links these two major gene networks. In addition to the genes described above, several ubiquitin-related genes are involved in human neurological diseases. These include NHLRC1, UBR, CUL4B, BRWD3 and HUWE1, genes that encode ubiquitin protein E3 ligases. Mutations in the latter three and in UBE2A, an E2 ubiquitin-conjugating enzyme, cause syndromes that include intellectual disability22.

Genes from the second group of genes implicated in our study, neuronal cell-adhesion molecules, are critical in the development of the nervous system, contributing to axonal guidance, synaptic formation and plasticity, and neuronal–glial interactions. Recent genetic evidence has suggested associations between autism susceptibility and neuronal cell-adhesion molecules, including NRXN1 (ref. 10), CNTNAP2 (ref. 23), NLGN3 (ref. 17 ), NLGN4X (ref. 17), and specific cadherins. Our results provide support for some previously reported genes (NRXN1 and CNTN4), and also implicate additional genes with cell-adhesion functions, including NLGN1 and ASTN2. Mutations in neuroligin superfamily members have previously been found in individuals with autism, and have subsequently been shown to be functionally relevant24. ASTN1, a well-studied homologue of ASTN2, is a neuronal protein receptor integral in the process of glial-guided granule cell migration during development25, and ASTN2 deletions have recently been associated with schizophrenia26.

Using a genome-wide approach for high-resolution CNV detection, we have identified candidate genomic loci with enrichment of CNVs in ASD cases as compared to controls, and have replicated many of them using an independent set of cases. Most of these genes fall within two pathways/networks involving neuronal cell-adhesion and ubiquitin degradation. The enrichment of genes within these molecular systems suggests new susceptibility mechanisms for ASDs. Our results call for functional and expression assays to be completed to assess the biological effects of CNVs in these candidate genes.

Methods Summary

All genome-wide SNP genotyping was performed using the InfiniumII HumanHap550 BeadChip at the Center for Applied Genomics at The Children’s Hospital of Philadelphia (CHOP). We called CNVs with the PennCNV algorithm15, which combines multiple values, including Log R Ratio, B Allele Frequency, SNP spacing and population frequency of the B allele into a hidden Markov model. The term ‘CNV’ represents individual CNV calls, whereas ‘CNVR’ refers to population-level variation. Quality control thresholds included a high success rate of attempted SNPs, low standard deviation of normalized intensity, genetically inferred European ancestry, low genomic wave artefacts, count of CNV calls per subject, and genotypic duplicate removal (Supplementary Table 4). CNV frequency between cases and controls was evaluated at each SNP using Fisher’s exact test. We report statistical local minimums in reference to a region of nominal significance including SNPs residing within 1 Mb of each other (Supplementary Fig. 6). Resulting significant CNVRs were excluded if they were (1) residing on telomere or centromere proximal cytobands; (2) arising from a ‘peninsula’ of common CNV (Supplementary Fig. 7); (3) genomic regions with extremes in GC content27; or (4) samples contributing to multiple CNVRs. To adjust for siblings in the AGRE data, we calculated a permutation-based P value (×1,000), in which disease labels for siblings were permutated together. DAVID (Database for Annotation, Visualization, and Integrated Discovery)28 assessed the significance of functional annotation clustering. We considered loci significant between cases and controls (P < 0.05) where ACC discovery cases had overlapping variation, replicated in AGRE or were not observed in control subjects, and validated with another method (qPCR Roche Universal Probe Library using qBase29, MRC-Holland MLPA, and Affymetrix 5.0 from Broad). Statistical correction of five deletion and nine duplication CNVRs, on the basis of discovery cohort (ACC) significance and signal review is appropriate for our study (‘CNV Filtering Steps’ in Supplementary Materials).