Introduction

Autism spectrum disorders (ASDs) are a group of conditions characterized by impairments in social interaction and communication and by restricted, repetitive patterns of interest or behaviour1. The disorder presents clinically in the first 3 years of life. Its precise prevalence has proven difficult to estimate due to changing practices of diagnosis and ascertainment; rates have increased from no more than 5 per 10,000 individuals throughout the 1980 s to approximately 1 in 68 children today, according to a report from America’s CDC. Genetic epidemiology studies have indicated that ASDs have a high heritability2,3,4, suggesting a strong genetic basis for ASDs.

Evidence from different cases supports that de novo and rare chromosomal abnormalities, copy number variations (CNVs), single nucleotide variations (SNVs)/small insertion and deletions (InDels) and multiple common genetic variants contribute to the risk of ASD5. Dozens of de novo and rare inherited copy number variations (CNVs) have been found through large-cohort genome-wide studies that involved more than 5000 ASD families of primary European ancestry6,7,8,9,10,11,12,13. The most significant CNVs are 16p11.2 deletions and duplications, 7q11.23 duplications, 15q11–13 duplications, 1q21.1 duplications, 3q29 deletions, 22q11.2 deletions and duplications, deletions at NRXN1 and deletions at CDH136. These structural variants, many of which have large effects but are individually rare, together may account for approximately 4% of ASD cases6. Recently, whole-exome sequencing studies and candidate resequencing studies of large cohorts have also identified multiple genes with recurrent de novo, likely gene-disruptive (LGD) mutations in ASD patients, such as CHD8, SYNGAP1, DYRK1A, ARID1B, SCN2A, DSCAM, ANK2, ADNP, POGZ, GRIN2B and CHD214,15,16,17,18,19,20. Common genetic variants have also been reported at 5p14.1 (CDH10-CDH9)21, 5p15.2 (SEMA5A)22, MACROD223, CNTNAP224 and 1p13.2 (CSDE1, TRIM33) loci25 through genome-wide association studies (GWAS). Together, these results suggest that ASD has a complex inheritance pattern; its genetic architecture and underlying mechanism are still largely unknown.

Although several large-cohort genome-wide CNV studies have been performed, few studies have investigated CNVs related to ASD in Chinese populations26,27. Previously, we performed a GWAS of autism in a Han Chinese population for common genetic variants associated with ASD25. Here, with additional samples, we present a genome-wide study in an attempt to identify CNVs that may contribute to the aetiology of ASD in the Han Chinese population. We found 32 rare, large CNVs longer than 1 Mb accounting for 5.68% of the total patients, of which five CNVs were recurrently identified in ASD patients.

Results

Burden of rare, large CNVs in ASD patients of a Chinese Han population

After strict quality control and CNV calling, 546 ASD subjects (343 trios) and 988 normal controls were analysed (Fig. 1). On average, 18 CNV calls per individual were made for samples genotyped using the HumanCNV370 BeadChip, 24.7 CNV calls for samples genotyped using the HumanCNV610 BeadChip and 357.3 CNV calls for samples genotyped using the HumanCNV660 BeadChip (the HumanCNV660 BeadChip includes many common CNV loci) (Figure S1). As we used low-density BeadChips and multiple genotyping platforms to perform the whole-genome genotyping, small CNV calls would not be confident and would be difficult to integrate from multiple platforms. Therefore, we only considered large CNVs (>1 Mb) in the analysis.

Figure 1: Pipeline of CNV discovery and analysis.
figure 1

ASD patients and parents were genotyped by an Illumina 370 K or 660 K BeadChip. Control subjects were genotyped using an Illumina 610 K BeadChip. CNV calling and quality control were performed using the PennCNV program. Rare, large CNVs (>1 Mb) were used for validation and analysis. The global burden was subsequently determined. De novo CNVs and inherited CNVs were characterized.

We observed a higher CNV burden in patients with ASD. A total of 33 rare CNVs (<1%) larger than 1 Mb were identified in ASD patients by PennCNV, but one was not consistently validated (Figure S2). Finally, 32 rare CNVs were identified in 31 ASD probands (5.68% of the patients, Table 1), and 19 CNVs were identified in 19 control subjects (1.92% of the control subjects) (Table S1). The ASD patients had a significantly higher number of CNVs than the control subjects (odds ratio: 3.05, 95% CI: 1.66–5.74, p = 1.55E-04, Fisher’s exact test, Fig. 2, Table S2). The CNV burden was higher when considering CNVs larger than 2 Mb (odds ratio: 28.9, 95% CI: 4.47–1208.24, p = 8.7E-07, Fisher’s exact test). Both deletions and duplications contributed to the CNV burden. None of the 32 rare CNVs detected in this study were present in the control subjects. Among the 32 case-private CNVs, there were 16 de novo CNVs, 11 inherited CNVs, and five CNVs with unknown inheritance because of the absence of parental samples. Five CNVs were recurrently identified in 13 ASD patients (Table 1).

Table 1 Rare, large CNVs (>1 Mb) identified in this study.
Figure 2: Burden analysis of rare, large CNVs in patients and controls.
figure 2

Deletions, duplications and the combined rate for all CNVs are shown. The CNV size was categorized as >1 Mb and >2 Mb. For each event type, the significance between patients and controls is given at the bottom.

Recurrent de novo or case-private CNVs

The 15q11–13 duplication was found to be one of the most common CNVs in the ASD patients. Of 32 CNVs, five de novo duplications were found at 15q11–13 (Table 1), which accounts for approximately 1% of the ASD patients in our study. We compared the frequency of this CNV with the Simons Simplex Collection (SSC) and Autism Genome Project (AGP) samples with primary European ancestry6,7 (Table S4). The incidence of 15q11–13 duplications was significantly higher in the Chinese population in our study (5-fold, p = 0.021, Fisher’s exact test). Among the five individuals with 15q11–13 duplications, two individuals carried three copy numbers of 15q11–13 respectively, and three individuals carried four copy numbers respectively. We then performed a karyotyping analysis using the blood samples of the patients and confirmed that the three incidences with four repeats were due to a partial tetrasomy of 15q (idic(15)) (Figure S3). A methylation assay confirmed that four of the duplications originated on maternally derived chromosomes (Table S3). The parent-of-origin of the last one is not confirmed because of the unsufficiency of the DNA quantity.

Several other recurrent CNVs were observed at four distinct chromosome regions. First, two deletions (3.5 Mb and 3.7 Mb) were found on chromosome Xp22.3, which includes the NLGN4X gene, in two ASD patients. The first proband (M8590) is a female with a de novo deletion; the other proband (M15199) is a male (hemizygous) whose mother carried the heterozygous deletion. Second, two deletions (1.39 Mb and 1.35 Mb) were found in 15q13.1–13.2, which includes the APBA2 gene (Table 1, Figure S4). One of these deletions was de novo, whereas the other was maternally inherited. This is the first report of deletions involving the APBA2 gene in ASD patients. Third, two duplications (1.9 Mb and 2.4 Mb) were detected on chromosome 3p26, which have been reported in our previous paper28. Both duplications disrupted the gene encoding Contactin, CNTN4, which has been implicated in developmental delays and ASDs. Finally, two duplications (1.1 Mb and 1.5 Mb) were detected on 2p12 in two ASD patients (Table 1). One was paternally inherited; the inheritance of the other is unknown. However, no known gene was found in this region.

Integration of CNV and SNV data identified novel ASD candidate risk genes

In addition to the recurrent CNVs, 19 case-private CNVs were identified in individual ASD patients (Table 1). This included disruptions in four known ASD risk genes, ARID1B, SHANK3, CDH10 and CSMD1. One large de novo deletion (21.9 Mb) at 6q24.3–6q27 contains the ARID1B gene, and one de novo deletion (2.6 Mb) at 22q13.3 includes SHANK3 (Table 1). Two duplications disrupted genes encoding CUB and sushi domain-containing proteins (CSMDs), one of which (1.1 Mb) disrupted CSMD1, which has been implicated in ASD risk by a whole-exome sequencing study29. Notably, we identified another duplication (1.8 Mb) at 8q23.3 disrupting another gene encoding the CUB and sushi domain-containing protein CSMD3 (Table 1). A maternally inherited deletion (2 Mb) at 5p14.2–14.1 disrupted a single gene, CDH10 (Table 1), of which common variants have been implicated in ASD risk21.

To identify novel potential ASD risk genes in these case-private CNVs (Table 1), we integrated de novo CNV data (Table S4) from large-cohort whole-genome CNV studies6,7,8,9,10,11,13 and SNV/InDel data (Table S5) from the SSC and Autism Sequencing Consortium (ASC) whole-exome sequencing studies14,15 to refine the CNV regions and search for potential de novo mutations or private LGD mutations of the genes represented by our case-private CNVs that were shorter than 5 Mb (n = 11). Finally, three published de novo CNVs were identified as overlapping with a de novo CNV (15q23) in this study (Fig. 3). The overlapped region includes only seven refSeq genes (Fig. 3). We analysed these genes in the SSC and ASC whole-exome sequencing data, and two de novo variants were identified in GRAMD2 (Fig. 3). One of them is a frameshift mutation (p.K59EfsX11); the other is located in the 3’ UTR (c.*37 T > C). No other gene in this region was identified with a de novo mutation, and no de novo mutation of GRAMD2 was identified in the unaffected SSC siblings. In addition, a de novo frameshift mutation (p. F526SfsX3) of CDH10 was identified in an SSC simplex quad family, and an LGD mutation was identified in an ASC patient (Fig. 4). No CDH10 de novo mutation was identified in the unaffected SSC siblings or ASC controls. In addition, a de novo frameshift mutation (p.S261LfsX4) of STAM was identified in an SSC simplex quad family, and a missense de novo mutation was identified in an unaffected sibling (Fig. 4). A private LGD mutation of STAM was identified in an ASC patient. No LGD mutation was found in unaffected siblings and ASC controls.

Figure 3: Convergence of de novo CNVs at 15q23 and de novo mutations of GRAMD2 within the overlapped region.
figure 3

Red bars indicate the chromosome locations of the four deletions identified in this study (M16229) and other studies (11233.p1, AU008, SK0243-003) (9, 13, 43). Two de novo mutations of GRAMD2 (a frameshift and an SNV in the 3’ UTR) were identified in two SSC simplex quad families (16). SSC family IDs and pedigree plots are presented at the bottom.

Figure 4: Convergence of rare private CNVs in this study and LGD mutations in SSC and ASC families or patients for STAM and CHD10.
figure 4

(a) Displayed for the STAM gene is a RefSeq gene model (larger ticks are exons), a loss-of-function deletion identified in this study, a de novo frameshift mutation (orange arrows) identified in an SSC quad simplex family and a nonsense mutation (orange arrows) identified in an ASC patient; (b). Displayed for the CDH10 gene is a RefSeq gene model (larger ticks are exons), a disrupted deletion identified in this study and a de novo frameshift mutation (orange arrows) identified in SSC quad simplex families. SSC family IDs and pedigree plots are presented.

Discussion

We have performed a large-cohort genome-wide CNV study of ASD patients in a Chinese population to search for de novo and rare private CNVs that cause or contribute to the risk of ASD. Our study revealed the ASD-related CNVs (>1 Mb) in a Chinese population, which accounted for approximately 4% of the ASD cases. This study revealed multiple known ASD CNVs or genes and implicated novel ASD candidate CNVs or genes. Significantly, the 15q11–13 de novo duplication is more frequently found in Chinese samples compared with primary European samples. There are three plausible explanations for this finding: 1) the power of this study is low, and the finding will not be replicated with large populations; 2) the population analysed here is more severely affected than equivalent populations (SSC/AGP); and 3) there is a higher rate of mutation or penetrance at this locus in Han populations. Unfortunately, we do not have IQ data or other neurological examination data for the patients in our study and are therefore unable to address the second possibility. This is a limitation of our study. A larger detailed phenotyping cohort is needed to further explore this difference. Another limitation of this study is the low-density genotyping array and multiple platforms used. Because of this limitation, small CNVs (<1 Mb) were not investigated in this study. Another limitation is that the sample size is small considering the high genetic heterogeneity of ASD. Further genome-wide CNV studies using high-resolution approaches should be conducted in larger ASD cohorts from Chinese populations.

In this study, we identified known ASD risk CNVs or genes and implicated novel ASD risk CNVs or genes. The known ASD risk CNVs include those at 15q11–13 (UBE3A, GABRB3), Xp22.3–22.3 (NLGN4X), CNTN4, 22q13.31–13.33 (SHANK3) and CDH10. The overlapped region of 15 duplications on 15q11–13 identified from this study and other large-cohort studies include UBE3A, GABRB3 and 11 other genes (Figure S5). The role of UBE3A has been clearly demonstrated by both functional studies and mouse model studies. GABRB3 was recently implicated as another risk gene in this region6. NLGN4X has long been recognized as an ASD risk gene. Jamain et al. first identified NLGN4X as an ASD causative gene in 200330. Subsequently, NLGN4X de novo and inherited mutations have been occasionally reported31,32. In total, 15 NLGN4X mutations were identified in the ASD patients in this study, including three missense mutations previously reported by our group (Figure S4). CNTN4 has been implicated in development delays and ASDs. Up to this point, three studies, involving six patients or families with CNTN4-disruptive deletions or duplications, have been published28,33,34. The function of APBA2 also supports the potential involvement of this gene in the pathogenesis of ASD. APBA2 encodes a neuronal adaptor protein essential to synaptic transmission. It can form a complex that is able to couple synaptic vesicle exocytosis to neuronal cell adhesion. In addition, APBA2 protein deficiency is associated with impaired social interaction in mice35. By integrating large-scale CNV and SNV data, our study also implicated several new ASD risk genes, including GRAMD2 and STAM, which encode signal transducing adaptor molecules. The function of GRAMD2 and STAM are still unclear. Further studies with both genetic and functional perspectives should be conducted to replicate the finding that these genes are involved in ASD risk.

Several genes in the CNV regions that were not specifically described in the result section are also worthy of mention here. The first one is NYAP2, which is disrupted by a paternally inherited duplication (1.07 Mb) at 2q36.3 (Table 1). NYAPs are expressed predominantly in developing neurons. Three genes, NYAP1, NYAP2, NYAP3, encoding the Neuronal tYrosine-phosphorylated Adaptor for the PI3-kinase (NYAP) family of phosphoproteins, have been recognized. Interestingly, the NYAP1 locus (7q22.1) and NYAP3 locus (13q33.3) have suggestive linkage and association signals with ASD21,36. Upon stimulation with Contactins (such as CNTN4, CNTNAP2 and CNTN5), NYAPs activate the downstream pathway including WAVE1 complex and PI3K-mTOR pathway37,38. The phosphalated NYAPs can active WAVE1 complex which function by interaction with several proteins, including NCKAP1, ABI2, CYFIP2 and HSP300. Recently, recurrent de novo LGD mutations of NCKAP1 were identified in ASD patients15. We further analyzed the SSC and ASC whole-exome sequencing data, two de novo missense mutations of ABI2 were identified in two ASD probands (simplex quad families) but not for siblings (Fig. 5). CYFIP1 was also implicated in ASD previous by both genetic and functional studies39,40. Therefore, disruption of Contactins-NYAPs-WAVE1 pathway (Fig. 5) appeared to be an important pathogenesis pathway in ASD etiology. In addition, there are two other genes worthy of mention. One is GRM6 at 5q35.3, and the other is ST6GAL2 at 2q12.2. GRM6 encodes metabotropic glutamate receptor (mGluR). Interestingly, structural defects in the mGluR gene family interaction networks are significantly enriched in ASD patients41. In addition, common variants of ST6GAL2 are associated with the risperidone response of schizophrenic patients42. Both of these genes are good candidates for further study.

Figure 5: Genetic identification of genes involved in Contactins-NYAPs-WAVE1 pathway.
figure 5

Disruption of Contactin gene, CNTN4, was recurrently identified in this study and other studies. Case-private rare CNV disrupting NYAP2 was also detected in this study. Red bars indicate deletions. The pedigree plots of SSC families with recurrent de novo mutations of NCKAP1 and ABI2 were presented. Orange arrows indicated the mutation location.

Conclusion

We performed the largest genome-wide copy number variation analysis of 546 ASD patients with 343 trios and 988 controls in a Han Chinese population. ASD patients were found to carry a higher global burden of rare, large CNVs. Recurrent de novo or case-private CNVs were found in five loci. The frequency of de novo 15q11–13 duplication was found to be significant higher compared with cohorts of European ancestry. In addition, several genes were implicated as novel ASD risk genes. Our findings identify ASD-related CNVs in a Chinese population, which could facilitate early disease testing and diagnosis and, most importantly, early intervention. Our study also implicated novel ASD risk genes, providing motivation for further functional and translational studies.

Methods

Study subjects

All the subjects who participated in this study completed informed consent before the original sample collection. ASD subjects were diagnosed independently by two experienced psychiatrists according to DSM-IV-TR criteria (American Psychiatric Association, 2000). The diagnostic procedure also included the assessment using a series of tools for neurological examination, mental status examination. A total of 406 ASD case-parent triad families (one affected offspring and two healthy parents) and 225 children with sporadic ASD were recruited for this study. The mean age of the patients was 5.065 years, and the mean age of onset was 2.5 years. An additional 1000 unrelated control subjects were also recruited. These control subjects had no history of ASDs or any other psychiatric diseases, nor did they have a familial history of psychiatric, neurological or autoimmune diseases. The mean age for the control subjects was 34.3 years. In summary, the study subjects include 406 ASD trios, 225 patients with sporadic cases and 1000 healthy controls of Chinese ancestry. This study was approved by the Institutional Review Board (IRB) of the State Key Laboratory of Medical Genetics, School of Life Sciences at Central South University, Changsha, Hunan, China and adhered to the tenets of the Declaration of Helsinki.

Whole-genome genotyping

Genomic DNA was extracted from the whole blood for genome-wide genotyping. The DNA was extracted using a standard proteinase K digestion and phenol-chloroform method. All subject samples were genotyped using an Illumina BeadChip. The samples used in the initial GWAS of autism (290 autism trios and 174 autism cases) were genotyped using the Illumina HumanCNV370-Quad BeadChip including approximately 370 K SNPs, as demonstrated in our GWAS study25. The additional 126 trios and 51 sporadic cases were genotyped using the Illumina Human660W-Quad BeadChip, which included approximately 660 K genetic variations. A cohort of 1,000 healthy controls was genotyped using the Illumina Human610-Quad BeadChip, which included approximately 610 K genetic variations. All SNPs in the HumanCNV370 BeadChip are covered in the Human610 BeadChip. All experiments were carried out in accordance with relevant guidelines.

Quality control and CNV calling

Quality control was first performed based on the whole-genome genotyping data before CNV calling. We excluded individuals with likely poor DNA samples to avoid possible false positive CNVs. Individuals with a missing SNP call rate >2% were excluded. To reduce the noise of the genome-wide intensity signal, we only included samples whose standard deviation (SD) of normalized intensity was less than 0.3. Because wave artefacts roughly correlating with GC content resulting from the hybridization bias of low full-length DNA quantity are known to interfere with the accurate inference of copy number variations, only samples for which the correlation between the LRR and the wave model ranged between −0.2 < X < 0.4 were used in the CNV calling analysis. CNVs were called using the PennCNV algorithm43, which combines multiple sources of information, including the Log R Ratio (LRR) and B Allele Frequency (BAF) at each SNP marker, along with SNP spacing and population frequency of the B allele to generate CNV calls.

CNV verification

The CNVs were validated by quantitative PCR (qPCR). Quantitative PCR validation was performed using the ABI PrismTM 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA). Three pairs of primers were selected from the start, middle and end of each CNV, separately. The sample was analysed in triplicate in a 10 μl reaction mixture (200 nM each primer, Maxima® SYBR Green/ROX qPCR Master Mix (2X) from Fermentas, and 5 or 10 ng of genomic DNA). The values were evaluated using System 7900 Software SDS2.3 (Applied Biosystems, CA). Further data analysis was performed using the qBase method. Reference genes, chosen from COBL, GUSB, and SNCA, were included based on the minimal coefficient of variation, and the data were then normalized by setting a normal control to a value of 1.

Published CNV and exome sequencing data analysis

De novo CNV data were collected from six whole-genome CNV studies6,7,8,9,10,11,13 (Table S3). De novo SNV/InDel data were collected from the SSC and ASC whole-exome sequencing studies14,15 (Table S4). All CNVs and genes within our CNV regions of interest were analysed in the de novo CNV and SNV/InDel datasets both in cases and controls/unaffected siblings. We also examined ExAC data for the inferring variant deleteriousness and gene constraint of the implicated genes.

Additional Information

How to cite this article: Guo, H. et al. Genome-wide copy number variation analysis in a Chinese autism spectrum disorder cohort. Sci. Rep. 7, 44155; doi: 10.1038/srep44155 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.