Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Quantitative genome-wide association study of six phenotypic subdomains identifies novel genome-wide significant variants in autism spectrum disorder


Autism spectrum disorders (ASD) are highly heritable and are characterized by deficits in social communication and restricted and repetitive behaviors. Twin studies on phenotypic subdomains suggest a differing underlying genetic etiology. Studying genetic variation explaining phenotypic variance will help to identify specific underlying pathomechanisms. We investigated the effect of common variation on ASD subdomains in two cohorts including >2500 individuals. Based on the Autism Diagnostic Interview-Revised (ADI-R), we identified and confirmed six subdomains with a SNP-based genetic heritability h2SNP = 0.2–0.4. The subdomains nonverbal communication (NVC), social interaction (SI), and peer interaction (PI) shared genetic risk factors, while the subdomains of repetitive sensory-motor behavior (RB) and restricted interests (RI) were genetically independent of each other. The polygenic risk score (PRS) for ASD as categorical diagnosis explained 2.3–3.3% of the variance of SI, joint attention (JA), and PI, 4.5% for RI, 1.2% of RB, but only 0.7% of NVC. We report eight genome-wide significant hits—partially replicating previous findings—and 292 known and novel candidate genes. The underlying biological mechanisms were related to neuronal transmission and development. At the SNP and gene level, all subdomains showed overlap, with the exception of RB. However, no overlap was observed at the functional level. In summary, the ADI-R algorithm-derived subdomains related to social communication show a shared genetic etiology in contrast to restricted and repetitive behaviors. The ASD-specific PRS overlapped only partially, suggesting an additional role of specific common variation in shaping the phenotypic expression of ASD subdomains.


Autism spectrum disorder (ASD) is a phenotypically heterogeneous neurodevelopmental disorder. The diagnostic criteria according to DSM-5 (Diagnostic and Statistical Manual-5)1 include two symptom domains: (A) social communication and interaction and (B) restricted or repetitive patterns of behavior and interests. The genetic architecture of ASD is highly complex comprising common, rare inherited and de novo genetic variants. Common variants show small effects, but collectively have a substantial impact on ASD susceptibility explaining ~50% of ASD liability2. Phenotypic subdomains with high heritability3 and rather low cross-trait genetic correlation estimates are reported previously4. Albeit high heritability estimates in ASD, the genetic and biological contribution of individual ASD domains remains largely unknown. This can be attributed to its heterogeneous genetic and phenotypic complex architecture. An approach to address this difficulty and ravel the ASD complexity is to focus on ASD phenotypic domains and subdomains, which have been proposed to reduce genetic heterogeneity and thus increase statistical power5.

The phenotypic independence of the two DSM-5 dimensions has been shown previously6. Categorizing these domains further into independent phenotypic subdomains has shown evidence for an underlying strong genetic susceptibility as published by Liu et al.3. Based on the diagnostic algorithm items of the Autism Diagnostic Interview-Revised (ADI-R), they identified 6 subdomains in the Autism Genome Project (AGP) cohort, namely joint attention (JA), social interaction and communication (SI), nonverbal communication (NVC), and peer interaction (PI) related to domain A, and repetitive sensory-motor behavior (RB), compulsion/restricted interests, or insistence on sameness (RI) related to domain B. Linkage-based common-variant heritability of these ASD subdomains ranged between 29% (PI) and 65% (RI), which is comparable to the additive SNP-based heritability of 40–60%2 or the twin-based additive genetic heritability of 62–81%3,4 of the categorical ASD diagnosis.

ASD domains and subdomains are likely to show distinct underlying genetic risk. Twin studies reported the genetic correlation (rg) between domain A and domain B to be ranging between ~10 and 50%, and varying between males and females4. Another study investigating 189 twins with at least one affected individual reported that the overall co-twin co-trait correlations were small between five phenotypic subdomains derived from the Development and Wellbeing Assessment instrument7 (i.e., social, communication, restricted repetitive behavior and interests, language development, and insistence on sameness (IS))8.

The high genetic heritability and the low genetic correlation between domains and subdomains suggest that the previously reported statistically independent ADI-R subdomains3,9 are also genetically independent. Evidence for a genetic etiology further comes from a genome-wide SNP-based linkage study on these subdomains3, which identified two genetic loci, i.e., for JA (11q23) and RB (19q13.3). In addition, numerous quantitative genome-wide association studies (qGWAS) have focused on different ASD-related dimensional traits derived from the ADI-R diagnostic algorithm (e.g., repetitive sensory-motor behavior or IS)10, the SRS total score11, or single items of the ADI-R12, the ADOS and the SRS12. None of the reported genome-wide significant findings were observed in another ASD subdomain or trait, or independently replicated in any study. This may be attributed to small genetic effects or limited sample size13; still, it may also indicate a differing underlying genetic etiology and implicated neurobiological mechanisms of different ASD subdomains.

In addition, an overlap with common genetic risk for ASD as categorical diagnosis has not been assessed in previous studies. To capture the additional value of studying phenotypic subdomains, their genetic correlation with ASD as a categorical diagnosis also needs to be explored. Regarding the small genetic effect of single SNPs, a powerful approach to capture the role of common genetic variation is to study the combined effect of SNPs by a polygenic risk score (PRS). This approach has been taken in ASD research to identify cross-disorder genetic risk, to study the role of common variation in different ASD subtypes, such as low and high IQ14, and genetic overlap with different traits observed in the population15. Given the polygenic etiology of ASD as categorical diagnosis and the assumed differential polygenic etiology of variance in phenotypic subdomains, it is of prime interest to study the overlap of general polygenic risk on ASD with specific genetic risk for the subdomains.

Regarding the implicated neurobiology, similarly to the assumed differential genetic risk, specific underlying neurobiological mechanisms are expected for the different ADI-R algorithm-based subdomains. Studies on neurobiological mechanisms in ASD as categorical diagnosis converge with regard to abnormal neuronal function and early-age brain growth abnormality16. ASD-associated genes are implicated in synaptic scaffolding, neuronal transmission, chromatin remodeling, protein synthesis or degradation, or actin cytoskeleton dynamics14. Previous research has also shown that ASD-associated genes are involved in numerous biological processes, such as the mammalian target of rapamycin (mTOR)17, Wnt18, and calcium (Ca2 + ) signaling pathways19. Although these pathways are well known for their role in ASD, there is still a great need to understand how dysregulation of these pathways is involved in modulating the subdomains of ASD. From a biological perspective, we hypothesize that the phenotypic domains of social interaction and stereotyped behavior show differential underlying pathomechanisms. This assumption is based on the observation that genetic animal models for ASD show inconsistent phenotypes. For example, Nlgn3 (Neuroligin) adult knockout (KO) mouse model showed normal direct social interaction, but was engaged in repetitive behavior20, whereas Nrxn2α (Neurexin 2α) KO mice showed social deficits, but did not exhibit stereotyped repetitive behavior21. Moreover, magnetic resonance imaging (MRI) studies in humans have shown that inferior frontal gyrus, amygdala, prefrontal, and temporal cortices are related to defects in social language processing and social attention22,23, whereas the orbitofrontal cortex and basal ganglia have been associated with repetitive and stereotyped behavior of ASD24. Given the concept of ASD as an early developmental disorder, another biologically plausible argument for a differential genetic regulation of ASD-related subdomains stems from the finding of distinct transcriptomic signatures during development of these brain regions.

We hypothesize that distinct common genetic variants will modulate ADI-R-derived ASD subdomains, which are related to specific underlying biological processes, and gene-regulatory signatures. Thus, we performed a qGWAS on ADI-R-derived ASD subdomains dissecting their genetic etiology, and investigated their relation to the polygenic risk for ASD.

Materials and methods

Study cohort

We included a German (DE) cohort (n = 625 trios, n = 53 duos, and n = 27 singletons) and the AGP cohort (n = 2730 trios and n = 5 duos). Diagnosis was based on thorough clinical assessment using Social Communication Questionnaire (SCQ), ADI-R, and/or ADOS. Exclusion criteria and QC were based on the AGP cohort25. For final analysis, only the index patients (AGP n = 1,895, DE n = 614) with ADI-R and genotype information available were included (Supplementary Material).

Genotype data

DE-cohort samples were genotyped on Illumina Human Omni Express 12v1-H chips. AGP samples were categorized into stage 1 and 2 samples, genotyped on 550 K Illumina, 510 K Illumina, 1 M Single, and 1 M Duo Illumina chips. However, all the stage 1 and 2 samples included in this study were genotyped on 1 M Illumina chips. We performed quality checks of both datasets separately. Genotype imputation was based on minimac326. For detailed procedure and power analysis see Supplementary Material and Yousaf et al.27.

Statistical analysis

All statistical analyses were performed in R-3.4.4 if not otherwise specified. For an overview of the analyses and the cohorts used for those analyses, refer to Supplementary Fig. 1.

Imputation of phenotype data

From Liu et al.3, we selected the 28 “ever/most abnormal” items from the ADI-R questionnaire available for verbal and nonverbal individuals. Individuals with >10% missing items were excluded. Missing scores were imputed using multivariate imputation by chained equations (MICE) applying predictive mean matching (pmm) in R package mice28 (Supplementary Material).

Define ADI-R subdomains

Subdomains were identified based on ADI-R data of the AGP cohort using principal component analysis with “varimax” rotation in R package psych29 as published10. Components were selected based on the Kaiser criterion. Confirmatory factor analysis (CFA) was performed in the DE cohort implementing R package lavaan30. For identified components, the sum of items with loading above 0.4 was calculated (Supplementary Material).

Single-nucleotide polymorphisms (SNP)-based analysis

The implemented algorithms require large sample sizes. To increase power, SNP-based analyses were performed in the combined cohort (AGP and DE). For quantitative GWAS, the sample size had a power of 1-beta > 80% to explain 6% of the variance (R² = 0.06) in the DE cohort, 1.5% in the AGP cohort, and 1.2% in the combined cohort with a genome-wide significance threshold of alpha = 5e−8. Power analysis was performed using Quanto ( See the power analysis for performing genetic heritability in the supplementary material.

Polygenic risk scores (PRS)

To identify the shared etiology between an ASD diagnosis and the phenotypic subdomains, we performed a polygenic risk score analysis implementing the Psychiatric Genomics Consortium (PGC) summary statistics of ASDs (see Polygenic risk score analysis was performed using PRSice tool31 (Supplementary Material) in the merged cohort. P values for shared etiology were corrected using false discovery rate (FDR).

Genetic heritability and its correlation

SNP-based heritability (h2SNP) was calculated using the GCTA software32 based on the genetic relationship matrix (GRM) between pairs of individuals. For genetic correlation (rg) analysis, bivariate genomic GREML analysis was performed in GCTA (Supplementary Material) using the merged cohort.

Quantitative GWAS

SNP-based association analysis was performed in combined as well as individual cohorts. However, for further downstream analyses, we only used findings replicated in the GWAS of individual cohorts. Linear mixed-effect regression models with the subdomains as dependent variables were applied with fixed effects for gender, age, first four dimensions of the multidimensional scaling results (population stratification) from plinkv1.933 (Supplementary Fig. 2), and with recruitment site of individuals as a random effect. The analysis was implemented using R package lme434. Due to the high amount of missing IQ values, we did not correct for IQ. Correlations between IQ and the subdomains were minimal in both samples (cor = −0.26–0.12).

Gene-based analysis

Gene-based association

This analysis was performed separately on the individual cohorts based on their respective GWAS output. The simultaneous joint effect of multiple SNPs was determined using Multimarker Analysis of GenoMic Annotation (MAGMA) software package v1.0635. qGWAS results of the individual cohorts were used. To reduce false-positive findings, we included only genes with Ppermuted ≤ 0.05 replicated in both datasets for further analysis.

Pathway and brain network analysis

The significant (Ppermuted <0.05) and overlapping genes from the MAGMA analysis resulting from both the cohorts were subjected to these analysis. Gene ontology (GO) and pathway analysis was performed using GO-Elite36. Brain network analysis was based on published gene lists of the 29 transcriptome modules (kindly provided by Dr. Kang) co-regulated during the development of the human brain37. Replicated genes from MAGMA analysis for each subdomain were tested for enrichment using Fisher-exact test.


In our study, we refer domain A as the ASD domain “Social interaction and social communication” domain, whereas domain B refers to the “restricted repetitive behaviors, interests, and activities” in ASD. The quantitative traits in our study can be classified into either domain A or domain B, i.e., SI, JA, PI, and NVC belong to domain A, whereas RB and RI belong to domain B.

Descriptive data

Complete phenotypes and genotypes (N = 6,900,500 SNPs) were available for 1895 AGP and 614 DE cases with no difference in gender distribution across cohorts (P = 0.373). The DE cohort was older at diagnosis and showed a higher IQ compared with AGP sample (Table 1).

Table 1 Descriptive statistics of samples with complete phenotype and genotype data.

ADI-R algorithm-based subdomains

The AGP cohort satisfied the sample adequacy criteria (Supplementary Table 1). Six subdomains (Supplementary Table 2, Supplementary Fig. 3) were identified and labeled as SI (five items), JA (eight items), PI (four items), NVC (three items), RB (five items), and RI (three items). The item “Conventional/Instrumental gestures” loaded on SI and NVC, respectively, and—in accordance with the previously published study3—was included into NVC. CFA in the DE cohort confirmed the structure (Supplementary Table 3). No differences with respect to SI, PI, and RI were observed between cohorts (Pall > 0.1). JA and RB were lower in the DE compared with the AGP cohort, while NVC was higher (Pall < 1 × 10−03, Table 1).

Single-nucleotide polymorphism (SNP)-based analysis

Polygenic risk scores (PRS)

The ASD–PRS explained a significant (all P < 2 × 10−05) proportion of genetic variance of all subdomains. The best PRS model explained 3.3% of variance (R²) in SI and 2.3% in JA and in PI. In contrast, R2 was lower for NVC (0.7%) and RB (1.2%), whereas for RI, the best model explained 4.5% of variance. P-value thresholds used for SNP selection of the subdomain GWAS in the best models ranged from 0.031 to 0.411 (Fig. 1).

Fig. 1: Polygenic risk score analysis showing the shared genetic etiology between ASD diagnosis and individual subdomains.
figure 1

Each bar represents the respective P-value thresholds (PT), whereas the numbers above bars denote the P value for the underlying model. SI social interaction, JA joint attention, PI peer interaction, NVC nonverbal communication, RB repetitive sensory-motor behavior, RI restricted interest.

Genetic heritability (h 2 SNP)

Significant h2SNP (P < 0.05) was identified for all subdomains with the highest h2SNP observed for SI (h2SNP = 0.53, Padjusted = 3.33 × 10−16), and the lowest for RB (h2SNP = 0.21, Padjusted = 6.72 × 10−08) (Supplementary Table 2).

Cross-trait correlations

The strongest rg was observed between SI and NVC (rg = 0.97, P = 1.19 × 10−11). Moderate correlations were observed between SI and PI (rg = 0.79, P = 2.19 × 10−6), SI and JA (rg = 0.67, P = 7.47 × 10−11), and SI and RI (rg = 0.64, P = 4.2 × 10−7), while the least correlation was observed between SI and RB (rg = 0.10, P = 0.280). However, JA and PI were highly correlated (rg = 1, P = 2.62 × 10−10). Moderate correlations were observed between JA and NVC (rg = 0.66, P = 1.89 × 10−5) and RI (rg = 0.55, P = 1.35 × 10−4). The lowest rg with respect to JA was observed with RB (rg = 0.11, P = 0.285). For PI, middle-range correlation was observed with SI (rg = 0.79, P = 2.19 × 10−6), and NVC (rg = 0.74, P = 4.78 × 10−5), whereas lower rg values were seen for RB (rg = 0.29, P = 0.127) and RI (rg = 0.25, P = 0.077). Lowest rg of NVC was observed with RB (rg = 0.32, P = 0.040), whereas moderate rg with RI (rg = 0.68, P = 1.0 × 10−4). RB showed very low genetic correlation with RI (rg = 0.15, P = 0.213). Overall, RB showed no significant rg, i.e., P < 0.05 with any other subdomain (Fig. 2).

Fig. 2: Genetic correlations (rg) among six subdomains based on the merged cohort.
figure 2

Asterisks mark significances with *** being corrected for P < 0.001. SI social interaction, JA joint attention, PI peer interaction, NVC nonverbal communication, RB repetitive sensory-motor behavior, RI restricted interest.

Quantitative GWAS

GWAS (combined cohort) identified eight genome-wide significant SNPs (Fig. 3, Supplementary Fig. 4, Supplementary Tables 4, 5), which are reported along with their chromosomal position and closest gene as follows: four were found for SI, i.e., rs2095092, P = 4.3 × 10−08 at 1p31.3 (PATJ), rs377634870, P = 4.8 × 10−08 at 1p22.3 (no gene within 10 kb), rs34459814, P = 2.5 × 10−08 at 7q11.23 (CLIP2), rs34083004, P = 3.7 × 10−08 at 7q11.23 (CLIP2), one for PI, i.e., rs10115292, P = 1.8 × 10−08 at 9p21.1 (no gene within 10 kb), and three for RB, i.e., rs13274146, P = 2.1 × 10−08 at 8p21.3 (no gene within 10 kb), rs7837513, P = 4.2 × 10−09 at 8p21.3 (no gene within 10 kb), and rs7824610, P = 2.0 × 10−09 at 8q21.11 (no gene within 10 kb). No significant hit was identified for RI. For locus plots, see Supplementary Fig. 5.

Fig. 3: Manhattan plots of the six subdomains of the merged cohort.
figure 3

The x axis shows chromosome 1–22; the y axis shows the –log10 P value, where each individual dot represents a SNP. The red line here shows the genome-wide significance threshold, i.e., P = 5 × 10−8 and the respective SNPs are mentioned.

Gene-based analysis

MAGMA identified 292 replicated (DE and AGP cohort Ppermuted < 0.05) genes associated with any of the subdomains (Fig. 4; Supplementary Tables 68). The 52 associated genes with SI were enriched for GO terms, including “sensory perception”, and at brain level, the childhood-activated co-regulated brain gene-network module 637 (beta = 3.213, P = 0.042, Padj = 1). For JA, 35 genes were associated and nriched for GO terms, e.g., “carbohydrate and energy metabolism” and “chromatin modification”. For PI, 59 genes were identified, which are implicated in “hormone processing” and “plasma membrane” processes. For NVC, 47 genes were enriched for GO terms related to protein catabolism, and at brain level, the brain-expressed module 27 (beta = 3.297, P = 0.039, Padj = 1) was enriched. The brain-enriched module P values were tested for multiple correction within the individual subdomain but not across subdomains. RB-associated 49 genes were enriched for “skeletal muscle tissue development”, “DNA binding”, and “transmembrane receptor activity”. For RI, 59 genes were identified and implicated in “postsynaptic” and “intracellular mediated” signaling along with regulation of MAPKKK (mitogen-activated protein kinase kinase kinase) cascade.

Fig. 4: Venn diagram of overlapping genes with a significant Ppermuted < 0.05 as identified using MAGMA from the individual cohorts, i.e., AGP and DE.
figure 4

The underlined genes represent SFARI genes. SI social interaction, JA joint attention, PI peer interaction, NVC nonverbal communication, RB repetitive sensory-motor behavior, RI restricted interest.

No genome-wide significant hit was overlapping between subdomains. However, 149 nominal (P < 0.01) SNPs were shared between SI, JA, and PI; 27 SNPs between SI, PI, and NVC. No nominal overlaps were identified between RI and any other phenotype. At gene level Ppermuted < 0.05, we observed three overlapping genes between SI and JA (GYS1, TTC17, and PPM1N), two genes between SI and PI (MNS1 and IL20), one gene between NVC and PI (TM4SF4), SI and RB (RGS10), JA and PI (LHB), and JA and NVC (COBLL1) (Fig. 4).


In this study, we studied common genetic variants for their role in shaping the phenotypic variability of ASD. We focused on ADI-R-derived phenotypic subdomains to determine their underlying genetic etiology and possible genetic and functional overlap. A large amount of variance was not explained by the PRS, implicating additional common and/or rare variation in the phenotypic expression of the subdomains. We also studied the rg of individual subdomains and estimated the polygenetic risk for ASD to explore if variability in subdomains may be explained by general common genetic risk for ASD. Measures explaining phenotypic heterogeneity often have been studied as predictors of outcome in clinical trials38 or of long-term outcomes39, but genetic studies aiming at describing the genetic underpinnings of this phenotypic heterogeneity are scarce.

We identified and confirmed the six-factor structure of the ADI-R algorithm items first reported by Liu et al.3 in two independent ASD datasets. A similar six-factor solution has been published for 98 ADI-R algorithms40. Previously, another study conducted a factor analysis on 11 items related to restricted and repetitive behavior (RRB) and identified two factors, i.e., RSM and IS similar to our identified subdomains of RB and RI, respectively10. Thus, the identified subdomains in our study have been well replicated in independent ASD datasets before, and are plausible targets for quantitative genetic analyses.

h2SNP has been studied in large ASD samples to quantify additive heritability explained by genome-wide SNPs41. Our study is the first estimating SNP-based heritability of specific phenotypic subdomains in ASD. We assumed that the heterogeneous phenotype of ASD may prevent a clear picture of the role of SNP in each subdomain. Overall, we observed low SNP-based heritability for the individual subdomains; however, in our study, h2SNP for all subdomains was higher than previously reported estimates of the categorical phenotype ASD (~17%)42. Although, without replication, we cannot generalize our findings for the specific subdomains, we still describe higher heritability estimates for domain A-related subdomains. However, we observe a difference between the subdomains of domain B, which showed the lowest estimate for RB but higher estimates for RI. From the results of our study, we suggest a differential role of common and rare variants in domain A and also within the subdomains of domain B. This also may explain the lower SNP-based heritability of the categorical ASD phenotype, because it is defined by symptoms in domains A and B.

Phenotypic subdomains with high SNP-based heritability but lack of genome-wide significant hits, such as RI, might underlie many variants with low effect sizes. Subdomains with only little variance explained by SNP heritability where genome-wide hits are identified, might in contrast underlie few variants with a moderate-to-high effect size. Thus, we conclude that the genetic architecture underlying the phenotypic variance in ASD individuals is likely to be different across the domains.

The highest genetic correlation was identified between SI and NVC (0.97), mirroring the correlation at the phenotypic level43. A complete genetic correlation of 1 was found for JA and PI, suggesting strongly overlapping common genetic variation underlying SI and NVC, or JA and PI. Moreover, we observed that subdomains of domain A are also highly phenotypically correlated than the subdomains of domain B (Supplementary Fig. 6). In contrast to SNP-based heritability, genetic correlation analysis of the two subdomains related to domain B showed only weak correlation, and thus may be genetically independent with respect to common variation. Another linkage study on ADI-R algorithm-derived “repetitive sensory-motor behavior” (RSMB) and “insistence on sameness” (IS) scores44 similarly reported predominantly specific, but also a few overlapping linkage findings for these subdomains. A recent qGWAS reported suggestive evidence for distinct common variants when RSMB and IS were analyzed independently; however, when both phenotypes were considered together, three genome-wide hits were identified10. This indicates higher variability of the combined phenotypic measure, resulting in a higher power to detect a specific genetic risk. RB did not genetically correlate with other subdomains and also had the lowest h2SNP (0.21). Similarly, a population-based twin study did not find genetic covariation between SI and RB scores45; however, a twin study of ASD individuals reported a strong genetic overlap of the extreme values of impaired social communication and restricted behaviors derived from SCQ46. The contrasting findings may be explained by a differential role of common and rare variation in social communication-related subdomains and RB, especially in ASD individuals, with rare variation playing a stronger role in RB47.

With respect to specific genetic variation underlying the different subdomains, several genome-wide significant hits and novel candidate genes were identified in the present study. For SI, we observed an association with PATJ (aka INADL) at SNP as well as at gene level. PATJ is coding for a scaffolding protein CIPP, and regulates surface expression of the acid-sensing ion channel 3 in sensory neurons48. The Uniprot Protein Database ( predicts, based on sequence similarities, an interaction of PATJ with glutamatergic NMDA receptors and ASD candidate genes NLGN2 and HTR2A. Rare loss-of-function variants in PATJ have also previously been found in ASD49, thus strengthening our findings. The second genome-wide significant hit for SI mapped to CLIP2 gene is located at 7q11.23. Duplication carriers of this region show a high rate of ASD50,51. Furthermore, SI-associated genes were enriched in a co-expressed brain gene set (module 6). This module is mainly active in cortical structures during early childhood. In the hippocampus, module 6 is activated before birth, silenced prior to puberty, and then reactivated. This supports previous findings of early cortical maturation impairments in ASD52, and of the important role of the hippocampus in social behavior53.

No genome-wide hit was identified for JA. At the gene level, JA was associated with DAGLA gene implicated in seizures and neurodevelopmental disorders, including autism54, and the COBLL1 gene involved in epilepsy55 and language impairment56.

The only genome-wide significant SNP in PI is rs10115292 mapped to an intergenic region at chr. 9p21.1, known for ASD-associated CNVs57. Among the significant genes (Ppermuted < 0.05) enriched for PI, we identified a sodium voltage-gated ion channel gene SCN5A that was found to be a hub protein in an ASD-associated protein-interaction module58. Other ASD-associated significant PI genes include CECR2, a 7.2-kb exonic loss, which was found in an ASD female54.

For NVC, no genome-wide significant hit was identified. Most suggestively associated SNPs map to chr. 6q26, a region linked to ASD59. SLC26A5 at 11p15.4 was among the top hits from the gene-based analysis; mutations in this gene are potential candidates for causing neurosensory deafness60. This region is linked with delayed development of speech61. The NVC-associated regulatory gene set (module 27) is expressed in the hippocampus, striatum, and mediodorsal nucleus of the thalamus until puberty (Supplementary Fig. 7). These regions are well known for their role in language and communication62,63, which puts our findings in line with the current literature.

RB was associated with genome-wide significant SNPs at 8p21.3, a region previously associated with restricted and repetitive behaviors in ASD10. Duplications of this region have been associated with ASD64. The suggestive effect at 19q13.33 is also in line with previous findings regarding RB3. Gene-based analysis indicated RGS10 gene implicated in neurodegenerative diseases65, and is also overlapping in SI and RB.

Top significant SNP hits for RI were also observed in migraine, sensorineural deafness, cognition, Williams–Beuren syndrome, and ASD such as NLPR366, GNG267, and NSUN568. No genome-wide associated SNP was identified for RI. The top peak at 15q25.3, however, is spanning the NTRK3 gene, associated with autism and Asperger syndrome69, as well as obsessive–compulsive disorder70.

Among the overlapping genes in the subdomains, we identified GYS1 in JA and SI. KO of Gys1 has been known to induce depression-like behavior in rats, indicating that brain glycogen has an important role in animal emotion71. Another study generated a brain-specific GYS1 KO mouse and found that these animals had a significant deficiency in motor and cognitive abilities and synaptic strength72. Another overlapping gene found between JA and NVC is COBLL1. A study reported an individual with ASD and Tourette syndrome with heterozygous microdeletion of approximately 719 kb at 2q24.3, which led to deletion of COBLL1 gene as well besides four other genes. As mentioned above, this gene is also found to be deleted in a patient with severe epilepsy55 and individuals with autistic features, developmental delay, repetitive hand movements, and language impairments56.

One of the major limitations of our study is the limited sample size of individual AGP and DE cohorts. Although quantitative statistical tests generally have a higher power in comparison with the qualitative approaches, small effects are likely to have been undetected in our study (see power analysis in the methods section). It is possible that a variant may carry a large genetic risk to increase expression of one phenotypic subdomain but a smaller risk on another. Thus, to identify the overlapping SN-based genetic risk with high confidence, it requires a larger sample size to attain an adequate statistical power. However, we followed a conservative approach to minimize proneness of false positives by performing the gene analysis in two independent ASD datasets, and to classify genes as replicated only if they have an empirical P < 0.05 in both cohorts. Although this cannot omit the possibility of false positives and especially not false-negative findings, it lowers the risk for false findings.

Another limitation of our study is the mixed ethnicity in the two cohorts and higher ASD severity scores in the AGP sample. However, we accounted for the mixed ethnicity in our GWAS analysis, and to overcome false-positive associations, we followed a conservative approach by performing gene-based analysis in two independent ASD datasets and only interpreted overlapping hits. In addition, several genes mapped from GWAS hits of the combined cohort were found at gene level. For the PRS analysis, we used the combined cohort, which contained PGC ids as well, but since our research question was focused on dimensional phenotypes rather than categorical, so we did not exclude those samples from our cohort. In the heritability and genetic correlation analysis, we did not account for covariates. This might have led to an overestimation of estimates. Still, a recent study has shown that the inclusion of covariates can result in inflated and biased genetic correlations and heritability estimates73. Thus, we again chose the more conservative approach. However, we suggest replicating the analysis in a genetically more homogeneous sample.

In summary, our results suggest that the genetic architecture of subdomains is distinct between A- and B-related subdomains and differs within the two B-related subdomains RB and RI. We replicated several previously implicated genes in ASD, but also describe new candidate genes for specific subdomains. Involved biological pathways and gene expression patterns strengthen the previous observations that ASD phenotypic variability is influenced by pathways regulating neuronal development of different brain areas, including the hippocampus, amygdala, and cortical areas.

The results of our study need to be replicated in larger samples with different ethnic backgrounds. In addition, a combined analysis of common and rare variants may clarify the specific role of common variants in shaping the ASD phenotype in relation to the reported subdomains.


  1. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Fifth Edn (DSM-5™). Arlington, VA, American Psychiatric Association (2013).

  2. Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Liu, X.-Q. et al. Identification of genetic loci underlying the phenotypic constructs of autism spectrum disorders. J. Am. Acad. Child Adolesc. Psychiatry 50, 687–696.e13 (2011).

    PubMed  PubMed Central  Google Scholar 

  4. Ronald, A. & Hoekstra, R. A. Autism spectrum disorders and autistic traits: a decade of new twin studies. Am. J. Med. Genet. B Neuropsychiatr. Genet. 156, 255–274 (2011).

    Google Scholar 

  5. Stranger, B. E., Stahl, E. A. & Raj, T. Progress and promise of genome-wide association studies for human complex trait genetics. Genetics 187, 367–383 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Frazier, T. W. et al. Validation of proposed DSM-5 criteria for autism spectrum disorder. J. Am. Acad. Child Adolesc. Psychiatry 51, 28–40.e3 (2012).

    PubMed  Google Scholar 

  7. Goodman, R., Ford, T., Richards, H., Gatward, R. & Meltzer, H. The development and well-being assessment: description and initial validation of an integrated assessment of child and adolescent psychopathology. J. Child Psychol. Psychiatry 41, 645–655 (2000).

    CAS  PubMed  Google Scholar 

  8. Dworzynski, K., Happé, F., Bolton, P. & Ronald, A. Relationship between symptom domains in autism spectrum disorders: a population based twin study. J. Autism Dev. Disord. 39, 1197–1210 (2009).

    PubMed  Google Scholar 

  9. Shuster, J., Perry, A., Bebko, J. & Toplak, M. E. Review of factor analytic studies examining symptoms of autism spectrum disorders. J. Autism Dev. Disord. 44, 90–110 (2014).

    PubMed  Google Scholar 

  10. Tao, Y. et al. Evidence for contribution of common genetic variants within chromosome 8p21.2-8p21.1 to restricted and repetitive behaviors in autism spectrum disorders. BMC Genomics 17, 163 (2016).

    PubMed  PubMed Central  Google Scholar 

  11. Lowe, J. K., Werling, D. M., Constantino, J. N., Cantor, R. M. & Geschwind, D. H. Social responsiveness, an autism endophenotype: genomewide significant linkage to two regions on chromosome 8. Am. J. Psychiatry 172, 266–275 (2015).

    PubMed  Google Scholar 

  12. Connolly, J. J., Glessner, J. T. & Hakonarson, H. A genome-wide association study of autism incorporating autism diagnostic interview-revised, autism diagnostic observation schedule, and social responsiveness scale. Child Dev. 84, 17–33 (2013).

    PubMed  Google Scholar 

  13. Torrico, B. et al. Lack of replication of previous autism spectrum disorder GWAS hits in European populations. Autism Res. 945, 202–211 (2016).

  14. Bourgeron, T. From the genetic architecture to synaptic plasticity in autism spectrum disorder. Nat. Rev. Neurosci. 16, 551–563 (2015).

    CAS  PubMed  Google Scholar 

  15. Bralten, J. et al. Autism spectrum disorders and autistic traits share genetics and biology. Mol. Psychiatry. (2017).

  16. Donovan, A. P. A. & Basson, M. A. The neuroanatomy of autism—a developmental perspective. J. Anat. 230, 4–15 (2017).

    PubMed  Google Scholar 

  17. Sato, A. mTOR, a potential target to treat autism spectrum disorder. CNS Neurol. Disord. Drug Targets 15, 533–543 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Bae, S. M. & Hong, J. Y. The Wnt signaling pathway and related therapeutic drugs in autism spectrum disorder. Clin. Psychopharmacol. Neurosci. 16, 129–135 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Schmunk, G. et al. High-throughput screen detects calcium signaling dysfunction in typical sporadic autism spectrum disorder. Sci. Rep. 7, 40740 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Hamilton, S. M. et al. Fmr1 and Nlgn3 knockout rats: novel tools for investigating autism spectrum disorders. Behav. Neurosci. 128, 103–109 (2014).

    CAS  PubMed  Google Scholar 

  21. Dachtler, J. et al. Deletion of α-neurexin II results in autism-related behaviors in mice. Transl. Psychiatry 4, e484 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Ha, S., Sohn, I.-J., Kim, N., Sim, H. J. & Cheon, K.-A. Characteristics of brains in autism spectrum disorder: structure, function and connectivity across the lifespan. Exp. Neurobiol. 24, 273–284 (2015).

    PubMed  PubMed Central  Google Scholar 

  23. Bicks, L. K., Koike, H., Akbarian, S. & Morishita, H. Prefrontal cortex and social cognition in mouse and man. Front. Psychol. 6, 1805 (2015).

    PubMed  PubMed Central  Google Scholar 

  24. Calderoni, S., Bellani, M., Hardan, A. Y., Muratori, F. & Brambilla, P. Basal ganglia and restricted and repetitive behaviours in Autism Spectrum Disorders: current status and future perspectives. Epidemiol. Psychiatr. Sci. 23, 235–238 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Anney, R. et al. A genome-wide scan for common alleles affecting risk for autism. Hum. Mol. Genet. 19, 4072–4082 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Yousaf, A. et al. Mapping the genetics of neuropsychological traits to the molecular network of the human brain using a data integrative approach. bioRxiv. (2018).

  28. van Buuren, S. & Groothuis-Oudshoorn, K. Mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67. (2011).

  29. Revelle, W. Psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA. (2017).

  30. Rosseel, Y. lavaan: an R package for structural equation modeling. J. Stat. Soft. 48, 1–36 (2012).

    Google Scholar 

  31. Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2015).

    CAS  PubMed  Google Scholar 

  32. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 3–4 (2015).

    Google Scholar 

  35. Leeuw, C. A., de, Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    PubMed  PubMed Central  Google Scholar 

  36. Zambon, A. C. et al. GO-Elite: a flexible solution for pathway and ontology over-representation. Bioinformatics 28, 2209–2210 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Freitag, C. M. et al. Group-based cognitive behavioural psychotherapy for children and adolescents with ASD: the randomized, multicentre, controlled SOSTA-net trial. J. Child Psychol. Psychiatry 57, 596–605 (2016).

    PubMed  Google Scholar 

  39. Magiati, I., Tay, X. W. & Howlin, P. Cognitive, language, social and behavioural outcomes in adults with autism spectrum disorders: a systematic review of longitudinal follow-up studies in adulthood. Clin. Psychol. Rev. 34, 73–86 (2014).

    PubMed  Google Scholar 

  40. Tadevosyan-Leyfer, O. et al. A principal components analysis of the autism diagnostic interview-revised. J. Am. Acad. Child Adolesc. Psychiatry 42, 864–872 (2003).

    PubMed  Google Scholar 

  41. The Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism 8, 21 (2017).

    Google Scholar 

  42. Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).

    CAS  PubMed  Google Scholar 

  43. Robinson, E. B. et al. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat. Genet. 48, 552–555 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Cannon, D. S. et al. Genome-wide linkage analyses of two repetitive behavior phenotypes in Utah pedigrees with autism spectrum disorders. Mol. Autism 1, 3 (2010).

    PubMed  PubMed Central  Google Scholar 

  45. Ronald, A. et al. Genetic heterogeneity between the three components of the autism spectrum: a twin study. J. Am. Acad. Child Adolesc. Psychiatry 45, 691–699 (2006).

    PubMed  Google Scholar 

  46. Frazier, T. W. et al. A twin study of heritable and shared environmental contributions to autism. J. Autism Dev. Disord. 44, 2013–2025 (2014).

    PubMed  PubMed Central  Google Scholar 

  47. Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722.e12 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Anzai, N. et al. The multivalent PDZ domain-containing protein CIPP is a partner of acid-sensing ion channel 3 in sensory neurons. J. Biol. Chem. 277, 16655–16661 (2002).

    CAS  PubMed  Google Scholar 

  49. Kenny, E. M. et al. Excess of rare novel loss-of-function variants in synaptic genes in schizophrenia and autism spectrum disorders. Mol. Psychiatry 19, 872–879 (2014).

    CAS  PubMed  Google Scholar 

  50. Klein-Tasman, B. P. & Mervis, C. B. Autism spectrum symptomatology among children with duplication 7q11.23 syndrome. J. Autism Dev. Disord. 48, 1982–1994 (2018).

    PubMed  PubMed Central  Google Scholar 

  51. Robertson, C. E. & Baron-Cohen, S. Sensory perception in autism. Nat. Rev. Neurosci. 18, 671–684 (2017).

    CAS  PubMed  Google Scholar 

  52. Yang, D. Y.-J., Beam, D., Pelphrey, K. A., Abdullahi, S. & Jou, R. J. Cortical morphological markers in children with autism: a structural magnetic resonance imaging study of thickness, area, volume, and gyrification. Mol. Autism 7, 11 (2016).

    PubMed  PubMed Central  Google Scholar 

  53. Rubin, R. D., Watson, P. D., Duff, M. C. & Cohen, N. J. The role of the hippocampus in flexible cognition and social behavior. Front. Hum. Neurosci. 8, 742 (2014).

    PubMed  PubMed Central  Google Scholar 

  54. Prasad, A. et al. A discovery resource of rare copy number variations in individuals with autism spectrum disorder. G3 (Bethesda) 2, 1665–1685 (2012).

    CAS  Google Scholar 

  55. Davidsson, J., Collin, A., Olsson, M. E., Lundgren, J. & Soller, M. Deletion of the SCN gene cluster on 2q24.4 is associated with severe epilepsy: an array-based genotype-phenotype correlation and a comprehensive review of previously published cases. Epilepsy Res. 81, 69–79 (2008).

    CAS  PubMed  Google Scholar 

  56. Chen, C.-P. et al. Array-CGH detection of a de novo 2.8 Mb deletion in 2q24.2-q24.3 in a girl with autistic features and developmental delay. Eur. J. Med. Genet. 53, 217–220 (2010).

    PubMed  Google Scholar 

  57. Marshall, C. R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Li, J. et al. Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Mol. Syst. Biol. 10, 774 (2014).

    PubMed  PubMed Central  Google Scholar 

  59. Szatmari, P. et al. Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nat. Genet. 39, 319–328 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Liu, X. Z. et al. Prestin, a cochlear motor protein, is defective in non-syndromic hearing loss. Hum. Mol. Genet. 12, 1155–1162 (2003).

    CAS  PubMed  Google Scholar 

  61. Liu, X.-Q. et al. Genome-wide linkage analyses of quantitative and categorical autism subphenotypes. Biol. Psychiatry 64, 561–570 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Duff, M. C. & Brown-Schmidt, S. The hippocampus and the flexible use and processing of language. Front. Hum. Neurosci. 6, 69 (2012).

    PubMed  PubMed Central  Google Scholar 

  63. Chan, S.-H., Ryan, L. & Bever, T. G. Role of the striatum in language: syntactic and conceptual sequencing. Brain Lang. 125, 283–294 (2013).

    PubMed  Google Scholar 

  64. Fisch, G. S., Davis, R., Youngblom, J. & Gregg, J. Genotype-phenotype association studies of chromosome 8p inverted duplication deletion syndrome. Behav. Genet. 41, 373–380 (2011).

    PubMed  PubMed Central  Google Scholar 

  65. Lee, J.-K. et al. Regulator of G-protein signaling 10 promotes dopaminergic neuron survival via regulation of the microglial inflammatory response. J. Neurosci. 28, 8517–8528 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Cassel, S. L., Joly, S. & Sutterwala, F. S. The NLRP3 inflammasome: a sensor of immune danger signals. Semin Immunol. 21, 194–198 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Melliti, K., Grabner, M. & Seabrook, G. R. The familial hemiplegic migraine mutation R192Q reduces G-protein-mediated inhibition of P/Q-type (Ca(V)2.1) calcium channels expressed in human embryonic kidney cells. J. Physiol. 546, 337–347 (2003).

    CAS  PubMed  Google Scholar 

  68. Merla, G., Brunetti-Pierri, N., Micale, L. & Fusco, C. Copy number variants at Williams-Beuren syndrome 7q11.23 region. Hum. Genet. 128, 3–26 (2010).

    CAS  PubMed  Google Scholar 

  69. Chakrabarti, B. et al. Genes related to sex steroids, neural growth, and social-emotional behavior are associated with autistic traits, empathy, and Asperger syndrome. Autism Res. 2, 157–177 (2009).

    CAS  PubMed  Google Scholar 

  70. Muiños-Gimeno, M. et al. Allele variants in functional MicroRNA target sites of the neurotrophin-3 receptor gene (NTRK3) as susceptibility factors for anxiety disorders. Hum. Mutat. 30, 1062–1071 (2009).

    PubMed  Google Scholar 

  71. Zhao, Y. et al. Decreased glycogen content might contribute to chronic stress-induced atrophy of hippocampal astrocyte volume and depression-like behavior in rats. Sci. Rep. 7, 43192 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Duran, J., Saez, I., Gruart, A., Guinovart, J. J. & Delgado-García, J. M. Impairment in long-term memory formation and learning-dependent synaptic plasticity in mice lacking glycogen synthase in the brain. J. Cereb. Blood Flow. Metab. 33, 550–556 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Weissbrod, O., Flint, J. & Rosset, S. Estimating SNP-based heritability and genetic correlation in case-control studies directly and with summary statistics. Am. J. Hum. Genet. 103, 89–99 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank all the families and patients for their cooperation and the clinical staff for their support in collecting the data. We thank Heiko Zerlaut, Norbert Dichter, and Silvia Lindlar for excellent technical assistance. This work has been supported by Saarland University T6 03 10 00-45, The European Commission and the German Bundesministerium für Bildung und Forschung BMBF (ERA-NET NEURON project: EUHFAUTISM-01EW1105), Landes-Offensive zur Entwicklung wissenschaftlichökonomischer Exzellenz (LOEWE) Neuronal Coordination Research Focus Frankfurt (NeFF), and AIMS-2-TRIALS funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777394-2 AIMS.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Afsheen Yousaf.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yousaf, A., Waltes, R., Haslinger, D. et al. Quantitative genome-wide association study of six phenotypic subdomains identifies novel genome-wide significant variants in autism spectrum disorder. Transl Psychiatry 10, 215 (2020).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:

Further reading


Quick links