Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication and social interaction and restricted and repetitive patterns of activities and behaviour with an onset in early development.1 However ASD is a psychiatric diagnosis based on clinical criteria, and the severity of these characteristics can be measured as quantitative traits that represent a continuum that extends into the general population, with ASD at the extreme end of the distribution.2 Other associated, but not core features are intellectual disability, attention-deficit disorder and medical comorbidities.3 The prevalence of ASD is estimated to be 62/10 0004 with boys-to-girls ratio of ~4:1.5 The importance of a genetic aetiology is established with heritability estimates ranging from 37 to 90%.6, 7, 8, 9 Despite genetic heterogeneity, considerable progress in understanding the genetic architecture of ASD has been made by identifying monogenetic causes through genetic syndromes,10 rare chromosomal abnormalities,11, 12 rare copy-number variants13, 14, 15, 16 and rare penetrant gene mutations.3 Several genomic regions, including 2q, 3q25–27, 3p25, 6q14–21, 7q31–36 and 17q11–2117 have been linked to ASD. The role of rare genetic variants in the aetiology of ASD has been established by high-throughput technologies.18, 19 More recently, the theory of excess of de novo loss-of-function variants in ASD patients has gained popularity after some initial successes.18, 19, 20 Around 1000 genes have been identified to be enriched with de novo loss-of-function mutations in ASD patients.21 However, de novo genetic variants do not contribute to the estimated heritability as these are not inherited. On the other hand, most genetic variance in ASD is attributed to common genetic variants.9, 22 Their role has been demonstrated by several genome-wide association studies (GWAS)23, 24, 25, 26, 27, 28, 29 (Supplementary Table S1). Even though not many common susceptibility variants have been identified, significant association has been reported at 5p14.1,23 at 5p15.31 between SEMA5A and TAS2R1 genes,28 within MACROD2 at 20p12.126 and at 1p13.2.29 However, there is a significant overlap of the discovery samples used and little replication of specific loci between studies.30

Although the individual effect of common variants is modest, their joint effect may be substantial.25 In this study besides assessing the effect of single variants on ASD, we evaluated the joint effect of multiple single variants in a gene in a genome-wide gene-based association analysis in patients with ASD from a Belgian Flemish cohort who were genotyped on a dense genotyping array.

Materials and methods

General overview of the study design and work flow are illustrated in Supplementary Figure S1.

Discovery sample

The discovery sample consisted of 160 nuclear Belgian Flemish families (657 individuals; Supplementary Table S2). The families were recruited to participate in the prospective study through the Expert Center for Autism (ECA) Leuven. All probands had been seen multiple times as part of their clinical care program in the ECA before recruitment. The families were asked to participate if there was at least one child with the diagnosis of non-syndromic ASD of unknown origin after a clinical genetics workup. Out of the 160 families, 55 were multiplex (two or more siblings with ASD) and 105 simplex. In six families the father had also been diagnosed with ASD such that there were 77.7% affected males and 22.3% affected females with male-to-female ratio of 3.5:1. Among them, 88.4% had normal and high intelligence, whereas 11.6% had mild, moderate or severe intellectual disability.

Diagnoses of ASD were made by a multidisciplinary team in the ECA Leuven according to DSM-IV-TR (American Psychiatric Association, 2000) criteria. Additionally, participants were assessed for quantitative autistic trait using the Dutch version of the Social Responsiveness Scale (SRS) and the Social Responsiveness Scale for Adults (SRS-A) designed to measure social impairment associated with ASD across a wide range of severity.31, 32 Completed questionnaires were obtained for 490 probands, parents and siblings. Among the affected patients that had the SRS score available, the majority had normal and high intelligence (86%). For all participants, we received written informed consent. This study was approved by the Medical Ethical Committee of the University Hospitals Leuven.

Genotyping

Genotyping of 657 individuals from the discovery cohort was performed at the Center for Human Genetics at the KU Leuven, Belgium using the HumanOmni2.5-8 BeadChip, which contains more than 2.3 million common and less-frequent single-nucleotide polymorphisms (SNPs) from the 1000 Genome Project (minor allele frequency >2.5%). SNP calling was performed in Genome Studio 2011.1 using the genotyping module v1.9. Markers with call rate <95%, or which were monomorphic or which failed an exact test of Hardy–Weinberg equilibrium (HWE) (P-value<1 × 10−7) were removed from the analysis. Samples with low call rate <95% or high identity-by-state (≥95%) were also removed from the analysis. Ethnic outliers were determined using multidimensional scaling analysis with 1000 Genomes data set (Supplementary Figure S2). All samples clustered tightly with the Europeans and no ethnic outlier was identified. In total 1 646 898 markers and 654 genotyped individuals were retained for further statistical analysis. Lastly, Mega2 tool v4.4(ref. 33) was used to identify Mendelian inconsistencies, which were later set to missing.

Statistical analysis

Baseline descriptive analysis was performed with SPSS v21 (IBM Corporation, Armonk, NY, USA) and PEDSTATS v0.6.12.34 Genome-wide association analyses of the binary ASD phenotype were performed through joint modelling of linkage and association, using the LAMP software v0.0.9 (School of Public Health, Ann Arbor, MI, USA). LAMP uses a maximum likelihood model to extract information on genetic linkage and association from samples of unrelated individuals, sib pairs, trios and larger pedigrees in settings where population stratification is not a concern (Supplementary Figure S2).35 Odds ratios and 95% confidence intervals were estimated using PLINK v1.07.36 The association tests for markers on sex chromosomes were performed by transmission disequilibrium test for chromosome X and by logistic regression for chromosome Y. Single-variant and gene-based genome-wide association analyses of the quantitative autistic trait adjusted for age, gender and familial relationships were performed using the RVtests software tool version 20150630 (http://zhanxw.github.io/ rvtests/). The gene-based analysis included Combined Multivariate and Collapsing method which is robust and powerful in the presence of wide spectrum of variant allele frequencies.37 The genes were defined according to human reference genome hg19. All association analyses were performed for entire discovery sample, and simplex and multiplex families separately. The standard genome-wide significance threshold of 5 × 10−8 was used to declare significance in the single-variant analyses, while the genome-wide significance threshold for the gene-based analysis was set at 2.5 × 10−6 based on 19 650 genes tested. PLINK/SEQ v0.10 (https://atgu.mgh.harvard.edu/plinkseq/) was used to convert PLINK files into variant call format files. All genome maps were updated to human genome build 19 (hg19). Gene pathway enrichment analysis of all nominally significant genes (P-value<0.01) in the gene-based analysis was performed using the web-based gene network pathway enrichment tool (http://129.125.135.180:8080/GeneNetwork/pathway.html).

The data were deposited in the GWAS Central database (http://www.gwascentral.org/study/HGVST1847).

Bioinformatic analysis

To annotate SNPs with regulatory information, we used RegulomeDB v1.1 database (http://www.regulomedb.org/index) that combines information from ENCODE and other sources, as well as computational predictions and manual annotations into a tool that classifies SNPs into six categories, where Category 1 variants are 'likely to affect binding and linked to expression of a gene target', whereas category 6 variants have 'minimal binding evidence'.38 Furthermore, regulatory information on SNPs in haplotype blocks was explored using a HaploReg v4.1 tool.39 For these analysis r2 was set to 1 and the population of European descent was chosen.

Replication samples

Psychiatric Genomics Consortium (PGC)

A lookup of top findings from the single-variant analyses of the binary ASD phenotype and quantitative autistic trait was performed in the latest PGC GWAS. This data set consists of a total of 6495 parent-child trios who met diagnostic criteria for ASD and had genome-wide SNP data available (https://www.med.unc.edu/pgc).

Erasmus Rucphen Family (ERF) study

Replication of the gene-based analysis of quantitative autistic trait was performed in the ERF study as 1250 participants from this cohort have been assessed for quantitative autistic trait using Baron-Cohen’s Autism-Spectrum Quotient (AQ) test40 and exomes of half of these participants (n=615) have been sequenced, thus providing a greater resolution at the gene level. Individuals whose exome were sequenced were selected based on having good quality phenotype information on a wide range of topics, and therefore random with regards to AQ scores (Supplementary Table S2). ERF is a family-based cohort originating from 22 couples and spread over 23 generations.41, 42 The ERF study was approved by the Medical Ethics Committee of the Erasmus MC which is constituted according to the WMO (Wet Medisch-wetenschappelijk Onderzoek met mensen). A written informed consent was obtained from all study participants.

Sequencing was done at a mean depth of 74 × using the Nimblegen SeqCap EZ V2 capture kit on an Illumina Hiseq2000 sequencer (Illumina, San Diego, CA, USA) using the TruSeq Version 3 protocol at the Human Genotyping facility of the Internal Medicine department, at the Erasmus MC, The Netherlands.43, 44 The sequence reads were aligned to the human genome build 19 (hg19), using Burrows-Wheeler Aligner and the NARWHAL pipeline.45, 46 After processing, genetic variants were called, using the Unified Genotyper tool from the GATK.47 Variants with a low quality (QUAL<150), which were out of HWE (P-value<10−6) or with low call rate (< 90%), as well as samples with a low call rate (< 90%), and duplicates, were removed.44 Functional annotations were also performed using the SeattleSeq annotation 138 database (http://snp.gs.washington.edu/SeattleSeqAnnotation138/).

Association analyses of the quantitative autistic trait adjusted for age, gender and familial relationship were performed using the RVtests software. Meta-analysis of gene-based results of discovery sample and ERF study was performed using Fisher’s combined probability test.

Results

Results of genome-wide association analysis for the binary ASD phenotype and quantitative autistic trait are illustrated in Supplementary Figures S3 and S4. No single-variant surpassed the genome-wide significance threshold. Top findings from the association analyses are shown in Tables 1 and 2 for the discovery sample, and in Supplementary Tables S3 and S4 for simplex and multiplex families separately. Suggestive association of binary ASD phenotype was observed with two common variants (rs6452310; P-value=7.80 × 10−8 and rs7700465; P-value=8.70 × 10−6; Table 1) at chromosome 5p14.1—a region previously known to be associated with ASD (Figure 1). The two variants were in strong linkage disequilibrium (LD) (r2=0.85) with each other but not in LD with any of the previously identified variants in this region associated with ASD23, 24 (r2 ranged from 0.002 to 0.009, D’ ranged from 0.02 to 0.33). None of the top variants from this analysis showed evidence of association in the replication sample (Table 1).

Table 1 Top findings (P-value<1 × 10−5) from the binary ASD phenotype association analysis
Table 2 Top findings (P-value<1 × 10−5) from the quantitative autistic trait association analysis
Figure 1
figure 1

Regional association and recombination rate plot of the 5p14.1 region and the binary ASD phenotype in the discovery cohort. The left y axis represents –log10 P-values for association with binary ASD phenotype in the discovery cohort. The right y axis represents the recombination rate, and the x axis represents chromosomal position (genomic position is according to the hg19 assembly). The most significantly associated single SNP in this region (rs6452310) is denoted with a purple diamond. Surrounding SNPs are shaded according to their pairwise correlation (r2) with rs6452310. The gene annotations are shown below the figure.

Results of gene-based genome-wide association analysis are illustrated in Supplementary Figure S5 for the discovery sample, and in Supplementary Table S5 for simplex and multiplex families separately. The gene-based association analysis revealed significant association of quantitative autistic trait and TTC25 gene (P-value=3.4 × 10−7) on chromosome 17 (Table 3). This association was not driven by any single variant but by nine variants, four of which showed nominally significant association with quantitative autistic trait (P-value<0.05) (Supplementary Table S6). The combined effect of these variants on the SRS score was large (effect size=10.2). The functional annotation of nine variants revealed that five variants are likely to have regulatory functions (Category 1 RegulomeDB score; Supplementary Table S6). Furthermore, the surrounding variants in strong LD (r2=1) with nine variants lie in enhancer histone marks, and protein-binding regions and change regulatory motifs based on the variant allele changes (Supplementary Table S7). The gene was also nominally associated with autistic trait in the ERF cohort (P-value=0.045). The combined effect of 15 variants in the TTC25 gene (Supplementary Table S8) in the replication sample was much smaller (effect size=1.75). Meta-analysis of discovery and replication samples resulted in an improved association signal (P-value=1.5 × 10−8).

Table 3 Top results (P-value<1 × 10−3) from the gene-based association analysis of quantitative autistic trait in the discovery cohort

Pathway analysis

Pathway enrichment analysis based on nominally significant genes in the gene-based association analysis (Supplementary Table S9) showed significantly enriched SMAD protein signal transduction (P-value=2 × 10−5) pathway and series of digestive system development enrichment categories (Supplementary Table S10).

Discussion

In this study, we have identified a novel gene TTC25 associated with quantitative autistic trait. The association of the gene TTC25 in a cohort-based study suggests that the gene may be relevant for broader ASD phenotype in the general population. Further, we identified SMAD protein signal transduction pathway and series of digestive system development categories as being significantly enriched with genes nominally associated with quantitative autistic trait. Moreover, our study provides additional evidence for the previously identified association of the intergenic loci at 5p14.1 with the binary ASD phenotype.

TTC25 gene is located on chromosome 17q21.2 and encodes Tetratricopeptide Repeat Domain 25. 17q21.2 locus has previously been linked to ASD in a genome-wide linkage scan,48, 49 although a gene was never implicated. TTC25 is overexpressed in testis, frontal cortex and rectum (http://www.genecards.org/cgi-bin/carddisp.pl?gene=TTC25). TTC25 is involved in cilium movement, organization and morphogenesis.50 Cilia are specialized organelles protruding from the cell surface of almost all mammalian cells.51 Mutations in ciliary proteins cause ciliopathies which can affect many organs at different levels of severity and are characterized by a wide spectrum of phenotypes.51 In the vertebrate nervous system, the primary cilium is increasingly viewed as hub for certain neural developmental signalling pathways, and growing data suggest this is also true for several types of adult neuronal signalling.52 The capacity of the brain to interpret the sensory input is often affected in ciliopathies, resulting in neurological disorders; cognitive impairment, anosmia, intellectual disability, ASD and obesity are apparent in various degrees in many of the ciliopathies.51, 53 Further Joubert syndrome (JS) is a well-known ciliopathy of the central nervous system.52, 54 Features of ASD, such as problems in social behaviour, communication problems and repetitive behaviours, have been described in up to 40% of JS patients55, 56, 57 and about 25% of JS patients meet criteria for the DSM-IV diagnosis of ASD.55, 58 Multiple variants mapped to this gene in our sample appear to have a regulatory function.

Our identification of SMAD protein signal transduction pathway as being significantly enriched with genes nominally associated with quantitative autistic trait reinforces the role of the transforming growth factor-β (TGFβ) in ASD. The Smad pathways are the major mediators of transcriptional responses induced by the TGFβ family, which control cell-fate determination, cell cycle arrest, apoptosis and actin rearrangements.59 While decreased levels of TGFβ have been reported in blood samples from individuals with ASD60 and associated with more severe behavioural scores in ASD children,61 higher levels of TGFβ have been reported in postmortem brain and cerebrospinal fluid samples of ASD patients.62 In addition, series of digestive system development categories were enriched with genes nominally associated with ASD. Gastrointestinal (GI) disturbances are 4-fold more common in ASD63 and available scientific evidence supports combination of changes in the areas of immune function, gut microbiome and gut and brain signalling pathways.64 Recent studies in animal models suggests that GI difficulties may originate from the same genetic changes that lead to the behavioural characteristics of ASD.65

In addition, we found the association of ASD with the known region on 5p14.1. This is one of the few replicated GWA regions that implicates long noncoding RNA gene, MSNP1AS (moesin pseudogene 1, antisense) in ASD risk.66 This region has also been associated with social communication spectrum phenotypes in the general population supporting the role of 5p14.1 as a quantitative trait locus for ASD.49 MSNP1AS shows a very high sequence homology to the chromosome X transcript of MSN that is involved in brain development.66 MSNP1AS is highly overexpressed (12.7-fold) in the postmortem cerebral cortex of individuals with ASD.66 Interestingly, our top hit did not replicate in the PGC sample and vice versa. The multiple different variants discovered in the 5p14.1 region23, 24 may suggest that multiple alleles in the same region are implicated in ASD.

Although our study sample was small, the strength of our study is that we used a data set with high-quality phenotypes, in which participants were assessed for both binary ASD phenotype and a quantitative autistic trait. The use of quantitative endophenotype provides additional power to find genetic signals by focusing on less complex aspects of complex phenotypes such as ASD.67 The identification of new loci associated with quantitative autistic trait in our study validates this approach. Another strength is phenotypic homogeneity in the sense that the majority of patients in the current study had a normal intelligence, unlike most ASD cohorts with typical rates of intellectual disability ranging from 30 to 50%.68 Further, the genomes of our study participants were genotyped on a very dense SNP array that contains not only common but also less-frequent SNPs. This gave us the opportunity to make a comprehensive overview of how common and less-frequent variants, both individually and taken together, affect ASD. Our study shows the advantage of a gene-based test as the more powerful approach compared to single-variant analysis and demonstrated the use of gene-based pathway and enrichment analysis in understanding the molecular mechanisms of the disorder. One of the possible limitations of our study is that the assessment of quantitative autistic trait in the discovery and replication cohort used two different questionnaires, the SRS and the AQ. As the questionnaires have both been designed to measure the severity of social responsiveness problems across clinical cases and the general population and as their ratings are significantly correlated,69 we were able to compare results from the two cohorts.

To conclude, our study has identified a novel gene TTC25 to be associated with autistic trait in the ASD population where majority of patients have a normal intelligence. The replication of TTC25 association in a cohort-based study suggests that this gene may also be relevant for broader ASD phenotype in the general population. However, whether these findings hold true also for ASD patients with intellectual disability remains to be evaluated. TTC25 is overexpressed in frontal cortex and testis and is known to be involved in cilium movement and thus an interesting candidate gene for autistic trait. Furthermore, we discovered significantly enriched SMAD protein signal transduction pathway and series of digestive system development categories in the pathway analysis of quantitative autistic trait. Our finding provides new insights into the genetic background of quantitative autistic trait.