Hundreds of genes are implicated in autism spectrum disorder (ASD), but the mechanisms through which they contribute to ASD pathophysiology remain elusive. Here we analyzed leukocyte transcriptomics from 1- to 4-year-old male toddlers with ASD or typical development from the general population. We discovered a perturbed gene network that includes highly expressed genes during fetal brain development. This network is dysregulated in human induced pluripotent stem cell-derived neuron models of ASD. High-confidence ASD risk genes emerge as upstream regulators of the network, and many risk genes may impact the network by modulating RAS–ERK, PI3K–AKT and WNT–β-catenin signaling pathways. We found that the degree of dysregulation in this network correlated with the severity of ASD symptoms in the toddlers. These results demonstrate how the heterogeneous genetics of ASD may dysregulate a core network to influence brain development at prenatal and very early postnatal ages and, thereby, the severity of later ASD symptoms.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Leukocyte transcriptome data can be accessed from the NCBI Gene Expression Omnibus (GEO) database under the accession codes GSE42133 and GSE111175. Microarray transcriptome data on the differentiation of primary human neural progenitor cells to neural cells were downloaded from the NCBI GEO accession GSE57595. Transcriptome data on hiPSC-derived neuron models of ASD and TD were downloaded from EMBL-EBI ArrayExpress with the accession code E-MTAB-6018. Human brain developmental transcriptome data were downloaded from BrainSpan.org.
The R code for reproducing the analyses reported in this article is available as a Supplementary Software file as well as at https://gitlab.com/LewisLabUCSD/ASD_Transcriptional_Organization.
Stoner, R. et al. Patches of disorganization in the neocortex of children with autism. N. Engl. J. Med. 370, 1209–1219 (2014).
Courchesne, E. et al. Neuron number and size in prefrontal cortex of children with autism. JAMA 306, 2001–2010 (2011).
Courchesne, E. et al. The ASD living biology: from cell proliferation to clinical phenotype. Mol. Psychiatry 24, 88–107 (2019).
Sandin, S. et al. The heritability of autism spectrum disorder. JAMA 318, 1182–1184 (2017).
Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).
Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454–1462 (2016).
Chang, J., Gilman, S. R., Chiang, A. H., Sanders, S. J. & Vitkup, D. Genotype to phenotype relationships in autism spectrum disorders. Nat. Neurosci. 18, 191–198 (2015).
de la Torre-Ubieta, L., Won, H., Stein, J. L. & Geschwind, D. H. Advancing the understanding of autism disease mechanisms through genetics. Nat. Med. 22, 345–361 (2016).
Willsey, A. J. et al. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell 155, 997–1007 (2013).
Sahin, M. & Sur, M. Genes, circuits, and precision therapies for autism and related neurodevelopmental disorders. Science 350, aab3897 (2015).
Parikshak, N. N. et al. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155, 1008–1021 (2013).
Krumm, N. et al. Excess of rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588 (2015).
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
Pierce, K. et al. Evaluation of the diagnostic stability of the early autism spectrum disorder phenotype in the general population starting at 12 months. JAMA Pediatr. 173, 578–587 (2019).
Wright, F. A. et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).
Pramparo, T. et al. Cell cycle networks link gene expression dysregulation, mutation, and brain maldevelopment in autistic toddlers. Mol. Syst. Biol. 11, 841 (2015).
Pramparo, T. et al. Prediction of autism by translation and immune/inflammation coexpressed genes in toddlers from pediatric community practices. JAMA Psychiatry 72, 386–394 (2015).
Lombardo, M. V. et al. Large-scale associations between the leukocyte transcriptome and BOLD responses to speech differ in autism early language outcome subtypes. Nat. Neurosci. 21, 1680–1688 (2018).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Nishimura, Y. et al. Genome-wide expression profiling of lymphoblastoid cell lines distinguishes different forms of autism and reveals shared pathways. Hum. Mol. Genet. 16, 1682–1698 (2007).
Achuta, V. S. et al. Functional changes of AMPA responses in human induced pluripotent stem cell-derived neural progenitors in fragile X syndrome. Sci. Signal. 11, eaan8784 (2018).
Hu, V. W. et al. Gene expression profiling of lymphoblasts from autistic and nonaffected sib pairs: altered pathways in neuronal development and steroid biosynthesis. PLoS ONE 4, e5775 (2009).
Hu, V. W., Frank, B. C., Heine, S., Lee, N. H. & Quackenbush, J. Gene expression profiling of lymphoblastoid cell lines from monozygotic twins discordant in severity of autism reveals differential regulation of neurologically relevant genes. BMC Genom. 7, 118 (2006).
Kong, S. W. et al. Characteristics and predictive value of blood transcriptome signature in males with autism spectrum disorders. PLoS One 7, e49475 (2012).
Diaz-Beltran, L. et al. Cross-disorder comparative analysis of comorbid conditions reveals novel autism candidate genes. BMC Genom. 18, 315 (2017).
Marchetto, M. C. et al. Altered proliferation and networks in neural cells derived from idiopathic autistic individuals. Mol. Psychiatry 22, 820–835 (2016).
Mariani, J. et al. FOXG1-dependent dysregulation of GABA/glutamate neuron differentiation in autism spectrum disorders. Cell 162, 375–390 (2015).
Califano, A. & Alvarez, M. J. The recurrent architecture of tumour initiation, progression and drug sensitivity. Nat. Rev. Cancer 17, 116–130 (2017).
Ideker, T. & Krogan, N. J. Differential network biology. Mol. Syst. Biol. 8, 565 (2012).
Yang, B. et al. Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma. Nat. Commun. 9, 678 (2018).
Chen, L., Liu, R., Liu, Z. P., Li, M. & Aihara, K. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci. Rep. 2, 342 (2012).
BrainSpan. BrainSpan: Atlas of the Developing Human Brain http://www.brainspan.org (2016).
Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).
Sugathan, A. et al. CHD8 regulates neurodevelopmental pathways associated with autism spectrum disorder in neural progenitors. Proc. Natl Acad. Sci. USA 111, E4468–E4477 (2014).
Cotney, J. et al. The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment. Nat. Commun. 6, 6404 (2015).
Gompers, A. L. et al. Germline Chd8 haploinsufficiency alters brain development in mouse. Nat. Neurosci. 20, 1062–1073 (2017).
Darnell, J. C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011).
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Lachmann, A. et al. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26, 2438–2444 (2010).
Abrahams, B. S. et al. SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. Autism 4, 36 (2013).
Stein, J. L. et al. A quantitative framework to evaluate modeling of cortical development by neural stem cells. Neuron 83, 69–86 (2014).
Clipperton-Allen, A. E. & Page, D. T. Pten haploinsufficient mice show broad brain overgrowth but selective impairments in autism-relevant behavioral tests. Hum. Mol. Genet. 23, 3490–3505 (2014).
Cupolillo, D. et al. Autistic-like traits and cerebellar dysfunction in purkinje cell PTEN knock-out mice. Neuropsychopharmacology 41, 1457–1466 (2016).
Mellios, N. et al. MeCP2-regulated miRNAs control early human neurogenesis through differential effects on ERK and AKT signaling. Mol. Psychiatry 23, 1051–1065 (2017).
Brockmann, M. et al. Genetic wiring maps of single-cell protein states reveal an off-switch for GPCR signalling. Nature 546, 307–311 (2017).
Schafer, S. T. et al. Pathological priming causes developmental gene network heterochronicity in autistic subject-derived neurons. Nat. Neurosci. 22, 243–255 (2019).
Robinson, E. B. et al. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat. Genet. 48, 552–555 (2016).
Wang, Y. et al. Heritable aspects of biological motion perception and its covariation with autistic traits. Proc. Natl Acad. Sci. USA 115, 1937–1942 (2018).
Pierce, K. et al. Detecting, studying, and treating autism early: the one-year well-baby check-up approach. J. Pediatr. 159, 458–465.e6 (2011).
Wetherby, A. M., Allen, L., Cleary, J., Kublin, K. & Goldstein, H. Validity and reliability of the communication and symbolic behavior scales developmental profile with very young children. J. Speech Lang. Hear. Res. 45, 1202–1218 (2002).
Du, P., Kibbe, W. A. & Lin, S. M. lumi: a pipeline for processing Illumina microarray. Bioinformatics 24, 1547–1548 (2008).
Consortium, G. T. Human genomics. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Phipson, B., Lee, S., Majewski, I. J., Alexander, W. S. & Smyth, G. K. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann. Appl. Stat. 10, 946–963 (2016).
Andrews, S. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).
Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).
Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Faith, J. J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).
Cerami, E. G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685–D690 (2011).
Fabregat, A. et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 44, D481–D487 (2016).
Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 45, D369–D379 (2017).
Warde-Farley, D. et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38, W214–W220 (2010).
He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Liberzon, A. et al. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Schroder, M. S., Gusenleitner, D., Quackenbush, J., Culhane, A. C. & Haibe-Kains, B. RamiGO: an R/Bioconductor package providing an AmiGO visualize interface. Bioinformatics 29, 666–668 (2013).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
We thank L. Iakoucheva for her comments on this manuscript. This work was supported by NIMH grant no. R01-MH110558 (E.C., N.E.L.), NIMH grant no. R01-MH080134 (K.P.), NIMH grant no. R01-MH104446 (K.P.), an NFAR grant (K.P.), NIMH grant no. P50-MH081755 (E.C.), the Brain & Behavior Research Foundation NARSAD (T.P.) and generous funding from the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (grant no. NNF10CC1016517 to N.E.L.).
V.H.G., T.P., E.C. and N.E.L. report serving as investigators on a patent pending assignment to University of California, San Diego about utilizing the developed framework in this work to discover biomarkers for the diagnosis and prognosis of complex diseases and disorders.
Peer review information Nature Neuroscience thanks Dan Arking and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
a) MA-plot of the primary dataset (n=119 ASD and 107 TD subjects). The purple line indicates the regression line between mean and fold change of all genes expressed in the dataset. As demonstrated, the mean expression and the fold changes are not correlated in overall. However, compared to all expressed genes, differentially expressed genes exhibit an up-regulation pattern (p-value <2.6x10−63; two-sided Wilcoxon-Mann-Whitney test). b) To observe the effect of the data processing approach on fold change patterns, the covariates of the linear regression model were changed in the limma package. As shown, similar fold change patterns were observed with or without inclusion of age as a covariate with a Pearson’s correlation coefficient of 0.94 (n= 119 ASD and 107 TD subjects). c) Our primary transcriptome dataset was composed of 226 subjects analyzed in two different batches including one batch of 128 samples (n= 84 ASD and 44 TD subjects) that was reported by Pramparo et al. previously and a second batch of 98 samples (n= 35 ASD and 63 TD subjects) that included new samples (samples non-overlapping between the batches; technical replicates were randomly removed). To assess whether the batch effect could be effectively handled, we compared the fold change patterns of DE genes between these two batches. For a more conservative analysis, we took the fold change of DE genes from the previously published batch from Pramparo et al1 study. These fold changes were calculated using a different analysis pipeline than the current study. We next compared those fold changes with our limma-based analysis of 98 new non-overlapping samples presented in this study. As illustrated, similar fold change patterns were observed with a Pearson’s correlation coefficient of 0.74. Further analysis corroborated the effectiveness of our analysis on removal of batch effects (Supplementary Fig. 14). d) To ensure that the observed fold changes are not due to presence of some outliers in our samples, we performed jack-knife resampling. Repeating 100 times, we sampled from 20% to 90% of the quantile normalized transcriptome data from n=226 samples, while preserving the proportion of ASD-to-TD samples. The sampled datasets were then processed independently and fold change patterns were observed. As shown, we found that the jack-knife fold change patterns correlate well with those of total 226 samples as measured by Pearson’s correlation coefficient. This result demonstrates that the fold change patterns are shared in a large fraction of samples. We reached to similar conclusions based on the resampling of the network activity as presented in Fig. 6 and Supplementary Fig. 11. e) To estimate the signal-to-noise ratio of observed fold changes on n=226 samples, we repeated the jack-knife resampling procedure, but this time counted the number of DE genes that were also identified as differentially expressed in the sampled datasets (limma analysis; FDR <0.05). As illustrated, we found that a large sample size is required to identify our DE list as significant. f-i) To assess the reproducibility of fold change patterns of differentially expressed genes, we performed another transcriptome experiment using a different microarray chip platform. The primary dataset was analyzed on Illumina BeadChip HumanHT-12, while this dataset was analyzed on Illumina BeadChip HumanWG-6. This second dataset was composed of 56 male toddlers (n= 35 ASD and 21 TD) that were shared with the primary dataset (technical replicate dataset). In a third dataset, we included an additional n= 48 samples including 24 independent ASD male toddlers, while 21 out of 24 TD samples overlapped with the primary dataset. These two groups of samples were processed separately to assess the reproducibility of results at both technical and biological levels. The latter would potentially hint on the penetrance level of the dysregulation signal in ASD population. As illustrated, we observed good reproducibility at technical (panels f and g) and biological (panels h and i) levels with Pearson’s correlation coefficients of 0.83 and 0.73, respectively. To assess if the overlapping TD samples in the partially overlapping dataset are driving the signal, we excluded the 21 overlapping TD subjects from the primary dataset (panels h and i). We further performed the transcriptomics analysis of an entirely independent cohort of ASD and TD male toddlers using RNA-Seq platform and reached to a similar conclusion (Supplementary Fig. 4). Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
a) Cell type compositions have not significantly changed between n=119 ASD and 107 TD toddlers. The cell type compositions were estimated in each sample using Cibersort algorithm. The relative frequency of each cell type was next compared between ASD and TD samples using t-test. p-values were adjusted using Benjamini-Hochberg procedure. b) To assess the potential confounding effect of cell type composition on the gene expression patterns, we included the cell types with nominal p-value <0.1 (four cell types) in the regression model. As illustrated, the fold change patterns remain robust to changes in the cell type composition. c) To assess the potential effect of cell type composition on the network activity, we regressed out the effect of the cell types that nominally changed between n=119 ASD and 107 TD (p-value <0.1) from the gene expression data. As illustrated, the DE network remains transcriptionally over-active. For an unbiased analysis, as is done in all comparisons of network activity between leukocyte ASD and TD samples, a merged network composed of union of interactions between DE-ASD and DE-TD networks were considered. The signal becomes stronger if the analysis was based on only DE-ASD network. Paired one-sided Wilcoxon-Mann-Whitney test used for the comparison of Pearson’s correlation coefficients. d) To examine potential effect of hidden confounding effects on the gene expression patterns, we performed SVA analysis. SVA predicted only one significant surrogate variable (SV) in the normalized but unadjusted microarray dataset (num.sv() function). The predicted SV was highly correlated with the experimental batch. We have already included the batch as a covariate in our analyses and have regressed it out prior to the calculations of correlation and mutual information scores. Accordingly, SVA did not find any significant SV in the batch adjusted expression data. To further explore the potential influence of other hidden confounding variables, To further explore the potential effect of (hidden) confounding co-variates on our results, we performed SVA on the batch adjusted dataset and included the first SV in the regression models related to the DE analysis (total of two covariates, one batch and one SV). As illustrated in Supplementary Fig. 2c, we observed that the fold change patterns remained consistent. e) To assess the effect of hidden surrogate variables on the network activity, we considered the first SV from above as an additional covariate and regressed out its effect from gene expression data from n=119 ASD and 107 TD samples. As illustrated, the DE network remains transcriptionally over active (paired one-sided Wilcoxon-Mann-Whitney test).
Supplementary Figure 3 Robustness analysis related to co-expression activity of DE network in ASD samples.
a) The over-activity of the DE-ASD network is independent of the backbone static network. Here, the co-expression strength of context specific DE-ASD and DE-TD networks were compared using three different backbone networks of high confidence (HC), functional, and full co-expression across n=119 ASD and 107 TD samples (see Methods for details). The three networks varied in number of genes and interactions. For each backbone, only those interactions that were significantly co-expressed (FDR <0.05) in at least one of diagnosis groups were included in the analysis. Red and blue colors represent regions with high and low density of interactions, respectively. The interaction strengths were compared by paired two-sided Wilcoxon-Mann-Whitney test. b) The over-activity of the DE-ASD network was uncovered using a mutual information-based method. To assess whether the elevated co-expression strength of the networks are supported with other metrics, we calculated the Pearson’s correlation coefficient for each interaction present in the static networks or all possible pairs of DE genes in the case of full co-expression network (n=119 ASD and 107 TD subjects). We next only retained interactions that had an absolute correlation of above 0.5 in either ASD or TD diagnosis groups. As illustrated, interactions tend to have higher correlations in ASD than TD, indicating the robustness of observed over-activity across different similarity metrics. c) The over-activity of the DE network was examined in another dataset that included n=24 independent male ASD toddlers. This dataset also contained n=24 TD male toddlers, including 21 subjects that overlapped with the primary dataset. As shown, we observed replicable over-activity of the DE networks in ASD group, as measured based on the co-expression strength. The correlation strengths of interactions in ASD and TD samples were compared using paired two-sided Wilcoxon-Mann-Whitney test. We further performed the transcriptomics analysis of an entirely independent cohort of ASD and TD male toddlers using RNA-Seq platform and reached to a similar conclusion (Supplementary Fig. 4). d) To assess whether network over-activity of ASD samples is a general characteristic in our dataset or is specific to the DE networks, we constructed a global ASD network using the same approach as employed for the DE networks, but not limiting the network to the DE genes (n=119 ASD and 107 TD subjects). Thus, the backbone static network included all functional interactions present in GeneMania database. As shown, we found that in contrast to the DE networks, the global network showed slightly but significantly higher co-expression levels in TD samples (p-value 1.0; paired one-sided Wilcoxon-Mann-Whitney test). The table indicates the number of interactions in the global network that were deemed as significant in either ASD or TD diagnosis groups (FDR <0.05).
Supplementary Figure 4 Reproducibility of the signature in an independent cohort as measured by RNA-Seq.
To assess the generalizability and reproducibility of the observed signature, we performed transcriptomics analysis of an independent cohort of male toddlers with n=12 TD and 23 ASD (56 samples including technical replicates, Supplementary Table 1). Our primary dataset was analyzed by microarray platform. To ensure the results are not dependent on the transcriptomics platform, we analyzed this dataset using RNA-Seq platform. Moreover, we included in the dataset 7 and 14 technical replicated of TD and ASD samples. Technical replicates were modeled as random effects in the differential expression analysis of this RNA-Seq dataset. None of subjects overlapped with those in the primary dataset. a) Fold change comparison of the 1236 DE genes between the microarray and RNA-Seq datasets demonstrated moderate conservation of the fold change patterns between the two datasets (Pearson’s correlation coefficient: 0.46). We also observed 73% of DE genes preserved their directionality in both datasets (for example., up or down in both datasets). The moderate conservation of the fold change patterns could be due to inherent heterogeneity within ASD samples (the two datasets included non-overlapping subjects), relatively small size of the RNA-Seq dataset, false positives among 1236 DE genes, or technical differences between the two transcriptomics platforms. b) Genes involved in the DE-ASD network exhibit highly preserved fold change patterns between the two datasets. The figure compares the fold change pattern of genes involved in HC DE-ASD network. We observed a boost in the Pearson’s correlation coefficient of the fold change patterns between the RNA-Seq (n=56 samples) and microarray (n=226 samples) datasets, suggesting the network construction procedure has removed some of the false positives among the 1236 DE genes. We also observed 82% of genes involved in the HC DE-ASD network have preserved their directionality between the two datasets. c) DE networks are transcriptionally over-active in the independent RNA-Seq dataset. Y-axis demonstrates the z-transformed p-value comparing the activity of DE network between ASD and TD samples in the RNA-Seq dataset. To ensure robustness of the results, iterating 100 times, we randomly selected n=12 samples from each diagnosis group (unique subjects) and compared the activity of DE network using a two-sided Wilcoxon-Mann-Whitney test. Y-axis demonstrates the z-transformed p-values. In cases that z-score could not be estimated (e.g., p-value =0), we used the z-score form lowest non-zero p-value. Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
Supplementary Figure 5 DE genes are involved in networks that are preserved between blood and brain tissues.
a) Genes involved in the DE-ASD networks are highly expressed during normal brain development process in prenatal (≥ 8 post conception weeks) and early postnatal (<1 year old) ages (p-value <5.2 x 10−29). BrainSpan RNA-Seq transcriptome data were used for this analysis (n=187 samples). Genes with RPKM >5 were considered as expressed in each sample. Groups were compared using two-sided Wilcoxon-Mann-Whitney test. b) Network activity patterns of functional and full DE-ASD networks based on BrainSpan transcriptome data at prenatal and early postnatal ages. At each time window, the activity was measured based on the distribution of co-expression strength of interacting gene pairs in DE-ASD network using Pearson’s correlation coefficient metric (n= 121 frontal, 73 temporal, 42 parietal, 27 occipital cortices, and 72 striatum, hippocampus, and amygdala samples across time points). The y-axis indicates the z-transformed p-value of co-expression strength as measured by a two-sided Wilcoxon-Mann-Whitney test. c) The conservation of interactions between blood and brain for a previously reported co-expression network around high confidence rASD genes in brain at 10–19 post conception weeks from Willsey et al. The interactions were partitioned based on their correlation value in the n=119 blood samples from subjects with ASD (window size of 0.1). The bar-graphs in each bin represents significant enrichment for positive (blue with positive enrichment values) or negative (red with negative enrichment values) interactions based on the corresponding brain transcriptome data (log10 transformed p-values; hypergeometric test). Only statistically significant (p <0.05) comparisons are represented in bar graphs. d) Similar to panel c, but for a co-expression network of high confidence ASD risk genes from 13 to 24 weeks post conception were extracted from Willsey et al. e) brain derived co-expression network of rASD genes (the same network as panel d) were compared to the co-expression pattern of the same interactions in leukocyte transcriptome of ASD and TD toddlers. Boxplots represent the observed similarity based on 100 random sub-sampling of n=75 ASD and 75 of TD samples (~70% of samples in each diagnosis group). The x-axis represents the top percentile of positive and negative interactions based on the brain transcriptome interaction weights. For example, 20 %ile illustrates the results when only top 20% of positively and top 20% of negatively interacting genes in the brain co-expression network were considered for the analysis (selected based on the interaction weights). As illustrated, ASD samples have significantly higher similarity in co-expression patterns with the developing brain than TD samples. f) Positive and negative interactions in DE-ASD networks were preserved between blood (n=119) and brain at prenatal and early postnatal ages (n=187). Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
a) DE-ASD network is significantly better preserved than DE-TD network in prenatal and early postnatal brain. Correlations of interactions of DE-ASD and DE-TD with brain gene expression were estimated using BrainSpan RNA-Seq data. Briefly, to examine the preservation of interactions in each of the two networks, iterating 100 times, we calculated the correlations of interactions in DE-ASD and DE-TD networks based on a randomly selected subset of n=70 ASD and 70 TD samples, respectively. Next, the similarity of estimated correlations of interactions between brain and blood samples were calculated using Pearson’s correlation coefficient. As illustrated, the DE-ASD network is significantly better preserved in prenatal and early postnatal brain transcriptome data. b) Transcriptional over-activity of DE-ASD networks at prenatal brain development period. The transcriptional activity of genes involved in DE-ASD network were estimated in each sample using GSVA analysis. Opposed to our network transcriptional activity measure that is based on the co-expression magnitude of interactions, GSVA employs a sample based metric based on the concept of GSEA in which the overall expression pattern of the genes in each sample is examined, disregarding the network structure. As illustrated, similar to the co-expression-based analysis of network activity, GSVA supports up-regulation of DE-ASD networks at prenatal brain transcriptome, suggesting the robustness of the results to methodological variations. Reported p-values are based on the comparison of the DE-ASD network expression pattern between prenatal (n=157 samples) and early postnatal (n=90 samples; 4 month-old to 8 year-old) periods using a two-sided Wilcoxon-Mann-Whitney test. Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
Supplementary Figure 7 Robustness analysis of observed association of rASD genes with DE-ASD networks.
a) High confidence rASD genes are enriched in the XP-ASD networks. Genes with likely gene damaging (LGD) and synonymous (Syn) mutations in siblings of ASD subjects were extracted from Iossifov et al. study. b-c) The regulatory targets of well-known rASD genes (CHD8 and FMR1) are enriched in HC DE-ASD (panel b) as well as Functional and Full co-expression DE-ASD networks (panel c). The regulatory targets of CHD8 were extracted from Sugathan et al. (CHD8-1), Gompers et al. (CHD8-2), and Cotney et al. (CHD8-3). The regulatory targets of FMR1 gene were retrieved from Darnell et al. P-values were calculated empirically by permutation tests. d) High confidence rASD genes are more strongly associated with the XP-ASD network than the lower confidence ones, as judged with the number of interactions. The node degree distribution of high confidence and lower confidence rASD genes were compared using a two-sided Wilcoxon-Mann-Whitney test. e) rASD genes with potentially gene expression regulatory roles are enriched in the XP-ASD networks. The node degree of DNA binding rASD genes with those from the rest of rASD genes were compared using a two-sided Wilcoxon-Mann-Whitney test. f) Cross interactions between DE and rASD genes are significantly enriched for interactions with negative Pearson’s correlation coefficient, related to Fig. 4. We compared the ratio of positive to negative interactions between DE and rASD genes to those within DE genes. The x-axis shows the estimated odds ratio. All p-values <3.1 x 10-4, two-sided Fisher’s exact test. g) Each plot shows the distribution of Pearson’s correlation coefficients of gene expressions with the time points during the in vitro differentiation process of primary human neural progenitor cells, related to Fig. 4 (n= 77 samples; 3 fetal brain donors). As shown, DE genes are down-regulated during the differentiation process (negative correlations), while rASD genes show an up-regulation pattern (positive correlations).
a) Genes up-regulated in background knock-down of CHD8 are significantly enriched (permutation tests) in DE-ASD networks. Up and down regulated genes were extracted from Sugathan et al. (CHD8_1), Gompers et al. (CHD8_2), and Cotney et al. (CHD8_3). b) Biological processes that are enriched in the DE-ASD networks (Benjamini-Hochberg corrected FDR <0.1; hypergeometric test). The represented terms are also significantly changed between n=119 ASD and 107 TD samples as judged by GSEA. See Methods for more details. c) Integrated hub analysis of DE-ASD and XP-ASD networks. For each network, the hub analysis was based on an integrated analysis of context specific high confidence (HC) and functional ASD networks. P-values are calculated empirically based on the degree distribution of genes involved in the DE-ASD and XP-ASD networks (see Methods).
Network of hub genes in the HC XP-ASD network. Green color represents genes that are hub in both DE-ASD and XP-ASD networks. Purple color shows genes that are hub only in the HC XP-ASD network.
Supplementary Figure 10 Elevated co-expression of the DE-ASD networks in hiPSC-derived neuron models of subjects with ASD.
a) DE-ASD networks are highly over-active in the ASD neural progenitor and neurons of individuals with ASD, compared to the TD cases. To ensure the robustness of estimated network co-expression activity levels, iterating 100 times, we measured the co-expression activity by randomly selecting 4 individuals within each diagnosis and measuring the co-expression strength across the neural progenitor to neuron stages of Day 0, 2, 4, 7, and 14 (n= 20 samples). The y-axis indicates the z-transformed p-value of co-expression strength as measured by two-sided Wilcoxon-Mann-Whitney test. b) The DE-ASD networks are highly expressed in RNA-Seq transcriptome data from hiPSC-to-neuron differentiation of individuals with ASD and TD from an additional dataset (GSE67528). In this dataset, the same set of hiPSC cell lines as those in the Fig. 6 are differentiated to neurons (in total three time points of hiPSC, NPC, and neurons) by an independent group (n= 83 samples from 8 individuals with ASD and 6 individuals with TD). RNA-Seq data were TMM normalized and log2(x+1) transformed. As shown, genes involved in the DE-ASD networks are highly expressed at NPC and neuron stages. c) The replication of observed dysregulation of DE-ASD network in the orthogonal dataset (GSE67528). To measure network activity, iterating 100 times, we randomly selected n=5 NPC and 5 neuron samples (in total 10 samples) from each diagnosis group. Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
Supplementary Figure 11 The DE-ASD network transcriptional activity is correlated with ADOS-SA deficit scores.
a) Male toddlers with ASD were categorized based on their ADOS social affect (ADOS-SA) deficit scores to the three groups of mild (ADOS-SA between 5 to 11; n=29), medium severity (ADOS-SA between 12 to 15; n=51), and high severity (ADOS-SA between 16 to 21; n=39). As shown, individuals at different severity levels show similar dysregulation patterns of DE genes. b) Male toddlers with ASD were sorted and grouped based on the ADOS-SA severity scores. Activity level of the DE-ASD networks in each group were measured based on the observed co-expression strength of interactions in the DE-ASD networks in randomly selected n=20 samples from each diagnosis group. The distribution of co-expression strengths (i.e., unsigned Pearson’s correlation coefficient) were next compared with what would be expected from a randomly selected set of genes within the same samples. The inset boxplots on top left demonstrate the distribution of observed and expected by chance Pearson’s correlation coefficient of ADOS-SA scores with network activity levels. The expected random distribution was generated by 10000 times random permutation of ADOS-SA scores of ASD individuals in the dataset. Note that the defined ASD severity levels are not independent and overlap with each other. c) The DE-ASD co-expression activity was measured by comparing the activity of DE-ASD networks in ASD versus randomly sampled TD cases (activity is measured based on n=20 selected samples from each diagnosis group). The inset boxplots demonstrate the distribution of observed and expected random Pearson’s correlation of ADOS-SA severity levels with the DE-ASD network activity levels. We used empirical methods to estimate the p-value of observed correlations with those of random in panels b and c. Iterating 106 times, we sampled 100 data points from each of observed and random groups and assessed if the mean correlation in the samples from the random group is equal or higher than those of the observation group in absolute value. This analysis demonstrated a two-sided empirical p-value <10-6 for observed correlations. Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
Supplementary Figure 12 Isolating the effect of ADOS-SA scores on the co-transcriptional activity of DE-ASD networks.
(a) In our ASD cohort, ADOS social affect (ADOS-SA) scores are correlated with Mullen ELC scores (n=119 subjects with ASD; Pearson’s correlation coefficient: -0.41). (b) To isolate the effect of ADOS-SA scores, we selected subjects with ASD who have Mullen ELC scores above 55 and below 80 (n=47 subjects). In this subset, ADOS-SA scores were no longer correlated with Mullen ELC scores (Pearson’s correlation coefficient: -0.0029). We next divided the 47 ASD samples into two groups based on their median ADOS-SA scores. Iterating 100 times, we sampled n=15 from each ASD subgroup and compared the activity of the network. Panel c represents the z-transformed p-value of the comparisons of DE-ASD network over-activity between high and low ADOS-SA groups as measured by a one-sided Wilcoxon-Mann-Whitney test. In cases that z-score could not be estimated (e.g., p-value =0), we used the z-score form lowest none-zero p-value. As illustrated, the DE-ASD network exhibits over-activity in subjects with high ADOS-SA scores in this selected subset. Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
To assess the robustness of results, we assessed their reproducibility under a more stringent criterion for the expressed genes. The main results are based on the selection of 14,854 protein coding genes as expressed in the dataset. By comparison with the results of GTEx whole blood transcriptome data, we showed that number of expressed protein coding genes are in the same range in both studies (14,555 Expressed protein coding genes in GTEx; see Methods for more details). Here, we employed a more stringent analysis by setting the p-value detection cut-off threshold at 0.01 instead of 0.05. This resulted in the selection of 13,032 protein coding genes as expressed in our leukocyte transcriptome dataset (n=119 ASD and 107 TD). We re-analyzed the new filtered dataset from DE analysis onward (constructed HC DE and XP networks). As shown, (a) MA-plot shows a similar pattern with mean expression being, in overall, uncorrelated with the fold change patterns. (b) the DE network showed transcriptional over-activity in ASD subjects (paired Wilcoxon-Mann-Whitney test). (c) The DE-ASD network significantly overlaps with the same modules and networks of rASD genes as the main results (permutation test). (d) The DE-ASD network is preferentially expressed at prenatal brain (n=187 BrainSpan neocortex samples). (e) Transcriptional activity of the DE-ASD network shows a peak at 10-19pcw in prenatal brain (z-transformed p-values of a two-sided Wilcoxon-Mann-Whitney test). (f-g) The DE-ASD network is significantly enriched for targets of high confidence rASD gens (permutation test). CHD8-1: Sugathan et al., CHD8-2: Cotney et al., CHD8-3: Gompers et al., FMR1: Darnell et al. (h-i) The XP-ASD network is preferentially associated with high confidence rASD genes (hypergeometric test). (j) The XP-ASD network is enriched for rASD genes with regulatory roles (two-sided Wilcoxon-Mann-Whitney test). (k) The high confidence rASD genes identified by truncating protein mutations in their sequence and pLI score >0.9 through large-scale genetics studies are enriched in the XP-ASD network (hypergeometric test). (l) The DE and rASD genes show anti-correlated expression patterns in in vitro neural differentiation data (n=77 samples from 3 fetal brain donors). (m) The XP-ASD network preferentially incorporates rASD genes with regulatory roles on RAS/ERK, PI3K/AKT, and WNT/B-catenin proteins. Results for RAS/ERK pathway is only shown due to space limitations; similar patterns were observed for PI3K/AKT and WNT/B-catenin pathways. (n) The DE-ASD network is preferentially expressed in hiPSC-derived ASD neurons and neural progenitor cells (two-sided Wilcoxon-Mann-Whitney test; n= 83 samples from 14 donors). (o) The DE-ASD network is significantly over-active at transcriptional level in ASD neural progenitor and neuron models (two-sided Wilcoxon-Mann-Whitney test; n= 5 progenitor and 5 neuron samples from each of ASD and TD groups). (p) Co-transcriptional activity of the DE-ASD network correlates with ADOS social affect scores of ASD subjects (permutation test; n=20 subjects at each ASD symptom severity level). Boxplots represent the median (horizontal line), lower and upper quartile values (box), and the range of values (whisker), and outliers (dots).
The top plots (a-b) illustrate a hierarchical clustering of n=226 subjects based on 1000 most variable genes in the primary microarray dataset. Samples are color coded based on the batch number. To estimate batch effects, we clustered the samples to 8 clusters and measured the uncertainty (heterogeneity) level of batch co-variate within each cluster using an entropy metric. The overall entropy score is sum of cluster entropy scores weighted by the cluster size. Entropy of zero implies that samples of the same batch are clustered with each other, and increasing entropy levels indicate a more random distribution of samples in terms of the batch co-variate. The bottom plots (c-d) demonstrate principle coordinate plot of samples that have technical replicates (57 samples). The distance of every two samples on the plot approximate their Euclidian distance based on 1000 most variable genes in the dataset. Samples are color-coded with technical replicates in the same color. B5Z4P is a sample with technical replicates in two different batches as obvious in panel (c). Principal component figures are generated by limma package in R.