Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer’s and Parkinson’s diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson’s disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data generated in this work are available through GEO accession no. GSE147672. To facilitate wide access to our data, we created a WashU Epigenome browser session (session ID: drS3o1n4kJ) for our scATAC–seq data in the following track formats: (1) broad cell types (Corces_scATAC_BroadCellTypes);(2) broad clusters (Corces_scATAC_BroadClusters); (3) neuron subclusters (Corces_scATAC_NeuronSubClusters); and (4) neuron subclustered cell types/LDSC groups (Corces_scATAC_NeuronSubCellTypes). These tracks are accessible via the following link: http://epigenomegateway.wustl.edu/legacy/?genome=hg38&session=drS3o1n4kJ.
All custom code used in this work is available at the following GitHub repository: https://github.com/kundajelab/alzheimers_parkinsons.
Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet. 51, 414–430 (2019).
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).
Beecham, G. W. et al. Genome-wide association meta-analysis of neuropathologic features of Alzheimer’s disease and related dementias. PLoS Genet. 10, e1004606 (2014).
Pankratz, N. et al. Meta-analysis of Parkinson’s disease: identification of a novel locus, RIT2. Ann. Neurol. 71, 370–384 (2012).
Chang, D. et al. A meta-analysis of genome-wide association studies identifies 17 new Parkinson’s disease risk loci. Nat. Genet. 49, 1511–1516 (2017).
Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).
Nott, A. et al. Brain cell type-specific enhancer–promoter interactome maps and disease-risk association. Science 366, 1134–1139 (2019).
Li, M. et al. Integrative functional genomic analysis of human brain development and neuropsychiatric risks. Science 362, eaat7615 (2018).
Amiri, A. et al. Transcriptome and epigenome landscape of human cortical development modeled in organoids. Science 362, eaat6720 (2018).
Trevino, A. E. et al. Chromatin accessibility dynamics in a model of human forebrain development. Science 367, eaay1645 (2020).
Nowakowski, T. J. et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. Science 358, 1318–1323 (2017).
Song, M. et al. Mapping cis-regulatory chromatin contacts in neural cells links neuropsychiatric disorder risk variants to target genes. Nat. Genet. 51, 1252–1262 (2019).
Rajarajan, P. et al. Neuron-specific signatures in the chromosomal connectome associated with schizophrenia risk. Science 362, eaat4311 (2018).
Fullard, J. F. et al. An atlas of chromatin accessibility in the adult human brain. Genome Res. 28, 1243–1252 (2018).
Fullard, J. F. et al. Open chromatin profiling of human postmortem brain infers functional roles for non-coding schizophrenia loci. Hum. Mol. Genet. 26, 1942–1951 (2017).
Bryois, J. et al. Evaluation of chromatin accessibility in prefrontal cortex of individuals with schizophrenia. Nat. Commun. 9, 3121 (2018).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Sey, N. Y. A. et al. A computational tool (H-MAGMA) for improved prediction of brain-disorder risk genes by incorporating brain chromatin interaction profiles. Nat. Neurosci. 23, 583–593 (2020).
Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).
Shrikumar, A., Prakash, E. & Kundaje, A. GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs. Bioinformatics 35, i173–i182 (2019).
Kumasaka, N., Knights, A. J. & Gaffney, D. J. High-resolution genetic mapping of putative causal interactions between regions of open chromatin. Nat. Genet. 51, 128–137 (2019).
Amlie-Wolf, A. et al. INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Res. 46, 8740–8753 (2018).
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Mumbach, M. R. et al. Enhancer connectome in primary human cells reveals target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).
Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).
Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871.e8 (2018).
Corces, M. R. et al. Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).
McKeown, M. R. et al. Superenhancer analysis defines novel epigenomic subtypes of non-APL AML, including an RARα dependency targetable by SY-1425, a potent and selective RARα agonist. Cancer Discov. 7, 1136–1153 (2017).
Stolt, C. C. et al. The Sox9 transcription factor determines glial fate choice in the developing spinal cord. Genes Dev. 17, 1677–1689 (2003).
Kuhlbrodt, K., Herbarth, B., Sock, E., Hermans-Borgmeyer, I. & Wegner, M. Sox10, a novel transcriptional modulator in glial cells. J. Neurosci. 18, 237–250 (1998).
Kondo, T. & Raff, M. Basic helix-loop-helix proteins and the timing of oligodendrocyte differentiation. Development 127, 2989–2998 (2000).
Nakatani, H. et al. Ascl1/Mash1 promotes brain oligodendrogenesis during myelination and remyelination. J. Neurosci. 33, 9752–9768 (2013).
Smith, A. M. et al. The transcription factor PU.1 is critical for viability and function of human brain microglia. Glia 61, 929–942 (2013).
Schlingensiepen, K. H. et al. The role of Jun transcription factor expression and phosphorylation in neuronal differentiation, neuronal cell death, and plastic adaptations in vivo. Cell. Mol. Neurobiol. 14, 487–505 (1994).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Hemonnot, A. L., Hua, J., Ulmann, L. & Hirbec, H. Microglia in Alzheimer disease: well-known targets and new opportunities. Front. Aging Neurosci. 11, 233 (2019).
Efthymiou, A. G. & Goate, A. M. Late onset Alzheimer’s disease genetics implicates microglial pathways in disease risk. Mol. Neurodegener. 12, 43 (2017).
Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Ghandi, M. et al. gkmSVM: an R package for gapped-kmer SVM. Bioinformatics 32, 2205–2207 (2016).
Bromberg, Y. & Rost, B. Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics 24, i207–i212 (2008).
Xu, W., Tan, L. & Yu, J.-T. The role of PICALM in Alzheimer’s disease. Mol. Neurobiol. 52, 399–413 (2015).
Stage, E. et al. The effect of the top 20 Alzheimer disease risk genes on gray-matter density and FDG PET brain metabolism. Alzheimers Dement. (Amst) 5, 53–66 (2016).
Andrew, R. J. et al. Reduction of the expression of the late-onset Alzheimer’s disease (AD) risk-factor BIN1 does not affect amyloid pathology in an AD mouse model. J. Biol. Chem. 294, 4477–4487 (2019).
Ma, J., Yu, J.-T. & Tan, L. MS4A cluster in Alzheimer’s disease. Mol. Neurobiol. 51, 1240–1248 (2015).
Rouka, E. et al. Differential recognition preferences of the three Src homology 3 (SH3) domains from the adaptor CD2-associated protein (CD2AP) and direct association with Ras and Rab interactor 3 (RIN3). J. Biol. Chem. 290, 25275–25292 (2015).
Larsson, M. et al. GWAS findings for human iris patterns: associations with variants in genes that influence normal neuronal pattern development. Am. J. Hum. Genet. 89, 334–343 (2011).
Kajiho, H. et al. RIN3: a novel Rab5 GEF interacting with amphiphysin II involved in the early endocytic pathway. J. Cell Sci. 116, 4159–4168 (2003).
Lecours, C. et al. Microglial implication in Parkinson’s disease: loss of beneficial physiological roles or gain of inflammatory functions? Front. Cell. Neurosci. 12, 282 (2018).
Kaushik, D. K., Gupta, M., Das, S. & Basu, A. Krüppel-like factor 4, a novel transcription factor regulates microglial activation and subsequent neuroinflammation. J. Neuroinflammation 7, 68 (2010).
Schellenberg, G. D. & Montine, T. J. The genetics and neuropathology of Alzheimer’s disease. Acta Neuropathol. 124, 305–323 (2012).
Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet. 37, 129–137 (2005).
Zody, M. C. et al. Evolutionary toggling of the MAPT 17q21.31 inversion region. Nat. Genet. 40, 1076–1083 (2008).
Valenca, G. T. et al. The role of MAPT haplotype H2 and isoform 1N/4R in Parkinsonism of older adults. PLoS ONE 11, e0157452 (2016).
Allen, M. et al. Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levels. Alzheimers Res. Ther. 6, 39 (2014).
Pascale, E. et al. Genetic architecture of MAPT gene region in Parkinson disease subtypes. Front. Cell. Neurosci. 10, 96 (2016).
Beevers, J. E. et al. MAPT genetic variation and neuronal maturity alter isoform expression affecting axonal transport in iPSC-derived dopamine neurons. Stem Cell Reports 9, 587–599 (2017).
Lai, M. C. et al. Haplotype-specific MAPT exon 3 expression regulated by common intronic polymorphisms associated with Parkinsonian disorders. Mol. Neurodegener. 12, 79 (2017).
Huin, V. et al. Alternative promoter usage generates novel shorter MAPT mRNA transcripts in Alzheimer’s disease and progressive supranuclear palsy brains. Sci. Rep. 7, 12589 (2017).
Pankratz, N. et al. Genomewide association study for susceptibility genes contributing to familial Parkinson disease. Hum. Genet. 124, 593–605 (2009).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Li, Z. et al. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia. Nat. Genet. 49, 1576–1583 (2017).
Duncan, L. et al. Significant locus and metabolic genetic correlations revealed in genome-wide association study of anorexia nervosa. Am. J. Psychiatry 174, 850–858 (2017).
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
Otowa, T. et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol. Psychiatry 21, 1391–1399 (2016).
Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).
Anney, R. J. L. et al. Genetic determinants of common epilepsies: a meta-analysis of genome-wide association studies. Lancet Neurol. 13, 893–903 (2014).
Zillikens, M. C. et al. Large meta-analysis of genome-wide association studies identifies five loci for lean body mass. Nat. Commun. 8, 80 (2017).
Kemp, J. P. et al. Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis. Nat. Genet. 49, 1468–1475 (2017).
Howson, J. M. M. et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 49, 1113–1119 (2017).
Harvey, C. T. et al. QuASAR: quantitative allele-specific analysis of reads. Bioinformatics 31, 1235–1242 (2015).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Granja, J. M. et al. ArchR: an integrative and scalable software package for single-cell chromatin accessibility analysis. Preprint at bioRxiv https://doi.org/10.1101/2020.04.28.066498 (2020).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Cusanovich, D. A. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555, 538–542 (2018).
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729.e27 (2018).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Bhattacharyya, S., Chandra, V., Vijayanand, P. & Ay, F. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat. Commun. 10, 4221 (2019).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
Krueger, F. & Andrews, S. R. SNPsplit: allele-specific splitting of alignments between genomes with known SNP genotypes. F1000Res. 5, 1479 (2016).
This work was supported by National Institutes of Health (NIH) grant nos. NS062684, AG057707, AG053959, AG047366 (to T.J.M.), HG007735 (to H.Y.C.), HG009431 (to S.B.M./A.K.), AG066490 (to S.B.M.) and AG059918 (to M.R.C.). Additional support for patient sample collection was provided by NIH grant nos. AG005136 and AG019610. Sequencing data for this project were generated on an Illumina HiSeq 4000 System supported in part by NIH grant no. S10OD018220. Additional resources at the Stanford Center for Genomics and Personalized Medicine Sequencing Center were supported by NIH grant no. S10OD025212. M.R.C. is supported by the American Society of Hematology Scholar Award. A.S. is supported by the Stanford BioX Bowes fellowship. M.J.G. and T.E. are supported by a National Library of Medicine training grant no. 5T15LM007033-36. M.J.G. is additionally supported by a Stanford Graduate Fellowship. H.Y.C. is an Investigator of the Howard Hughes Medical Institute.
H.Y.C. is a cofounder of Accent Therapeutics, Boundless Bio, and an advisor to 10x Genomics, Arsenal Bio and Spring Discovery. S.B.M. is on the scientific advisory board of MyOme. A.K. is a consultant for Biogen. A.S. is a consultant for MyoKardia. W.J.G. is a consultant for Guardant Health, 10x Genomics and Protillion Biosciences.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Region-centric scATAC-seq identifies cellular and regional heterogeneity in chromatin accessibility in adult brain.
a, b, UMAP dimensionality reduction (a) prior to and (b) after batch correction with Harmony of scATAC-seq data from 10 different samples. Each dot represents a single cell (N = 70,631). Dots are colored by the sample of origin. Color labels are shown in Extended Data Fig. 1b. c, The same UMAP dimensionality reduction shown in Extended Data Fig. 1b but each cell is colored by its gene activity score for the annotated lineage-defining gene. Gene activity scores were imputed using MAGIC. Grey represents the minimum gene activity score while purple represents the maximum gene activity score for the given gene. The minimum and maximum scores are shown in the bottom left of each panel. The gene of interest and the cell type that it identified are shown in the upper left of each panel. MSNs – medium spiny neurons. d, Heatmap of cell type-specific markers used to define the cell type corresponding to each cluster. Color represents the row-wise Z-score of chromatin accessibility in the vicinity of each gene for each cluster. e, Cluster residence heatmap showing the percent of each cluster that is composed of cells from each sample. Cell numbers were normalized across samples prior to calculating cluster residence percentages to account for differences in total pass filter cells per sample. f–h, UMAP dimensionality reduction as shown in Extended Data Fig. 1b but colored by (f) the gross brain region from which each cell was obtained, (g) the biological sex of the donor for each cell, or (h) the predicted cell class for each cell. i–k, Bar plot showing the number of cells identified in our scATAC-seq data from (i) each of the annotated cell classes, (j) each of the annotated donors/samples, or (k) each of the gross brain regions subdivided based on cell class. Color represents the predicted cell class as shown in the legend of Extended Data Fig. 1h. l, m, Bar plot showing the percentage of cells in our scATAC-seq data from (l) each of the gross brain regions subdivided based on cell class or (m) each of the annotated cell classes subdivided based on donor/sample of origin. Color represents (l) the predicted cell class as shown in the Extended Data Fig. 1h or (m) the biological sample from which the cells were obtained.
Extended Data Fig. 2 Cellular heterogeneity in brain tissue necessitates single-cell approaches to capture biological complexity.
a, b, Bar plot of the log2(Fold Change) in the percent of peaks mapping to various genomic annotations comparing peaks from (a) the scATAC-seq peak set that are not overlapped by a peak from the bulk ATAC-seq peak set to peaks that are overlapped by a peak from the bulk ATAC-seq peak set or (b) the scATAC-seq peak set that were identified as cell type-unique through feature binarization to all peaks from the scATAC-seq peak set. c, Sequencing tracks of lineage-defining factors shown across all 24 scATAC-seq clusters (except Cluster 18 – putative doublets). From left to right, NEFL (neurons; chr8:24933431-24966791), AIF1 (aka IBA1, microglia; chr6:31607841-31617906), MOG (oligodendrocytes; chr6:29652183-29699713), GJB6 (astrocytes; chr13:20200243-20239571), and PDGFRA (OPCs; chr4:54209541-54303643). d, Box and whiskers plots showing the distribution of the number of single cells from our scATAC-seq data showing accessibility within (left) each peak from the set of peaks from the scATAC-seq peak set that overlap a peak from the bulk ATAC-seq peak set (N = 120,941 peaks) and (right) each peak from the set of peaks from the scATAC-seq peak set that do not overlap a peak from the bulk ATAC-seq peak set (N = 238,081 peaks). The lower and upper ends of the box represent the 25th and 75th percentiles and the internal line represents the median. The whiskers represent 1.5 multiplied by the inter-quartile range. P-value determined by Kolmogorov–Smirnov test. e, Dot plot showing the inter-region Pearson correlation of pseudo-bulk replicates comprised of all cells from either SMTG, PARL, or MDFG within each of the clusters shown. The clusters shown were selected based on biological relevance (that is clusters annotated as “substantia nigra astrocytes” should not be compared across isocortical regions) and on cluster size (that is clusters with small numbers of isocortical cells would not provide robust comparisons).
Extended Data Fig. 3 Neuronal sub-clustering identifies diverse biologically relevant populations of neurons.
a–d, UMAP dimensionality reduction of neuronal cells (identified as Clusters 1-7, 11, and 12 from Fig. 1e) (a) prior to or (b–d) after batch correction with Harmony of scATAC-seq data from 10 different samples. Each dot represents a single cell (N = 21,116). Dots are colored by (a, b) the sample of origin, (c) the neuronal sub-cluster (repeated from Fig. 2a), or (d) its gene activity score for the annotated lineage-defining gene. In (d), gene activity scores were imputed using MAGIC. Grey represents the minimum gene activity score while purple represents the maximum gene activity score for the given gene. The minimum and maximum scores are shown in the bottom left of each panel. The gene of interest is shown in the upper right of each panel. e, Heatmap of gene activity scores for all neuronal markers used in identifying relevant cell types for neuronal sub-clusters. Color represents the column-wise z-scores for each gene across all neuronal sub-clusters with values thresholded at -2 and +2. Neuronal cluster “major annotation” is shown by color along with a cluster description to the right of the plot. f–h, The same UMAP dimensionality reduction shown in Extended Data Fig. 3c but cells are colored by (f) the major cell class annotation, (g) a more granular neuronal sub-annotation, or (h) the neuronal cell class annotation. Assignment was made based on gene activity scores of lineage-defining genes. The cell class annotation shown in (h) was used to perform LD score regression analysis.
Extended Data Fig. 4 Sub-clustering of cells from the substantia nigra identifies TH-positive dopaminergic neurons.
a–d, UMAP dimensionality reduction after iterative LSI of scATAC-seq data from substantia nigra cells from 2 different samples. Each dot represents a single cell (N = 11,199). Dots are colored by (a) their corresponding substantia nigra sub-cluster, (b) the sample of origin, or (c, d) its gene activity score for (c) the tyrosine hydoxylase (TH) gene, a specific marker of dopaminergic neurons or (d) other lineage-defining genes. In (c, d), gene activity scores were imputed using MAGIC. Grey represents the minimum gene activity score while purple represents the maximum gene activity score. The minimum and maximum scores are shown in the bottom left of each panel. In (a-c), the predicted cluster cell type identities are overlaid on the UMAPs.
Extended Data Fig. 5 HiChIP and co-accessibilty predict enhancer-promoter interactions in primary adult human brain.
a, Heatmap representation of HiChIP interaction signal at 100-kb, 25-kb, and 5-kb resolution at the OLIG2 locus. Sample shown represents the substantia nigra from donor 03-41. Signal is normalized to the square root of the coverage. The maximum value of the color range and the coordinates along chromosome 21 are shown below each panel. b, Bar plots showing the total number of paired-end reads sequenced for each HiChIP library generated in this study. Color represents the brain region from which the data was generated. c, Bar plots showing the number of valid interaction pairs identified in HiChIP data from all samples profiled in this study. Color represents the type of interaction identified. d, Bar plot showing the overlap of FitHiChIP loop calls from the 4 gross brain regions profiled. Color indicates whether the loop was identified in a single region (unique) or more than one region (shared). e, Bar plot showing the classification of FitHiChIP loop calls based on whether the loop call contained an ATAC-seq peak (from either the bulk ATAC-seq peak set or the scATAC-seq peak set) or TSS in one, both, or no anchor. f, Bar plots showing the number of Cicero-predicted co-accessibility-based peak links that are observed in HiChIP (left) or the number of HiChIP-based FitHiChIP loop calls that are predicted as peak links by Cicero. g, Bar plot showing the number of cell type-specific peaks (defined as peaks identified through feature binarization; N = 221,062) or non-cell type-specific peaks (defined as scATAC-seq peaks that were not identified through feature binarization; N = 137,960) that overlap or do not overlap a Cicero-predicated co-accessibility linkage. Significance determined by Kolmogorov-Smirnov test.
Extended Data Fig. 6 A multi-omic tiered approach leverages machine learning to predict functional noncoding SNPs in AD and PD.
a, Flow chart of the analytical framework used to prioritize noncoding SNPs and predict functionality. The highest confidence SNPs (Tier 1) are supported by either machine learning predictions, allelic imbalance, or both. Moderate confidence SNPs (Tier 2) are supported by the presence of the SNP within a peak and a HiChIP loop or co-accessibility peak link that connects the SNP to a gene. Lower confidence SNPs (Tier 3) are only supported by the presence of the SNP in a peak. b, c, Box plot showing the area under (b) the precision-recall curve or (c) the receiver-operating characteristics curve for the gkm-SVM machine learning classifier. Performance for each of the 24 broad clusters is shown with dots representing outliers. The lower and upper ends of the box represent the 25th and 75th percentiles. The whiskers represent 1.5 multiplied by the inter-quartile range. The center line represents the median. d, GkmExplain importance scores shown across all 10 folds for each base across a 100-bp window surrounding rs636317 for the effect (left) and non-effect (right) bases. e, Dot plots showing comparison of the GkmExplain score, ISM score, and deltaSVM score. Each dot represents an individual SNP test in a given fold. Dot color represents the GWAS locus number. The only off-diagonal dots (circled) correspond to repetitive regions within the MAPT locus where the deltaSVM score appears to be particularly sensitive. f, Dot plot showing allelic imbalance assessed by RASQUAL across all bulk ATAC-seq data used in this study from a region-specific analysis. Significance is assessed by RASQUAL (see Methods). Dot color indicates the brain region found to have significant allelic imbalance. Grey dots do not pass significance testing based on an empircal distribution of permuted null q-values and a 10% false discovery rate. A RASQUAL effect size greater than 0.5 indicates that the alternate allele is enriched while less than 0.5 indicates that the reference allele is enriched. The plot is divided to show SNPs within the MAPT and DNAH17 loci (bottom) and SNPs in all other loci (top). SNPs mentioned in downstream analyses are highlighted by red text.
Extended Data Fig. 7 Multi-omic characterization of well-studied AD-related GWAS loci pinpoints putative functional noncoding SNPs.
a, c, Normalized scATAC-seq-derived pseudo-bulk tracks, H3K27ac HiChIP loop calls, co-accessibility correlations, and publically available H3K4me3 PLAC-seq loop calls (Nott. et al. 2019) in (a) the BIN1 gene locus (chr2:127045000-127182000) and (c) the MS4A gene locus (chr11:60023000-60554000). scATAC-seq tracks represent the aggregate signal of all cells from the given cell type and have been normalized to the total number of reads in TSS regions, enabling direct comparison of tracks across cell types. For HiChIP, each line represents a FitHiChIP loop call connecting the points on each end. Red lines contain one anchor overlapping the SNP of interest while grey lines do not. For co-accessibility, only interactions involving the accessible chromatin region of interest are shown. For PLAC-seq, MAPS loop calls from microglia (blue), neurons (orange), and oligodendrocytes (purple) are shown. b, d, GkmExplain importance scores for each base in the 50-bp region surrounding (b) rs13025717 or (d) rs636317 for the effect and non-effect alleles from the gkm-SVM model for microglia (Cluster 24). The predicted motif affected by the SNP is shown at the bottom and the SNP of interest is highlighted in blue. e, Dot plot showing allelic imbalance at rs636317. Significance of allelic imbalance was determined by RASQUAL. The bulk ATAC-seq counts determined by WASP and ASEReadCounter for the reference/non-effect (A) allele and variant/effect (T) allele are plotted. Each dot represents an individual bulk ATAC-seq sample (N = 140) colored by the brain region from which the sample was collected. Samples where fewer than 3 reads were present to support both the reference and variant allele (that is presumed homozygotes or samples with insufficient sequencing depth) are shown in grey. The blue line represents a linear regression of the non-grey points and the grey box represents the 95% confidence interval of that regression.
Extended Data Fig. 8 Multi-omic characterization of noncoding SNPs identifies novel genes implicated in PD.
a, c, Normalized scATAC-seq-derived pseudo-bulk tracks, H3K27ac HiChIP loop calls, co-accessibility correlations, and publically available H3K4me3 PLAC-seq loop calls (Nott. et al. 2019) in (a) the IP6K2 gene locus (chr3:48671000-49205000) or (c) the TMEM163 gene locus (chr2:134429000-134905000). scATAC-seq tracks represent the aggregate signal of all cells from the given cell type and have been normalized to the total number of reads in TSS regions, enabling direct comparison of tracks across cell types. For HiChIP, each line represents a FitHiChIP loop call connecting the points on each end. Red lines contain one anchor overlapping the SNP of interest while grey lines do not. For co-accessibility, only interactions involving the accessible chromatin region of interest are shown. For PLAC-seq, MAPS loop calls from microglia (blue), neurons (orange), and oligodendrocytes (purple) are shown. b, d, GkmExplain importance scores for each base in the 50-bp region surrounding (b) rs6781790 or (d) rs7599054 for the effect and non-effect alleles from the gkm-SVM model for (b) astrocytes (Cluster 15) or (d) microglia (Cluster 24). The predicted motif affected by the SNP is shown at the bottom and the SNP of interest is highlighted in blue. e, Dot plot comparing the –log10(p-value) from 23andMe PD GWAS data with the –log10(p-value) from GTEx Caudate eQTL data of SNPs in the TMEM163 locus. Each dot represents an individual SNP. Dot color represents the r2 value of LD with the lead SNP (rs7599054 – purple diamond) within a European reference population. f, g, Dot plots showing the genomic coordinates of each SNP and the –log10(p-value) from (f) 23andMe PD GWAS data or (g) GTEx Caudate eQTL data. Dots are colored as in Extended Data Fig. 8e. In (e–g), p-values are based on genome-wide chi-squared statistics from the relevant GWAS and eQTL studies.
a, Flowchart illustrating the analytical scheme used to identify bins with significant allelic imbalance across the H1 and H2 MAPT haplotypes. b, Heatmaps showing chromatin accessibility in 500-bp bins identified as having significantly different accessibility across MAPT haplotypes. Regions are shown for homozygous samples without allelic read splitting (left) and for heterozygous samples after allelic read splitting (right). Bin start coordinates are shown to the right. c, Box and whiskers plots for multiple regions which show differential chromatin accessibility across the H1 and H2 MAPT haplotypes. Each dot represents a single homozygous H1 (N = 91) or homozygous H2 (N = 12) sample. Heterozygotes are not shown. The lower and upper ends of the box represent the 25th and 75th percentiles. The whiskers represent 1.5 multiplied by the inter-quartile range. The center line represents the median. d, e, Gene expression of (d) the KANSL1-AS1 gene or (e) the MAPK8IP1P2 gene shown as a box plot from GTEx cortex brain samples subdivided based on MAPT haplotype. The lower and upper ends of the box represent the 25th and 75th percentiles. The whiskers represent 1.5 multiplied by the inter-quartile range. The center line represents the median. ***p < 10-5 based on Wilcoxon rank sum test. N = 117 H1/H1, 78 H1/H2, and 10 H2/H2. f, Sequencing tracks from pseudo-bulk data derived from predicted cell types in scATAC-seq data. This region represents a zoomed in view of the predicted distal regulatory region (chr17:45216500-45324000) that interacts with the MAPT promoter in the H1 haplotype. Putative neuron-specific regulatory elements are highlighted in blue. g, Box plots showing differential HiChIP interaction signal occurring between regions within the MAPT inversion and regions outside the inversion (“left” or “right”). The schematic at the top explains the analysis performed. The box plots show normalized HiChIP interaction counts for the H1 (N = 6) and H2 (N = 6) haplotypes for upstream/“left” interactions and downstream/“right” interactions. P-value determined by paired two-sided t-test.
Supplementary Figs. 1–6, Notes 1–7 and Methods
Donor information, sequencing statistics, GWAS SNP metadata and multi-omic characterizations
Final merged peak set derived from all bulk ATAC–seq data.
Quality control information for all individual cells profiled by scATAC–seq and the cluster residence information for all clusters and samples.
Final merged peak set derived from all scATAC–seq data.
Results of feature binarization from scATAC–seq data showing cell-type-specific peaks.
CIBERSORT signature matrices for the cell group-specific and cluster-specific classifiers.
Results of differential accessibility comparisons between the substantia nigra and isocortex for astrocytes, OPCs, oligodendrocytes, and microglia.
Neuronal cluster definitions, and cell class-specific peaks, TF motifs, and genes.
Results of all LD score regression analyses across all conditions and cell types.
All FitHiChIP loop calls overlapping a SNP on at least one anchor.
Results of RASQUAL allelic imbalance analysis and allele counts from WASP/ASEReadCounter for all tested SNPs.
About this article
Cite this article
Corces, M.R., Shcherbina, A., Kundu, S. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat Genet 52, 1158–1168 (2020). https://doi.org/10.1038/s41588-020-00721-x