Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases

Abstract

Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer’s and Parkinson’s diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson’s disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: scATAC–seq identifies cell-type-specific chromatin accessibility in the adult brain.
Fig. 2: Subclustering identifies diverse biologically relevant neuronal cell types in the adult brain.
Fig. 3: Machine learning predicts functional polymorphisms in AD and PD.
Fig. 4: Vertical integration of multi-omic data and machine-learning nominate gene targets in AD and PD.
Fig. 5: Epigenetic deconvolution of the MAPT locus explains haplotype-associated transcriptional changes.

Data availability

All data generated in this work are available through GEO accession no. GSE147672. To facilitate wide access to our data, we created a WashU Epigenome browser session (session ID: drS3o1n4kJ) for our scATAC–seq data in the following track formats: (1) broad cell types (Corces_scATAC_BroadCellTypes);(2) broad clusters (Corces_scATAC_BroadClusters); (3) neuron subclusters (Corces_scATAC_NeuronSubClusters); and (4) neuron subclustered cell types/LDSC groups (Corces_scATAC_NeuronSubCellTypes). These tracks are accessible via the following link: http://epigenomegateway.wustl.edu/legacy/?genome=hg38&session=drS3o1n4kJ.

Code availability

All custom code used in this work is available at the following GitHub repository: https://github.com/kundajelab/alzheimers_parkinsons.

References

  1. 1.

    Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet. 51, 414–430 (2019).

    CAS  Google Scholar 

  2. 2.

    Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).

    CAS  Google Scholar 

  3. 3.

    Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).

    CAS  Google Scholar 

  4. 4.

    Beecham, G. W. et al. Genome-wide association meta-analysis of neuropathologic features of Alzheimer’s disease and related dementias. PLoS Genet. 10, e1004606 (2014).

    Google Scholar 

  5. 5.

    Pankratz, N. et al. Meta-analysis of Parkinson’s disease: identification of a novel locus, RIT2. Ann. Neurol. 71, 370–384 (2012).

    CAS  Google Scholar 

  6. 6.

    Chang, D. et al. A meta-analysis of genome-wide association studies identifies 17 new Parkinson’s disease risk loci. Nat. Genet. 49, 1511–1516 (2017).

    CAS  Google Scholar 

  7. 7.

    Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).

    CAS  Google Scholar 

  8. 8.

    Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).

    CAS  Google Scholar 

  9. 9.

    Nott, A. et al. Brain cell type-specific enhancer–promoter interactome maps and disease-risk association. Science 366, 1134–1139 (2019).

    CAS  Google Scholar 

  10. 10.

    Li, M. et al. Integrative functional genomic analysis of human brain development and neuropsychiatric risks. Science 362, eaat7615 (2018).

    CAS  Google Scholar 

  11. 11.

    Amiri, A. et al. Transcriptome and epigenome landscape of human cortical development modeled in organoids. Science 362, eaat6720 (2018).

    CAS  Google Scholar 

  12. 12.

    Trevino, A. E. et al. Chromatin accessibility dynamics in a model of human forebrain development. Science 367, eaay1645 (2020).

    CAS  Google Scholar 

  13. 13.

    Nowakowski, T. J. et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. Science 358, 1318–1323 (2017).

    CAS  Google Scholar 

  14. 14.

    Song, M. et al. Mapping cis-regulatory chromatin contacts in neural cells links neuropsychiatric disorder risk variants to target genes. Nat. Genet. 51, 1252–1262 (2019).

    CAS  Google Scholar 

  15. 15.

    Rajarajan, P. et al. Neuron-specific signatures in the chromosomal connectome associated with schizophrenia risk. Science 362, eaat4311 (2018).

    CAS  Google Scholar 

  16. 16.

    Fullard, J. F. et al. An atlas of chromatin accessibility in the adult human brain. Genome Res. 28, 1243–1252 (2018).

    CAS  Google Scholar 

  17. 17.

    Fullard, J. F. et al. Open chromatin profiling of human postmortem brain infers functional roles for non-coding schizophrenia loci. Hum. Mol. Genet. 26, 1942–1951 (2017).

    CAS  Google Scholar 

  18. 18.

    Bryois, J. et al. Evaluation of chromatin accessibility in prefrontal cortex of individuals with schizophrenia. Nat. Commun. 9, 3121 (2018).

    Google Scholar 

  19. 19.

    Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).

    CAS  Google Scholar 

  20. 20.

    Sey, N. Y. A. et al. A computational tool (H-MAGMA) for improved prediction of brain-disorder risk genes by incorporating brain chromatin interaction profiles. Nat. Neurosci. 23, 583–593 (2020).

    CAS  Google Scholar 

  21. 21.

    Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).

    CAS  Google Scholar 

  22. 22.

    Shrikumar, A., Prakash, E. & Kundaje, A. GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs. Bioinformatics 35, i173–i182 (2019).

    CAS  Google Scholar 

  23. 23.

    Kumasaka, N., Knights, A. J. & Gaffney, D. J. High-resolution genetic mapping of putative causal interactions between regions of open chromatin. Nat. Genet. 51, 128–137 (2019).

    CAS  Google Scholar 

  24. 24.

    Amlie-Wolf, A. et al. INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Res. 46, 8740–8753 (2018).

    CAS  Google Scholar 

  25. 25.

    Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).

    CAS  Google Scholar 

  26. 26.

    Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

    CAS  Google Scholar 

  27. 27.

    Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    CAS  Google Scholar 

  28. 28.

    Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).

    CAS  Google Scholar 

  29. 29.

    Mumbach, M. R. et al. Enhancer connectome in primary human cells reveals target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).

    CAS  Google Scholar 

  30. 30.

    Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    CAS  Google Scholar 

  31. 31.

    Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871.e8 (2018).

    CAS  Google Scholar 

  32. 32.

    Corces, M. R. et al. Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).

    CAS  Google Scholar 

  33. 33.

    Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).

    Google Scholar 

  34. 34.

    McKeown, M. R. et al. Superenhancer analysis defines novel epigenomic subtypes of non-APL AML, including an RARα dependency targetable by SY-1425, a potent and selective RARα agonist. Cancer Discov. 7, 1136–1153 (2017).

    CAS  Google Scholar 

  35. 35.

    Stolt, C. C. et al. The Sox9 transcription factor determines glial fate choice in the developing spinal cord. Genes Dev. 17, 1677–1689 (2003).

    CAS  Google Scholar 

  36. 36.

    Kuhlbrodt, K., Herbarth, B., Sock, E., Hermans-Borgmeyer, I. & Wegner, M. Sox10, a novel transcriptional modulator in glial cells. J. Neurosci. 18, 237–250 (1998).

    CAS  Google Scholar 

  37. 37.

    Kondo, T. & Raff, M. Basic helix-loop-helix proteins and the timing of oligodendrocyte differentiation. Development 127, 2989–2998 (2000).

    CAS  Google Scholar 

  38. 38.

    Nakatani, H. et al. Ascl1/Mash1 promotes brain oligodendrogenesis during myelination and remyelination. J. Neurosci. 33, 9752–9768 (2013).

    CAS  Google Scholar 

  39. 39.

    Smith, A. M. et al. The transcription factor PU.1 is critical for viability and function of human brain microglia. Glia 61, 929–942 (2013).

    Google Scholar 

  40. 40.

    Schlingensiepen, K. H. et al. The role of Jun transcription factor expression and phosphorylation in neuronal differentiation, neuronal cell death, and plastic adaptations in vivo. Cell. Mol. Neurobiol. 14, 487–505 (1994).

    CAS  Google Scholar 

  41. 41.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  Google Scholar 

  42. 42.

    Hemonnot, A. L., Hua, J., Ulmann, L. & Hirbec, H. Microglia in Alzheimer disease: well-known targets and new opportunities. Front. Aging Neurosci. 11, 233 (2019).

  43. 43.

    Efthymiou, A. G. & Goate, A. M. Late onset Alzheimer’s disease genetics implicates microglial pathways in disease risk. Mol. Neurodegener. 12, 43 (2017).

  44. 44.

    Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    CAS  Google Scholar 

  45. 45.

    Ghandi, M. et al. gkmSVM: an R package for gapped-kmer SVM. Bioinformatics 32, 2205–2207 (2016).

    CAS  Google Scholar 

  46. 46.

    Bromberg, Y. & Rost, B. Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics 24, i207–i212 (2008).

    Google Scholar 

  47. 47.

    Xu, W., Tan, L. & Yu, J.-T. The role of PICALM in Alzheimer’s disease. Mol. Neurobiol. 52, 399–413 (2015).

    CAS  Google Scholar 

  48. 48.

    Stage, E. et al. The effect of the top 20 Alzheimer disease risk genes on gray-matter density and FDG PET brain metabolism. Alzheimers Dement. (Amst) 5, 53–66 (2016).

    Google Scholar 

  49. 49.

    Andrew, R. J. et al. Reduction of the expression of the late-onset Alzheimer’s disease (AD) risk-factor BIN1 does not affect amyloid pathology in an AD mouse model. J. Biol. Chem. 294, 4477–4487 (2019).

    CAS  Google Scholar 

  50. 50.

    Ma, J., Yu, J.-T. & Tan, L. MS4A cluster in Alzheimer’s disease. Mol. Neurobiol. 51, 1240–1248 (2015).

    CAS  Google Scholar 

  51. 51.

    Rouka, E. et al. Differential recognition preferences of the three Src homology 3 (SH3) domains from the adaptor CD2-associated protein (CD2AP) and direct association with Ras and Rab interactor 3 (RIN3). J. Biol. Chem. 290, 25275–25292 (2015).

    CAS  Google Scholar 

  52. 52.

    Larsson, M. et al. GWAS findings for human iris patterns: associations with variants in genes that influence normal neuronal pattern development. Am. J. Hum. Genet. 89, 334–343 (2011).

    CAS  Google Scholar 

  53. 53.

    Kajiho, H. et al. RIN3: a novel Rab5 GEF interacting with amphiphysin II involved in the early endocytic pathway. J. Cell Sci. 116, 4159–4168 (2003).

    CAS  Google Scholar 

  54. 54.

    Lecours, C. et al. Microglial implication in Parkinson’s disease: loss of beneficial physiological roles or gain of inflammatory functions? Front. Cell. Neurosci. 12, 282 (2018).

    Google Scholar 

  55. 55.

    Kaushik, D. K., Gupta, M., Das, S. & Basu, A. Krüppel-like factor 4, a novel transcription factor regulates microglial activation and subsequent neuroinflammation. J. Neuroinflammation 7, 68 (2010).

    Google Scholar 

  56. 56.

    Schellenberg, G. D. & Montine, T. J. The genetics and neuropathology of Alzheimer’s disease. Acta Neuropathol. 124, 305–323 (2012).

    CAS  Google Scholar 

  57. 57.

    Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet. 37, 129–137 (2005).

    CAS  Google Scholar 

  58. 58.

    Zody, M. C. et al. Evolutionary toggling of the MAPT 17q21.31 inversion region. Nat. Genet. 40, 1076–1083 (2008).

    CAS  Google Scholar 

  59. 59.

    Valenca, G. T. et al. The role of MAPT haplotype H2 and isoform 1N/4R in Parkinsonism of older adults. PLoS ONE 11, e0157452 (2016).

    Google Scholar 

  60. 60.

    Allen, M. et al. Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levels. Alzheimers Res. Ther. 6, 39 (2014).

    Google Scholar 

  61. 61.

    Pascale, E. et al. Genetic architecture of MAPT gene region in Parkinson disease subtypes. Front. Cell. Neurosci. 10, 96 (2016).

    Google Scholar 

  62. 62.

    Beevers, J. E. et al. MAPT genetic variation and neuronal maturity alter isoform expression affecting axonal transport in iPSC-derived dopamine neurons. Stem Cell Reports 9, 587–599 (2017).

    CAS  Google Scholar 

  63. 63.

    Lai, M. C. et al. Haplotype-specific MAPT exon 3 expression regulated by common intronic polymorphisms associated with Parkinsonian disorders. Mol. Neurodegener. 12, 79 (2017).

    Google Scholar 

  64. 64.

    Huin, V. et al. Alternative promoter usage generates novel shorter MAPT mRNA transcripts in Alzheimer’s disease and progressive supranuclear palsy brains. Sci. Rep. 7, 12589 (2017).

    Google Scholar 

  65. 65.

    Pankratz, N. et al. Genomewide association study for susceptibility genes contributing to familial Parkinson disease. Hum. Genet. 124, 593–605 (2009).

    CAS  Google Scholar 

  66. 66.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  Google Scholar 

  67. 67.

    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    CAS  Google Scholar 

  68. 68.

    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    CAS  Google Scholar 

  69. 69.

    Li, Z. et al. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia. Nat. Genet. 49, 1576–1583 (2017).

    CAS  Google Scholar 

  70. 70.

    Duncan, L. et al. Significant locus and metabolic genetic correlations revealed in genome-wide association study of anorexia nervosa. Am. J. Psychiatry 174, 850–858 (2017).

    Google Scholar 

  71. 71.

    Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).

    CAS  Google Scholar 

  72. 72.

    Otowa, T. et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol. Psychiatry 21, 1391–1399 (2016).

    CAS  Google Scholar 

  73. 73.

    Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

    CAS  Google Scholar 

  74. 74.

    Anney, R. J. L. et al. Genetic determinants of common epilepsies: a meta-analysis of genome-wide association studies. Lancet Neurol. 13, 893–903 (2014).

    CAS  Google Scholar 

  75. 75.

    Zillikens, M. C. et al. Large meta-analysis of genome-wide association studies identifies five loci for lean body mass. Nat. Commun. 8, 80 (2017).

    Google Scholar 

  76. 76.

    Kemp, J. P. et al. Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis. Nat. Genet. 49, 1468–1475 (2017).

    CAS  Google Scholar 

  77. 77.

    Howson, J. M. M. et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 49, 1113–1119 (2017).

    CAS  Google Scholar 

  78. 78.

    Harvey, C. T. et al. QuASAR: quantitative allele-specific analysis of reads. Bioinformatics 31, 1235–1242 (2015).

    Google Scholar 

  79. 79.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  80. 80.

    Granja, J. M. et al. ArchR: an integrative and scalable software package for single-cell chromatin accessibility analysis. Preprint at bioRxiv https://doi.org/10.1101/2020.04.28.066498 (2020).

  81. 81.

    Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    CAS  Google Scholar 

  82. 82.

    Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    CAS  Google Scholar 

  83. 83.

    Cusanovich, D. A. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555, 538–542 (2018).

    CAS  Google Scholar 

  84. 84.

    Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    CAS  Google Scholar 

  85. 85.

    van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729.e27 (2018).

    CAS  Google Scholar 

  86. 86.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Google Scholar 

  87. 87.

    Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Google Scholar 

  88. 88.

    Bhattacharyya, S., Chandra, V., Vijayanand, P. & Ay, F. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat. Commun. 10, 4221 (2019).

    Google Scholar 

  89. 89.

    Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).

    CAS  Google Scholar 

  90. 90.

    Krueger, F. & Andrews, S. R. SNPsplit: allele-specific splitting of alignments between genomes with known SNP genotypes. F1000Res. 5, 1479 (2016).

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Institutes of Health (NIH) grant nos. NS062684, AG057707, AG053959, AG047366 (to T.J.M.), HG007735 (to H.Y.C.), HG009431 (to S.B.M./A.K.), AG066490 (to S.B.M.) and AG059918 (to M.R.C.). Additional support for patient sample collection was provided by NIH grant nos. AG005136 and AG019610. Sequencing data for this project were generated on an Illumina HiSeq 4000 System supported in part by NIH grant no. S10OD018220. Additional resources at the Stanford Center for Genomics and Personalized Medicine Sequencing Center were supported by NIH grant no. S10OD025212. M.R.C. is supported by the American Society of Hematology Scholar Award. A.S. is supported by the Stanford BioX Bowes fellowship. M.J.G. and T.E. are supported by a National Library of Medicine training grant no. 5T15LM007033-36. M.J.G. is additionally supported by a Stanford Graduate Fellowship. H.Y.C. is an Investigator of the Howard Hughes Medical Institute.

Author information

Affiliations

Authors

Contributions

M.R.C., H.Y.C. and T.J.M. conceived of and designed the project. M.R.C. and T.J.M. compiled the figures and wrote the manuscript with help and input from all authors. A.S. and M.R.C. performed the bulk ATAC–seq data processing and analysis. M.R.C. performed all HiChIP data analysis with help from M.R.M. and J.M.G. J.M.G., M.R.C. and A.S. performed all scATAC–seq data processing and analysis with supervision from W.J.G., A.K., S.B.M. and H.Y.C. M.J.G. performed the GWAS locus curation, colocalization analysis and GTEx analysis. M.J.G., L.F. and B.L. performed all LD score regression analysis with supervision from S.B.M. S.K. and A.S. performed the machine-learning analysis with supervision from A.K. S.K. and T.E. performed the allelic imbalance analyses with supervision from A.K. and S.B.M. B.H.L., S.S. and M.R.C. performed all ATAC–seq, scATAC–seq and HiChIP data generation with help from S.T.B. and M.R.M. K.S.M. curated the frozen tissue specimens used in this work.

Corresponding authors

Correspondence to Howard Y. Chang or Thomas J. Montine.

Ethics declarations

Competing interests

H.Y.C. is a cofounder of Accent Therapeutics, Boundless Bio, and an advisor to 10x Genomics, Arsenal Bio and Spring Discovery. S.B.M. is on the scientific advisory board of MyOme. A.K. is a consultant for Biogen. A.S. is a consultant for MyoKardia. W.J.G. is a consultant for Guardant Health, 10x Genomics and Protillion Biosciences.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Region-centric scATAC-seq identifies cellular and regional heterogeneity in chromatin accessibility in adult brain.

a, b, UMAP dimensionality reduction (a) prior to and (b) after batch correction with Harmony of scATAC-seq data from 10 different samples. Each dot represents a single cell (N = 70,631). Dots are colored by the sample of origin. Color labels are shown in Extended Data Fig. 1b. c, The same UMAP dimensionality reduction shown in Extended Data Fig. 1b but each cell is colored by its gene activity score for the annotated lineage-defining gene. Gene activity scores were imputed using MAGIC. Grey represents the minimum gene activity score while purple represents the maximum gene activity score for the given gene. The minimum and maximum scores are shown in the bottom left of each panel. The gene of interest and the cell type that it identified are shown in the upper left of each panel. MSNs – medium spiny neurons. d, Heatmap of cell type-specific markers used to define the cell type corresponding to each cluster. Color represents the row-wise Z-score of chromatin accessibility in the vicinity of each gene for each cluster. e, Cluster residence heatmap showing the percent of each cluster that is composed of cells from each sample. Cell numbers were normalized across samples prior to calculating cluster residence percentages to account for differences in total pass filter cells per sample. fh, UMAP dimensionality reduction as shown in Extended Data Fig. 1b but colored by (f) the gross brain region from which each cell was obtained, (g) the biological sex of the donor for each cell, or (h) the predicted cell class for each cell. ik, Bar plot showing the number of cells identified in our scATAC-seq data from (i) each of the annotated cell classes, (j) each of the annotated donors/samples, or (k) each of the gross brain regions subdivided based on cell class. Color represents the predicted cell class as shown in the legend of Extended Data Fig. 1h. l, m, Bar plot showing the percentage of cells in our scATAC-seq data from (l) each of the gross brain regions subdivided based on cell class or (m) each of the annotated cell classes subdivided based on donor/sample of origin. Color represents (l) the predicted cell class as shown in the Extended Data Fig. 1h or (m) the biological sample from which the cells were obtained.

Extended Data Fig. 2 Cellular heterogeneity in brain tissue necessitates single-cell approaches to capture biological complexity.

a, b, Bar plot of the log2(Fold Change) in the percent of peaks mapping to various genomic annotations comparing peaks from (a) the scATAC-seq peak set that are not overlapped by a peak from the bulk ATAC-seq peak set to peaks that are overlapped by a peak from the bulk ATAC-seq peak set or (b) the scATAC-seq peak set that were identified as cell type-unique through feature binarization to all peaks from the scATAC-seq peak set. c, Sequencing tracks of lineage-defining factors shown across all 24 scATAC-seq clusters (except Cluster 18 – putative doublets). From left to right, NEFL (neurons; chr8:24933431-24966791), AIF1 (aka IBA1, microglia; chr6:31607841-31617906), MOG (oligodendrocytes; chr6:29652183-29699713), GJB6 (astrocytes; chr13:20200243-20239571), and PDGFRA (OPCs; chr4:54209541-54303643). d, Box and whiskers plots showing the distribution of the number of single cells from our scATAC-seq data showing accessibility within (left) each peak from the set of peaks from the scATAC-seq peak set that overlap a peak from the bulk ATAC-seq peak set (N = 120,941 peaks) and (right) each peak from the set of peaks from the scATAC-seq peak set that do not overlap a peak from the bulk ATAC-seq peak set (N = 238,081 peaks). The lower and upper ends of the box represent the 25th and 75th percentiles and the internal line represents the median. The whiskers represent 1.5 multiplied by the inter-quartile range. P-value determined by Kolmogorov–Smirnov test. e, Dot plot showing the inter-region Pearson correlation of pseudo-bulk replicates comprised of all cells from either SMTG, PARL, or MDFG within each of the clusters shown. The clusters shown were selected based on biological relevance (that is clusters annotated as “substantia nigra astrocytes” should not be compared across isocortical regions) and on cluster size (that is clusters with small numbers of isocortical cells would not provide robust comparisons).

Extended Data Fig. 3 Neuronal sub-clustering identifies diverse biologically relevant populations of neurons.

ad, UMAP dimensionality reduction of neuronal cells (identified as Clusters 1-7, 11, and 12 from Fig. 1e) (a) prior to or (bd) after batch correction with Harmony of scATAC-seq data from 10 different samples. Each dot represents a single cell (N = 21,116). Dots are colored by (a, b) the sample of origin, (c) the neuronal sub-cluster (repeated from Fig. 2a), or (d) its gene activity score for the annotated lineage-defining gene. In (d), gene activity scores were imputed using MAGIC. Grey represents the minimum gene activity score while purple represents the maximum gene activity score for the given gene. The minimum and maximum scores are shown in the bottom left of each panel. The gene of interest is shown in the upper right of each panel. e, Heatmap of gene activity scores for all neuronal markers used in identifying relevant cell types for neuronal sub-clusters. Color represents the column-wise z-scores for each gene across all neuronal sub-clusters with values thresholded at -2 and +2. Neuronal cluster “major annotation” is shown by color along with a cluster description to the right of the plot. fh, The same UMAP dimensionality reduction shown in Extended Data Fig. 3c but cells are colored by (f) the major cell class annotation, (g) a more granular neuronal sub-annotation, or (h) the neuronal cell class annotation. Assignment was made based on gene activity scores of lineage-defining genes. The cell class annotation shown in (h) was used to perform LD score regression analysis.

Extended Data Fig. 4 Sub-clustering of cells from the substantia nigra identifies TH-positive dopaminergic neurons.

ad, UMAP dimensionality reduction after iterative LSI of scATAC-seq data from substantia nigra cells from 2 different samples. Each dot represents a single cell (N = 11,199). Dots are colored by (a) their corresponding substantia nigra sub-cluster, (b) the sample of origin, or (c, d) its gene activity score for (c) the tyrosine hydoxylase (TH) gene, a specific marker of dopaminergic neurons or (d) other lineage-defining genes. In (c, d), gene activity scores were imputed using MAGIC. Grey represents the minimum gene activity score while purple represents the maximum gene activity score. The minimum and maximum scores are shown in the bottom left of each panel. In (a-c), the predicted cluster cell type identities are overlaid on the UMAPs.

Extended Data Fig. 5 HiChIP and co-accessibilty predict enhancer-promoter interactions in primary adult human brain.

a, Heatmap representation of HiChIP interaction signal at 100-kb, 25-kb, and 5-kb resolution at the OLIG2 locus. Sample shown represents the substantia nigra from donor 03-41. Signal is normalized to the square root of the coverage. The maximum value of the color range and the coordinates along chromosome 21 are shown below each panel. b, Bar plots showing the total number of paired-end reads sequenced for each HiChIP library generated in this study. Color represents the brain region from which the data was generated. c, Bar plots showing the number of valid interaction pairs identified in HiChIP data from all samples profiled in this study. Color represents the type of interaction identified. d, Bar plot showing the overlap of FitHiChIP loop calls from the 4 gross brain regions profiled. Color indicates whether the loop was identified in a single region (unique) or more than one region (shared). e, Bar plot showing the classification of FitHiChIP loop calls based on whether the loop call contained an ATAC-seq peak (from either the bulk ATAC-seq peak set or the scATAC-seq peak set) or TSS in one, both, or no anchor. f, Bar plots showing the number of Cicero-predicted co-accessibility-based peak links that are observed in HiChIP (left) or the number of HiChIP-based FitHiChIP loop calls that are predicted as peak links by Cicero. g, Bar plot showing the number of cell type-specific peaks (defined as peaks identified through feature binarization; N = 221,062) or non-cell type-specific peaks (defined as scATAC-seq peaks that were not identified through feature binarization; N = 137,960) that overlap or do not overlap a Cicero-predicated co-accessibility linkage. Significance determined by Kolmogorov-Smirnov test.

Extended Data Fig. 6 A multi-omic tiered approach leverages machine learning to predict functional noncoding SNPs in AD and PD.

a, Flow chart of the analytical framework used to prioritize noncoding SNPs and predict functionality. The highest confidence SNPs (Tier 1) are supported by either machine learning predictions, allelic imbalance, or both. Moderate confidence SNPs (Tier 2) are supported by the presence of the SNP within a peak and a HiChIP loop or co-accessibility peak link that connects the SNP to a gene. Lower confidence SNPs (Tier 3) are only supported by the presence of the SNP in a peak. b, c, Box plot showing the area under (b) the precision-recall curve or (c) the receiver-operating characteristics curve for the gkm-SVM machine learning classifier. Performance for each of the 24 broad clusters is shown with dots representing outliers. The lower and upper ends of the box represent the 25th and 75th percentiles. The whiskers represent 1.5 multiplied by the inter-quartile range. The center line represents the median. d, GkmExplain importance scores shown across all 10 folds for each base across a 100-bp window surrounding rs636317 for the effect (left) and non-effect (right) bases. e, Dot plots showing comparison of the GkmExplain score, ISM score, and deltaSVM score. Each dot represents an individual SNP test in a given fold. Dot color represents the GWAS locus number. The only off-diagonal dots (circled) correspond to repetitive regions within the MAPT locus where the deltaSVM score appears to be particularly sensitive. f, Dot plot showing allelic imbalance assessed by RASQUAL across all bulk ATAC-seq data used in this study from a region-specific analysis. Significance is assessed by RASQUAL (see Methods). Dot color indicates the brain region found to have significant allelic imbalance. Grey dots do not pass significance testing based on an empircal distribution of permuted null q-values and a 10% false discovery rate. A RASQUAL effect size greater than 0.5 indicates that the alternate allele is enriched while less than 0.5 indicates that the reference allele is enriched. The plot is divided to show SNPs within the MAPT and DNAH17 loci (bottom) and SNPs in all other loci (top). SNPs mentioned in downstream analyses are highlighted by red text.

Extended Data Fig. 7 Multi-omic characterization of well-studied AD-related GWAS loci pinpoints putative functional noncoding SNPs.

a, c, Normalized scATAC-seq-derived pseudo-bulk tracks, H3K27ac HiChIP loop calls, co-accessibility correlations, and publically available H3K4me3 PLAC-seq loop calls (Nott. et al. 2019) in (a) the BIN1 gene locus (chr2:127045000-127182000) and (c) the MS4A gene locus (chr11:60023000-60554000). scATAC-seq tracks represent the aggregate signal of all cells from the given cell type and have been normalized to the total number of reads in TSS regions, enabling direct comparison of tracks across cell types. For HiChIP, each line represents a FitHiChIP loop call connecting the points on each end. Red lines contain one anchor overlapping the SNP of interest while grey lines do not. For co-accessibility, only interactions involving the accessible chromatin region of interest are shown. For PLAC-seq, MAPS loop calls from microglia (blue), neurons (orange), and oligodendrocytes (purple) are shown. b, d, GkmExplain importance scores for each base in the 50-bp region surrounding (b) rs13025717 or (d) rs636317 for the effect and non-effect alleles from the gkm-SVM model for microglia (Cluster 24). The predicted motif affected by the SNP is shown at the bottom and the SNP of interest is highlighted in blue. e, Dot plot showing allelic imbalance at rs636317. Significance of allelic imbalance was determined by RASQUAL. The bulk ATAC-seq counts determined by WASP and ASEReadCounter for the reference/non-effect (A) allele and variant/effect (T) allele are plotted. Each dot represents an individual bulk ATAC-seq sample (N = 140) colored by the brain region from which the sample was collected. Samples where fewer than 3 reads were present to support both the reference and variant allele (that is presumed homozygotes or samples with insufficient sequencing depth) are shown in grey. The blue line represents a linear regression of the non-grey points and the grey box represents the 95% confidence interval of that regression.

Extended Data Fig. 8 Multi-omic characterization of noncoding SNPs identifies novel genes implicated in PD.

a, c, Normalized scATAC-seq-derived pseudo-bulk tracks, H3K27ac HiChIP loop calls, co-accessibility correlations, and publically available H3K4me3 PLAC-seq loop calls (Nott. et al. 2019) in (a) the IP6K2 gene locus (chr3:48671000-49205000) or (c) the TMEM163 gene locus (chr2:134429000-134905000). scATAC-seq tracks represent the aggregate signal of all cells from the given cell type and have been normalized to the total number of reads in TSS regions, enabling direct comparison of tracks across cell types. For HiChIP, each line represents a FitHiChIP loop call connecting the points on each end. Red lines contain one anchor overlapping the SNP of interest while grey lines do not. For co-accessibility, only interactions involving the accessible chromatin region of interest are shown. For PLAC-seq, MAPS loop calls from microglia (blue), neurons (orange), and oligodendrocytes (purple) are shown. b, d, GkmExplain importance scores for each base in the 50-bp region surrounding (b) rs6781790 or (d) rs7599054 for the effect and non-effect alleles from the gkm-SVM model for (b) astrocytes (Cluster 15) or (d) microglia (Cluster 24). The predicted motif affected by the SNP is shown at the bottom and the SNP of interest is highlighted in blue. e, Dot plot comparing the –log10(p-value) from 23andMe PD GWAS data with the –log10(p-value) from GTEx Caudate eQTL data of SNPs in the TMEM163 locus. Each dot represents an individual SNP. Dot color represents the r2 value of LD with the lead SNP (rs7599054 – purple diamond) within a European reference population. f, g, Dot plots showing the genomic coordinates of each SNP and the –log10(p-value) from (f) 23andMe PD GWAS data or (g) GTEx Caudate eQTL data. Dots are colored as in Extended Data Fig. 8e. In (eg), p-values are based on genome-wide chi-squared statistics from the relevant GWAS and eQTL studies.

Extended Data Fig. 9 Epigenomic dissection of the MAPT locus.

a, Flowchart illustrating the analytical scheme used to identify bins with significant allelic imbalance across the H1 and H2 MAPT haplotypes. b, Heatmaps showing chromatin accessibility in 500-bp bins identified as having significantly different accessibility across MAPT haplotypes. Regions are shown for homozygous samples without allelic read splitting (left) and for heterozygous samples after allelic read splitting (right). Bin start coordinates are shown to the right. c, Box and whiskers plots for multiple regions which show differential chromatin accessibility across the H1 and H2 MAPT haplotypes. Each dot represents a single homozygous H1 (N = 91) or homozygous H2 (N = 12) sample. Heterozygotes are not shown. The lower and upper ends of the box represent the 25th and 75th percentiles. The whiskers represent 1.5 multiplied by the inter-quartile range. The center line represents the median. d, e, Gene expression of (d) the KANSL1-AS1 gene or (e) the MAPK8IP1P2 gene shown as a box plot from GTEx cortex brain samples subdivided based on MAPT haplotype. The lower and upper ends of the box represent the 25th and 75th percentiles. The whiskers represent 1.5 multiplied by the inter-quartile range. The center line represents the median. ***p < 10-5 based on Wilcoxon rank sum test. N = 117 H1/H1, 78 H1/H2, and 10 H2/H2. f, Sequencing tracks from pseudo-bulk data derived from predicted cell types in scATAC-seq data. This region represents a zoomed in view of the predicted distal regulatory region (chr17:45216500-45324000) that interacts with the MAPT promoter in the H1 haplotype. Putative neuron-specific regulatory elements are highlighted in blue. g, Box plots showing differential HiChIP interaction signal occurring between regions within the MAPT inversion and regions outside the inversion (“left” or “right”). The schematic at the top explains the analysis performed. The box plots show normalized HiChIP interaction counts for the H1 (N = 6) and H2 (N = 6) haplotypes for upstream/“left” interactions and downstream/“right” interactions. P-value determined by paired two-sided t-test.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6, Notes 1–7 and Methods

Reporting Summary

Supplementary Tables 1 and 2

Donor information, sequencing statistics, GWAS SNP metadata and multi-omic characterizations

Supplementary Data 1

Final merged peak set derived from all bulk ATAC–seq data.

Supplementary Data 2

Quality control information for all individual cells profiled by scATAC–seq and the cluster residence information for all clusters and samples.

Supplementary Data 3

Final merged peak set derived from all scATAC–seq data.

Supplementary Data 4

Results of feature binarization from scATAC–seq data showing cell-type-specific peaks.

Supplementary Data 5

CIBERSORT signature matrices for the cell group-specific and cluster-specific classifiers.

Supplementary Data 6

Results of differential accessibility comparisons between the substantia nigra and isocortex for astrocytes, OPCs, oligodendrocytes, and microglia.

Supplementary Data 7

Neuronal cluster definitions, and cell class-specific peaks, TF motifs, and genes.

Supplementary Data 8

Results of all LD score regression analyses across all conditions and cell types.

Supplementary Data 9

All FitHiChIP loop calls overlapping a SNP on at least one anchor.

Supplementary Data 10

Results of RASQUAL allelic imbalance analysis and allele counts from WASP/ASEReadCounter for all tested SNPs.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Corces, M.R., Shcherbina, A., Kundu, S. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat Genet 52, 1158–1168 (2020). https://doi.org/10.1038/s41588-020-00721-x

Download citation

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing