Abstract

Enhancers function as DNA logic gates and may control specialized functions of billions of neurons. Here we show a tailored program of noncoding genome elements active in situ in physiologically distinct dopamine neurons of the human brain. We found 71,022 transcribed noncoding elements, many of which were consistent with active enhancers and with regulatory mechanisms in zebrafish and mouse brains. Genetic variants associated with schizophrenia, addiction, and Parkinson’s disease were enriched in these elements. Expression quantitative trait locus analysis revealed that Parkinson’s disease-associated variants on chromosome 17q21 cis-regulate the expression of an enhancer RNA in dopamine neurons. This study shows that enhancers in dopamine neurons link genetic variation to neuropsychiatric traits.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

RNA-seq and genotyping raw data have been deposited in dbGAP under accession number phs001556.v1.p1. The supporting data and eQTL results for the BRAINcode project can be queried at http://www.humanbraincode.org through a user-friendly interface. Other data supporting the findings of this study are available upon reasonable request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Cookson, W., Liang, L., Abecasis, G., Moffatt, M. & Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10, 184–194 (2009).

  2. 2.

    Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). ENCODE Project Consortium et al..

  3. 3.

    Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

  4. 4.

    Kowal, S. L., Dall, T. M., Chakrabarti, R., Storm, M. V. & Jain, A. The current and projected economic burden of Parkinson’s disease in the United States. Mov. Disord. 28, 311–318 (2013).

  5. 5.

    Cloutier, M. et al. The economic burden of schizophrenia in the United States in 2013. J. Clin. Psychiatry 77, 764–771 (2016).

  6. 6.

    National Institute of Drug Abuse. Treatment Statistics. DrugAbuse.gov https://www.drugabuse.gov/sites/default/files/drugfacts_treatmentstats.pdf (2011).

  7. 7.

    Hassan, A. & Benarroch, E. E. Heterogeneity of the midbrain dopamine system. Neurology 85, 1795–1805 (2015).

  8. 8.

    Zheng, B. et al. PGC-1α, a potential therapeutic target for early intervention in Parkinson’s disease. Sci. Transl. Med. 2, 52ra73 (2010).

  9. 9.

    Liang, W. S. et al. Neuronal gene expression in non-demented individuals with intermediate Alzheimer’s disease neuropathology. Neurobiol. Aging. 31, 549–566 (2010).

  10. 10.

    Elstner, M. et al. Neuromelanin, neurotransmitter status and brainstem location determine the differential vulnerability of catecholaminergic neurons to mitochondrial DNA deletions. Mol. Brain 4, 43 (2011).

  11. 11.

    Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

  12. 12.

    Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).

  13. 13.

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

  14. 14.

    Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).

  15. 15.

    Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

  16. 16.

    Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).

  17. 17.

    Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009).

  18. 18.

    Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome. Biol. 13, R48 (2012).

  19. 19.

    Engström, P. G., Fredman, D. & Lenhard, B. Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes. Genome. Biol. 9, R34 (2008).

  20. 20.

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  21. 21.

    Akbarian, S. et al. The PsychENCODE project. Nat. Neurosci. 18, 1707–1712 (2015).

  22. 22.

    Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: A method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 1–9 (2015).

  23. 23.

    Mittal, S. et al. β2-Adrenoreceptor is a regulator of the α-synuclein gene driving risk of Parkinson’s disease. Science 357, 891–898 (2017).

  24. 24.

    Scherzer, C. R. et al. GATA transcription factors directly regulate the Parkinson's disease-linked gene alpha-synuclein. Proc. Natl. Acad. Sci. USA. 105, 10907–10912 (2008).

  25. 25.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  26. 26.

    Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

  27. 27.

    Ellingsen, S. et al. Large-scale enhancer detection in the zebrafish genome. Development 132, 3799–3811 (2005).

  28. 28.

    Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007).

  29. 29.

    Welter, D. et al. The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

  30. 30.

    Jiang, Y. et al. A genetic screen to assess dopamine receptor (DopR1) dependent sleep regulation in Drosophila. G3 (Bethesda) 6, 4217–4226 (2016).

  31. 31.

    González, S. et al. Circadian-related heteromerization of adrenergic and dopamine D4 receptors modulates melatonin synthesis and release in the pineal gland. PLoS Biol. 10, e1001347 (2012).

  32. 32.

    Breen, D. P. et al. Sleep and circadian rhythm regulation in early Parkinson disease. JAMA Neurol. 71, 589–595 (2014).

  33. 33.

    Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).

  34. 34.

    Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

  35. 35.

    Nalls, M. A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease. Nat. Genet. 46, 989–993 (2014).

  36. 36.

    Wu, H. et al. Tissue-specific RNA expression marks distant-acting developmental enhancers. PLoS Genet. 10, e1004610 (2014).

  37. 37.

    Mercer, E. M. et al. Multilineage priming of enhancer repertoires precedes commitment to the B and myeloid cell lineages in hematopoietic progenitors. Immunity 35, 413–425 (2011).

  38. 38.

    Ostuni, R. et al. Latent enhancers activated by stimulation in differentiated cells. Cell 152, 157–171 (2013).

  39. 39.

    Koolen, D. A. et al. Mutations in the chromatin modifier gene KANSL1 cause the 17q21.31 microdeletion syndrome. Nat. Genet. 44, 639–641 (2012).

  40. 40.

    Li, R. et al. Six novel susceptibility loci for early-onset androgenetic alopecia and their unexpected association with common diseases. PLoS Genet. 8, e1002746 (2012).

  41. 41.

    Adams, H. H. H. et al. Novel genetic loci underlying human intracranial volume identified through genome-wide association. Nat. Neurosci. 19, 1569–1582 (2016).

  42. 42.

    Torsney, K. M. et al. Bone health in Parkinson’s disease: a systematic review and meta-analysis. J. Neurol. Neurosurg. Psychiatry 85, 1159–1166 (2014).

  43. 43.

    Ding, H. et al. Unrecognized vitamin D3 deficiency is common in Parkinson disease: Harvard Biomarker Study. Neurology 81, 1531–1537 (2013).

  44. 44.

    Saliba, A. E., Westermann, A. J., Gorski, S. A. & Vogel, J. Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42, 8845–8860 (2014).

  45. 45.

    Lake, B. B. et al. Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science 352, 1586–1590 (2016).

  46. 46.

    Lee, J. H. et al. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues. Nat. Protoc. 10, 442–458 (2015).

  47. 47.

    Hughes, A. J., Daniel, S. E., Kilford, L. & Lees, A. J. Accuracy of clinical diagnosis of idiopathic Parkinson’s disease: a clinico-pathological study of 100 cases. J. Neurol. Neurosurg. Psychiatry 55, 181–184 (1992).

  48. 48.

    The National Institute on Aging and Reagan Institute Working Group on Diagnostic Criteria for the Neuropathological Assessment of Alzheimer’s Disease. Consensus recommendations for the postmortem diagnosis of Alzheimer’s disease. Neurobiol. Aging. 18 Suppl, S1–S2 (1997).

  49. 49.

    Bonanni, L., Thomas, A., Onofrj, M. & McKeith, I. G. Diagnosis and management of dementia with Lewy bodies: third report of the DLB Consortium. Neurology 66, 1455 (2006). author reply 1455.

  50. 50.

    Schroeder, A. et al. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol. Biol. 7, 3 (2006).

  51. 51.

    Unni, V. K., Ebrahimi-Fakhari, D., Vanderburg, C. R., McLean, P. J. & Hyman, B. T. Studying protein degradation pathways in vivo using a cranial window-based approach. Methods 53, 194–200 (2011).

  52. 52.

    Ingelsson, M. et al. No alteration in tau exon 10 alternative splicing in tangle-bearing neurons of the Alzheimer’s disease brain. Acta Neuropathol. 112, 439–449 (2006).

  53. 53.

    Liu, G. et al. Metal exposure and Alzheimer’s pathogenesis. J. Struct. Biol. 155, 45–51 (2006).

  54. 54.

    Kurn, N. et al. Novel isothermal, linear nucleic acid amplification systems for highly multiplexed applications. Clin. Chem. 51, 1973–1981 (2005).

  55. 55.

    Faherty, S. L., Campbell, C. R., Larsen, P. A. & Yoder, A. D. Evaluating whole transcriptome amplification for gene profiling experiments using RNA-seq. BMC Biotechnol. 15, 65 (2015).

  56. 56.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

  57. 57.

    Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).

  58. 58.

    Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

  59. 59.

    Anvar, S. Y. et al. Determining the quality and complexity of next-generation sequencing data without a reference genome. Genome. Biol. 15, 555 (2014).

  60. 60.

    Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  61. 61.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  62. 62.

    ’t Hoen, P. A. C. et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).

  63. 63.

    Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).

  64. 64.

    Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

  65. 65.

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  66. 66.

    Zhao, Y. et al. NONCODE 2016: an informative and valuable data source of long non-coding RNAs. Nucleic Acids Res. 44 D1, D203–D208 (2016).

  67. 67.

    Micallef, L. & Rodgers, P. eulerAPE: drawing area-proportional 3-Venn diagrams using ellipses. PLoS One 9, e101717 (2014).

  68. 68.

    Wang, J. et al. Factorbook. org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res. 41, (D171–D176 (2013).

  69. 69.

    Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

  70. 70.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45 D1, D896–D901 (2017).

  71. 71.

    Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).

  72. 72.

    Untergasser, A. et al. Primer3–new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).

  73. 73.

    Meng, A., Tang, H., Ong, B. A., Farrell, M. J. & Lin, S. Promoter analysis in living zebrafish embryos identifies a cis-acting motif required for neuronal expression of GATA-2. Proc. Natl. Acad. Sci. USA. 94, 6267–6272 (1997).

  74. 74.

    Wen, L. et al. Visualization of monoaminergic neurons and neurotoxicity of MPTP in live transgenic zebrafish. Dev. Biol. 314, 84–92 (2008).

  75. 75.

    Piñero, J. et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45 D1, D833–D839 (2017).

  76. 76.

    Mathelier, A. et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44 D1, D110–D115 (2016).

  77. 77.

    Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

  78. 78.

    Takahashi, H., Lassmann, T., Murata, M. & Carninci, P. 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat. Protoc. 7, 542–561 (2012).

  79. 79.

    Lassmann, T., Hayashizaki, Y. & Daub, C. O. TagDust–a program to eliminate artifacts from next generation sequencing data. Bioinformatics 25, 2839–2840 (2009).

  80. 80.

    Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

Download references

Acknowledgements

We thank H. Suzuki and T. Suzuki of RIKEN for providing the modified pGL4.10_mod3_EF1α vector and consultation. We are grateful to C. Vanderburg of the Advanced Tissue Resource Center, Massachusetts General Hospital, for his expertise and support. We thank Z. Weng at the University of Massachusetts Medical School for sharing additional data from the ENCODE consortium. We thank A. Sandelin and R. Andersson, both from Copenhagen University; A. Regev, Broad Institute; and M. Feany, Brigham & Women's Hospital, for insightful comments and guidance. We thank C. Liu, A. Shieh, and T. Goodman for assisting in extracting the RNA-seq and ATAC-seq data in the BrainGVEX dataset. We gratefully acknowledge the Banner Sun Health Institute, Massachusetts Alzheimer’s Disease Research Center at Massachusetts General Hospital, Harvard Brain Tissue Resource Center at McLean Hospital, University of Kentucky ADC Tissue Bank, University of Maryland Brain and Tissue Bank, Pacific Northwest Dementia and Aging Neuropathology Group at University of Washington Medicine Center, and Neurological Foundation of New Zealand for providing human brain tissue. This study was funded in part by NIH grant U01 NS082157 and the US Department of Defense (to C.R.S.); NIH R01AG057331 (to C.R.S.) funded RNA-seq of pyramidal neurons; with additional support from the Michael J. Fox Foundation (MJFF) (to C.R.S. and C.H.A., respectively); the Australia NHMRC GNT1067350 (to A.A.C. and J.S.M.); NIA P30 AG028383 (to P.T.N.); UK Wellcome Trust Investigator award (to F.M.); NINDS U24 NS072026 National Brain and Tissue Resource for Parkinson’s Disease and Related Disorders (to T.G.B. and C.H.A.); NIA P50 AG005134 (to M.P.F.). The MSBB data were generated as part of the AMP-AD Consortium from postmortem brain tissue collected through the Mount Sinai VA Medical Center Brain Bank and were provided by E. Schadt from Mount Sinai School of Medicine. PsychENCODE data were generated as part of the PsychENCODE Consortium, supported by grants U01MH103339, U01MH103365, U01MH103392, U01MH103340, U01MH103346, R01MH105472, R01MH094714, R01MH105898, R21MH102791, R21MH105881, R21MH103877, and P50MH106934 awarded to S. Akbarian (Icahn School of Medicine at Mount Sinai), G. Crawford (Duke), S. Dracheva (Icahn School of Medicine at Mount Sinai), P. Farnham (USC), M. Gerstein (Yale), D. Geschwind (UCLA), T.M. Hyde (LIBD), A. Jaffe (LIBD), J.A. Knowles (USC), C. Liu (UIC), D. Pinto (Icahn School of Medicine at Mount Sinai), N. Sestan (Yale), P. Sklar (Icahn School of Medicine at Mount Sinai), M. State (UCSF), P. Sullivan (UNC), F. Vaccarino (Yale), S. Weissman (Yale), K. White (U Chicago), and P. Zandi (JHU).

Author information

Affiliations

  1. Precision Neurology Program, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA

    • Xianjun Dong
    • , Zhixiang Liao
    • , David Gritsch
    • , Yunfei Bai
    • , Joseph J. Locascio
    • , Ganqiang Liu
    • , Tao Wang
    •  & Clemens R. Scherzer
  2. Center for Advanced Parkinson’s Disease Research of Harvard Medical School and Brigham & Women’s Hospital, Boston, MA, USA

    • Xianjun Dong
    • , Zhixiang Liao
    • , David Gritsch
    • , Joseph J. Locascio
    • , Ganqiang Liu
    • , Tao Wang
    •  & Clemens R. Scherzer
  3. Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK

    • Yavor Hadzhiev
    •  & Ferenc Müller
  4. State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China

    • Yunfei Bai
  5. Department of Neurology, Massachusetts General Hospital, Boston, MA, USA

    • Joseph J. Locascio
    •  & Clemens R. Scherzer
  6. Sydney Medical School, Brain and Mind Centre, The University of Sydney, Sydney, New South Wales, Australia

    • Boris Guennewig
  7. Division of Neuroscience, Garvan Institute of Medical Research, Sydney, New South Wales, Australia

    • Boris Guennewig
    • , Antony A. Cooper
    •  & John S. Mattick
  8. St Vincent’s Clinical School, UNSW Sydney, Sydney, New South Wales, Australia

    • Boris Guennewig
    • , Antony A. Cooper
    •  & John S. Mattick
  9. German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany

    • Cornelis Blauwendraat
    • , Patrizia Rizzu
    •  & Peter Heutink
  10. Department of Neurology, Mayo Clinic, Scottsdale, AZ, USA

    • Charles H. Adler
  11. Harvard Brain Tissue Resource Center, McLean Hospital, Harvard Medical School, Boston, MA, USA

    • John C. Hedreen
  12. Centre for Brain Research, University of Auckland, Auckland, New Zealand

    • Richard L. M. Faull
  13. C.S. Kubik Laboratory for Neuropathology, Massachusetts General Hospital, Boston, MA, USA

    • Matthew P. Frosch
  14. Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA

    • Peter T. Nelson
  15. Banner Sun Health Research Institute, Sun City, AZ, USA

    • Thomas G. Beach
  16. Ann Romney Center for Neurologic Diseases, Brigham and Women’s Hospital, Boston, MA, USA

    • Clemens R. Scherzer
  17. Program in Neuroscience, Harvard Medical School, Boston, MA, USA

    • Clemens R. Scherzer

Authors

  1. Search for Xianjun Dong in:

  2. Search for Zhixiang Liao in:

  3. Search for David Gritsch in:

  4. Search for Yavor Hadzhiev in:

  5. Search for Yunfei Bai in:

  6. Search for Joseph J. Locascio in:

  7. Search for Boris Guennewig in:

  8. Search for Ganqiang Liu in:

  9. Search for Cornelis Blauwendraat in:

  10. Search for Tao Wang in:

  11. Search for Charles H. Adler in:

  12. Search for John C. Hedreen in:

  13. Search for Richard L. M. Faull in:

  14. Search for Matthew P. Frosch in:

  15. Search for Peter T. Nelson in:

  16. Search for Patrizia Rizzu in:

  17. Search for Antony A. Cooper in:

  18. Search for Peter Heutink in:

  19. Search for Thomas G. Beach in:

  20. Search for John S. Mattick in:

  21. Search for Ferenc Müller in:

  22. Search for Clemens R. Scherzer in:

Contributions

X.D. performed data analysis with important contributions from D.G., B.G., G.L., C.B., and T.W. T.G.B., C.H.A., M.P.F., P.T.N., J.C.H., R.L.M.F., and C.R.S. obtained and clinically and neuropathologically characterized patient samples. Z.L. and D.G. were responsible for laser-capture and RNA-seq data production. Z.L. and Y.B. performed validation experiments. F.M. and Y.H. designed and performed zebrafish experiments. C.B., P.R., and P.H. performed CAGE experiments. C.R.S. and X.D. wrote the paper with input from all other authors. C.R.S., J.S.M., F.M., A.A.C., and J.J.L. oversaw data analysis and interpretation. C.R.S. conceived, designed, analyzed, and interpreted the study.

Competing interests

C.R.S. has collaborated with Pfizer and Sanofi; has consulted for Sanofi; has served as Advisor to the Michael J. Fox Foundation, NIH, and Department of Defense; is on the Scientific Advisory Board of the American Parkinson Disease Association; received funding from the NIH, the US Department of Defense, the Michael J. Fox Foundation, and the American Parkinson Disease Association; and is named as co-inventor on two US patent applications on biomarkers for PD held in part by Brigham & Women’s Hospital. B.G. is the founder of Pacific Analytics PTY LTD, Australia and is a founding member of the International Cerebral Palsy Genetics Consortium, a member of the Australian Genomics Health Alliance, and is on the Scientific Advisory Board of Iggy Get Out!, Australia. T.G.B. provides consultancies to Prothena and GSK; is on the Advisory Board of Vivid Genomics; and has contracted research with Avid Radiopharmaceuticals, Navidea Biopharmaceuticals, and Aprinoia Therapeutics. The other authors declare no competing financial interests.

Corresponding author

Correspondence to Clemens R. Scherzer.

Integrated supplementary information

  1. Supplementary Figure 1 RNA-seq sample filters and quality control, including outlier detection, sex concordance, and assay performance measures.

    a, Schematic of RNA-seq sample filtering. IDs of excluded samples are listed under each filtering step in gray text. Four QC tests were performed to identify outlier samples based on systematic abnormalities in overall expression (b,c,d). Moreover, we tested for sex concordance to identify potential sample mix-ups (e). b, Dendrogram visualizing pairwise Spearman correlations between gene expression levels of individual neuronal and non-neuronal samples. c, Histogram of median pairwise k-mer distances for each of the 115 samples with all other samples. d, Histogram of median pairwise Spearman correlations (D-statistics) between gene expression levels of individual samples. e, Concordance between clinical sex and sex-specific gene expression in neuronal and non-neuronal samples: normalized expression levels of the female-specific XIST transcript (x axis) and normalized expression levels of the Y-chromosome specific RPS4Y1 transcript (y axis) are shown. f, Scatterplot of two technical replicates based on lcRNAseq. N = 57,814; all annotated genes in GENCODE v19. g, Differential expression changes in linearly amplified RNA samples versus non-amplified RNA samples are preserved using qPCR consistent with previous reports validating the isothermal linear amplification method9,10. ΔCT values indicate the relative abundance of a target gene in one substantia nigra sample compared to one sample of human universal RNA. On the x-axis the relative abundance of the target gene in linearly amplified cDNA from one substantia nigra sample compared to linearly amplified cDNA from one sample of universal RNA is shown. One the y-axis, the relative abundance of the target gene in non-amplified cDNA from the same substantia nigra sample compared to non-amplified cDNA from the same sample of universal RNA is shown. The source RNA used for both the amplified and non-amplified experiment was identical. 25 target genes were analyzed by qPCR. h, lcRNAseq of melanized neurons from the SNpc highly enriches for dopaminergic markers genes consistent with previous reports (Poulin, J.F. et al. Cell Rep. 2014; Cahoy, J.D. et al. J. Neurosci. 2008). Relative expression abundance of dopaminergic genes (tyrosine hydroxylase, TH; dopamine transporter, SLC6A3; vesicular monoamine transporter, SLC18A1; dopamine receptor D2, DRD2) was highly enriched in laser-captured dopamine neuron samples compared to substantia nigra homogenates. By contrast, expression of microglia marker genes (purinergic receptor, P2RY12; protein tyrosine phosphatase receptor, PTPRC), astrocyte markers (glial fibrillary acidic protein, GFAP; connexin 30, GJB6), oligodendroglia markers (oligodendroglial transcription factors OLIG1 and OLIG2), and myelin markers (myelin oligodendrocyte glycoprotein, MOG; peripheral myelin protein, PMP22) was low in the laser-captured dopamine neuron samples compared to nigral homogenates.

  2. Supplementary Figure 2 RNA-seq density for known cell type-specific marker genes in human dopamine neurons (SNDA), motor cortex pyramidal neurons (MCPY), temporal cortex pyramidal neurons (TCPY), peripheral blood mononuclear white blood cells (PBMCS), and fibroblasts (FB).

    a, The pile graph shows expression of the general neuron marker gene neurofilament medium chain (NEFM) in all three types of human brain neurons, but not in the non-neuronal cell types. b, Pile graphs show expression of four cell-type specific genes in the five human cell types. TH was mainly expressed in dopamine neurons, the empty spiracles homeobox 1 gene (EMX1), a known marker of pyramidal cells (Chan, C.H. et al. Cereb. Cortex, 2001), was mainly expressed in motor and temporal cortex pyramidal neurons; the cluster of differentiation 69 gene (CD69), a known marker of immediate early activation of human mononuclear white blood cells (Green, S. et al. J. Infect. Dis., 1999), was mainly expressed in PBMCs; and fibroblast activation protein FAP (Tillmanns, J. et al. J. Mol. Cell. Cardio., 2015) was chiefly expressed in fibroblasts. Y-axis, normalized RNA-seq reads density (log scale).

  3. Supplementary Figure 3 Characteristics of TNEs.

    a, Length distribution of TNEs. The distribution peaks at 426 bp as indicated by the red dashed line. b, Normalized GC content for TNE, promoter and random background regions. See Methods for details. c, Distribution of the distance from the middle position of TNEs to the TSS of host gene (for intronic TNEs) or to the nearest TSS (for intergenic TNEs). The distance is marked as positive for intronic TNEs (red) and negative for intergenic TNEs (blue). d, Relative position of intronic TNEs (red) and exonic lcRNAseq reads (blue) within the gene body. GENCODE v19 genes were normalized to the same length; x-axis represents the relative position of reads to each genes 5' TSS. TSS, transcription start site.

  4. Supplementary Figure 4 Dopamine neuron-specific mRNA and noncoding RNA subprograms.

    a, Distribution and b, counts of protein-coding mRNAs and non-coding (ncRNAs) expressed in dopamine neurons annotated in GENOCDE v19. c, Dopamine neuron transcription factors and the molecular machinery required to produce (dopamine decarboxylase, DDC), store (vesicular monoamine transporter 2, SLC18A1), and reuptake (dopamine transporter, SLC6A3) dopamine from the synaptic cleft was highly expressed in dopamine neurons (bold gene symbols in d). d, Expression of the twenty most abundant mRNAs and g, ncRNAs specific to dopamine neurons (magenta bars) compared to pyramidal neurons (cyan bars) (twenty left-most genes) and vice versa (twenty right-most genes) (note that for each RNA the corresponding bars indicating abundance in the second brain cell type, respectively, are also shown and are close to 0; insert in d); median ± m.a.d (median absolute deviation) are shown. N = 86 biologically independent dopamine neuron samples and 13 biologically independent pyramidal neuron samples, respectively. See the full list of neuron-specific genes in Supplementary Table 4. e, The Venn diagram shows the number of mRNAs and h, ncRNAs detected exclusively in dopamine neurons (DA), pyramidal neurons (PY), or non-neuronal cells (NN), respectively. f,i, The twenty abundant, cell type-exclusive mRNAs (f) and ncRNAs (i) were sufficient to accurately cluster the 106 individual samples into dopamine, pyramidal, and non-neuronal clusters. DA, substantia nigra dopamine neurons; MCPY, motor cortex pyramidal neurons; TCPY, temporal cortex pyramidal neurons; PBMC, primary human peripheral blood mononuclear cells; FB, primary human fibroblasts.

  5. Supplementary Figure 5 TNEs as enhancers supported by CAGE signal and TF binding.

    a, Representative dopamine neuron TNEs for which bi-directional CAGE signals were detected in human substantia nigra homogenates. 5,642 dopamine neuron-TNEs had at least 10 detected CAGE reads in both directions in human substantia nigra homogenates in our study; N = 4 biologically independent samples were analyzed. In each panel, the following tracks are displayed: Refseq gene; TNEs calls and normalized RNA-seq density in dopamine neurons (DA); normalized RNA-seq density in pyramidal neurons (PY) and non-neuronal cells (NN); total CAGE counts in substantia human substantia nigra homogenates (BRAINcode); combined DNase signal from Roadmap Epigenomics20, Integrated Regulation from ENCODE Tracks for H3K27ac, H3K4me1, H3K4me3, and transcription factor ChIP-seq peaks from ENCODE26. b, Transcription factors with ENCODE ChIP-seq peaks significantly enriched in dopamine neuron TNEs by one-sided Fisher’s exact test. ENCODE ChIP-seq peaks were based on the wgEncodeRegTfbsClusteredV3 file downloaded from UCSC Genome Browser, which contains 4,380,444 TF ChIP-seq peaks from 161 TFs in total. The red dash line indicates the significance threshold of Bonferroni-corrected P = 0.01 based on 161 TFs in total. c-d, JASPAR binding motifs significantly overrepresented in TNEs. The top 50 canonical transcription factor binding motifs most significantly overrepresented (compared to other transcription factor binding motifs) in all dopamine neuron-TNE (c) and in TNE exclusively expressed in dopamine neurons (d) are shown (by one-sided Fisher’s exact test). Note, bold fonts in (c) indicate motifs, whose enrichment was confirmed by a secondary method using randomly selected, length- and GC-matched genomic background sequences for comparison. The y-axes indicate the –log10(P value) from one-sided Fisher’s exact tests. Red dashed lines indicate the Bonferroni-corrected significance threshold of P = 0.01. 579 non-redundant vertebrate JASPAR CORE motifs76 were scanned.

  6. Supplementary Figure 6 VISTA-confirmed and brain cell-type-specific TNEs.

    a, Genomic context of a TNE located in intron 4 of the Parkinson’s gene SNCA. This TNE co-localizes with five GWAS-derived variants associated with PD risk. It is supported as an active enhancer region by marks from ENCODE and FANTOM, including high H3K27ac and H3K4me1, open chromatin indicated by DNase I hypersensitivity, hotspot for binding of more than 20 transcription factors, including EP300 (hallmark of active enhancers), RAD21 (a subunit of cohesion complex). GATA transcription factors are known to occupy conserved intronic binding motifs in SNCA (Scherzer et al., PNAS, 2008). This region is bi-directionally transcribed as supported by CAGE signal. b, UCSC genome browser screenshots of two dopamine neuron-specific TNE, H303 and H305, and two pyramidal neuron-specific TNE, H210 and H211. The exon4-exon5 junction of the SLC6A3 gene (encoding the dopamine transporter) was evaluated as positive control. For TNEs localized to an intron, the host gene symbol is shown. The displayed tracks are RNA-seq density in dopamine neurons (DA), pyramidal neurons (PY), and non-neuronal cells (NN), The gray box indicates the target transcript region probed by qPCR. Horizontal bars under the RNA-seq density track indicate TNEs calls. The number of spliced reads for the SLC6A3 e4-e5 junction site is highlighted in the sashimi plot. c, Enrichment of VISTA-validated enhancers in TNE overall and the sub-classes of TNEs. Number of enhancer elements is displayed under the bars. Significance in the hypergeometric test is shown above. n.s.: no significant. d, Genomic context of a VISTA-confirmed TNE (overlapping with VISTA element hs658) located in intron 3 of the human autism susceptibility candidate 2 gene (AUST2). This evolutionary conserved TNE is supported by the classical epigenetic features of a putative active enhancer including open chromatin (DNase)[20], high levels of H3K4me1 and H3K27ac[26], low levels of H3K4me3[26], TF ChIP-seq peak clusters[26], and sequence conservation.

  7. Supplementary Figure 7 Comparative genomics analysis of the VMP1–MIR21 locus.

    a, UCSC genome browser screenshot for the VMP1-MIR21 locus in humans (left) and its two main ortholog loci in zebrafish (right). The human locus has two ortholog copies in zebrafish due to a whole genome duplication event in the teleost fish. These are shown as the two long chained alignments in the dot frames (chr15+16277k and chr10-28843k, respectively). The TNE element we tested in zebrafish (VMP1-TNE) is marked in magenta. It is conserved on the zebrafish chr10 ortholog. Zoom-in details for the two zebrafish orthologs are shown on right. Note that the ortholog gene vmp1 is absent on the chr10 copy (the dot frame in the bottom right panel). b, Schematic of evolutionary changes in the VMP1-MIR21 locus following the whole genome duplication event in teleost fish. The MIR-21 gene resides adjacent to VMP1 and is in conserved synteny with the VMP1-TNE ortholog in zebrafish, suggesting that this candidate regulatory element may be shared between VMP1 and miR-21 or specifically target the miR-21 promoter. The VMP1-TNE element is highlighted in magenta and miR21 is highlighted in blue.

  8. Supplementary Figure 8 GWAS disease and traits are enriched in TNEs.

    GWAS diseases and traits significantly enriched in dopamine TNEs and TNE subclasses; one-sided Fisher’s exact test.

  9. Supplementary Figure 9 Dopamine neuron TNEs harbor a higher density of GWAS variants and diseases of the dopamine system than enhancer annotations that lack cell specificity.

    a, Localization of GWAS SNPs for all diseases/traits and GWAS SNPs for 11 dopamine system traits to distinct genomic regions. GWAS trait density analysis showed a higher density of GWAS variants for dopamine system traits in TNE active in midbrain dopamine neurons compared to FANTOM5-predicted putative enhancers and ChromHMM-predicted putative enhancers. This density was also higher than in coding exons, promoters, introns, and intergenic regions. ChromHMM enhancers were characterized by a combinatorial pattern of histone modification in nine human cell types by ENCODE23. FANTOM5-predicted enhancers were identified by CAGE13. The number of disease-associated variants in each region is shown within each bar. b, The resulting dopamine trait density was higher (18%) in dopamine neuron-TNE than in enhancer predictions that are not cell type-specific such as FANTOM5- and ChromHMM-predicted enhancers. c, Relative position of GWAS SNPs in dopamine neuron TNEs and their length-matched random regions

  10. Supplementary Figure 10 Dopamine neuron TNE–eQTL, ncRNA–eQTL, and mRNA–eQTL associations suggest gene-regulatory mechanisms for multiple annotated and novel transcripts expressed from loci causally linked to familial neurologic diseases.

    a, Manhatten plot of eQTL SNPs in association with TNEs (top panel), ncRNAs (middle panel), and mRNAs (bottom panel). SNPs with the best eQTL P-values in the local genomic regions are shown as diamonds. The color of the diamond indicates if the associated gene is related to nervous system disease (red), other Mendelian disease (blue), or non-Mendelian disease (grey), respectively. Thirteen of the 151 cis-regulated TNEs were expressed from host genes mutated in Mendelian diseases (TNE-eQTLs, top Manhatten plot), 10 (69%) of which are nervous system diseases (red diamonds). These TNE host genes included NRXN1 (MIM 600565) linked to rare autosomal recessive mental retardation and susceptibility to schizophrenia and CACNB4 (MIM 601949), mutated in autosomal dominant types of episodic ataxia and juvenile myoclonic epilepsy. 3,381 ncRNA-eQTLs were significant, comprising combinations of 3,320 unique eSNPs and 52 unique expressed ncRNA genes (ncRNA-eQTLs, middle Manhatten plot). 1,150 mRNA-eQTLs reached statistical significance, comprising combinations of 676 unique eSNPs and 46 unique associated expressed protein-coding genes (mRNA-eQTLs, bottom Manhatten plot). ATP5A1, CRYGC, and ADAMTS18, all of which are mutated in Mendelian diseases, exhibited significant mRNA-eQTL associations in dopamine neurons. P values from the linear regression model implemented in Matrix-eQTL are shown; FDR 0.05, red dashed line. b, TNE-eQTL boxplots visualizing TNE expression values by genotype for eight TNE physically localized to introns of familial neurologic disease genes. P-values are from Matrix-eQTL linear regression model; N = 84 biologically independent samples. Box plots as in Fig. 3b.

  11. Supplementary Figure 11 Comparison of eQTL P values with or without adjusting for rs17649553 SNP.

    Each dot represents a SNP-transcript pair. Only pairs of transcripts and SNPs within the MAPT locus (chr17:43000000-45300000 in hg19) are displayed. Y-axis is the -log10(p-value) of original eQTL analysis (without conditional analysis), and X-axis is the -log10(p-value) of conditional eQTL analysis after adjusting the rs17649553 SNP. Horizontal and vertical dash line is for p-value cutoff with FDR of 0.05. P-values are from the linear regression model implemented in Matrix-eQTL; N = 84 biologically independent samples. The majority of significant eSNPs became insignificant after conditional analysis of rs17649553, except 31 SNPs in the KANSL1 gene (green dots in the top-right corner) and rs17698176 in NSF (red dot in the top-right corner). The 31 SNPs are in the same LD block as rs17649553. Coordinates of LD block: LD2, chr17:43657257-44369320; LD3, chr17:44762252-44862613.

  12. Supplementary Figure 12 Confirmation and independent replication of the inverse eQTL relation between the top PD GWAS SNP rs17649553 and KANSL1-TNE1 and LRRC37A4P.

    a, This eQTL relation between rs17649553 and KANSL1-TNE1 and LRRC37A4P was confirmed by a second method, qPCR, in laser-captured dopamine neurons. b, Moreover, the association was replicated in a second, independent population representing pyramidal neurons laser-captured from temporal cortex of 31 high-quality control brains. The geometric mean of the two standard housekeeping genes, EIF4A2 and RPL13, was used to control for input RNA in (a) and (b); P values by two-tailed Student’s t-tests. Mean ± SEM are shown. c,d, Furthermore, the rs17649553-LRRC37A4P eQTL association was confirmed in silico in a third independent population comprising 56 substantial nigra (c) as well 96 frontal cortex (d) samples from the GTEx data set, which used a polyA+ selecting protocol that would not detect KANSL1-TNE1. P values by two-tailed Student’s t-tests. Box plots as in Fig. 3b.

  13. Supplementary Figure 13 Genotyping and eQTL pipelines.

    a, Genotyping pipeline. 93 subjects were initially genotyped on Illumina HumanOmni2.5 Exome BeadChips. After a series of sample and variant quality control steps, 6,124,720 SNPs for each of 91 subjects were retained for downstream eQTL analysis. b, Pipeline for eQTL analysis of TNE, ncRNAs, and mRNAs across 84 dopamine neuron samples.

  14. Supplementary Figure 14 TNE-calling pipeline.

    a, Dopamine neuron samples (DA) (N = 86 were merged using the trimmed mean of RNA-seq density at each nucleotide position. Pyramidal neuron samples (from motor or temporal cortex; N = 13) were similarly merged. Non-neuronal cell samples (from white blood cells and fibroblasts; N = 7) were similarly merged. Genome-wide scanning using the six-step method identified TNE sets for each group. b, Histogram of RNA-seq density (RPM) across 1,000,000 randomly selected single nucleotides. The histogram was fitted into a normal distribution that was used to determine the local summit cutoff with P = 0.05 (informing step #2 of the TNE identification method).

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–14

  2. Reporting Summary

  3. Supplementary Note and Supplementary Tables 1, 5, 7, 9, 11, 12

  4. Supplementary Table 2

    RNA-seq meta-data.

  5. Supplementary Table 3

    BED file of locations of TNEs called in substantia nigra dopamine neurons (SNDA), pyramidal neurons (PY), and non-neuronal cells (NN), respectively, in bed format. Coordinates refer to the hg19 assembly.

  6. Supplementary Table 4

    Genes expressed specifically in dopamine neurons (SNDA) compared to pyramidal neurons (PY).

  7. Supplementary Table 6

    Primer sequences for human brain, cell culture, and zebrafish experiments.

  8. Supplementary Table 8

    Enrichment analysis showed that TNE host genes are enriched in Gene Ontology terms related to synapse function. 151 cis-regulated TNEs physically localized to introns of 102 host genes. Gene set enrichment analysis was performed using the C5 gene sets (GO terms) implemented in the SMsigDB database using the hypergeometic test. Each gene set contains genes annotated to the same GO term. For each gene set, the hypergeometric test was performed for k-1, K, N - K, n; where k is the number of TNE host genes genes that are part of a GO term gene set; K is the total number of genes annotated to the same GO term gene set; N is the total number of all known human genes; and n is the number of genes in the query set. The top 50 GO terms enriched in these TNE host genes are shown (all with a FDR q value < 0.05).

  9. Supplementary Table 10

    Summary of the top TNE-eQTLs, ncRNA-eQTLs, and mRNA-eQTLs linked to disease-associated variants. P-values are from Matrix-eQTL linear regression model (n=84 biologically independent samples).

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41593-018-0223-0