Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Decoding gene regulation in the fly brain


The Drosophila brain is a frequently used model in neuroscience. Single-cell transcriptome analysis1,2,3,4,5,6, three-dimensional morphological classification7 and electron microscopy mapping of the connectome8,9 have revealed an immense diversity of neuronal and glial cell types that underlie an array of functional and behavioural traits in the fly. The identities of these cell types are controlled by gene regulatory networks (GRNs), involving combinations of transcription factors that bind to genomic enhancers to regulate their target genes. Here, to characterize GRNs at the cell-type level in the fly brain, we profiled the chromatin accessibility of 240,919 single cells spanning 9 developmental timepoints and integrated these data with single-cell transcriptomes. We identify more than 95,000 regulatory regions that are used in different neuronal cell types, of which 70,000 are linked to developmental trajectories involving neurogenesis, reprogramming and maturation. For 40 cell types, uniquely accessible regions were associated with their expressed transcription factors and downstream target genes through a combination of motif discovery, network inference and deep learning, creating enhancer GRNs. The enhancer architectures revealed by DeepFlyBrain lead to a better understanding of neuronal regulatory diversity and can be used to design genetic driver lines for cell types at specific timepoints, facilitating their characterization and manipulation.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Chromatin landscape of adult brain cell types.
Fig. 2: Chromatin changes through neuronal development.
Fig. 3: Identification of regulators through multi-omic data integration.
Fig. 4: DL analysis unravels enhancer make-up.
Fig. 5: eGRNs identify cell-type-specific activators and repressors.

Data availability

The data generated for this study have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession numbers GSE163697 and GSE181494 (DGRP lines). We also provide a dedicated website to browse the results of the analyses and processed data (, which provides link-outs to the SCope session (, UCSC hub (, the eGRNs in NDEx, the DeepExplainer plots of enhancers and other information. The following online databases were used: FlyBase (, FlyMine (, icis-Target (, FlyLight (, CIS-BP (, ENCODE (, ENCFF704WGH). The following publicly accessible datasets were also used: GSE107451 (scRNA-seq adult brain), GSE157202 (scRNA-seq larval brain), GSE101581 (scATAC-seq embryo). The neural network is from Özel et al5.

Code availability

The updated version of cisTopic for scATAC-seq clustering and topic identification including warpLDA are available at GitHub ( with set-up instructions and a tutorial. The Nextflow pipeline for scRNA-seq analysis is available at GitHub ( together with example config files and instructions. DeepFlyBrain is deposited in Kipoi (, and the Jupyter notebooks that can be used to train the model are provided in Supplementary Data 35. Enhancer gene links can be calculated using ScoMAP ( and GENIE3 ( Trajectory analysis was performed using Monocle3 according to the package tutorials ( Differential expression, accessibility and integration of RNA-seq and ATAC-seq was performed using Seurat v.3 (with vignettes and install instructions at TaDa analysis was performed using Perl scripts available at GitHub ( Code for the website is available at GitHub ( and notebooks are available at GitHub (


  1. Li, H. et al. Classifying Drosophila olfactory projection neuron subtypes by single-cell RNA sequencing. Cell 171, 1206–1220 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Davie, K. et al. A single-cell transcriptome atlas of the aging Drosophila brain. Cell 174, 982–998 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Konstantinides, N. et al. Phenotypic convergence: distinct transcription factors regulate common terminal features. Cell 174, 622–635 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Croset, V., Treiber, C. D. & Waddell, S. Cellular diversity in the Drosophila midbrain revealed by single-cell transcriptomics. eLife 7, e34550 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Özel, M. N. et al. Neuronal diversity and convergence in a visual system developmental atlas. Nature 589, 88–95 (2020).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  6. Kurmangaliyev, Y. Z., Yoo, J., Valdes-Aleman, J., Sanfilippo, P. & Zipursky, S. L. Transcriptional programs of circuit assembly in the Drosophila visual system. Neuron 108, 1045–1057 (2020).

    Article  CAS  PubMed  Google Scholar 

  7. Costa, M., Manton, J. D., Ostrovsky, A. D., Prohaska, S. & Jefferis, G. S. X. E. NBLAST: rapid, sensitive comparison of neuronal structure and construction of neuron family databases. Neuron 91, 293–311 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Zheng, Z. et al. A complete electron microscopy volume of the brain of adult Drosophila melanogaster. Cell 174, 730–743 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Scheffer, L. K. et al. A connectome and analysis of the adult Drosophila central brain. eLife 9, e57443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Jenett, A. et al. A GAL4-driver line resource for Drosophila neurobiology. Cell Rep. 2, 991–1001 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Robie, A. A. et al. Mapping the neural substrates of behavior. Cell 170, 393–406 (2017).

    Article  CAS  PubMed  Google Scholar 

  12. Ravenscroft, T. A. et al. Drosophila voltage-gated sodium channels are only expressed in active neurons and are localized to distal axonal initial segment-like domains. J. Neurosci. 40, 7999–8024 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Konstantinides, N. et al. A comprehensive series of temporal transcription factors in the fly visual system. Preprint at (2021).

  14. Allen, A. M. et al. A single-cell transcriptomic atlas of the adult Drosophila ventral nerve cord. eLife 9, e54074 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Doe, C. Q. Temporal patterning in the Drosophila CNS. Annu. Rev. Cell Dev. Biol. 33, 219–240 (2017).

    Article  CAS  PubMed  Google Scholar 

  16. Estacio-Gómez, A., Hassan, A., Walmsley, E., Le, L. W. & Southall, T. D. Dynamic neurotransmitter specific transcription factor expression profiles during Drosophila development. Biol. Open 9, bio052928 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Komiyama, T., Johnson, W. A., Luo, L. & Jefferis, G. S. X. E. From lineage to wiring specificity. POU domain transcription factors control precise connections of Drosophila olfactory projection neurons. Cell 112, 157–167 (2003).

    Article  CAS  PubMed  Google Scholar 

  18. Kurmangaliyev, Y. Z., Yoo, J., LoCascio, S. A. & Zipursky, S. L. Modular transcriptional programs separately define axon and dendrite connectivity. eLife 8, e50822 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Schilling, T., Ali, A. H., Leonhardt, A., Borst, A. & Pujol-Martí, J. Transcriptional control of morphological properties of direction-selective T4/T5 neurons in Drosophila. Development 146, dev169763 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Masserdotti, G., Gascón, S. & Götz, M. Direct neuronal reprogramming: learning from and for development. Development 143, 2494–2510 (2016).

    Article  CAS  PubMed  Google Scholar 

  21. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bravo González-Blas, C. et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat. Methods 16, 397–400 (2019).

    Article  PubMed  Google Scholar 

  24. Kirilly, D. et al. A genetic pathway composed of Sox14 and Mical governs severing of dendrites during pruning. Nat. Neurosci. 12, 1497–1505 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Atak, Z. K. et al. Interpretation of allele-specific chromatin accessibility using cell state–aware deep learning. Genome Res. 31, 1082–1096 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Minnoye, L. et al. Cross-species analysis of enhancer logic using deep learning. Genome Res. 30, 1815–1834 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Avet-Rochex, A., Maierbrugger, K. T. & Bateman, J. M. Glial enriched gene expression profiling identifies novel factors regulating the proliferation of specific glial subtypes in the Drosophila brain. Gene Expr. Patterns 16, 61–68 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Crittenden, J. R., Skoulakis, E. M. C., Goldstein, E. S. & Davis, R. L. Drosophila mef2 is essential for normal mushroom body and wing development. Biol. Open 7, bio035618 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Minocha, S., Boll, W. & Noll, M. Crucial roles of Pox neuro in the developing ellipsoid body and antennal lobes of the Drosophila brain. PLoS ONE 12, e0176002 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Davis, F. P. et al. A genetic, genomic, and computational resource for exploring neural circuit function. eLife 9, e50901 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Naidu, V. G. et al. Temporal progression of Drosophila medulla neuroblasts generates the transcription factor combination to control T1 neuron morphogenesis. Dev. Biol. 464, 35–44 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30, 4765–4774 (2017).

    Google Scholar 

  33. Shrikumar, A. et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version Preprint at (2020).

  34. Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  35. Southall, T. D. et al. Cell-type-specific profiling of gene expression and chromatin binding without cell isolation: assaying RNA Pol II occupancy in neural stem cells. Dev. Cell 26, 101–112 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Mackay, T. F. C. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173–178 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. Jacobs, J. et al. The transcription factor Grainy head primes epithelial enhancers for spatiotemporal activation by displacing nucleosomes. Nat. Genet. 50, 1011–1020 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Southall, T. D., Davidson, C. M., Miller, C., Carr, A. & Brand, A. H. Dedifferentiation of neurons precedes tumor formation in lola mutants. Dev. Cell 28, 685–696 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Yang, J., Ramos, E. & Corces, V. G. The BEAF-32 insulator coordinates genome organization and function during the evolution of Drosophila species. Genome Res. 22, 2199–2207 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Trevino, A. E. et al. Chromatin accessibility dynamics in a model of human forebrain development. Science 367, eaay1645 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018).

    Article  CAS  PubMed  Google Scholar 

  45. Preissl, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat. Neurosci. 21, 432–439 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol. 26, 1063–1070 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Gramates, L. S. et al. FlyBase at 25: looking to the future. Nucleic Acids Res. 45, D663–D671 (2017).

    Article  CAS  PubMed  Google Scholar 

  51. Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    Article  CAS  PubMed  Google Scholar 

  52. Herrmann, C., Van de Sande, B., Potier, D. & Aerts, S. i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res. 40, e114 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Chen, J., Li, K., Zhu, J. & Chen, W. WarpLDA: a cache efficient O(1) algorithm for latent dirichlet allocation. Proc. VLDB Endow. 9, 744–755 (2016).

    Article  Google Scholar 

  54. Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403–411 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. De Waegeneer, M., Flerin, C. C., Davie, K. & Hulselmans, G. vib-singlecell-nf/vsn-pipelines: v0.26.1. Zenodo (2021).

  56. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Van de Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).

    Article  PubMed  Google Scholar 

  59. Stanescu, D. E., Yu, R., Won, K.-J. & Stoffers, D. A. Single cell transcriptomic profiling of mouse pancreatic progenitors. Physiol. Genom. 49, 105–114 (2017).

    Article  CAS  Google Scholar 

  60. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  63. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Shih, M.-F. M., Davis, F. P., Henry, G. L. & Dubnau, J. Nuclear transcriptomes of the seven neuronal cell types that constitute the Drosophila mushroom bodies. G3 9, 81–94 (2019).

    Article  CAS  PubMed  Google Scholar 

  66. Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Aronesty et al. ea-utils: ‘Command-line tools for processing biological sequencing data’. (2011).

  68. Imrichová, H., Hulselmans, G., Kalender Atak, Z., Potier, D. & Aerts, S. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res. 43, W57–W64 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Aughey, G. N., Delandre, C., McMullen, J. P. D., Southall, T. D. & Marshall, O. J. FlyORF-TaDa allows rapid generation of new lines for in vivo cell-type-specific profiling of protein-DNA interactions in Drosophila melanogaster. G3 11, jkaa005 (2021).

    Article  PubMed  Google Scholar 

  70. Marshall, O. J., Southall, T. D., Cheetham, S. W. & Brand, A. H. Cell-type-specific profiling of protein-DNA interactions without cell isolation using targeted DamID with next-generation sequencing. Nat. Protoc. 11, 1586–1598 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Marshall, O. J. & Brand, A. H. damidseq_pipeline: an automated pipeline for processing DamID sequencing datasets. Bioinformatics 31, 3371–3373 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Aerts, S. et al. Robust target gene discovery through transcriptome perturbations and genome-wide enhancer predictions in Drosophila uncovers a regulatory basis for sensory specification. PLoS Biol. 8, e1000435 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  MATH  Google Scholar 

  74. Kudron, M. M. et al. The ModERN resource: genome-wide binding profiles for hundreds of Drosophila and Caenorhabditis elegans transcription factors. Genetics 208, 937–949 (2018).

    Article  CAS  PubMed  Google Scholar 

  75. Davis, C. A. et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018).

    Article  CAS  PubMed  Google Scholar 

  76. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. Preprint at arXiv (2019).

  79. Bravo González-Blas, C. et al. Identification of genomic enhancers through spatial integration of single-cell transcriptomics and epigenomics. Mol. Syst. Biol. 16, e9438 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Frith, M. C., Li, M. C. & Weng, Z. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 31, 3666–3668 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  82. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  85. Cusanovich, D. A. et al. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555, 538–542 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  86. Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proc. 9th Python Science Conf. 92–96 (2010).

  87. De Rop, F. V. et al. HyDrop: droplet-based scATAC-seq and scRNA-seq using dissolvable hydrogel beads. Preprint at (2021).

Download references


We thank the staff at the Janelia FlyLight Project for publicly providing images and reporter lines to assess enhancer activity on the CNS in Drosophila; F. Pinto-Teixeira for providing the ato-Gal4 and acj6/TfAP-2 RNAi lines; and the members of the Aerts laboratory for discussions and for reading the manuscript.This work is funded by the following grants to S. Aerts: ERC Consolidator Grant (724226_cis‐CONTROL), by the Special Research Fund (BOF) KU Leuven (grant C14/18/092) and the FWO (grants G0C0417N, G094121N). J.J., C.B.G.‐B., F.V.D.R. and D.P. are supported by a PhD fellowship of The Research Foundation, Flanders (FWO, 1199518N; 11F1519N; 1S80920N; 1S75219N). 10x Chromium was partially made available through VIB Tech Watch Funding. Imaging, FACS and single-cell analyses were supported by the light microscopy, FACS and single-cell expertise units at the VIB-KU Leuven Center for Brain and Disease Research. Computing was performed at the Vlaams Supercomputer Center (VSC). Stocks obtained from the Bloomington Drosophila Stock Center were used in this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations



S. Aerts, J.J., S. Aibar and I.I.T. conceived the study. J.J., S. Aibar, I.I.T. and D.P. performed computational analyses with assistance from K.I.S., C.B.G.-B., G.H. and M.D.W.; S.M. and V.C. performed scATAC-seq experiments. J.N.I., J.J. and S.M. performed FACS and omniATAC-seq. J.N.I., X.J.Q., J.J. and S.M. performed antibody staining and visualization. X.J.Q., V.C. and J.N.I. performed the cloning of selected enhancers. S.M. performed omniATAC on DGRP lines. J.N.I., J.J., and F.V.D.R. performed Hydrop-ATAC experiments. V.C. and J.N.I. performed CUT&Tag experiments. A.E.G., G.A. and T.S. performed TaDa experiments. M.D. and K.G. generated Mef2-Dam line. S. Aibar created the website with assistance of G.H., D.P. and K.S.; S. Aerts, J.J., S. Aibar and I.I.T. wrote the manuscript.

Corresponding author

Correspondence to Stein Aerts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review information

Nature thanks Andrew Adey and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Global analysis and adult clustering approaches.

a. UMAP of global cisTopic analysis (150k cells shown), coloured by region accessibilities near elav (red, neurons), repo (green, glia) and dpn (blue, neuroblasts). b. Overview of regions shown in a for representative cell types (Kenyon cells for neurons, Astrocytes for glia, optic lobe neuroblasts for neuroblasts). c. Distribution of cells per timepoint in the global UMAP. Timepoints jointly analysed in the upcoming sections (green: early timepoints, blue: late timepoints) are grouped by borders. d. Spearman correlation of top 1000 variable regions across timepoints separating early timepoints (ET), middle timepoints (MT) and late timepoints (LT). e. UMAP after timepoint correction with Harmony (coloured by timepoint: ET in green, LT in blue). ft-SNE of the 60k cells in the adult cell types analysis (LT). Central-brain only runs allow to annotate clusters according to their location (central brain (CB) and optic lobe (OL)). Subclustering was performed by splitting cells based in CB, OL and glia, note that Kenyon cells, plasmatocytes and photoreceptors were not included. g. Subclustering of OL neurons leads to 58 subclusters, including a further split of T4/T5 neurons. h. Subclustering of CB neurons reveals 51 subtypes. Notice how the S-shaped separation of pros+ cells and Imp+ cells is retained. i. Subclustering of glia reveals 16 subtypes. j. Clustering 88k cells from 48h APF to adult, provides three extra clusters enriched for younger cells (young only: circle, young enriched: arrows), but does not increase the resolution of the adult cell types. k. Clustering using ArchR pipeline, on 56k cells from 48hAPF to adult, leading to 90 clusters. UMAPs are shown after Harmony batch effect correction. l. Heatmap showing the correspondence between clusters from cisTopic and ArchR. Note that cisTopic clusters are merged across age, in contrast to ArchR clusters, leading to multiple ArchR clustes (different ages) mapping to one cisTopic cluster).

Extended Data Fig. 2 Integration of scRNA-seq and snATAC-seq.

a. Calculation of gene-accessibility scores using a weighted sum of regions in the gene body and up to 5kb upstream of the TSS. Weights decrease exponentially with distance from the TSS (constant in the gene body), and increase with higher gini (variability) coefficients. b. Gene expression and gene accessibility display a similar pattern for many genes (6 examples shown), which can be used to transfer cell type annotations across modalities (black lines). c. Overview of used annotation methods. Main cell types are consistently detected with each method, while low confidence matches are method specific. d. Annotated t-SNE of the transcriptomes of 118k adult cells. e. Integrated t-SNE of scRNA-seq and snATAC-seq using Seurat’s co-clustering. f. Gene set enrichment of marker genes using AUCell on gene-accessibility matrix, revealing matches per cell type and per major cell type group (glia, optic lobe neurons, central brain neurons, Kenyon cells, photoreceptors and plasmatocytes). g. Scatterplot showing the number of marker genes against the number of cells in the scRNA-seq dataset. Matched cell types between RNA and ATAC are coloured, unmatched are shown in grey. h. Scatterplot showing the number of marker genes in scRNA against the number of DARs for the matched cell types, with glia having the highest number for both. i. Heatmap of DARs per cluster. j. Bar plot showing the number of DARs per cluster.

Extended Data Fig. 3 FAC-sorted cell types match single-cell aggregates.

a. Overview of bulk ATAC-seq on three sorted Kenyon cell (KC) populations: (top) Confocal images of KC subtypes targeted with split-GAL4 lines (p: posterior, m: medial, d: dorsal); (middle) Average accessibility of the top 100 differential peaks from sorted cell types projected on the single-cell ATAC t-SNE; (bottom) Locus of three marker genes, showing similarity between bulk ATAC-seq (black) and the aggregated scATAC-seq profiles. b–e. Heatmap showing the Spearman’s correlation of the FAC-sorted samples and single-cell aggregates over different sets of regions. Matching samples and aggregates are shown in bold. f. Subclustering of T4 and T5 neurons identifies the a/b and c/d subtypes, with differential regions near marker genes TfAP-2 and big. Locus of marker genes showing differential peaks between T4 and T5 neurons (top, TfAP-2, range: 0.8-252) and between a/b and c/d subtypes (bottom, bi, range: 0.2-120). h. Experimental overview: GH146 was used to drive GFP expression in OPNs, followed by FACS and scATAC-seq. i. UMAP showing 309 cells kept after filtering, forming 6 clusters. j. Visualization of gene accessibility near OPN markers, showing heterogeneity between clusters. k. 17k peaks were identified in the OPNs, of which 4.5k are unique, and of which 876 are near OPN marker genes (184 near highly expressed positive markers). These peaks were not found in the consensus peaks (CP) as shown in the tracks. l. Co-clustering UMAP of sorted OPNs and 10x scATAC data, with sorted OPNs shown in yellow (top), corresponding to cluster 37 which contains 10 cells of the 10x data, scattered from multiple clusters (bottom).

Extended Data Fig. 4 Chromatin landscapes of progenitors and developing neurons.

a. An SVM classifier was used to propagate the adult cell type labels to earlier stages in development. The classifier also included the progenitor cell types –from the developmental analysis– (purple and dark green colours). b. Proportion of cell types at each timepoint. c. Chromatin landscape for T1 neurons, shows a highly dynamic opening and closing of peaks during development. A core-set remains accessible at all times, of which a subset is specific to T1 neurons. d. Examples of regions with different developmental dynamics for T1 neurons. e. Bar plot showing the number of core-regions identified per cell type. Dark colours show specific core-regions (core-DARs). f. Number of DARs calculated per cell type (down sampled to 75 cells) for every timepoint, shows a decline over time. The arrow notes a small increase at 48h APF during synaptogenesis. The box plot marks the median (red line), upper and lower quartiles and 1.5× interquartile range (whiskers); outliers are shown individually, n=74,77,78,77,78,77,78,75,76 cell types. g. Progenitor cell types show specific marker accessibility, while neurons show accessibility in adult specific regions (Awhey). h. Number of DARs per cell type in the early development dataset, revealing a lower number for progenitors (purple shades). i–j. Trajectory from optic lobe neuroepithelium (ONE) to lamina progenitor cells (LPC) and optic lobe neuroblasts (OL NB) using (i) scATAC-seq and (j) scRNA-seq. Heatmap shows dynamic chromatin accessibility modules with enriched motifs (NES score shown) and line plot shows expression profiles for predicted master regulators. k. Specific comparison of different progenitor cell types detects thousands of differential regions, with motif enrichment of key TFs. l. In vivo reporter assay of a cloned ONE enhancer driving GFP. m. Optic lobe and central brain branches in 3D-UMAP. n. Central brain and VNC duality between Imp and pros traced through development. Standardized mean accessibility of Imp regions (n=128) and pros regions (n=166) is plotted for different developmental stages. Dati (AAAAAA) motifs and Pros ChIP-seq peaks (embryo, ModERN) are enriched in Imp regions where they are not expressed (grey) and vice versa, suggesting a chromatin closing role. o. AUCell enrichment scores of branch-specific regions for adult OL clusters (box plot marks the median (red line), upper and lower quartiles and 1.5× interquartile range (whiskers); outliers are shown individually; number of cells between brackets). Candidate TFs expressed per branch are shown (TFs with matching motifs in bold).

Extended Data Fig. 5 Cistromes overview.

a. TF expression vs motif enrichment for a selection of TFs. b. Heatmap of number of regions per cell type which present motif enrichment for a TF, coloured according to the TF expression-motif enrichment correlation (red: positive correlation, blue: negative correlation). Note that in contrast to c this heatmap does not require the TF to be expressed in the given cell type. c. Dot-heatmap including all available "chromatin-opening" cistromes (full version of Fig. 3c).

Extended Data Fig. 6 Deep learning predicts de novo key transcriptional activators and repressors.

a. t-SNE from the cisTopic analysis on the subset of 15 cell types used for the deep learning (DL) analysis. b. Accessibility of topic regions near marker genes. Calculated as the average region probability for topic-regions linked to each set of marker genes (markers from the transcriptome atlas). c. Comparison of topic coherence and DL classification performance (area under ROC-curve (auROC) for the classification of the left-out test regions). The topic coherence represents how likely the regions of the topic will co-occur (higher values are better). d. Box plot of TF motif enrichment in the topics (average enrichment score) split by the topic annotation (i.e., to one cell type, to multiple cell types, or marked as low contribution). The box plot marks the median, upper and lower quartiles and 1.5× interquartile range (whiskers); outliers are shown individually. Number of topics per category shown above plot. e. Topic heatmap showing cell-type specific topics. Bar plots show the number of regions per topic (cutoff p=0.995) and area under the precision recall curve (auPR) of the DL model. f. Contributions of the patterns identified by DeepFlyBrain to classify glial regions reveal activators and repressors (negative nucleotide importance). These motifs can be matched to known factors with concordant expression. g–h. Conservation of the regions centred by the motif (blue) or ATAC peak (orange) for (g) KC and (h) T neuron motif instances. The location of the motifs is shown with dashed lines. i. Heatmap showing Jaccard index between TF binding site predictions from DL and regions from conventional motif discovery. j. Box plots showing higher conservation for overlapping regions compared to deep-learning only regions. The box plot marks the median, upper and lower quartiles and 1.5× interquartile range (whiskers); outliers are shown individually. Number of enhancers per category shown above plot. k. Box plots showing higher enhancer-gene link scores for overlapping regions compared to deep-learning only regions. The box plot marks the median, upper and lower quartiles and 1.5× interquartile range (whiskers); outliers are shown individually. Number of enhancers per category shown above plot. l. Bulk ATAC-seq was performed on brains of 44 different genotypes leading to the identification of caQTLs. caQTLs affecting each of 28k motifs (adjusted p-value of Fisher test versus difference of number of motifs increasing/decreasing accessibility, see Methods). Dots in the same colour affect similar motifs (black: not-significant). m. The fraction of caQTLs predicted to affect chromatin accessibility at different false positive rates (random SNPs). The 5% false positive rate is shown as a dashed grey line. np. Effect of SNPs in Mamo (n), Lola-PF (o) and Lola-N (p) motifs on chromatin accessibility. Top-left: DeepExplainer plot for the reference (G) and alternative allele (T), showing a loss of a repressor site for Mamo and Lola-N, and a gain of a repressor site for Lola-PF. Top-right: Candle plots showing predicted accessibility change caused by the SNP for different cell types (increase shown in blue, decrease in red). Bottom-left: Box plot showing bulk accessibility of 44 DGRP lines, split by genotype at this SNP, highlighting an increase in accessibility for the alternative allele. The box plot marks the median, upper and lower quartiles and 1.5× interquartile range (whiskers); all data points are shown. Number of genotypes associated with either reference (Ref) or alternative (Alt) alleles shown. Bottom-right: Single-cell aggregates over the SNP. q. Overexpression of lola isoform N (lola-N) in glia (repo driver) versus neurons (elav driver, control) leads to the closing of 250 regions with the GATC motif. r. Example of a region in perineurial glia (PNG) and subperineurial glia (SUB), that closes upon overexpression of lola-N in glia. The region is also part of the PNG eGRN (see Fig. 5). s. caQTLs affecting DL motifs. Column nUP/nDw: Number of SNPs overlapping with the motif which produce an increase/decrease of accessibility. The FDR is checked on 1000 random caQTLs with the same number of SNPs (i.e., ey: take 6 random caQTLs, 1000 times, and see how many of the 1k repetitions have at least 5 SNPs increasing accessibility).

Extended Data Fig. 7 Overview of TF binding and perturbation experiments.

a. Table showing a summary of all TF binding/perturbation experiments indicating number of affected regions, top motif enrichment and overlap with deep learning binding sites. b. CUT&Tag signal of Repo is enriched over predicted Repo binding sites. c. CUT&Tag signal of Ey is enriched over predicted Ey binding sites. d. Optimization of CUT&Tag for Ey, finding a combination of Pitstop and higher Tn5 concentration to increase the number of regions detected and improve motif enrichment. e. The genomic region near trio contains peaks for both glia and Kenyon cells, with predicted binding sites for Ey and Repo. Ey and Repo CUT&Tag data are normalized against each other, showing biases to the Ey side over Ey binding sites and to the Repo side over Repo binding sites. f. TaDa coverage of Mef2 in optic lobe (Tm1) and Kenyon cells (left); and of Mef2 in γ-KC for predicted Mef2 regions in the optic lobe and the γ-KC. g. Venn diagrams showing overlap of TaDa experiments for Mef2 (left) and Acj6 (right). Motif enrichment is shown with Mef2 motifs enriched in all overlaps of Mef2, but no Acj6 enrichment in unique regions for acj6-TaDa. Strongest enrichments are found for common regions. h. Summary of RNAi results of Fig. 3f (knockdown in γ-KC and T4/T5 neurons). Bar plots show affected regions in both directions upon knock-down, with expected direction accentuated. Enriched motifs for the unexpected direction are shown. i–k. Results of hypergeometric tests of the overlap of TF knockdown affected regions for cistromes (i), cell-type specific cistromes (j) and deep learning binding sites (k). l. Mamo RNAi ATAC peaks from γ-KC are enriched for α/β Kenyon cell regions compared to WT or other knockdowns. m. Examples of three loci where Mamo RNAi has led to increased accessibility of α/β Kenyon cell regions in γ-KC.

Extended Data Fig. 8 Enhancers selected by accessible regions generate novel driver lines.

a. Peak in the overlap of two existing Kenyon Cell driver lines recapitulates KC expression. b. DeepExplainer view of the selected element from (a) showing Ey and Mef2 binding sites. c. Existing non-specific driver lines can be broken up in separate more specific drivers for KC and glia using cell-type specific ATAC-peak signals. d. In-silico overlap of ATAC-peak signals resembles that of in-vitro split-GAL4 lines for T4/T5 neurons. Images courtesy of the Janelia FlyLight Project. e. 63 adult enhancers were selected and cloned into a construct, flanked by gypsy insulators (GI) driving either direct GFP or GAL4 expression from the Hsp70 promoter. Selected peaks have a median size of 580bp (direct) or 626 (Gal4). f. Overview of GFP expression in different cloned enhancers (GAL4 enhancers were crossed with UAS-nlsGFP). Red numbers point to enhancer-IDs (Supplementary Table 8). Green numbers are scores of enhancer activity in the predicted cell type (0: no activity, 1: low, 2: high). g. Bar plots showing validation rate for GFP expression within Kenyon cells (KC), optic lobe (OL), glia (G). Mixed (M) and Negative (N) bar plots are shown as controls. Dark colours mean high expression (2), light colours mean faint expression (1). h. ROC curve showing the performance of different metrics to predict OL activity of the 64 cloned enhancers (including developmental enhancer). i. ROC curve showing the performance of different metrics to predict glial activity of the 64 cloned enhancers (including developmental enhancer).

Extended Data Fig. 9 DeepFlyBrain accurately predicts effects of mutations.

ae. Analysis of cloned enhancers near (agish, (bAppl, (cBx, (dPkc53e, and (eCG15117. Accessibility profiles of the loci, DL prediction scores for the WT and mutated (mut) sequences, nucleotide importance scores and in-silico saturation mutagenesis assays, and in vivo enhancer activity of the cloned sequences are shown as in Fig. 4a, b. f, g. In vivo enhancer activity and nuclei count of the WT region and the region with mutated repressors near (fsNPF and (gAppl. The expected nuclei count after destroying the repressors is shown as a dashed grey line (20% increase). The box plot marks the median, upper and lower quartiles and 1.5× interquartile range (whiskers); all data points are shown (one-sided Mann-Whitney U test). Number of measured brains shown.

Extended Data Fig. 10 Gene expression and region accessibility correlation can be exploited to build eGRNs, and classify TF roles according to their network activity.

a. Proportion of BEAF-32 ChIP-seq peaks that have a high-scoring BEAF-32 motif, and are accessible in the fly brain. Despite being performed on whole embryo (0-14 h, mixed sex), most of the motif containing peaks are ubiquitously accessible across the brain. b. Distance to the closest BEAF-32 peak with motif upstream and downstream of each gene. Most of the genes (86%) are within 50kb of a BEAF-32 peak (46% are between two peaks within 50kb, and 88% within 200kb). For genes further than 50kb, expanding the search space from 50kb to 200kb adds a median of 2 extra links. c. View of genes, topological associated domains (TADs), genomic regulatory blocks (GRBs), BEAF-32 ChIP-seq peaks and BEAF-32 defined search spaces near the locus of Imp (green), ras (red), Ant2 (yellow) and feo (pink). The lowest track shows the search space for each gene (i.e., the region between the first two BEAF-32 peaks within 200kb of the transcript, skipping 500bp around the TSS. In case there are no peaks within 200kb, 50kb is kept as search space). d. Box plots showing higher correlation for enhancer-gene links contained within BEAF-32 domains compared to links outside. Links that cross boundaries have a higher anticorrelation. The box plot marks the median, upper and lower quartiles and 1.5× interquartile range (whiskers); outliers are shown individually. . Number of links per category is shown with one-sided Mann-Whitney U test. P-value below numerical precision equals 0. e. Overview of selected tracks in the Pkc53E locus; only the enhancer-gene links between two BEAF-32 peaks (green bar) are kept. The grey/blue regions on top are the pre-defined regulatory regions used for the analysis, dark blue indicates a region with link to the gene, grey regions are not differentially accessible. f. Regulatory region selection for Pkc53E; the inset shows accessibility of the top region versus gene expression (input to the random-forest), regions with a weight above the threshold are linked to the gene. On the right: t-SNEs showing the gene expression and accessibility, and the resulting network. g. Scatterplot showing the correlation (Pearson’s r=0.4, two-sided test, p=0.001, n=64) between in vivo enhancer reporter activity (GFP score) and the strength of the enhancer-gene link (correlation). Linear regression and 95% confidence interval are shown. h–i. Correlation of gene expression (h.) and TF gene expression (i.) with aggregated accessibility profiles at TSS, averaged gene-accessibility, and averaged accessibility of regions with positive links. Red line shows linear fit, with orange boundaries as the 95% confidence interval. Note the regions near the TSS that have high accessibility but do not lead to gene expression (highlighted in blue) and the increase in performance for the gene-accessibility score in the TF expression, while overall the highest correlation is reached with links. j. Overview of eGRN expression across different cell types. For each TF, the first row is TF expression, with below a heatmap value of normalized enrichment score of the eGRN (NES, gene-set enrichment analysis of eGRN target genes on genes ranked by FC in each cell type). Note that chromatin-repressing eGRNs are less validated, and represent lower confidence. k. Heatmap of number of genes in the TF-eGRNs (in all cell types) split by cistrome type and gene-link correlation (i.e., indicating potential activator/repressor roles). Canonical activators have most of their targets in opening-cistrome with positive-links. Most of the potential repressors are repressors of chromatin (e.g., closing enhancers), rather than opening repressive regions (i.e., regions with negative links). Only 4 TFs have a higher number of targets with the negative links: Fkh, Acj6, Oli and Ftz-f1 (cell-type dependent). TFs with an asterisk use 0.20 TF expression-motif correlation threshold. Bold highlights TFs with confirmed roles (in this study or previously known).

Extended Data Fig. 11 A resource of cell-type specific eGRNs.

a. γ-KC eGRN (motif-based) with key TFs in the middle. Genes marked as squares are also present in the DL-filtered eGRN (Fig. 5a). b. Heatmap of Jaccard index between TF target regions in the γ-KC eGRN. c. eGRN T1 neurons (regions are coloured in blue shades, genes in red; regulatory TFs are in the center). d. Heatmap of Jaccard index between TF target regions in T1 eGRN. e. Heatmap of eGRN overlap (region based, Jaccard index) of all cell types and all TFs. Examples of eGRNs (scro, TfAP-2, ey and Mef2) are highlighted, showing co-clustering based on TF, and on cell type (genomic context). f. Regulatory network for KCs and perineurial glia, with colour showing the status of the network (average expression and accessibility). Presence of Mamo leads to repression of α/β-KC marker regions and genes, while Lola-N leads to repression of glial marker regions and genes. g. eGRNs for different subtypes of Kenyon cells, T-neuron subclasses and glia are available for exploration on NDEx through Link outs from the gene to FlyBase and UCSC allow to explore gene function and chromatin profiles with all nearby predicted enhancers coloured, while link outs from regions allow to inspect the region with DeepFlyBrain, to visualize nucleotide importances, while also linking to UCSC to view the genomic context with the selected region highlighted.

Extended Data Fig. 12 Enhancer switching is a prominent feature of development.

a. Heatmap showing 458 enhancers that undergo a switch from one cell type to another. Enhancers are grouped based on whether the switch is from (non)neuronal to (non)neuronal. Heatmap shows standardized average accessibility (RPGC): ET: early timepoints (larva-12h APF), MT: middle timepoints (24-48h APF), LT: late timepoints (72h APF-adult). b. Examples of enhancers that switch between cell types for different categories. Given that one region can contain multiple enhancers, it is hard to separate enhancer switches from shifters: the glia enhancer (right) shows a shift, where one peak goes down and an adjacent one becomes accessible. ct-SNE from the cisTopic analysis on the subset of 18 cell types used for the deep learning (DL) analysis. d. Performance of the DL model for the different topics. e. Examples of topics linked to progenitor cell types (left) and to Kenyon cell subtypes (right).  f. TF-MoDISco results for topics linked to progenitors (left) and to Kenyon cells (right), highlighting motifs of TFs expressed in those cell types. The motif for Ase shows negative nucleotide importances, suggesting a chromatin repressing role. gCG15117 enhancer switches from ensheathing glia (ENS) to T1 neurons. h. Bar plots showing predicted scores of the region for the developmental and adult DL model. i-j. DeepExplainer plot and in-silico mutagenesis plots of the CG15117 enhancer calculated with (i) adult DL model and (j) development DL model. According to the models, the enhancer is repressed in adult ensheathing glia and developing T1 neurons by the same binding site (highlighted with orange box).

Supplementary information

Supplementary Information

Reporting Summary

Supplementary Data 1

FACS gating strategy: gating strategies used in the FACS runs performed on split-Gal4 lines (MB371B, MB418B and MB419B), and for the normal Gal4 lines (knockdown experiments: R16A06 and ato; TaDa: R74G01; sorted OPNs: GH146, sorted cell types: R16A06, R74G01 and ato) together with detailed results.

Supplementary Data 2–5

Supplementary Data 2: VSN config file to run the VSN Nextflow pipeline on the adult scRNA-seq data from Davie et al.2. Supplementary Data 3: DeepFlyBrain training data containing a Jupyter notebook to train a DL model. Supplementary Data 4: DeepFlyBrain performance data containing a Jupyter notebook to determine the performance of the DL model. Supplementary Data 5: DeepFlyBrain scoring data and DeepExplainer plots containing a Jupyter notebook to score new regions and view nucleotide importance in the region.

Supplementary Tables

Supplementary Tables 1–11 and a guide of the Supplementary Tables.

Peer Review File

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Janssens, J., Aibar, S., Taskiran, I.I. et al. Decoding gene regulation in the fly brain. Nature 601, 630–636 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing