Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

MIRA: joint regulatory modeling of multimodal expression and chromatin accessibility in single cells

Abstract

Rigorously comparing gene expression and chromatin accessibility in the same single cells could illuminate the logic of how coupling or decoupling of these mechanisms regulates fate commitment. Here we present MIRA, probabilistic multimodal models for integrated regulatory analysis, a comprehensive methodology that systematically contrasts transcription and accessibility to infer the regulatory circuitry driving cells along cell state trajectories. MIRA leverages topic modeling of cell states and regulatory potential modeling of individual gene loci. MIRA thereby represents cell states in an efficient and interpretable latent space, infers high-fidelity cell state trees, determines key regulators of fate decisions at branch points and exposes the variable influence of local accessibility on transcription at distinct loci. Applied to epidermal differentiation and embryonic brain development from two different multimodal platforms, MIRA revealed that early developmental genes were tightly regulated by local chromatin landscape whereas terminal fate genes were titrated without requiring extensive chromatin remodeling.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic of MIRA’s cell-level topic and gene-level RP models for integrated analysis of single-cell multimodal transcription and accessibility data.
Fig. 2: MIRA topic modeling determined regulatory factors driving key fate decisions in hair follicle differentiation.
Fig. 3: MIRA RP modeling identified genes for which changes in expression were insufficiently explained by local chromatin accessibility.
Fig. 4: Gene-level and cell-level analysis of NITE gene regulation in the hair follicle explained regulatory mechanisms of fate commitment.
Fig. 5: MIRA joint representation reconstructed complex multi-axis differentiation in the IFE.
Fig. 6: MIRA explained regulatory factors driving fate decisions in key developmental trajectories in the developing brain.

Similar content being viewed by others

Data availability

The authors of the SHARE-seq skin study3 provide the RNA-seq count matrix at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4156608 and the ATAC-seq peak count matrix at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4156597. 10X Genomics provides the brain dataset14 RNA-seq count matrix and ATAC-seq peak count matrix at https://www.10xgenomics.com/resources/datasets/fresh-embryonic-e-18-mouse-brain-5-k-1-standard-2-0-0. RNA-seq and ATAC-seq count matrices used for the benchmarking study may be found at https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-2-0-0.

Code availability

MIRA is available as a Python package at https://github.com/cistrome/MIRA. Frankencell, a Python program we developed to generate synthetic differentiation trajectories for benchmarking, is available at https://github.com/AllenWLynch/frankencell-dynverse.

References

  1. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol. 26, 1063–1070 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Duren, Z., Chen, X., Xin, J., Wang, Y. & Wong, W. H. Time course regulatory analysis based on paired expression and chromatin accessibility data. Genome Res. 30, 622–634 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).

    Article  CAS  PubMed  Google Scholar 

  7. Gong, B., Zhou, Y. & Purdom, E. Cobolt: joint analysis of multimodal single-cell sequencing data. Genome Biol. 22, 351 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Minoura, K., Abe, K., Nam, H., Nishikawa, H. & Shimamura, T. A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data. Cell Rep. Methods 1, 100071 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chen, H., Ryu, J., Vinyard, M., Lerer, A. & Pinello, L. SIMBA: single-cell embedding along with features. Preprint at bioRxiv https://doi.org/10.1101/2021.10.17.464750 (2021).

  10. Lin, Y. et al. scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning. Nat. Biotechnol. 40, 703–710 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Duren, Z. et al. Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations. Proc. Natl Acad. Sci. USA 115, 7723–7728 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lara-Astiaso, D. et al. Chromatin state dynamics during blood formation. Science 345, 943–949 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011).

    Article  CAS  PubMed  Google Scholar 

  14. 10X Genomics Datasets (10X Genomics, 2022); https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets

  15. Blei, D. M. Probabilistic topic models. Commun. ACM 55, 77–84 (2012).

    Article  Google Scholar 

  16. Zhao, Y., Cai, H., Zhang, Z., Tang, J. & Li, Y. Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data. Nat. Commun. 12, 5261 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Bravo González-Blas, C. et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat. Methods 16, 397–400 (2019).

    Article  PubMed  Google Scholar 

  18. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).

  19. Blei, D. M. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).

    Google Scholar 

  20. Wang, S. et al. Target analysis by integration of transcriptome and ChIP-seq data with BETA. Nat. Protoc. 8, 2502–2515 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Qin, Q. et al. Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biol. 21, 32 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Schneider, M. R., Schmidt-Ullrich, R. & Paus, R. The hair follicle as a dynamic miniorgan. Curr. Biol. 19, R132–R142 (2009).

    Article  CAS  PubMed  Google Scholar 

  23. Blanpain, C. & Fuchs, E. Epidermal homeostasis: a balancing act of stem cells in the skin. Nat. Rev. Mol. Cell Biol. 10, 207–217 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Byron, L. & Wattenberg, M. Stacked graphs – geometry & aesthetics. IEEE Trans. Vis. Comput. Graph. 14, 1245–1252 (2008).

    Article  PubMed  Google Scholar 

  25. Soma, T., Ogo, M., Suzuki, J., Takahashi, T. & Hibino, T. Analysis of apoptotic cell death in human hair follicles in vivo and in vitro. J. Invest. Dermatol. 111, 948–954 (1998).

    Article  CAS  PubMed  Google Scholar 

  26. Cui, C.-Y. et al. Ectodysplasin regulates the lymphotoxin-beta pathway for hair differentiation. Proc. Natl Acad. Sci. USA 103, 9142–9147 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Pan, Y. et al. gamma-secretase functions through Notch signaling to maintain skin appendages but is not required for their patterning or initial morphogenesis. Dev. Cell 7, 731–743 (2004).

    Article  CAS  PubMed  Google Scholar 

  28. Genander, M. et al. BMP signaling and its pSMAD1/5 target genes differentially regulate hair follicle stem cell lineages. Cell Stem Cell 15, 619–633 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Joost, S. et al. Single-Cell transcriptomics reveals that differentiation and spatial signatures shape epidermal and hair follicle heterogeneity. Cell Syst. 3, 221–237 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Grose, R., Harris, B. S., Cooper, L., Topilko, P. & Martin, P. Immediate early genes krox-24 and krox-20 are rapidly up-regulated after wounding in the embryonic and adult mouse. Dev. Dyn. 223, 371–378 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. Hildesheim, J. et al. The hSkn-1a POU transcription factor enhances epidermal stratification by promoting keratinocyte proliferation. J. Cell Sci. 114, 1913–1923 (2001).

    Article  CAS  PubMed  Google Scholar 

  32. Zeitvogel, J. et al. GATA3 regulates FLG and FLG2 expression in human primary keratinocytes. Sci. Rep. 7, 111847 (2017).

    Article  Google Scholar 

  33. Hernández-Miranda, L. R., Parnavelas, J. G. & Chiara, F. Molecules and mechanisms involved in the generation and migration of cortical interneurons. ASN Neuro 2, e00031 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  34. La Manno, G. et al. Molecular architecture of the developing mouse brain. Nature 596, 92–96 (2021).

    Article  PubMed  Google Scholar 

  35. Di Bella, D. J. et al. Molecular logic of cellular diversification in the mouse cerebral cortex. Nature 595, 554–559 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Esther, L.-B. et al. in GABA And Glutamate: New Developments In Neurotransmission Research 25 (InTech, 2018).

  37. Yang, N. et al. Generation of pure GABAergic neurons by transcription factor programming. Nat. Methods 14, 621–628 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Raposo, A. A. S. F. et al. Ascl1 coordinately regulates gene expression and the chromatin landscape during neurogenesis. Cell Rep. 10, 1544–1556 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. de Martin, X., Sodaei, R. & Santpere, G. Mechanisms of binding specificity among bHLH transcription factors. Int. J. Mol. Sci. 22, 9150 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Porcher, C., Medina, I. & Gaiarsa, J.-L. Mechanism of BDNF modulation in GABAergic synaptic transmission in healthy and disease brains. Front. Cell. Neurosci. 12, 273 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Mo, J. et al. Early growth response 1 (Egr-1) directly regulates GABAA receptor α2, α4, and θ subunits in the hippocampus. J. Neurochem. 133, 489–500 (2015).

    Article  CAS  PubMed  Google Scholar 

  42. Sheng, Z.-H. & Cai, Q. Mitochondrial transport in neurons: impact on synaptic homeostasis and neurodegeneration. Nat. Rev. Neurosci. 13, 77–93 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Harrington, A. J. et al. MEF2C regulates cortical inhibitory and excitatory synapses and behaviors relevant to neurodevelopmental disorders. eLife 5, e20059 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Park, N. I. et al. ASCL1 reorganizes chromatin to direct neuronal fate and suppress tumorigenicity of glioblastoma stem cells. Cell Stem Cell 21, 411 (2017).

    Article  CAS  PubMed  Google Scholar 

  45. Chen, C.-H. et al. Determinants of transcription factor regulatory range. Nat. Commun. 11, 2472 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Tritschler, S. et al. Concepts and limitations for learning developmental trajectories from single cell genomics. Development 146, dev170506 (2019).

    Article  PubMed  Google Scholar 

  47. Wagner, D. E. & Klein, A. M. Lineage tracing meets single-cell omics: opportunities and challenges. Nat. Rev. Genet. 21, 410–427 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).

    Google Scholar 

  49. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Choi, K., Chen, Y., Skelly, D. A. & Churchill, G. A. Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics. Genome Biol. 21, 183 (2020).

  51. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 14, 128 (2013).

    Article  Google Scholar 

  52. Fisher, R. A. On the Interpretation of χ2 from contingency tables, and the calculation of P. J. R. Stat. Soc. 85, 87 (1922).

    Article  Google Scholar 

  53. Srivastava, A. & Sutton, C. Autoencoding variational inference for topic models. In Proc. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proc. (Cornell Univ., 2017).

  54. Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G. & Barceló-Vidal, C. Isometric logratio transformations for compositional data analysis. Math. Geol. 35, 279–300 (2003).

    Article  Google Scholar 

  55. Silverman, J. D., Washburne, A. D., Mukherjee, S. & David, L. A. A phylogenetic transform enhances analysis of compositional microbiota data. eLife 6, e21887 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv https://doi.org/10.48550/arXiv.1802.03426 (2018).

  58. Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Chen, C. H. et al. Determinants of transcription factor regulatory range. Nat. Commun. 11, 2472 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Yadav, A., Goldstein, T. & Jacobs, D. Making L-BFGS work with industrial-strength nets. in Proc. 31st The British Machine Vision Conference (BMVC) 710 September 2020 (BMVA, 2020).

  62. Pearson, E. S. & Naymon, J. On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 20, 275–240 (1928).

    Google Scholar 

  63. 10X Genomics Datasets (10X Genomics) (accessed February 2022); https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-2-0-0

  64. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the X.S. Liu laboratory members, M. Oser and K. Wucherpfennig for helpful scientific discussions. This work was supported by the National Institutes of Health (NIH) grant no. U24 CA237617 to C.A.M. C.V.T. was supported by the Helen Hay Whitney Foundation Postdoctoral Fellowship and grant no. NIH T32GM007748.

Author information

Authors and Affiliations

Authors

Contributions

A.W.L. developed MIRA, designed analyses and analyzed the SHARE-seq dataset. C.V.T. codeveloped MIRA, designed analyses and analyzed the 10X Genomics dataset. H.W.L. and M.B. contributed to analysis design. X.S.L. and C.A.M. designed analyses and supervised the work. A.W.L., C.V.T., X.S.L. and C.A.M. wrote the manuscript. A.W.L. and C.A.M. originated the work. All authors edited and approved the manuscript.

Corresponding authors

Correspondence to X. Shirley Liu or Clifford A. Meyer.

Ethics declarations

Competing interests

M.B. is a consultant to and receives sponsored research support from Novartis. M.B. serves on the SAB of H3 Biomedicine, Kronos Bio and GV20 Oncotherapy. X.S.L. conducted the work while being on the faculty at the Dana Farber Cancer Institute and is currently a board member and CEO of GV20 Therapeutics. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Eran Mukamel, Fangming Xie and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of MIRA topic model architecture.

a, The MIRA topic model uses a variational autoencoder (VAE) approach to learn stochastic mappings between observations in X-space, gene-counts or peak-counts in a cell, which are high-dimensional and noisy, and a simpler latent Z-space or topic space, which exists on the simplex basis with a Dirichlet prior. (bottom right) The generative model relates the observations X to the estimated composition 𝞺 over features (genes or peaks), sampling a negative binomial distribution for RNA counts and a multinomial distribution for ATAC peaks. (top right) The composition over features is given by the topic matrix 𝜷 encoding topic-feature associations and the latent topics Z of a cell, which are sampled from the distribution qφ(Z|X), the variational approximation of p𝜗(Z|X). (top left) The distribution of Z is parameterized by 𝞵 and 𝞼², outputs from the encoder neural network given the X-space observations as inputs. (bottom left) The encoder neural network for RNA data performs deviance residual featurization of counts which are passed through feed-forward layers. The ATAC data encoder passes binarized peak accessibility features through a deep averaging network. (Illustration adapted from Kingma and Welling, Foundations and Trends in Machine Learning, 2019). b, Ratio of probability of medulla fate commitment versus cortex commitment of each cell in the hair follicle, arranged by pseudotime. MIRA defines branch points between cell states where probabilities of differentiating into one terminal state diverges from another. c, MIRA joint representation UMAP colored by ratio of probability of medulla fate commitment within the ORS, matrix, medulla, and cortex populations. Differentiation in the hair follicle proceeds from ORS to progenitor matrix cells, which then specify into the medulla or cortex fate. (IRS cells indicated in black are not included in this trajectory).

Extended Data Fig. 2 MIRA outperforms standard methodology for resolving cell state trajectories using expression data alone.

Benchmarking results comparing MIRA to standard methodology of Seurat PCA + Slingshot in the indicated metrics of cell state trajectory inference using expression data alone. Top row shows ground truth scaffolds, which are computationally synthesized by mixing reads from distinct populations of single cells from a 10X Genomics dataset63 of peripheral blood mononuclear cells (PBMCs). Scaffold difficulty increases from left to right, where more difficult scaffolds contain cell states where mixture components are more similar (increased entropy), making them more difficult to distinguish by the tested lineage inference methodologies. Line plots indicate MIRA (red) versus Seurat PCA + Slingshot (blue) performance in each of the four scaffold difficulties with trials for three different mean read depths (lower read depth further increases the difficulty of solving the topology). For each trial, 5 replicates were tested for each modeling approach. Edge accuracy measures the accuracy of the inferred edges compared to ground truth (dynverse’s edge flip score64). Branch F1 score64 measures the precision and recall of the inferred branches compared to ground truth. Pseudotime correlation64 measures the correlation between inferred versus ground truth pseudotime for each cell. The bottom rows show example UMAPs for MIRA or Seurat PCA + Slingshot for each scaffold difficulty with black edges showing cell state parsing from each algorithm. Cells colored by ground truth branch assignment where blue cells are the origin state. In the line plots above, black outlines indicate the points for the models shown in the example UMAPs.

Extended Data Fig. 3 MIRA outperforms standard methodology for resolving cell state trajectories using accessibility data alone.

Benchmarking results comparing MIRA to standard methodology of Seurat LSI + Slingshot in the indicated metrics of cell state trajectory inference using accessibility data alone. Top row shows ground truth scaffolds with scaffold difficulty increasing from left to right. No models solved the topology of the most difficult scaffold using accessibility alone so metric comparisons are shown for the other three scaffolds. See Extended Data Fig. 3 for description of metrics.

Extended Data Fig. 4 MIRA outperforms standard methodology for resolving cell state trajectories using both expression and accessibility data jointly.

Benchmarking results comparing MIRA joint representation to standard methodology of joint representation combining Seurat PCA of expression data and Seurat LSI of accessibility data followed by Slingshot. See Extended Data Fig. 3 for description of metrics. For expression data, mean read depth n = 4000; for accessibility data, mean read depth n = 14000.

Extended Data Fig. 5 MIRA topics describing hair follicle cells were sparse and nonredundant.

a, UMAP based on standard methodology versus MIRA topic modeling for expression or accessibility. Standard PCA-based representation of expression shows matrix population as shifted away from its predecessor ORS and descendant IRS, medulla, and cortex cells. However, MIRA topic modeling of expression appropriately represents matrix cells as an intermediate population between the aforementioned lineages. Standard LSI-based representation of accessibility shows ORS cells interjected between matrix and its descendant IRS and shows medulla situated between two separate cortex populations. Conversely, MIRA topic modeling of accessibility appropriately represents matrix cells as continuous with its descendant IRS and better separates medulla and cortex into two distinct branches. b, MIRA joint topic representation of expression and accessibility. In (a-b), colors demonstrate expression of marker genes of indicated lineages. c, MIRA expression topics e1-6 and d, MIRA accessibility topics a1-7 on joint representation UMAP. In (c-d), colored boxes correspond to topic colors as on stream graphs in Fig. 2c and Extended Data Fig. 7a.

Extended Data Fig. 6 MIRA topics described gene modules activated in each lineage.

a, Stream graph of window-averaged cell-topic compositions starting from ORS cell state, progressing rightward through pseudotime (to facilitate visualization of all lineages concurrently, pseudotime scale is not log-transformed, unlike other presented stream graphs). b, MIRA joint topic representation colored by expression of genes highly activated in each of the indicated topics, which described the activated gene modules in each lineage. c, MIRA joint topic representation colored by indicated motif scores.

Extended Data Fig. 7 Terminal medulla and cortex cells showed significantly higher NITE regulation compared to cells earlier in hair follicle differentiation.

a, MIRA joint topic representation colored by expression of Hoxc genes, indicating that Hoxc motifs activated in both the medulla and cortex accessibility topics (a5 and a6, respectively) were most attributable to Hoxc13 based on its expression in these lineages. b, Correlation matrix between expression and accessibility topics. While some topics had a clear one-to-one correlation between modalities (for example expression topic e1 with accessibility topic a1), others did not strongly correlate with a single topic from the opposing modality (for example branch accessibility topic a4). c, Comparison of motif enrichment in top peaks of preceding matrix versus subsequent branch accessibility topics (a2 and a4, respectively). While most motifs were shared between these topics, accessibility of Wnt signaling-related motifs uniquely arose at the branch. d, Distribution of NITE scores among genes expressed in the hair follicle. Scores of example LITE gene Braf and NITE gene Krt23 are indicated by arrows. e, LITE gene Braf as shown in Fig. 3c but extended to include further downstream region. As described in Fig. 3c, plot shows chromatin accessibility fragments across pseudotime (moving downwards) in trajectories from ORS to matrix to cortex or medulla. Colored bars on the right indicate the identity of cells (colored by clusters in Fig. 2a) within each bin reflected by each row of accessibility fragments. Line plots across pseudotime depict the indicated gene’s observed expression (red) and LITE model prediction of expression (black), which is informed by the local accessibility reflected in the fragment plot. f, Medulla and cortex cells showed significantly more NITE regulation than other cells in the hair follicle (data are presented as mean values +/− standard deviation; rest n = 4565, cortex/medulla n = 1607; *p < 0.05 (1.4e-13), two-sided Wilcoxon rank-sum). g, Genes ultimately expressed in medulla or cortex that were primed at the branch were defined as those with a NITE regulation score above the indicated thresholds that had positive chromatin differential at the branch, indicating that expression was overestimated based on local chromatin accessibility. Branch-primed genes must also be upregulated in the downstream lineage relative to matrix cells. h, Driver transcription factor analysis of non-primed medulla versus cortex genes.

Extended Data Fig. 8 MIRA expression topics describing IFE cells captured shared and lineage-specific states.

a, Expression of marker genes of indicated lineages on MIRA expression, accessibility, and joint topic UMAPs. b, MIRA expression topics e1-13 on joint representation UMAP.

Extended Data Fig. 9 MIRA accessibility topics describing IFE cells captured shared and lineage-specific states.

a, MIRA accessibility topics a1-15 on joint representation UMAP. Colored boxes correspond to topics indicated in Fig. 5h, which are shared or lineage-specific within the basal-spinous-granular or intermediate basal-spinous-granular differentiation trajectories as annotated in Fig. 5a,b. b, Thbs1 and c, Egr2 expression distinguished basal cells distant from the hair follicle from those within the intermediate basal-spinous-granular trajectory near the hair follicle (*p < 0.05, two-sided Wilcoxon rank-sum, Benjamini-Hochberg corrected).

Extended Data Fig. 10 Terminal granular cells were enriched for NITE regulation.

a, Stream graph of expression topic compositions of basal-spinous-granular (top) and intermediate basal-spinous-granular (bottom) lineages. b, Terminal IFE granular cells showed significantly more NITE regulation than cells earlier in the differentiation trajectory (basal and spinous cells) (data are presented as mean values +/− standard deviation; basal and spinous n = 10850, granular n = 1596; *p < 0.05 (1.5e-15), two-sided Wilcoxon rank-sum). c, Genes upregulated in granular cells that were differentially-expressed between granular populations had significantly higher NITE scores than other genes (data are presented as mean values +/− standard deviation; rest n = 4641, terminal and differentially-expressed granular genes n = 241; *p < 0.05 (0.041), two-sided Wilcoxon rank-sum). d, Examples of terminally upregulated, differentially-expressed granular genes’ local chromatin accessibility (LITE model prediction) and expression. Despite accessibility increasing in both lineages, expression only increased in one lineage. e, Mef2c was more highly expressed in excitatory neurons, indicating that Mef2 motifs enriched in the terminal excitatory neuron topic were likely attributable to Mef2c. f, Stream graphs of expression topics across cells state trajectory colored by NITE versus LITE regulation of the top genes in each topic. Topics describing earlier states tended towards LITE regulation with the notable exception of topic e3, which is composed of cell cycle genes that have been previously described to be regulated with minimal influence of local chromatin accessibility state3. Topics describing terminal states tended more towards NITE regulation, including the major terminal excitatory and inhibitory neuron topics that are composed of neurotransmitter genes. Overall, expression topics describing the excitatory and inhibitory progenitor states (labeled mixed progenitor) were significantly enriched for LITE regulation, whereas after commitment to either the excitatory or inhibitory fate, topics were significantly enriched for NITE regulation (*p < 0.05, two-sided Wilcoxon rank-sum, Benjamini-Hochberg corrected). g, Genes predicted by MIRA pISD modeling to be regulated by pioneer transcription factor Ascl1 showed significantly more LITE regulation compared to genes predicted to be regulated by non-pioneer-like Egr1 (data are presented as mean values +/− standard deviation; n = 200; *p < 0.05 (0.0464), two-sided Wilcoxon rank-sum).

Supplementary information

Supplementary Information

Supplementary Figs. 1–5 and Information.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Table 1 (T1) Gene set enrichments of each MIRA expression topic in the hair follicle dataset. Indicated P values by one-sided Fisher’s exact test; adjusted P values are Enrichr z-scores corrected for multiple comparisons. Supplementary Table 2 (T2) Motif enrichments of each MIRA accessibility topic in the hair follicle dataset. Supplementary Table 3 (T3) Gene set enrichments of each MIRA expression topic in the IFE dataset. Indicated P values by one-sided Fisher’s exact test; adjusted P values are Enrichr z-scores corrected for multiple comparisons. Supplementary Table 4 (T4) Motif enrichments of each MIRA accessibility topic in the IFE dataset. Supplementary Table 5 (T5) Gene set enrichments of each MIRA expression topic in the embryonic brain dataset. Indicated P values by one-sided Fisher’s exact test; adjusted P values are Enrichr z-scores corrected for multiple comparisons. Supplementary Table 6 (T6) Motif enrichments of each MIRA accessibility topic in the embryonic brain dataset.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lynch, A.W., Theodoris, C.V., Long, H.W. et al. MIRA: joint regulatory modeling of multimodal expression and chromatin accessibility in single cells. Nat Methods 19, 1097–1108 (2022). https://doi.org/10.1038/s41592-022-01595-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01595-z

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics