Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Joint variational autoencoders for multimodal imputation and embedding

This article has been updated

A preprint version of the article is available at bioRxiv.


Single-cell multimodal datasets have measured various characteristics of individual cells, enabling a deep understanding of cellular and molecular mechanisms. However, multimodal data generation remains costly and challenging, and missing modalities happen frequently. Recently, machine learning approaches have been developed for data imputation but typically require fully matched multimodalities to learn common latent embeddings that potentially lack modality specificity. To address these issues, we developed an open-source machine learning model, Joint Variational Autoencoders for multimodal Imputation and Embedding (JAMIE). JAMIE takes single-cell multimodal data that can have partially matched samples across modalities. Variational autoencoders learn the latent embeddings of each modality. Then, embeddings from matched samples across modalities are aggregated to identify joint cross-modal latent embeddings before reconstruction. To perform cross-modal imputation, the latent embeddings of one modality can be used with the decoder of the other modality. For interpretability, Shapley values are used to prioritize input features for cross-modal imputation and known sample labels. We applied JAMIE to both simulation data and emerging single-cell multimodal data including gene expression, chromatin accessibility, and electrophysiology in human and mouse brains. JAMIE significantly outperforms existing state-of-the-art methods in general and prioritized multimodal features for imputation, providing potentially novel mechanistic insights at cellular resolution.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Challenges for multimodal data integration and imputation.
Fig. 2: JAMIE uses VAEs with a novel latent space aggregation technique to generate similar latent spaces for each modality.
Fig. 3: Simulated multimodal data8.
Fig. 4: Gene expression and electrophysiological features in the mouse visual cortex2.
Fig. 5: Gene expression and chromatin accessibility of single cells in the developing brain at 21 postconceptional weeks3.
Fig. 6: Feature prioritization for cross-modal imputation and embedding.

Similar content being viewed by others

Data availability

The MMD-MA simulation dataset can be downloaded from Our simulation data may be downloaded from Processed Patch-seq gene expression and electrophysiological features for the mouse visual and motor cortices are available at Raw Patch-seq datasets are available at refs. 2,31. Single-cell RNA-seq and ATAC-seq data on the developing human brain can be downloaded at under the heading ‘Multiome’. Single-cell RNA-seq and ATAC-seq of colon adenocarcinoma data can be found at Processed datasets for SNARE-seq adult mouse cortex data can be downloaded from

Code availability

All code was implemented in Python using PyTorch, and the source code is publicly available at Since Code Ocean provides an interactive platform for computational reproducibility34, we have also provided an interactive version of our code for reproducing results and figures35.

Change history

  • 16 June 2023

    In the version of this article initially published Daifeng Wang was not solely listed as the corresponding author, and the contact information has now been amended in the HTML and PDF versions of the article.


  1. Cadwell, C. R. et al. Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-seq. Nat. Biotechnol. 34, 199–203 (2016).

    Article  Google Scholar 

  2. Gouwens, N. W. et al. Integrated morphoelectric and transcriptomic classification of cortical GABAergic cells. Cell 183, 935–953 (2020).

  3. Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).

  4. Nguyen, N.D., Huang, J. & Wang, D. A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data. Nat. Comput. Sci. 2, 38–46 (2022).

  5. Wu, K. E., Yost, K. E., Chang, H. Y. & Zou, J. BABEL enables cross-modality translation between multiomic profiles at single-cell resolution. Proc. Natl Acad. Sci. USA 118, e2023070118 (2021).

  6. Zhang, R, Meng-Papaxanthos, L, Vert, J.-P. & Noble, W. S. In Research in Computational Molecular Biology (ed. Pe’er, I.) 20–35 (Springer International, 2022).

  7. Cao, K., Bai, X., Hong Y. & Wan, L. Unsupervised topological alignment for single-cell multi-omics integration. Bioinformatics 36, i48–i56 (2020).

  8. Liu, J., Huang, Y., Singh, R., Vert J.-P. & Noble, W. S. Jointly embedding multiple single-cell omics measurements. WABI. 143, 10:1–10:13 (2019).

  9. Cao, Z.-J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).

  10. Zhang, Z., Yang, C. & Zhang, X. scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously. Genome Biol. 23, 139 (2022).

  11. Khan, S. A. et al. scAEGAN: Unification of single-cell genomics data by adversarial learning of latent space correspondences. PLoS ONE 18, e0281315 (2023).

  12. Zhu, J.-Y., Park, T., Isola P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV (2017).

  13. Gala, R. et al. Consistent cross-modal identification of cortical neurons with coupled autoencoders. Nat. Comput. Sci. 1, 120–127 (2021).

    Article  Google Scholar 

  14. Tu, X., Cao, Z.-J., Xia, C.-R., Mostafavi, S., & Gao, G. Cross-Linked Unified Embedding for cross-modality representation learning. Adv. Neural Inf. Process. Syst. 35, 15942–15955 (2022).

  15. Nguyen, N. D., Blaby, I. K. & Wang, D. ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks. BMC Genomics 20, 1003 (2019).

  16. Hotelling, H. Relations between two sets of variates. Biometrika 28, 321–377 (1936).

  17. Gala, R. et al. A coupled autoencoder approach for multi-modal analysis of cell types. NeurIPS, 32, 9263-9272 (2019).

  18. Lundberg, S. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. NeurIPS, 31, 4768-4777 (2017).

  19. Johansen, N. & Quon, G. scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data. Genome Biol. 20, 1–21 (2019).

  20. Li, H., Zhang, Z., Squires, M., Chen, X. & Zhang, X. scMultiSim: simulation of multi-modality single cell data guided by cell–cell interactions and gene regulatory networks. Preprint at (2022).

  21. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

  22. Quinn, L. A., Moore, G. E., Morgan, R. T. & Woods, L. K. Cell lines from human colon carcinoma with unusual cell products, double minutes, and homogeneously staining regions. Cancer Res. 39, 4194–4924 (1979).

    Google Scholar 

  23. Shi, J., Cheng, C., Ma, J., Liew, C.-C. & Geng, X. Gene expression signature for detection of gastric cancer in peripheral blood. Oncol. Lett. 15, 9802–9810 (2018).

    Google Scholar 

  24. Bergdolt, L. & Dunaevsky, A. Brain changes in a maternal immune activation model of neurodevelopmental brain disorders. Prog. Neurobiol. 175, 1–19 (2019).

  25. Harder, J. M. & Libby, R. T. BBC3 (PUMA) regulates developmental apoptosis but not axonal injury induced death in the retina. Mol. Neurodegener. 6, 1–10 (2011).

  26. Song, Y.-H. et al. Somatostatin enhances visual processing and perception by suppressing excitatory inputs to parvalbumin-positive interneurons in V1. Sci. Adv. 6, eaaz0517 (2020).

  27. Kingma, D. P. & Welling, M. Auto-Encoding Variational Bayes. ICLR (2014).

  28. Doersch, C. Tutorial on variational autoencoders. Arxiv Tech Report (2016).

  29. Bowman, S. R. et al. Generating sentences from a continuous space. Assoc. Comput. Linguist. 57, 6008–6019 (2015).

  30. Cui, Z., Change, H., Shan, S. & Chen, X. Generalized unsupervised manifold alignment. Adv. Neural Inform. Process. Syst. 3, 2429–2437 (2014).

    Google Scholar 

  31. Scala, F. et al. Phenotypic variation of transcriptomic cell types in mouse motor cortex. Nature 598, 144–150 (2021).

  32. McInnes, L., Healy J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. J. Open Source Softw. 3, 861 (2018).

  33. Cohen Kalafut, N., Huang, X. & Wang, D. Joint variational autoencoders for multimodal imputation and embedding. Zenodo (2023).

  34. Clyburne-Sherin, A., Fei X. & Green, S. A. Computational reproducibility via containers in social psychology. Meta-Psychology 3, 892 (2019).

  35. Cohen Kalafut, N., Huang X. & Wang, D. Joint variational autoencoders for multimodal imputation and embedding. Code Ocean (2023).

Download references


This work was supported by National Institutes of Health grants R21NS128761, R21NS127432 and R01AG067025 to D.W., P50HD105353 to Waisman Center, National Science Foundation Career Award 2144475 to D.W. and the start-up funding for D.W. from the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The funders had no role in study design, data collection and analysis, decision to publish or manuscript preparation.

Author information

Authors and Affiliations



D.W. conceived and supervised the study. N.C.K. developed and implemented the methodology. X.H. and D.W. verified the methods. N.C.K. performed visualization and analysis. N.C.K, X.H. and D.W. edited and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Daifeng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Xiuwei Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Additional comparison table, supporting method hyperparameters, detailed dataset information and Figs. 1–16.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cohen Kalafut, N., Huang, X. & Wang, D. Joint variational autoencoders for multimodal imputation and embedding. Nat Mach Intell 5, 631–642 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer