Abstract
The phenotypes of complex biological systems are fundamentally driven by various multi-scale mechanisms. Multi-modal data, such as single-cell multi-omics data, enable a deeper understanding of underlying complex mechanisms across scales for phenotypes. We have developed an interpretable regularized learning model, deepManReg, to predict phenotypes from multi-modal data. First, deepManReg employs deep neural networks to learn cross-modal manifolds and then to align multi-modal features onto a common latent space. Second, deepManReg uses cross-modal manifolds as a feature graph to regularize the classifiers for improving phenotype predictions and also for prioritizing the multi-modal features and cross-modal interactions for the phenotypes. We apply deepManReg to (1) an image dataset of handwritten digits with multi-features and (2) single-cell multi-modal data (Patch-seq data) including transcriptomics and electrophysiology for neuronal cells in the mouse brain. We show that deepManReg improves phenotype prediction in both datasets, and also prioritizes genes and electrophysiological features for the phenotypes of neuronal cells.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The multiple-features (mfeat) dataset is available from ref. 11. The Patch-seq transcriptomics data and electrophysiological data are available from ref. 12. The simulated multi-omics data and gene regulatory network (that is, the example model data of dyngen for five genes) are available from ref. 16. Source data are provided with this paper.
Code availability
Code for deepManReg implementation and data analysis are available at https://github.com/daifengwanglab/deepManReg. An interactive version of the code base is provided in ref. 38.
References
Larranaga, P. et al. Machine learning in bioinformatics. Brief Bioinformatics 7, 86–112 (2006).
Subramanian, I., Verma, S., Kumar, S., Jere, A. & Anamika, K. Multi-omics data integration, interpretation and its application. Bioinform. Biol. Insights 14, 1177932219899051 (2020).
Sima, C. et al. Impact of error estimation on feature selection. Pattern Recogn. 38, 2472–2482 (2005).
Wang, C. & Mahadevan, S. A general framework for manifold alignment. In AAAI Fall Symposium: Manifold Learning and Its Applications 79–86 (AAAI, 2009).
Nguyen, N. D., Blaby, I. K. & Wang, D. ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks. BMC Genomics 20, 1003 (2019).
Nguyen, N. D. & Wang, D. Multiview learning for understanding functional multiomics. PLoS Comput. Biol. 16, e1007677 (2020).
Brorson, I. S. et al. No differential gene expression for CD4+ T cells of MS patients and healthy controls. Mult. Scler. J. Exp. Transl. Clin. 5, 2055217319856903 (2019).
Ng, A. Y. Feature selection, L1 vs. L2 regularization and rotational invariance. In Proc. 21st International Conference on Machine Learning (eds Greiner, R. & Schuurmans, D.) 78 (ACM Press, 2004).
Li, C. & Li, H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24, 1175–1182 (2008).
Sandler, T., Blitzer, J., Talukdar, P. & Ungar, L. Regularized learning with networks of features. Adv. Neural Inf. Process. Syst. 21, 1401–1408 (2008).
van Breukelen, M., Duin, R. P. W., Tax, D. M. J. & Den Hartog, J. E. Handwritten digit recognition by combined classifiers. Kybernetika 34, 381–386 (1998).
Gouwens, N. W. et al. Integrated morphoelectric and transcriptomic classification of cortical gabaergic cells. Cell 183, 935–953 (2020).
Wang, C. & Mahadevan, S. Manifold alignment without correspondence. In Proc. 21st International Joint Conference on Artificial Intelligence (ed. Boutilier, C.) 1273–1278 (ACM, 2009).
Hotelling, H. in Breakthroughs in Statistics 162–190 (Springer, 1992).
Welch, J. D., Hartemink, A. J. & Prins, J. F. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol. 18, 138 (2017).
Cannoodt, R., Saelens, W., Deconinck, L. & Saeys, Y. Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells. Nat. Commun. 12, 3942 (2021).
Cadwell, C. R. et al. Multimodal profiling of single-cell morphology, electrophysiology and gene expression using Patch-seq. Nat. Protoc. 12, 2531–2553 (2017).
Intrinsic Physiology Feature Extractor (IPFX) Python package (Allen Institute, 2021); https://ipfx.readthedocs.io/
Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H. & Santos, J. Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Mag. 13, 59–76 (2018).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3319–3328 (PMLR, 2017).
Nguyen, N. D., Jin, T. & Wang, D. Varmole: a biologically drop-connect deep neural network model for prioritizing disease risk variants and genes. Bioinformatics 37, 1772–1775 (2021).
Kokhlikyanet, N. et al. Captum: a unified and generic model interpretability library for PyTorch. CoRR abs/2009.07896 (2020).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).
Cunningham, J. P. & Ghahramani, Z. Linear dimensionality reduction: survey, insights and generalizations. J. Mach. Learn. Res. 16, 2859–2900 (2015).
Boumal, N., Mishra, B., Absil, P.-A. & Sepulchre, R. Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15, 1455–1459 (2014).
Sato, H. & Aihara, K. Cholesky QR-based retraction on the generalized Stiefel manifold. Comput. Opt. Appl. 72, 293–308 (2019).
Fowlkes, C., Belongie, S., Chung, F. & Malik, J. Spectral grouping using the nystrom method. IEEE Trans. Pattern Anal. Mach. Intell. 26, 214–225 (2004).
Belkin, M., Niyogi, P. & Sindhwani, V. On manifold regularization. In Proc. Tenth International Workshop on Artificial Intelligence and Statistics (eds Cowell, R. G. & Ghahramani, Z.) R5, 17–24 (PMLR, 2005).
Ando, R. K. & Zhang, T. Learning on graph with Laplacian regularization. Adv. Neural Inf. Process. Syst. 19, 25–32 (2007).
Singh Tomar, V. & Rose, R. C. Manifold regularized deep neural networks. In Proc. 15th Annual Conference of the International Speech Communication Association (eds Li, H. et al.) 348–352 (ISCA, 2014).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR, 2017).
Liu, J., Huang, Y., Singh, R., Vert, J.-P. & Noble, W. S. Jointly embedding multiple single-cell omics measurements. In 19th International Workshop on Algorithms in Bioinformatics (eds Huber, K. T. & Gusfield, D.) 10:1–10:13 (WABI, 2019).
Vu, H., Carey, C. & Mahadevan, S. Manifold warping: manifold alignment over time. In Proc. AAAI Conference on Artificial Intelligence Vol. 26 (eds Hoffmann, J. & Selman, B.) 1155–1161 (AAAI, 2012).
Wang, C., Krafft, P., Mahadevan, S., Ma, Y. & Fu, Y. Manifold alignment. In Manifold Learning: Theory and Applications 95–120 (CRC, 2011).
Belkin, M. & Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003).
Stiefel, E. Richtungsfelder und Fernparallelismus in n-dimensionalen Mannigfaltigkeiten. Commentarii Math. Helvetici 8, 305–353 (1935).
Paszke, A. et al. Automatic differentiation in PyTorch. In 31st Conference on Neural Information Processing Systems (NIPS) (Workshop on Autodiff, 2017).
Nguyen, N. D., Huang, J. & Wang, D. deepManReg: a deep manifold-regularized learning model for improving phenotype prediction from multi-modal data [source code] (CodeOcean, 2021); https://doi.org/10.24433/co.1706111.v1
Acknowledgements
We thank K. Huynh (Stony Brook University) for useful discussions. This work was supported by National Institutes of Health grants nos. R01AG067025, R21CA237955 and U01MH116492 to D.W. and U54HD090256 to Waisman Center. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
D.W. and N.D.N. conceptualized the study. D.W. and N.D.N. designed the algorithm and methodology. N.D.N. and J.H. implemented software. D.W., N.D.N. and J.H. performed analysis. D.W., N.D.N. and J.H. wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks James J. Cai, Bamdev Mishra and Daniel Osorio for their contribution to the peer review of this work. Handling editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplemental Materials
Supplementary Figs. 1–6, Algorithm and Table 1.
Supplementary Data 1
Prioritized top 20 genes and top 5 electrophysiological features of cell layers in the mouse visual cortex by feature importance scores of deepManReg.
Source data
Source Data Fig. 2
Numerical data for scatter plots in Fig. 2
Source Data Fig. 3
Numerical data for the plots in Fig. 3; source data for panels a, b and c are in separate folders.
Source Data Fig. 5
Numerical data for the plots in Fig. 5; source data for panels a, b and c are in separate folders.
Rights and permissions
About this article
Cite this article
Nguyen, N.D., Huang, J. & Wang, D. A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data. Nat Comput Sci 2, 38–46 (2022). https://doi.org/10.1038/s43588-021-00185-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-021-00185-x
This article is cited by
-
DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype–phenotype prediction
Genome Medicine (2023)
-
Joint variational autoencoders for multimodal imputation and embedding
Nature Machine Intelligence (2023)
-
Predictive scale-bridging simulations through active learning
Scientific Reports (2023)
-
Deep learning for video game genre classification
Multimedia Tools and Applications (2023)
-
Interpretable multi-modal data integration
Nature Computational Science (2022)