A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data

Nguyen, Nam D.; Huang, Jiawei; Wang, Daifeng

doi:10.1038/s43588-021-00185-x

Article
Published: 31 January 2022

A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data

Nature Computational Science volume 2, pages 38–46 (2022)Cite this article

2201 Accesses
17 Citations
34 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

The phenotypes of complex biological systems are fundamentally driven by various multi-scale mechanisms. Multi-modal data, such as single-cell multi-omics data, enable a deeper understanding of underlying complex mechanisms across scales for phenotypes. We have developed an interpretable regularized learning model, deepManReg, to predict phenotypes from multi-modal data. First, deepManReg employs deep neural networks to learn cross-modal manifolds and then to align multi-modal features onto a common latent space. Second, deepManReg uses cross-modal manifolds as a feature graph to regularize the classifiers for improving phenotype predictions and also for prioritizing the multi-modal features and cross-modal interactions for the phenotypes. We apply deepManReg to (1) an image dataset of handwritten digits with multi-features and (2) single-cell multi-modal data (Patch-seq data) including transcriptomics and electrophysiology for neuronal cells in the mouse brain. We show that deepManReg improves phenotype prediction in both datasets, and also prioritizes genes and electrophysiological features for the phenotypes of neuronal cells.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: DeepManReg: a deep manifold-regularized learning model for improving phenotype prediction from multi-modal data.**

**Fig. 2: Multi-modal feature alignment of handwritten digits.**

**Fig. 3: Regularized classification results for the mfeat digits dataset.**

**Fig. 4: The network showing the relationships across two modalities (genes and electrophysiology).**

**Fig. 5: Regularized classification results for single-cell multi-modal data in the mouse visual cortex.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Self-supervised learning for human activity recognition using 700,000 person-days of wearable data

Article Open access 12 April 2024

Data availability

The multiple-features (mfeat) dataset is available from ref. ¹¹. The Patch-seq transcriptomics data and electrophysiological data are available from ref. ¹². The simulated multi-omics data and gene regulatory network (that is, the example model data of dyngen for five genes) are available from ref. ¹⁶. Source data are provided with this paper.

Code availability

Code for deepManReg implementation and data analysis are available at https://github.com/daifengwanglab/deepManReg. An interactive version of the code base is provided in ref. ³⁸.

References

Larranaga, P. et al. Machine learning in bioinformatics. Brief Bioinformatics 7, 86–112 (2006).
Article Google Scholar
Subramanian, I., Verma, S., Kumar, S., Jere, A. & Anamika, K. Multi-omics data integration, interpretation and its application. Bioinform. Biol. Insights 14, 1177932219899051 (2020).
Article Google Scholar
Sima, C. et al. Impact of error estimation on feature selection. Pattern Recogn. 38, 2472–2482 (2005).
Article Google Scholar
Wang, C. & Mahadevan, S. A general framework for manifold alignment. In AAAI Fall Symposium: Manifold Learning and Its Applications 79–86 (AAAI, 2009).
Nguyen, N. D., Blaby, I. K. & Wang, D. ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks. BMC Genomics 20, 1003 (2019).
Article Google Scholar
Nguyen, N. D. & Wang, D. Multiview learning for understanding functional multiomics. PLoS Comput. Biol. 16, e1007677 (2020).
Article Google Scholar
Brorson, I. S. et al. No differential gene expression for CD4⁺ T cells of MS patients and healthy controls. Mult. Scler. J. Exp. Transl. Clin. 5, 2055217319856903 (2019).
Google Scholar
Ng, A. Y. Feature selection, L₁ vs. L₂ regularization and rotational invariance. In Proc. 21st International Conference on Machine Learning (eds Greiner, R. & Schuurmans, D.) 78 (ACM Press, 2004).
Li, C. & Li, H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24, 1175–1182 (2008).
Article Google Scholar
Sandler, T., Blitzer, J., Talukdar, P. & Ungar, L. Regularized learning with networks of features. Adv. Neural Inf. Process. Syst. 21, 1401–1408 (2008).
Google Scholar
van Breukelen, M., Duin, R. P. W., Tax, D. M. J. & Den Hartog, J. E. Handwritten digit recognition by combined classifiers. Kybernetika 34, 381–386 (1998).
MATH Google Scholar
Gouwens, N. W. et al. Integrated morphoelectric and transcriptomic classification of cortical gabaergic cells. Cell 183, 935–953 (2020).
Article Google Scholar
Wang, C. & Mahadevan, S. Manifold alignment without correspondence. In Proc. 21st International Joint Conference on Artificial Intelligence (ed. Boutilier, C.) 1273–1278 (ACM, 2009).
Hotelling, H. in Breakthroughs in Statistics 162–190 (Springer, 1992).
Welch, J. D., Hartemink, A. J. & Prins, J. F. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol. 18, 138 (2017).
Article Google Scholar
Cannoodt, R., Saelens, W., Deconinck, L. & Saeys, Y. Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells. Nat. Commun. 12, 3942 (2021).
Article Google Scholar
Cadwell, C. R. et al. Multimodal profiling of single-cell morphology, electrophysiology and gene expression using Patch-seq. Nat. Protoc. 12, 2531–2553 (2017).
Article Google Scholar
Intrinsic Physiology Feature Extractor (IPFX) Python package (Allen Institute, 2021); https://ipfx.readthedocs.io/
Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H. & Santos, J. Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Mag. 13, 59–76 (2018).
Article Google Scholar
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3319–3328 (PMLR, 2017).
Nguyen, N. D., Jin, T. & Wang, D. Varmole: a biologically drop-connect deep neural network model for prioritizing disease risk variants and genes. Bioinformatics 37, 1772–1775 (2021).
Article Google Scholar
Kokhlikyanet, N. et al. Captum: a unified and generic model interpretability library for PyTorch. CoRR abs/2009.07896 (2020).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).
Article Google Scholar
Cunningham, J. P. & Ghahramani, Z. Linear dimensionality reduction: survey, insights and generalizations. J. Mach. Learn. Res. 16, 2859–2900 (2015).
MathSciNet MATH Google Scholar
Boumal, N., Mishra, B., Absil, P.-A. & Sepulchre, R. Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15, 1455–1459 (2014).
MATH Google Scholar
Sato, H. & Aihara, K. Cholesky QR-based retraction on the generalized Stiefel manifold. Comput. Opt. Appl. 72, 293–308 (2019).
Article MathSciNet Google Scholar
Fowlkes, C., Belongie, S., Chung, F. & Malik, J. Spectral grouping using the nystrom method. IEEE Trans. Pattern Anal. Mach. Intell. 26, 214–225 (2004).
Article Google Scholar
Belkin, M., Niyogi, P. & Sindhwani, V. On manifold regularization. In Proc. Tenth International Workshop on Artificial Intelligence and Statistics (eds Cowell, R. G. & Ghahramani, Z.) R5, 17–24 (PMLR, 2005).
Ando, R. K. & Zhang, T. Learning on graph with Laplacian regularization. Adv. Neural Inf. Process. Syst. 19, 25–32 (2007).
Google Scholar
Singh Tomar, V. & Rose, R. C. Manifold regularized deep neural networks. In Proc. 15th Annual Conference of the International Speech Communication Association (eds Li, H. et al.) 348–352 (ISCA, 2014).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR, 2017).
Liu, J., Huang, Y., Singh, R., Vert, J.-P. & Noble, W. S. Jointly embedding multiple single-cell omics measurements. In 19th International Workshop on Algorithms in Bioinformatics (eds Huber, K. T. & Gusfield, D.) 10:1–10:13 (WABI, 2019).
Vu, H., Carey, C. & Mahadevan, S. Manifold warping: manifold alignment over time. In Proc. AAAI Conference on Artificial Intelligence Vol. 26 (eds Hoffmann, J. & Selman, B.) 1155–1161 (AAAI, 2012).
Wang, C., Krafft, P., Mahadevan, S., Ma, Y. & Fu, Y. Manifold alignment. In Manifold Learning: Theory and Applications 95–120 (CRC, 2011).
Belkin, M. & Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003).
Article Google Scholar
Stiefel, E. Richtungsfelder und Fernparallelismus in n-dimensionalen Mannigfaltigkeiten. Commentarii Math. Helvetici 8, 305–353 (1935).
Article MathSciNet Google Scholar
Paszke, A. et al. Automatic differentiation in PyTorch. In 31st Conference on Neural Information Processing Systems (NIPS) (Workshop on Autodiff, 2017).
Nguyen, N. D., Huang, J. & Wang, D. deepManReg: a deep manifold-regularized learning model for improving phenotype prediction from multi-modal data [source code] (CodeOcean, 2021); https://doi.org/10.24433/co.1706111.v1

Download references

Acknowledgements

We thank K. Huynh (Stony Brook University) for useful discussions. This work was supported by National Institutes of Health grants nos. R01AG067025, R21CA237955 and U01MH116492 to D.W. and U54HD090256 to Waisman Center. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Nam D. Nguyen
Present address: Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Jiawei Huang
Present address: College of Business, University of Cincinnati, Cincinnati, OH, USA

Authors and Affiliations

Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
Nam D. Nguyen
Waisman Center, University of Wisconsin–Madison, Madison, WI, USA
Nam D. Nguyen & Daifeng Wang
Department of Statistics, University of Wisconsin–Madison, Madison, WI, USA
Jiawei Huang
Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI, USA
Daifeng Wang
Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI, USA
Daifeng Wang

Authors

Nam D. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Daifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.W. and N.D.N. conceptualized the study. D.W. and N.D.N. designed the algorithm and methodology. N.D.N. and J.H. implemented software. D.W., N.D.N. and J.H. performed analysis. D.W., N.D.N. and J.H. wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Daifeng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks James J. Cai, Bamdev Mishra and Daniel Osorio for their contribution to the peer review of this work. Handling editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Materials

Supplementary Figs. 1–6, Algorithm and Table 1.

Peer review information

Supplementary Data 1

Prioritized top 20 genes and top 5 electrophysiological features of cell layers in the mouse visual cortex by feature importance scores of deepManReg.

Source data

Source Data Fig. 2

Numerical data for scatter plots in Fig. 2

Source Data Fig. 3

Numerical data for the plots in Fig. 3; source data for panels a, b and c are in separate folders.

Source Data Fig. 5

Numerical data for the plots in Fig. 5; source data for panels a, b and c are in separate folders.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, N.D., Huang, J. & Wang, D. A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data. Nat Comput Sci 2, 38–46 (2022). https://doi.org/10.1038/s43588-021-00185-x

Download citation

Received: 10 May 2021
Accepted: 13 December 2021
Published: 31 January 2022
Issue Date: January 2022
DOI: https://doi.org/10.1038/s43588-021-00185-x

This article is cited by

DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype–phenotype prediction
- Pramod Bharadwaj Chandrashekar
- Sayali Alatkar
- Daifeng Wang
Genome Medicine (2023)
Joint variational autoencoders for multimodal imputation and embedding
- Noah Cohen Kalafut
- Xiang Huang
- Daifeng Wang
Nature Machine Intelligence (2023)
Predictive scale-bridging simulations through active learning
- Satish Karra
- Mohamed Mehana
- Hari S. Viswanathan
Scientific Reports (2023)
Deep learning for video game genre classification
- Yuhang Jiang
- Lukun Zheng
Multimedia Tools and Applications (2023)
Interpretable multi-modal data integration
- Daniel Osorio
Nature Computational Science (2022)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links