Consistent cross-modal identification of cortical neurons with coupled autoencoders

Gala, Rohan; Budzillo, Agata; Baftizadeh, Fahimeh; Miller, Jeremy; Gouwens, Nathan; Arkhipov, Anton; Murphy, Gabe; Tasic, Bosiljka; Zeng, Hongkui; Hawrylycz, Michael; Sümbül, Uygar

doi:10.1038/s43588-021-00030-1

Brief Communication
Published: 22 February 2021

Consistent cross-modal identification of cortical neurons with coupled autoencoders

Nature Computational Science volume 1, pages 120–127 (2021)Cite this article

4141 Accesses
18 Citations
22 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

Consistent identification of neurons in different experimental modalities is a key problem in neuroscience. Although methods to perform multimodal measurements in the same set of single neurons have become available, parsing complex relationships across different modalities to uncover neuronal identity is a growing challenge. Here we present an optimization framework to learn coordinated representations of multimodal data and apply it to a large multimodal dataset profiling mouse cortical interneurons. Our approach reveals strong alignment between transcriptomic and electrophysiological characterizations, enables accurate cross-modal data prediction, and identifies cell types that are consistent across modalities.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Coordinated representations of transcriptomic and electrophysiological profiles with coupled autoencoders.**

**Fig. 2: Cross-modal reconstructions capture cell-type-specific gene expression patterns and electrophysiological features.**

**Fig. 3: Deriving a consensus cell-type clustering.**

Manifold learning analysis suggests strategies to align single-cell multimodal data of neuronal electrophysiology and transcriptomics

Article Open access 19 November 2021

Disentangling the flow of signals between populations of neurons

Article 18 August 2022

Using deep learning to quantify neuronal activation from single-cell and spatial transcriptomic data

Article Open access 26 January 2024

Data availability

The Patch-seq transcriptomic data are available at http://data.nemoarchive.org/other/grant/AIBS_patchseq/transcriptome/scell/SMARTseq/processed/analysis/20200611/, whereas the electrophysiological data are available at https://dandiarchive.org/dandiset/000020. For the Scala et al. 2019 dataset, the sequencing data are available under accession no. GSE134378, whereas the electrophysiological data are available at https://doi.org/10.5281/zenodo.3336165. The Scala et al. 2020 dataset was obtained from the public repository related to this work at https://github.com/berenslab/mini-atlas. Source Data are available with this paper.

Code availability

Code for the coupled autoencoder implementation and analysis are available at https://github.com/AllenInstitute/coupledAE-patchseq. An interactive version of the code base is provided in ref. ³⁷.

References

Tremblay, R., Lee, S. & Rudy, B. Gabaergic interneurons in the neocortex: from cellular properties to circuits. Neuron 91, 260–292 (2016).
Article Google Scholar
Zeng, H. & Sanes, J. R. Neuronal cell-type classification: challenges, opportunities and the path forward. Nature Rev. Neurosci. 18, 530 (2017).
Article Google Scholar
Paul, A. et al. Transcriptional architecture of synaptic communication delineates gabaergic neuron identity. Cell 171, 522–539 (2017).
Article Google Scholar
Huang, Z. J. & Paul, A. The diversity of gabaergic neurons and neural communication elements. Nat. Rev. Neurosci. 20, 563–572 (2019).
Article Google Scholar
Ascoli, G. A. et al. Petilla terminology: nomenclature of features of gabaergic interneurons of the cerebral cortex. Nat. Rev. Neurosci. 9, 557 (2008).
Article Google Scholar
Berens, P. & Euler, T. Neuronal diversity in the retina. e-Neuroforum 23, 93–101 (2017).
Google Scholar
Adkins, R. S. et al. A multimodal cell census and atlas of the mammalian primary motor cortex. Preprint at https://www.biorxiv.org/content/10.1101/2020.10.19.343129v1 (2020).
Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).
Article Google Scholar
Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).
Article Google Scholar
Chen, K. H. et al. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Cadwell, C. R. et al. Multimodal profiling of single-cell morphology, electrophysiology, and gene expression using Patch-seq. Nat Protoc. 12, 2531–2553 (2017).
Article Google Scholar
Somogyi, P., Tamas, G., Lujan, R. & Buhl, E. H. Salient features of synaptic organisation in the cerebral cortex. Brain Res. Rev. 26, 113–135 (1998).
Article Google Scholar
DeFelipe, J. et al. New insights into the classification and nomenclature of cortical gabaergic interneurons. Nat. Rev. Neurosci. 14, 202–216 (2013).
Article Google Scholar
Gowens, N. W. et al. Integrated morphoelectric and transcriptomic classification of cortical GABAergic cells. Cell 183, 935–953 (2020).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009).
Kobak, D. et al. Sparse reduced-rank regression for exploratory visualization of multimodal data sets. Preprint at https://www.biorxiv.org/content/10.1101/302208v2 (2019).
Gouwens, N. W. et al. Classification of electrophysiological and morphological neuron types in the mouse visual cortex. Nat. Neurosci. 22, 1182–1195 (2019).
Article Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Article Google Scholar
Smith, S. J. et al. Single-cell transcriptomic evidence for dense intracortical neuropeptide networks. eLife 8, e47889 (2019).
Article Google Scholar
Smith, S. J., Hawrylycz, M., Rossier, J. & Sümbül, U. New light on cortical neuropeptides and synaptic network plasticity. Curr. Opin. Neurobiol. 63, 176–188 (2020).
Article Google Scholar
Földy, C. et al. Single-cell rnaseq reveals cell adhesion molecule profiles in electrophysiologically defined neurons. Proc. Natl Acad. Sci. USA 113, E5222–E5231 (2016).
Article Google Scholar
Scala, F. et al. Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas. Nat. Commun. 10, 4174 (2019).
Article Google Scholar
Scala, F. et al. Phenotypic variation of transcriptomic cell types in mouse motor cortex. Nature https://doi.org/10.1038/s41586-020-2907-3 (2020).
Harris, K. D. et al. Classes and continua of hippocampal ca1 inhibitory neurons revealed by single-cell transcriptomics. PLoS Biol. 16, e2006387 (2018).
Article Google Scholar
Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2018).
Article Google Scholar
Li, Y., Yang, M. & Zhang, Z. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31, 1863–1883 (2018).
Article Google Scholar
Wang, K., Yin, Q., Wang, W., Wu, S. & Wang, L. A comprehensive survey on cross-modal retrieval. Preprint at https://arxiv.org/abs/1607.06215 (2016).
Andrew, G., Arora, R., Bilmes, J. & Livescu, K. Deep canonical correlation analysis. In International Conference on Machine Learning 1247–1255 (JMLR, 2013).
Wang, W., Arora, R., Livescu, K. & Bilmes, J. On deep multi-view representation learning. In International Conference on Machine Learning 1083–1092 (JMLR, 2015).
Feng, F., Wang, X. & Li, R. Cross-modal retrieval with correspondence autoencoder. In Proc. 22nd ACM International Conference on Multimedia 7–16 (ACM, 2014).
Gala, R. et al. A coupled autoencoder approach for multi-modal analysis of cell types. In Advances in Neural Information Processing Systems 9263–9272 (Curran Associates, 2019).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on International Conference on Machine Learning Vol. 37 448–456 (JMLR, 2015).
Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740 (2014).
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
MathSciNet MATH Google Scholar
Freedman, D. & Diaconis, P. On the histogram as a density estimator: L2 theory. Zeitschrift Wahrsch. Verwandte Gebiete 57, 453–476 (1981).
Article MathSciNet Google Scholar
Bakken, T. E. et al. Single- nucleus and single-cell transcriptomes compared in matched cortical cell types. PLoS One 13, e0209648 (2018).
Article Google Scholar
Gala, R. et al. Consistent Cross-modal Identification of Cortical Neurons with Coupled Autoencoders (CodeOcean, 2020); https://doi.org/10.24433/CO.4098627.v1

Download references

Acknowledgements

We wish to thank the Allen Institute for Brain Science founder, P. G. Allen, for his vision, encouragement and support. This work was supported by the NIH grant 1RF1MH123220-01.

Author information

Authors and Affiliations

Allen Institute, Seattle, WA, USA
Rohan Gala, Agata Budzillo, Fahimeh Baftizadeh, Jeremy Miller, Nathan Gouwens, Anton Arkhipov, Gabe Murphy, Bosiljka Tasic, Hongkui Zeng, Michael Hawrylycz & Uygar Sümbül

Authors

Rohan Gala
View author publications
You can also search for this author in PubMed Google Scholar
Agata Budzillo
View author publications
You can also search for this author in PubMed Google Scholar
Fahimeh Baftizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Miller
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Gouwens
View author publications
You can also search for this author in PubMed Google Scholar
Anton Arkhipov
View author publications
You can also search for this author in PubMed Google Scholar
Gabe Murphy
View author publications
You can also search for this author in PubMed Google Scholar
Bosiljka Tasic
View author publications
You can also search for this author in PubMed Google Scholar
Hongkui Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Michael Hawrylycz
View author publications
You can also search for this author in PubMed Google Scholar
Uygar Sümbül
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.G. and U.S. designed the methodology and wrote the manuscript. R.G. wrote software and performed formal analysis. A.B., F.B., J.M., N.G., A.A., and G.M. performed data curation and pre-processing. R.G., B.T. M.H. and U.S. wrote the manuscript revisions. R.G., A.B., F.B., J.M., N.G., G.M., B.T., H.Z., M.H., and U.S. conceptualized the study.

Corresponding authors

Correspondence to Rohan Gala or Uygar Sümbül.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Yann Sweeney was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Reference taxonomy for well-represented GABAergic neurons.

Cells were mapped to the complete hierarchical classification tree for cortical cells with a marker gene based procedure. Here we show a subset of the full hierarchical tree, which consists of only those leaf nodes that are well-represented (n≥10) in the Patch-seq dataset. At the highest resolution, this tree consists of 53 cell type labels. The lowest resolution view consists of a single label (n59) which comprises of all GABAergic cortical neurons.

Extended Data Fig. 2 Cell type distribution.

The distribution of samples according to the reference hierarchy cell type label assignment. Types with less than 10 samples are not shown.

Extended Data Fig. 3 Hyper-parameter search.

(Left and center) Reconstruction errors relative to the value over uncoupled networks, and (Right) coupling error over different values for α_e and λ_te averages over validation sets. The value for α_t was set to 1.0 and representation dimensionality was set to 3 for these experiments. As coupling is increased, the reconstruction error increases illustrating the trade-off between coupling and reconstruction accuracy.

Extended Data Fig. 4 Decoder augmentation improves cross-modal prediction accuracy.

We use cross modal representations to augment the input for decoder subnetworks while training. Reconstruction performance as measured by the coefficient of determination (R²) for linear baselines (PC-CCA), and coupled autoencoders with- and without- augmentation. Error bars show standard deviation over 20 cross validation folds.

Source data

Extended Data Fig. 5 Effect of latent space dimensionality on reconstruction performance.

errors for coupled autoencoder and linear baseline for different latent space dimensionality dim ∈ {3, 5, 10}. Coupled autoencoders reconstruct the data more accurately than linear baselines (p < 10⁻⁴, two-sided Wilcoxon signed-rank test). The only exception is for \({X}_{\text{t}}\to {\widetilde{X}}_{\text{e}}\) with dimensionality set to 10, where the null hypothesis cannot be rejected. We would like the dimensionality to be as low as possible for downstream tasks such as clustering and classification with limited data, and as high enough for good performance at tasks such as data imputation or cross-modal data prediction.

Source data

Extended Data Fig. 6 Reconstruction of gene expression using coordinated representations.

Within-modality reconstructions for individual genes are decoded from the coordinated λ_te = 1.0 representation z_t obtained for the transcriptomic data. Cross-modal reconstructions are obtained from the corresponding z_e, which is the representation for the electrophysiological data. The cross-modal reconstructions are comparable to within-modality reconstructions, and a majority of the neuropeptide precursor genes are reconstructed well, as suggested by the high coefficient of determination (R²) values.reconstructed well, as suggested by the high coefficient of determination (R²) values.

Source data

Extended Data Fig. 7 Reconstruction of electrophysiological features using coordinated representations.

The within-modality reconstructions for electrophysiological features are decoded from the coordinated λ_te = 1.0 representation z_e obtained for the electrophysiological data. Cross-modal reconstructions are obtained from the corresponding z_t, which is the representation for the transcriptomic data. Features that are reconstructed well in the within-modality case are analyzed in the context of transcriptomic cell types in the main text.

Source data

Extended Data Fig. 8 Predicting cell types based on gene expression.

Gene expression profiles of the 524 inhibitory neurons Scala et al. 2020 dataset were used to obtain 3-d representations without additional training of the coupled autoencoder trained on the Gouwens et al. dataset. QDA classifiers trained to predict cell types for the Gouwens et al. dataset were thereafter used to predict labels for the cells in the Scala et al. 2020 dataset. The contingency matrix comparing the predicted cell types and the cell types assigned by Scala et al. is shown. Overall accuracy of label prediction is 66%, with many inaccuracies being accounted for by closely related types.

Extended Data Fig. 9 Predicting electrophysiological properties from gene expression.

Gene expression profiles for 524 inhibitory neurons in the Scala et al. 2020 dataset were used as input for the coupled autoencoder that was trained only with the Gouwens et al. dataset. The electrophysiological measurements were not measured the same way in the two datasets; cross-modal setting only allows predictions for electrophysiological features of the Gouwens et al. dataset for cells in the Scala et al. dataset. There is a strong correlation (Pearson’s r is shown on each plot) for many related measurements across the datasets. Cells are colored according to the cell type assignments of Scala et al. 2020, who mapped them to the same reference taxonomy that is used throughout this study.

Extended Data Fig. 10 Reference taxonomy labels do not partition the data well.

Average silhouette scores for test samples, for successive mergings of the reference taxonomy with uncoupled representations do not indicate any particularly favorable number of clusters. Error bars show mean ± SD over 5 best initializations (based on reconstruction accuracy) of single modality (uncoupled) autoencoders operating on X_t and X_e. Here the uncoupled representations z_t and z_e serve as low dimensional representations of the standalone data. The per-label silhouette score for the 33-merged reference taxonomy labels with uncoupled representations performs worse than consensus cluster labels on both, z_t (b) and z_e (c).

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–6.

Source data

Source Data Fig. 1

Data for Fig. 1d–f.

Source Data Fig. 2

Data for heatmaps in Fig. 2a–d.

Source Data Fig. 3

Data for Fig. 3a–e.

Source Data Extended Data Fig. 4

All distributions shown in the figure.

Source Data Extended Data Fig. 5

All distributions shown in the figure.

Source Data Extended Data Fig. 6

Coefficient of determination values for within- and cross-modality reconstructions of neuropeptide gene expression.

Source Data Extended Data Fig. 7

Coefficient of determination values for within- and cross-modality reconstructions of electrophysiological features.

Source Data Extended Data Fig. 10

All silhouette scores displayed in the figure.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gala, R., Budzillo, A., Baftizadeh, F. et al. Consistent cross-modal identification of cortical neurons with coupled autoencoders. Nat Comput Sci 1, 120–127 (2021). https://doi.org/10.1038/s43588-021-00030-1

Download citation

Received: 30 June 2020
Accepted: 19 January 2021
Published: 22 February 2021
Issue Date: February 2021
DOI: https://doi.org/10.1038/s43588-021-00030-1

This article is cited by

CMOT: Cross-Modality Optimal Transport for multimodal inference
- Sayali Anil Alatkar
- Daifeng Wang
Genome Biology (2023)
Predictive and robust gene selection for spatial transcriptomics
- Ian Covert
- Rohan Gala
- Su-In Lee
Nature Communications (2023)
Joint variational autoencoders for multimodal imputation and embedding
- Noah Cohen Kalafut
- Xiang Huang
- Daifeng Wang
Nature Machine Intelligence (2023)
Explainable multi-task learning for multi-modality biological data analysis
- Xin Tang
- Jiawei Zhang
- Jia Liu
Nature Communications (2023)
A multi-encoder variational autoencoder controls multiple transformational features in single-cell image analysis
- Luke Ternes
- Mark Dane
- Young Hwan Chang
Communications Biology (2022)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links