Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Consistent cross-modal identification of cortical neurons with coupled autoencoders

A preprint version of the article is available at bioRxiv.

Abstract

Consistent identification of neurons in different experimental modalities is a key problem in neuroscience. Although methods to perform multimodal measurements in the same set of single neurons have become available, parsing complex relationships across different modalities to uncover neuronal identity is a growing challenge. Here we present an optimization framework to learn coordinated representations of multimodal data and apply it to a large multimodal dataset profiling mouse cortical interneurons. Our approach reveals strong alignment between transcriptomic and electrophysiological characterizations, enables accurate cross-modal data prediction, and identifies cell types that are consistent across modalities.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Coordinated representations of transcriptomic and electrophysiological profiles with coupled autoencoders.
Fig. 2: Cross-modal reconstructions capture cell-type-specific gene expression patterns and electrophysiological features.
Fig. 3: Deriving a consensus cell-type clustering.

Similar content being viewed by others

Data availability

The Patch-seq transcriptomic data are available at http://data.nemoarchive.org/other/grant/AIBS_patchseq/transcriptome/scell/SMARTseq/processed/analysis/20200611/, whereas the electrophysiological data are available at https://dandiarchive.org/dandiset/000020. For the Scala et al. 2019 dataset, the sequencing data are available under accession no. GSE134378, whereas the electrophysiological data are available at https://doi.org/10.5281/zenodo.3336165. The Scala et al. 2020 dataset was obtained from the public repository related to this work at https://github.com/berenslab/mini-atlas. Source Data are available with this paper.

Code availability

Code for the coupled autoencoder implementation and analysis are available at https://github.com/AllenInstitute/coupledAE-patchseq. An interactive version of the code base is provided in ref. 37.

References

  1. Tremblay, R., Lee, S. & Rudy, B. Gabaergic interneurons in the neocortex: from cellular properties to circuits. Neuron 91, 260–292 (2016).

    Article  Google Scholar 

  2. Zeng, H. & Sanes, J. R. Neuronal cell-type classification: challenges, opportunities and the path forward. Nature Rev. Neurosci. 18, 530 (2017).

    Article  Google Scholar 

  3. Paul, A. et al. Transcriptional architecture of synaptic communication delineates gabaergic neuron identity. Cell 171, 522–539 (2017).

    Article  Google Scholar 

  4. Huang, Z. J. & Paul, A. The diversity of gabaergic neurons and neural communication elements. Nat. Rev. Neurosci. 20, 563–572 (2019).

    Article  Google Scholar 

  5. Ascoli, G. A. et al. Petilla terminology: nomenclature of features of gabaergic interneurons of the cerebral cortex. Nat. Rev. Neurosci. 9, 557 (2008).

    Article  Google Scholar 

  6. Berens, P. & Euler, T. Neuronal diversity in the retina. e-Neuroforum 23, 93–101 (2017).

    Google Scholar 

  7. Adkins, R. S. et al. A multimodal cell census and atlas of the mammalian primary motor cortex. Preprint at https://www.biorxiv.org/content/10.1101/2020.10.19.343129v1 (2020).

  8. Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).

    Article  Google Scholar 

  9. Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).

    Article  Google Scholar 

  10. Chen, K. H. et al. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).

  11. Cadwell, C. R. et al. Multimodal profiling of single-cell morphology, electrophysiology, and gene expression using Patch-seq. Nat Protoc. 12, 2531–2553 (2017).

    Article  Google Scholar 

  12. Somogyi, P., Tamas, G., Lujan, R. & Buhl, E. H. Salient features of synaptic organisation in the cerebral cortex. Brain Res. Rev. 26, 113–135 (1998).

    Article  Google Scholar 

  13. DeFelipe, J. et al. New insights into the classification and nomenclature of cortical gabaergic interneurons. Nat. Rev. Neurosci. 14, 202–216 (2013).

    Article  Google Scholar 

  14. Gowens, N. W. et al. Integrated morphoelectric and transcriptomic classification of cortical GABAergic cells. Cell 183, 935–953 (2020).

  15. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009).

  16. Kobak, D. et al. Sparse reduced-rank regression for exploratory visualization of multimodal data sets. Preprint at https://www.biorxiv.org/content/10.1101/302208v2 (2019).

  17. Gouwens, N. W. et al. Classification of electrophysiological and morphological neuron types in the mouse visual cortex. Nat. Neurosci. 22, 1182–1195 (2019).

    Article  Google Scholar 

  18. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  Google Scholar 

  19. Smith, S. J. et al. Single-cell transcriptomic evidence for dense intracortical neuropeptide networks. eLife 8, e47889 (2019).

    Article  Google Scholar 

  20. Smith, S. J., Hawrylycz, M., Rossier, J. & Sümbül, U. New light on cortical neuropeptides and synaptic network plasticity. Curr. Opin. Neurobiol. 63, 176–188 (2020).

    Article  Google Scholar 

  21. Földy, C. et al. Single-cell rnaseq reveals cell adhesion molecule profiles in electrophysiologically defined neurons. Proc. Natl Acad. Sci. USA 113, E5222–E5231 (2016).

    Article  Google Scholar 

  22. Scala, F. et al. Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas. Nat. Commun. 10, 4174 (2019).

    Article  Google Scholar 

  23. Scala, F. et al. Phenotypic variation of transcriptomic cell types in mouse motor cortex. Nature https://doi.org/10.1038/s41586-020-2907-3 (2020).

  24. Harris, K. D. et al. Classes and continua of hippocampal ca1 inhibitory neurons revealed by single-cell transcriptomics. PLoS Biol. 16, e2006387 (2018).

    Article  Google Scholar 

  25. Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2018).

    Article  Google Scholar 

  26. Li, Y., Yang, M. & Zhang, Z. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31, 1863–1883 (2018).

    Article  Google Scholar 

  27. Wang, K., Yin, Q., Wang, W., Wu, S. & Wang, L. A comprehensive survey on cross-modal retrieval. Preprint at https://arxiv.org/abs/1607.06215 (2016).

  28. Andrew, G., Arora, R., Bilmes, J. & Livescu, K. Deep canonical correlation analysis. In International Conference on Machine Learning 1247–1255 (JMLR, 2013).

  29. Wang, W., Arora, R., Livescu, K. & Bilmes, J. On deep multi-view representation learning. In International Conference on Machine Learning 1083–1092 (JMLR, 2015).

  30. Feng, F., Wang, X. & Li, R. Cross-modal retrieval with correspondence autoencoder. In Proc. 22nd ACM International Conference on Multimedia 7–16 (ACM, 2014).

  31. Gala, R. et al. A coupled autoencoder approach for multi-modal analysis of cell types. In Advances in Neural Information Processing Systems 9263–9272 (Curran Associates, 2019).

  32. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on International Conference on Machine Learning Vol. 37 448–456 (JMLR, 2015).

  33. Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740 (2014).

    Article  Google Scholar 

  34. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    MathSciNet  MATH  Google Scholar 

  35. Freedman, D. & Diaconis, P. On the histogram as a density estimator: L2 theory. Zeitschrift Wahrsch. Verwandte Gebiete 57, 453–476 (1981).

    Article  MathSciNet  Google Scholar 

  36. Bakken, T. E. et al. Single- nucleus and single-cell transcriptomes compared in matched cortical cell types. PLoS One 13, e0209648 (2018).

    Article  Google Scholar 

  37. Gala, R. et al. Consistent Cross-modal Identification of Cortical Neurons with Coupled Autoencoders (CodeOcean, 2020); https://doi.org/10.24433/CO.4098627.v1

Download references

Acknowledgements

We wish to thank the Allen Institute for Brain Science founder, P. G. Allen, for his vision, encouragement and support. This work was supported by the NIH grant 1RF1MH123220-01.

Author information

Authors and Affiliations

Authors

Contributions

R.G. and U.S. designed the methodology and wrote the manuscript. R.G. wrote software and performed formal analysis. A.B., F.B., J.M., N.G., A.A., and G.M. performed data curation and pre-processing. R.G., B.T. M.H. and U.S. wrote the manuscript revisions. R.G., A.B., F.B., J.M., N.G., G.M., B.T., H.Z., M.H., and U.S. conceptualized the study.

Corresponding authors

Correspondence to Rohan Gala or Uygar Sümbül.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Yann Sweeney was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Reference taxonomy for well-represented GABAergic neurons.

Cells were mapped to the complete hierarchical classification tree for cortical cells with a marker gene based procedure. Here we show a subset of the full hierarchical tree, which consists of only those leaf nodes that are well-represented (n≥10) in the Patch-seq dataset. At the highest resolution, this tree consists of 53 cell type labels. The lowest resolution view consists of a single label (n59) which comprises of all GABAergic cortical neurons.

Extended Data Fig. 2 Cell type distribution.

The distribution of samples according to the reference hierarchy cell type label assignment. Types with less than 10 samples are not shown.

Extended Data Fig. 3 Hyper-parameter search.

(Left and center) Reconstruction errors relative to the value over uncoupled networks, and (Right) coupling error over different values for αe and λte averages over validation sets. The value for αt was set to 1.0 and representation dimensionality was set to 3 for these experiments. As coupling is increased, the reconstruction error increases illustrating the trade-off between coupling and reconstruction accuracy.

Extended Data Fig. 4 Decoder augmentation improves cross-modal prediction accuracy.

We use cross modal representations to augment the input for decoder subnetworks while training. Reconstruction performance as measured by the coefficient of determination (R2) for linear baselines (PC-CCA), and coupled autoencoders with- and without- augmentation. Error bars show standard deviation over 20 cross validation folds.

Source data

Extended Data Fig. 5 Effect of latent space dimensionality on reconstruction performance.

errors for coupled autoencoder and linear baseline for different latent space dimensionality dim  {3, 5, 10}. Coupled autoencoders reconstruct the data more accurately than linear baselines (p < 10−4, two-sided Wilcoxon signed-rank test). The only exception is for \({X}_{\text{t}}\to {\widetilde{X}}_{\text{e}}\) with dimensionality set to 10, where the null hypothesis cannot be rejected. We would like the dimensionality to be as low as possible for downstream tasks such as clustering and classification with limited data, and as high enough for good performance at tasks such as data imputation or cross-modal data prediction.

Source data

Extended Data Fig. 6 Reconstruction of gene expression using coordinated representations.

Within-modality reconstructions for individual genes are decoded from the coordinated λte = 1.0 representation zt obtained for the transcriptomic data. Cross-modal reconstructions are obtained from the corresponding ze, which is the representation for the electrophysiological data. The cross-modal reconstructions are comparable to within-modality reconstructions, and a majority of the neuropeptide precursor genes are reconstructed well, as suggested by the high coefficient of determination (R2) values.reconstructed well, as suggested by the high coefficient of determination (R2) values.

Source data

Extended Data Fig. 7 Reconstruction of electrophysiological features using coordinated representations.

The within-modality reconstructions for electrophysiological features are decoded from the coordinated λte = 1.0 representation ze obtained for the electrophysiological data. Cross-modal reconstructions are obtained from the corresponding zt, which is the representation for the transcriptomic data. Features that are reconstructed well in the within-modality case are analyzed in the context of transcriptomic cell types in the main text.

Source data

Extended Data Fig. 8 Predicting cell types based on gene expression.

Gene expression profiles of the 524 inhibitory neurons Scala et al. 2020 dataset were used to obtain 3-d representations without additional training of the coupled autoencoder trained on the Gouwens et al. dataset. QDA classifiers trained to predict cell types for the Gouwens et al. dataset were thereafter used to predict labels for the cells in the Scala et al. 2020 dataset. The contingency matrix comparing the predicted cell types and the cell types assigned by Scala et al. is shown. Overall accuracy of label prediction is 66%, with many inaccuracies being accounted for by closely related types.

Extended Data Fig. 9 Predicting electrophysiological properties from gene expression.

Gene expression profiles for 524 inhibitory neurons in the Scala et al. 2020 dataset were used as input for the coupled autoencoder that was trained only with the Gouwens et al. dataset. The electrophysiological measurements were not measured the same way in the two datasets; cross-modal setting only allows predictions for electrophysiological features of the Gouwens et al. dataset for cells in the Scala et al. dataset. There is a strong correlation (Pearson’s r is shown on each plot) for many related measurements across the datasets. Cells are colored according to the cell type assignments of Scala et al. 2020, who mapped them to the same reference taxonomy that is used throughout this study.

Extended Data Fig. 10 Reference taxonomy labels do not partition the data well.

Average silhouette scores for test samples, for successive mergings of the reference taxonomy with uncoupled representations do not indicate any particularly favorable number of clusters. Error bars show mean ± SD over 5 best initializations (based on reconstruction accuracy) of single modality (uncoupled) autoencoders operating on Xt and Xe. Here the uncoupled representations zt and ze serve as low dimensional representations of the standalone data. The per-label silhouette score for the 33-merged reference taxonomy labels with uncoupled representations performs worse than consensus cluster labels on both, zt (b) and ze (c).

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–6.

Source data

Source Data Fig. 1

Data for Fig. 1d–f.

Source Data Fig. 2

Data for heatmaps in Fig. 2a–d.

Source Data Fig. 3

Data for Fig. 3a–e.

Source Data Extended Data Fig. 4

All distributions shown in the figure.

Source Data Extended Data Fig. 5

All distributions shown in the figure.

Source Data Extended Data Fig. 6

Coefficient of determination values for within- and cross-modality reconstructions of neuropeptide gene expression.

Source Data Extended Data Fig. 7

Coefficient of determination values for within- and cross-modality reconstructions of electrophysiological features.

Source Data Extended Data Fig. 10

All silhouette scores displayed in the figure.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gala, R., Budzillo, A., Baftizadeh, F. et al. Consistent cross-modal identification of cortical neurons with coupled autoencoders. Nat Comput Sci 1, 120–127 (2021). https://doi.org/10.1038/s43588-021-00030-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-021-00030-1

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing