Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Machine learning-guided engineering of genetically encoded fluorescent calcium indicators

A preprint version of the article is available at bioRxiv.

Abstract

Here we used machine learning to engineer genetically encoded fluorescent indicators, protein-based sensors critical for real-time monitoring of biological activity. We used machine learning to predict the outcomes of sensor mutagenesis by analyzing established libraries that link sensor sequences to functions. Using the GCaMP calcium indicator as a scaffold, we developed an ensemble of three regression models trained on experimentally derived GCaMP mutation libraries. The trained ensemble performed an in silico functional screen on 1,423 novel, uncharacterized GCaMP variants. As a result, we identified the ensemble-derived GCaMP (eGCaMP) variants, eGCaMP and eGCaMP+, which achieve both faster kinetics and larger ∆F/F0 responses upon stimulation than previously published fast variants. Furthermore, we identified a combinatorial mutation with extraordinary dynamic range, eGCaMP2+, which outperforms the tested sixth-, seventh- and eighth-generation GCaMPs. These findings demonstrate the value of machine learning as a tool to facilitate the efficient engineering of proteins for desired biophysical characteristics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Description of variant library, computational approach and ensemble cross-validation.
Fig. 2: In vitro verification of ensemble predictions.
Fig. 3: Gq/IP3 assay in HEK293 cells to validate ensemble predictions.
Fig. 4: Identification of eGCaMP+ and eGCaMP2+ in HEK293 cells.
Fig. 5: eGCaMP, eGCaMP+ and eGCaMP2+F/F0 and kinetics characteristics in primary neurons.

Similar content being viewed by others

Data availability

All of the datasets generated within this study are available on figshare at https://doi.org/10.6084/M9.FIGSHARE.23750682.V1 (ref. 45). We included the Chen4 and Dana5 datasets used to run our model and an amino acid property matrix derived from AAindex30 in the Supplementary Data. The GCaMP crystal structure used in this paper is accessible online (https://www.rcsb.org/structure/3sg3), GCaMP3–D380Y (RCSB: 3SG3) and in the Supplementary Data. Source data are provided with this paper.

Code availability

The source code is available for download from GitHub at https://doi.org/10.5281/ZENODO.8179256 (ref. 46) and CodeOcean at https://doi.org/10.24433/CO.0624159.v1 (refs. 47,48). Custom Python scripts are available from figshare45.

References

  1. Baird, G. S., Zacharias, D. A. & Tsien, R. Y. Circular permutation and receptor insertion within green fluorescent proteins. Proc. Natl Acad. Sci. USA 96, 11241–11246 (1999).

    Article  Google Scholar 

  2. Tian, L. et al. Imaging neural activity in worms, flies and mice with improved GCaMP calcium indicators. Nat. Methods 6, 875–881 (2009).

    Article  Google Scholar 

  3. Akerboom, J. et al. Optimization of a GCaMP calcium indicator for neural activity imaging. J. Neurosci. 32, 13819–13840 (2012).

    Article  Google Scholar 

  4. Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).

    Article  Google Scholar 

  5. Dana, H. et al. High-performance calcium sensors for imaging activity in neuronal populations and microcompartments. Nat. Methods 16, 649–657 (2019).

    Article  Google Scholar 

  6. Zhang, Y. et al. Fast and sensitive GCaMP calcium indicators for imaging neural populations. Nature 615, 884–891 (2023).

    Article  Google Scholar 

  7. Patriarchi, T. et al. Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science 360, eaat4422 (2018).

    Article  Google Scholar 

  8. Sun, F. et al. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat. Methods 17, 1156–1166 (2020).

    Article  Google Scholar 

  9. Feng, J. et al. A genetically encoded fluorescent sensor for rapid and specific in vivo detection of norepinephrine. Neuron 102, 745–761.e8 (2019).

    Article  Google Scholar 

  10. Dong, A. et al. A fluorescent sensor for spatiotemporally resolved imaging of endocannabinoid dynamics in vivo. Nat. Biotechnol. 40, 787–798 (2022).

    Article  Google Scholar 

  11. Rappleye, M. et al. Optogenetic microwell array screening system: a high-throughput engineering platform for genetically encoded fluorescent indicators. ACS Sens. 8, 4233–4244 (2023).

  12. Saito, Y. et al. Machine-learning-guided library design cycle for directed evolution of enzymes: the effects of training data composition on sequence space exploration. ACS Catal. 11, 14615–14624 (2021).

    Article  Google Scholar 

  13. Romero, P. A. & Arnold, F. H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 10, 866–876 (2009).

    Article  Google Scholar 

  14. Wu, Z., Kan, S. B. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl Acad. Sci. USA. 116, 8852–8858 (2019).

    Article  Google Scholar 

  15. Saito, Y. et al. Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins. ACS Synth. Biol. 7, 2014–2022 (2018).

    Article  Google Scholar 

  16. Bedbrook, C. N. et al. Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics. Nat. Methods 16, 1176–1184 (2019).

    Article  Google Scholar 

  17. Unger, E. K. et al. Directed evolution of a selective and sensitive serotonin sensor via machine learning. Cell 183, 1986–2002.e26 (2020).

    Article  Google Scholar 

  18. Tian, L., Akerboom, J., Schreiter, E. R. & Looger, L. L. in Progress in Brain Research Vol. 196 (eds Knöpfel, T. & Boyden, E. S.) 79–94 (Elsevier, 2012).

  19. Nakai, J., Ohkura, M. & Imoto, K. A high signal-to-noise Ca2+ probe composed of a single green fluorescent protein. Nat. Biotechnol. 19, 137–141 (2001).

    Article  Google Scholar 

  20. Wardill, T. J. et al. A neuron-based screening platform for optimizing genetically-encoded calcium indicators. PLoS ONE 8, e77728 (2013).

    Article  Google Scholar 

  21. Kawashima, S. et al. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36, D202–D205 (2008).

    Article  Google Scholar 

  22. Dong, X., Yu, Z., Cao, W., Shi, Y. & Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 14, 241–258 (2020).

    Article  Google Scholar 

  23. Zhou, Z.-H. in Machine Learning (ed. Zhou, Z.-H.) 181–210 (Springer, 2021).

  24. Yang, Y. et al. Improved calcium sensor GCaMP-X overcomes the calcium channel perturbations induced by the calmodulin in GCaMP. Nat. Commun. 9, 1504 (2018).

    Article  Google Scholar 

  25. Song, Z., Wang, Y., Zhang, F., Yao, F. & Sun, C. Calcium signaling pathways: key pathways in the regulation of obesity. Int. J. Mol. Sci. 20, 2768 (2019).

    Article  Google Scholar 

  26. Nausch, B., Heppner, T. J. & Nelson, M. T. Nerve-released acetylcholine contracts urinary bladder smooth muscle by inducing action potentials independently of IP3-mediated calcium release. Am. J. Physiol. Regul. Integr. Comp. Physiol. 299, R878–R888 (2010).

    Article  Google Scholar 

  27. Ding, J., Luo, A. F., Hu, L., Wang, D. & Shao, F. Structural basis of the ultrasensitive calcium indicator GCaMP6. Sci. China Life Sci. 57, 269–274 (2014).

    Article  Google Scholar 

  28. Souslova, E. A. et al. Single fluorescent protein-based Ca2+ sensors with increased dynamic range. BMC Biotechnol. 7, 37 (2007).

    Article  Google Scholar 

  29. Akerboom, J. et al. Crystal structures of the GCaMP calcium sensor reveal the mechanism of fluorescence signal change and aid rational design. J. Biol. Chem. 284, 6455–6464 (2009).

    Article  Google Scholar 

  30. Barnett, L. M., Hughes, T. E. & Drobizhev, M. Deciphering the molecular mechanism responsible for GCaMP6m’s Ca2+- dependent change in fluorescence. PLoS ONE 12, e0170934 (2017).

    Article  Google Scholar 

  31. Nasu, Y., Shen, Y., Kramer, L. & Campbell, R. E. Structure- and mechanism-guided design of single fluorescent protein-based biosensors. Nat. Chem. Biol. 17, 509–518 (2021).

    Article  Google Scholar 

  32. Fenno, L. E. et al. Comprehensive dual- and triple-feature intersectional single-vector delivery of diverse functional payloads to cells of behaving mammals. Neuron 107, 836–853.e11 (2020).

    Article  Google Scholar 

  33. Kim, C. K. et al. Molecular and circuit-dynamical identification of top-down neural mechanisms for restraint of reward seeking. Cell 170, 1013–1027.e14 (2017).

    Article  Google Scholar 

  34. Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997).

    Article  Google Scholar 

  35. Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

    Article  Google Scholar 

  36. Yao, Z. & Ruzzo, W. L. A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinf. 7, S11 (2006).

    Article  Google Scholar 

  37. Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).

    Article  Google Scholar 

  38. Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).

    Article  Google Scholar 

  39. Pachitariu, M. & Stringer, C. Cellpose 2.0: how to train your own model. Nat. Methods 19, 1634–1641 (2022).

  40. Klima, J. C. et al. Incorporation of sensing modalities into de novo designed fluorescence-activating proteins. Nat. Commun. 12, 856 (2021).

    Article  Google Scholar 

  41. Klima, J. C. et al. Bacterial expression and protein purification of mini-fluorescence-activating proteins. Preprint at Protocol Exchange https://doi.org/10.21203/rs.3.pex-1077/v1 (2021).

  42. Catapano, L. A., Arnold, M. W., Perez, F. A. & Macklis, J. D. Specific neurotrophic factors support the survival of cortical projection neurons at distinct stages of development. J. Neurosci. 21, 8863–8872 (2001).

    Article  Google Scholar 

  43. Martin, D. L. Synthesis and release of neuroactive substances by glial cells. Glia 5, 81–94 (1992).

    Article  MathSciNet  Google Scholar 

  44. Kim, C. K. et al. Simultaneous fast measurement of circuit dynamics at multiple sites across the mammalian brain. Nat. Methods 13, 325–328 (2016).

    Article  Google Scholar 

  45. Wait, S. J. et al. Machine learning ensemble directed engineering of genetically encoded fluorescent calcium indicators. figshare https://doi.org/10.6084/M9.FIGSHARE.23750682.V1 (2023).

  46. sarahwaity/ProteiML: v0.1.1. Zenodo https://doi.org/10.5281/ZENODO.8179256 (2023).

  47. Wait, S. J. A. B. ProteiML. CodeOcean https://doi.org/10.24433/CO.0624159.v1 (2024).

  48. BerndtLab—overview. GitHub https://github.com/BerndtLab (2024).

  49. Dragicevic, P. in Modern Statistical Methods for HCI (eds Robertson, J. & Kaptein, M.) 291–330 (Springer, 2016).

Download references

Acknowledgements

S.J.W. was supported by the National Science Foundation (DGE-2140004) and the Herbold Foundation. M.E. was supported by ‘La Caixa’ foundation and the Rafael del Pino Foundation. J.D.L. was supported by 1F31DA056121-01A1. S.L. and C.K.K. were supported by the Burroughs Wellcome Fund (CASI 1019469) and the Searle Scholars Program (SSP-2022-107). A.B. was supported by the Brain Research Foundation, University of Washington (UW) Royalty Research Fund, UW Innovation Pilot Award, National Institute of General Medical Sciences (R01 GM139850-01), National Institute of Mental Health (RF1MH130391), National Institute of Neurological Disorders and Stroke (U01NS128537), National Institute on Drug Abuse (R21DA051193, P30 DA048736 01 Pilot) and the McKnight Foundation’s Technological Innovations in Neuroscience Award. The research received additional support from the UW Center of Excellence in Neurobiology of Addiction, Pain, and Emotion Center and Institute for Stem Cell and Regenerative Medicine Shared Equipment. We thank H. R. Daniels (UC Davis) for assistance with histology for in vivo data. We would like to thank M. B. Colby for extensive input and guidance throughout this process.

Author information

Authors and Affiliations

Authors

Contributions

S.J.W. conceived and designed the experiments, performed the experiments, analyzed the data, contributed materials and analysis tools, and wrote the paper. S.L. conceived and designed the experiments, performed the experiments, analyzed the data and wrote the paper. M.E., M.Ra. and J.D.L. analyzed the data and contributed materials/analysis tools. S.A.C. analyzed the data. A.S. contributed materials/analysis tools. A.A. performed the experiments, analyzed the data and contributed materials/analysis tools. L.T. performed the experiments and analyzed the data. M.Re., F.M.-H. and D.B. contributed materials/analysis tools. A.B. and C.K.K. conceived and designed the experiments, analyzed the data, contributed materials/analysis tools and wrote the paper.

Corresponding author

Correspondence to Andre Berndt.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Jie Pan, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Excitation and Emission Spectra of eGCaMP Sensors.

A. Purified GCaMP6s protein diluted into buffer containing either 10 mM EGTA or 10 mM CaEGTA. Emission spectra were calculated using a fixed excitation at 450 nm and excitation spectra were calculated using a fixed emission at 520 nm. B. Purified eGCaMP protein diluted into buffer containing either 10 mM EGTA or 10 mM CaEGTA. Emission spectra were calculated using a fixed excitation at 450 nm and excitation spectra were calculated using a fixed emission at 520 nm. C. Purified eGCaMP+ protein diluted into buffer containing either 10 mM EGTA or 10 mM CaEGTA. Emission spectra were calculated using a fixed excitation at 450 nm and excitation spectra were calculated using a fixed emission at 520 nm. D. Purified eGCaMP2+ protein diluted into buffer containing either 10 mM EGTA or 10 mM CaEGTA. Emission spectra were calculated using a fixed excitation at 450 nm and excitation spectra were calculated using a fixed emission at 520 nm.

Source data

Extended Data Fig. 2 In Vivo Performance of eGCaMP+ and eGCaMP2+ expressed in mouse mPFC.

A. Experimental timeline. Mice were injected with an AAV-Cre dependent-GCaMP variant in the mPFC and a retroAAV-Syn-Cre was injected in NAc. An optic fiber was implanted above the mPFC to allow for light delivery and fluorescence recording. B. Representative fluorescence images of GCaMP expression in mPFC and NAc (stained with anti-GFP-Alexafluor488). Scale bar, 130 µm. C. Mean Z-scored fluorescence changes in response to a foot shock (n = 4 total shock trials, collected from 2 mice for each GCaMP variant, Line depicts mean, shading depicts SEM). D. Comparison of the mean shock response between the three GCaMP variants. Top: schematic of how the shock response was calculated (see methods). Bottom: Mean change in Z-scored fluorescence response to shock (n = 4 total shock trials, collected from 2 mice for each GCaMP version). P-values were calculated using a One-way ANOVA followed by Tukey’s multiple comparisons in panels (D) and (E): *P < 0.05. All data show mean +/− SEM. E. Comparison of the mean decay to shock between the three GCaMP variants. Top: schematic of how the decay to shock was calculated (see methods). Bottom: Mean change in Z-scored fluorescence decay to shock (n = 4 total shock trials, collected from 2 mice for each GCaMP version). P-values were calculated using a One-way ANOVA followed by Tukey’s multiple comparisons in panels (D) and (E): *P < 0.05. All data show mean +/- SEM.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–8 and Tables 1–26.

Reporting Summary

Peer Review File

Supplementary Data 1

Crystal structure of GCaMP3.

Supplementary Data 2

Matrix to encode 554 amino acid properties used for model implementations.

Supplementary Data 3

Mutational library for engineering GCaMP6, linking GCaMP characteristics to single mutations.

Supplementary Data 4

Mutational library for engineering GCaMP7, linking GCaMP characteristics to single mutations.

Supplementary Data 5

PCR primers used for mutating GCaMP variants to generate eGCaMP variants.

Source data

Source Data Fig. 1

Data used for plots in Fig. 1a,b,e,f.

Source Data Fig. 2

Statistical data for mutation prediction.

Source Data Fig. 3

Fluorescence intensity and decay kinetics, raw data.

Source Data Fig. 4

Fluorescence intensity and decay kinetics, raw data.

Source Data Fig. 5

Fluorescence intensity and decay kinetics, raw data.

Source Data Extended Data Fig. 1

Emission and excitation spectra raw data plots.

Source Data Extended Data Fig. 2

Fiber photometry raw data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wait, S.J., Expòsit, M., Lin, S. et al. Machine learning-guided engineering of genetically encoded fluorescent calcium indicators. Nat Comput Sci 4, 224–236 (2024). https://doi.org/10.1038/s43588-024-00611-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-024-00611-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing