Exploration of natural red-shifted rhodopsins using a machine learning-based Bayesian experimental design

Inoue, Keiichi; Karasuyama, Masayuki; Nakamura, Ryoko; Konno, Masae; Yamada, Daichi; Mannen, Kentaro; Nagata, Takashi; Inatsu, Yu; Yawo, Hiromu; Yura, Kei; Béjà, Oded; Kandori, Hideki; Takeuchi, Ichiro

doi:10.1038/s42003-021-01878-9

Download PDF

Article
Open access
Published: 19 March 2021

Exploration of natural red-shifted rhodopsins using a machine learning-based Bayesian experimental design

Communications Biology volume 4, Article number: 362 (2021) Cite this article

4053 Accesses
14 Citations
31 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 30 April 2021

This article has been updated

Abstract

Microbial rhodopsins are photoreceptive membrane proteins, which are used as molecular tools in optogenetics. Here, a machine learning (ML)-based experimental design method is introduced for screening rhodopsins that are likely to be red-shifted from representative rhodopsins in the same subfamily. Among 3,022 ion-pumping rhodopsins that were suggested by a protein BLAST search in several protein databases, the ML-based method selected 65 candidate rhodopsins. The wavelengths of 39 of them were able to be experimentally determined by expressing proteins with the Escherichia coli system, and 32 (82%, p = 7.025 × 10⁻⁵) actually showed red-shift gains. In addition, four showed red-shift gains >20 nm, and two were found to have desirable ion-transporting properties, indicating that they would be potentially useful in optogenetics. These findings suggest that data-driven ML-based approaches play effective roles in the experimental design of rhodopsin and other photobiological studies. (141/150 words).

Color-tuning of natural variants of heliorhodopsin

Article Open access 13 January 2021

Se-Hwan Kim, Kimleng Chuon, … Kwang-Hwan Jung

Retinal chromophore charge delocalization and confinement explain the extreme photophysics of Neorhodopsin

Article Open access 04 November 2022

Riccardo Palombo, Leonardo Barneschi, … Massimo Olivucci

Red-shifting mutation of light-driven sodium-pump rhodopsin

Article Open access 30 April 2019

Keiichi Inoue, María del Carmen Marín, … Hideki Kandori

Introduction

Microbial rhodopsins are photoreceptive membrane proteins widely distributed in bacteria, archaea, unicellular eukaryotes, and giant viruses^1,2. They consist of seven transmembrane (TM) α helices, with a retinal chromophore bound to a conserved lysine residue in the seventh helix (Fig. 1a). The first microbial rhodopsin, bacteriorhodopsin (BR), was discovered in the plasma membrane of the halophilic archaea Halobacterium salinarum (formerly called H. halobium)³. BR forms a purple-colored patch in the plasma membrane called purple membrane, which outwardly transports H⁺ using sunlight energy⁴. After the discovery of BR, various types of microbial rhodopsins were reported from diverse microorganisms, and recent progress in genome sequencing techniques has uncovered several thousand microbial rhodopsin genes^1,5,6,7. These microbial rhodopsins show various types of biological functions upon light absorption, leading to all-trans-to-13-cis retinal isomerization. Among them, ion transporters, including light-driven ion pumps and light-gated ion channels, are the most ubiquitous (Fig. 1b). Ion-transporting rhodopsins can transport several types of cations and anions, including H⁺, Na⁺, K⁺, halides (Cl^–, Br^–, I^–), NO₃^–, and SO₄²,^8,9,10. The molecular mechanisms of ion-transporting rhodopsins have been detailed in numerous biophysical, structural, and theoretical studies^1,2.

**Fig. 1: Structure and phylogenetic tree of microbial rhodopsins.**

In recent years, many ion-transporting rhodopsins have been used as molecular tools in optogenetics to control the activity of animal neurons optically in vivo by heterologous expression¹¹, and optogenetics has revealed various new insights regarding the neural network relevant to memory, movement, and emotional behavior^12,13,14,15. However, strong light scattering by biological tissues and the cellular toxicity of shorter wavelength light make precise optical control difficult. To circumvent this difficulty, new molecular optogenetics tools based on red-shifted rhodopsins, which can be controlled by weak scattering and low toxicity longer-wavelength light are urgently needed. Therefore, many approaches to obtain red-shifted rhodopsins have been reported, including gene screening, amino acid mutation based on biophysical and structural insights, and the introduction of retinal analogs^16,17,18. The insights obtained in these experimental studies, and further theoretical and computational studies^19,20,21,22 revealed basic physical principle regulating absorption maximum wavelengths (λ_max) of rhodopsins (also called spectral or color-tuning rule) in which the distortion of retinal polyene chain induced by steric interactions with surrounding residues, electrostatic interaction between protonated retinal Schiff base and counterion(s), and polarizability of the retinal binding pocket play essential role²³. The λ_max of several rhodopsins could be red-shifted by 20–40 nm without impairing the ion-transport function based on these physicochemical insights^17,24,25. These are successful examples of knowledge-driven experimental approach. Recently, a new method using a chimeric rhodopsin vector and functional assay was reported to screen the λ_max and proton transport activities of several microbial rhodopsins that are present in specific environments²⁶. This method identified partial sequences of red-shifted yellow (560–570 nm)-absorbing proteorhodopsin (PR), the most abundant outward H⁺-pumping bacterial rhodopsin subfamily, from the marine environment. These works identified several red-shifted rhodopsins^15,16,18,27. Especially, most successful optogenetic tools are red-shifted channel rhodopsins such as Chrimson^27,28 and RubyACR²⁹ which can induce and inhibit neural firing by absorbing 590 and 610-nm light, respectively. The rational amino acid mutation based on the structural insight further red-shifted the λ_max of Chrimson to 608 nm²⁷. The development of next-generation sequencing technology is expected to continue to more rapidly identify a large number of new rhodopsin genes, including proteins with even longer wavelength-shifted absorption. However, screening of all of them either by experimental or theoretical methods would be very costly. Therefore, a less expensive and more efficient approach to screen red-shifted rhodopsins is needed, and data-driven study is expected as the third class of approach to investigate the color-tuning rule of rhodopsins at low cost.

To estimate the λ_max of rhodopsins, we recently introduced a data-driven approach³⁰. In this previous study, we investigated the statistical relationship between the amino acid types at each position of the seven TM helices and the absorption wavelength of rhodopsins. We constructed a database containing 796 wild-type (WT) rhodopsins and their variants, the λ_max of which had been reported in earlier studies. Then, we evaluated the strength of the relationship with a data-splitting approach, i.e., the data set was divided into a training set and a test set; the former was used to construct the predictive model, and the latter was used to estimate the predictive ability. The results of this “proof-of-concept’’ study suggested that the λ_max of an unknown family of rhodopsins could be predicted with an average error of ±7.8 nm, which is comparable to the mean absolute error of λ_max estimated by the hybrid quantum mechanics/molecular mechanics (QM/MM)²¹ method. Considering the computational cost of both approaches, the data-driven approach was found to be much more efficient than the QM/MM approach, while the latter provides insights on the physical origin controlling λ_max.

Encouraged by this result, in this study, we introduced a machine-learning (ML)-based experimental design method which enables us screening more efficiently the candidates of rhodopsins that are likely to have red-shift gains with data-driven assist compared to the random or knowledge-driven screening. For this aim, we constructed a new dataset of 3022 wild-type putative ion-pump rhodopsins which were collected from public gene databases (NCBI non-redundant protein sequences, and metagenomic proteins³¹ and the Tara Oceans microbiome and virome database³²) and for which λ_max have not been experimentally investigated yet to explore new red-shifted rhodopsins. The goal of the present study was to identify rhodopsins with λ_max longer than the wavelengths of the representative rhodopsins in each subfamily of microbial rhodopsins for which the λ_max has already been reported (base wavelengths). Here, we call the degrees of red-shift of the wavelength from the base wavelength the “red-shift gain”. We focus on rhodopsins with large red-shift gains because this would lead to the identification of amino acid types and residue positions that play important roles in red-shifting absorption wavelengths. Also, it is practically important in optogenetics applications to have a wide variety of ion-pumping rhodopsins from each subfamily to construct a new basis for rhodopsin toolboxes with red-shifted absorption and various types of ion species that can be transported. We constructed the ML-based experimental design method so that it could properly predict the expected red-shift gains, and applied this new method to 3022 putative ion-pumping rhodopsins derived from archaeal and bacterial origins that can be easily expressed in Escherichia coli (Fig. 1b).

We conducted experiments by introducing the synthesized rhodopsin genes into E. coli to measure the absorption wavelengths of 65 candidates for which the ML-based experimental design method predicted that the expected gains were >10 nm. Of these 65 selected candidates, 39 showed substantial coloring in E. coli cells, 32 showed actual red-shift gains, 6 showed blue-shifts, and 1 showed no change, i.e., 82% (=32/39, 7.025 × 10⁻⁵) of the selected candidates showed actual red-shift gains. We then investigated the ion-transportation properties of the rhodopsins for which the red-shift gains were >20 nm, and found that some actually had desirable ion-transporting properties, suggesting that they (and their variants) could potentially be used as new optogenetics tools. Furthermore, the differences in the amino acid sequences of the newly examined rhodopsins and the representative ones in the same subfamily could be used for further investigation of the red-shifting mechanisms. This result suggests that it should be possible to find rhodopsins that have desired properties without conducting exhaustive biological experiments, and suggests that data-driven ML-based approaches should play effective roles in the experimental design of rhodopsin and other photobiological studies.

Results

Construction of an ML-based experimental design method for predicting expected red-shift gain

To screen rhodopsins that would have large red-shift gains, it is necessary to consider the uncertainty of prediction in the form of “predictive distributions”³³. By using predictive distributions, it is possible to consider appropriately the “exploration–exploitation trade-off” in screening processes^34,35, where exploration indicates an approach that prefers candidates with larger predictive variances, and exploitation indicates an approach that prefers candidates with longer predictive mean wavelengths (Fig. 2). Here, the term “exploration–exploitation’’ is a technical term used in the fields of active learning and experimental design, and “explorations’’ in the title of this paper is used in a broader sense and is not directly related to the former technical terminology. We employed a Bayesian modeling framework to compute the predictive distributions of candidate rhodopsin red-shift gains. We then consider an exploration–exploitation trade-off by selecting candidate rhodopsins based on a criterion called “expected red-shift gains”.

**Fig. 2: Illustrations of exploration–exploitation for screening rhodopsins with red-shift gain.**

To compute the expected red-shift gains of a wide variety of rhodopsins, we developed ML-based experimental design method based on the statistical analysis in our previous study³⁰. Figure 3 shows a schematic illustration of the ML-based experimental design method. First, we added 88 WT microbial rhodopsins and their variants for which the λ_max had recently been reported in the literature or determined by our experiments, to a previously reported data set³⁰. In other words, the new training data set consisted of the amino acid sequences and λ_max of 884 WT microbial rhodopsins and their variants (Supplementary Data 1). Second, the new ML model used only N = 24 residues located around the retinal chromophore (Supplementary Fig. 1) because our previous study³⁰ indicated that amino acid residues at these 24 positions play significant roles in predicting absorption wavelengths (Fig. 3a). Third, M = 18 amino acid physicochemical features (Supplementary Data 2) were used as inputs in the ML model, as opposed to the amino acid types used in the previous statistical analysis. This enabled us to predict the absorption wavelengths of a wide range of target rhodopsins that contain unexplored amino acid types in the training data at certain positions. Therefore, an amino acid sequence is transformed into an M × N = 432 dimensional feature vector ${\boldsymbol{x}} \in {\Bbb R}^{MN}$ by concatenating x_i,j, the j-th feature of the i-th residue (Fig. 3b). We consider a linear prediction model $f\left( {\boldsymbol{x}} \right) = \mu + \mathop {\sum}\nolimits_{i = 1}^N {\mathop {\sum}\nolimits_{j = 1}^M {\beta _{i,j}x_{i,j}} }$, where β_i,j is the parameter for the j-th feature of the i-th residue, and μ is the intercept term.

**Fig. 3: Overview of the ML-based exploration of natural red-shifted rhodopsins.**

Finally, to consider the exploration–exploitation trade-off appropriately in the screening process, we introduce a Bayesian modeling framework, which allows us to compute the predictive distributions of red-shift gains. Specifically, we employed Bayesian sparse modeling called BLASSO³⁶ (see the Methods section for details). This enables us to provide not only the mean, but also the variance of the predicted wavelengths. Unlike classical regression analysis, BLASSO regards the model parameters β_i,j and μ as random variables generated from underlying distributions, as illustrated in Fig. 3c. Therefore, the wavelength prediction f(x) is also represented as a distribution. The red-shift gain is defined as gain = max(f(x)−λ_base’0), where λ_base is the wavelength of the representative rhodopsin in the same subfamily whose λ_max has been experimentally determined and reported in the literature (Supplementary Data 3). Note that the red-shift gain is positive if f (x) is greater than λ_base; otherwise, it takes the value of zero. Since f (x) is regarded as a random variable in BLASSO, the red-shift gain is also regarded as a random variable. Therefore, we employ the expected value of the red-shift gain, denoted by ${\Bbb E}[{\mathrm{gain}}]$, as the screening criterion where ${\Bbb E}$ represents the expectation of a random variable. Illustrative examples of ${\Bbb E}[{\mathrm{gain}}]$ are shown in Fig. 3d. Unlike the simple expectation of the wavelength prediction ${\Bbb E}[f({\boldsymbol{x}})]$, ${\Bbb E}[{\mathrm{gain}}]$ depends on the variance of the predictive distribution (For example, ${\Bbb E}[{\mathrm{gain}}]$ of target #4 is larger than #1 in Fig. 2f though ${\Bbb E}\left[ {f\left( {\boldsymbol{x}} \right)} \right] - \lambda _{{\mathrm{base}}}$ of #4 is smaller than #1 in Fig. 2e). This encourages the exploration of rhodopsin candidates having large uncertainty (for exploration), as opposed to only those having longer wavelengths with high confidence (for exploitation).

Screening potential red-shifted microbial rhodopsins based on expected red-shift gains

The target data set to explore red-shifted microbial rhodopsins was constructed with putative microbial rhodopsin genes collected by a protein BLAST (blastp) search³⁷ of the NCBI non-redundant protein and metagenome databases³¹, as well as the Tara Oceans microbiome and virome databases³². As a result, we obtained a non-redundant data set of 5558 microbial rhodopsin genes (Fig. 1b). The sequences were aligned by ClustalW and categorized to subfamilies of microbial rhodopsins based on the phylogenic distances, as reported previously³⁸. Among these, 3022 rhodopsin genes, which did not have identical sequences in the training data and from bacterial and archaeal origins, were extracted because their λ_max can be easily measured by expressing in E. coli cells. We calculated the ${\Bbb E}[{\mathrm{gain}}]$ of these 3022 genes (Supplementary Data 4), and then selected 65 genes of putative light-driven ion-pump rhodopsins showing an ${\Bbb E}[{\mathrm{gain}}]$ >10 nm for further experimental evaluation, as ion-pump rhodopsins can be used as new optogenetics tools.

Experimental measurement of the absorption wavelengths of microbial rhodopsins showing high red-shift gains

We synthesized the selected 65 genes that showed an ${\Bbb E}[{\mathrm{gain}}]$ > 10 nm. These were then introduced into E. coli cells, and the proteins expressed in the presence of 10 μM all-trans retinal. As a result, 39 E. coli cells showed substantial coloring, indicating high expression of folded protein, and their λ_max were determined by observing ultraviolet (UV)-visible absorption changes upon bleaching of the expressed rhodopsins through a hydrolysis reaction of their retinal with hydroxylamine, as previously reported³⁰ (Fig. 4). The observed gains were compared with the ${\Bbb E}[{\mathrm{gain}}]$ shown in Table 1. A full list of unexpressed genes is shown in Supplementary Data 5. In total, 32 out of 39 genes showed a longer wavelength than their base wavelength (that is, positive red-shift gain; Fig. 5), suggesting that our ML-based model can significantly improve the efficiency of screening to explore new red-shifted microbial rhodopsins compared with random sampling (p = 7.025 × 10⁻⁵ by a binomial test assuming that the probability of red-shift gain for random choice is 50%).

**Fig. 4: λ_max of 39 microbial rhodopsins in solubilized *E. coli* membrane observed upon hydroxylamine bleach reaction.**

Table 1 Predicted and observed gains of 39 microbial rhodopsins expressed in E. coli.

Full size table

**Fig. 5: Observed wavelengths and expected red-shift gains.**

Ion-transport function of red-shifted microbial rhodopsins

Overall, 4 of the 39 rhodopsins showed red-shifted absorption ≥20 nm compared with the base wavelengths (Table 1): three were halorhodopsins (HRs) from bacterial species^10,39,40 (to distinguish classical HRs from archaeal species, these are hereafter referred to as bacterial-halorhodopsins [BacHRs]), and one was a PR⁴¹. Their ion-transport activities were then investigated by expressing in E. coli cells and observing the pH change in external solvent whose pH was initially set to 7 (Fig. 6a). Upon light illumination, BacHRs from Rubrivirga marina and Myxosarcina sp. GI1 showed alkalization of external solvent, which was enhanced by addition of the protonophore (CCCP), which increases the H⁺ permeability of the cell membrane, and the light-dependent alkalizations disappeared when anions were exchanged from Cl^– to NO₃^–, indicating that these were light-driven Cl^– pumps, similar to other rhodopsins in the same BacHR subfamily^10,39. By contrast, Cyanothece sp. PCC 7425 did not show any substantial transport. While no transporting function can be attributed to the heterologous expression in E. coli, it would have considerably different molecular properties from other BacHRs. PR from a metagenome sequence (ECV93033.1) showed acidification of external solvent that was abolished by the addition of CCCP and was independent from ionic species in the solvent. Hence, this was a new red-shifted outward H⁺ pump compared with typical PRs whose λ_max are present at ca. 520 nm⁴¹. Furthermore, these rhodopsins are needed to be functional in mammalian cells for their optogenetic applications. To verify this issue, we carried out electrophysiological experiment to measure the photocurrent of BacHRs from Rubrivirga marina and PR from a metagenome sequence (ECV93033.1) in mammalian cells (ND7/23; Fig. 6b). Both of them showed substantial photocurrent even in the mammalian cells. These light-driven ion-pumping rhodopsins with red-shifted λ_max have the potential to be applied as new optogenetics tools, and thus, warrant further study in the near future.

**Fig. 6: Light-driven ion-transport activities of microbial rhodopsins showed longer λ_max.**

Discussion

Microbial rhodopsins show a wide variety of λ_max by changing steric and electrostatic interactions between all-trans retinal chromophores and surrounding amino acid residues. An understanding of the color-tuning rule enables more efficient screening and the design of new red-shifted rhodopsins that have value as optogenetics tools, and our ML-based data-driven approach therefore provides a new basis to identify color-regulating factors without assumptions.

We previously demonstrated that an ML-based model based on ∼800 experimental results could predict the λ_max of microbial rhodopsins with an average error of ±7.8 nm. Encouraged by this result, in the present study, we constructed a new ML-based model to compute expected red-shift gains for a wide range of unknown families of microbial rhodopsins. As a result, 32 out of 39 microbial rhodopsins were found to have red-shifted absorption compared with the base wavelengths of each subfamily of microbial rhodopsins (Table 1), suggesting that our data-driven ML approach can screen red-shifted microbial rhodopsin genes more efficiently than random choice (p = 7.025 × 10⁻⁵).

By considering the exploration–exploitation trade-off, that is, to consider not only the expected value of the prediction, but also the uncertainty, it was possible to construct a red-shift protein screening process, as shown in Fig. 7. Figure 7a shows the relationships between the prediction uncertainty (as measured by the standard deviation) and the observed red-shift gains. It can be seen that rhodopsins with red-shift gain are found in areas of not only low (small standard deviation), but also high prediction uncertainty (large standard deviation). Figure 7b shows the two-dimensional projection of the d = 432 dimensional feature space by principal component analysis. It can be seen that red-shift gains (red) are found for target proteins not only close to training proteins (green), but also far from training proteins. Figure 8 shows that the observed wavelengths and red-shift gains tend to be smaller than the predicted ones. We conjecture that these differences between the observed and predicted wavelengths and red-shift gains are due to modeling errors, possibly caused by a lack of sufficient information (e.g., three-dimensional structures) and modeling flexibility (e.g., nonlinear effects); in other words, rhodopsins having high prediction values partly by modeling errors have a high chance of being selected. Therefore, it would be valuable to develop a statistical methodology to eliminate selection bias due to modeling errors.

**Fig. 7: Diversity of the selected proteins.**

**Fig. 8: Comparisons of experimental observations and ML predictions.**

Four rhodopsins showed red-shifted absorption ≥20 nm than the base wavelength, three of which showed light-driven ion-transport function. Interestingly, while one BacHR from Rubrivirga marina (accession No.: WP 095512583.1) showed a 40-nm longer λ_max (577 nm) than the base wavelength, another 11-nm red-shifted BacHR (WP 095509924.1) was also identified from the same bacteria (Table 1). These BacHRs are highly similar to each other (55.2% identity and 70.6% similarity), and only four of 24 amino acid residues around the retinal chromophore differ. Hence, R. marina evolved two BacHRs with 29-nm different λ_max by a small number of amino acid replacements; the amino acid residue(s) responsible for this color-tuning should be investigated in the future.

The differences in amino acids in three of 24 retinal-surrounding residues are known to play a color-tuning role in natural rhodopsins without affecting their biological function. These correspond to positions 93, 186, and 215 in BR (BR Leu93, Pro186, and Ala215, respectively)¹⁷. Position 93 is known to be diversified in the PR family (the well-known position 105 in PRs). Green-light-absorbing PRs (GPRs) have leucine as a BR, whereas glutamine is conserved in blue-light-absorbing PRs^5,26. This color-tuning effect by the difference between leucine and glutamine is known as the “L/Q-switch”⁴². Interestingly, while 29.8% of 3022 candidate genes have glutamine at this position, all 39 genes whose large red-shift gains were suggested by our ML-based model have amino acids other than glutamine, which suggests that our ML-based model avoided the genes having glutamine at position 93. Especially, 12 (37.5%) of 32 genes that actually showed red-shifted absorption compared with the base wavelengths had methionine at this position (Supplementary Data 6), which is substantially higher than the proportion of methionine-conserving genes in the 3022 candidates (16.1%). The red-shifting effect of the L-to-M mutation of this residue in GPRs previously reported⁴² and the current result imply that many rhodopsins have evolved methionine to absorb light with longer wavelengths. Position 215 in BR is also known to have a color-tuning role. The mutation from alanine to threonine or serine (A/TS switch) has a blue-shifting effect of 9–20 nm^17,43,44,45. Five of six genes that showed blue-shifted λ_max compared with the base wavelengths have threonine or serine at this position, suggesting that these types of genes should be avoided to explore red-shifted rhodopsins. By contrast, asparagine was conserved in more than half (58.4%) of the 3022 candidate genes, especially in those belonging to the PR subfamily. A substantial portion (37.5%) of the genes with red-shifted absorption compared with the base wavelengths also had asparagine at this position (Supplementary Data 6). The A-to-N mutation at this position had a smaller effect (4–7 nm)^30,44 than that of the A-to-S/T mutation; thus, the difference between alanine and asparagine is not so critical to explore red-shifted rhodopsins. Position 186 in BR is proline in most microbial rhodopsins (in 98.7% of the 3022 candidate genes), and the mutation to non-proline amino acids induces red-shift of absorption¹⁷. We identified sodium pump rhodopsin (NaR) from Parvularcula oceani, which also has a threonine at this position, and showed 10-nm longer absorption than the base wavelength. Although genes having non-proline amino acids are rare in nature, it would be beneficial to identify new red-shifted rhodopsins. These results indicate that ML-based modeling can provide insights for identifying new functional tuning rules for proteins based on specific amino acid residues.

The number of reported microbial rhodopsin genes is rapidly increasing due to the development of next-generation sequencing techniques and microbe culturing methods. New microbial rhodopsins with molecular characteristics suitable for optogenetics applications are expected to be included in upcoming genomic data. Data-driven approaches would be able to efficiently suggest promising rhodopsins which should be investigated preferentially. Although the absorption of the most red-shifted rhodopsin found in this study (BacHR from Rubrivirga marina, λ_max = 577 nm) is shorter than the peak activation wavelength of eNpHR3.0 (590 nm) which is extensively used in optogenetic studies⁴⁶, our ML-based model could be expected to reduce the costs associated with identifying red-shifted rhodopsins from upcoming genomic data. Especially, we expect that our ML-based model could be applied to ion channel and enzymatic rhodopsins, which were not a focus of this study because of their eukaryotic origins; however, their use in optogenetics research could help identify more useful optogenetics tools with red-shifted absorption in the future.

Methods

Experimental design

The objective of this study was to introduce and demonstrate the effectiveness of a data-driven experimental design method to screen candidates for rhodopsin proteins with desired properties from more than several thousand candidates identified in various microbial species. To this end, we constructed a training dataset for developing a ML model and a target dataset for screening targets (Construction of training and target data sets). A machine learning model was constructed using the training dataset (ML modeling), which was used to select the 65 candidates from 3022 in the target dataset. The protein expressions of selected candidates were induced (Protein expression), and the absorption spectra and λ_max of the selected rhodopsins were measured (Measurement of the absorption spectra and λ_max of rhodopsins by bleaching with hydroxylamine). Furthermore, we investigated the ion-transportation properties of the rhodopsins that showed large red-shift gains (Ion-transport assay of rhodopsins in E. coli cells). Statistical significance of the effectiveness of the data-driven experimental design method was assessed by a binomial test.

Construction of training and target data sets

In this study, we constructed a new training data set (Supplementary Data 1) by adding 88 genes for which the λ_max had recently been reported in the literature or determined by our experiments, to a previously reported data set³⁰. The sequences were aligned using ClustalW⁴⁷ and the results were manually checked to avoid improper gaps and/or shifts in the TM parts. The aligned sequences were then used for ML-based modeling.

To collect microbial rhodopsin genes for the training data set, BR⁴⁸ and heliorhodopsin 48C12⁴⁹ sequences were used as queries for searching homologous amino acid sequences in NCBI non-redundant protein sequences and metagenomic proteins³¹ and the Tara Oceans microbiome and virome database³². Protein BLAST (blastp)³⁷ was used for the homology search, with the threshold E-value set at <10 by default, and sequences with >180 amino acid residues were collected. All sequences were aligned using ClustalW⁴⁷. The highly diversified C-terminal 15-residue region behind the retinal binding Lys (BR Lys216) and long loop of HeR between helices A and B were removed from the sequences to avoid unnecessary gaps in the alignment. The successful alignment of the TM helical regions, especially the 3rd and 7th helices, was checked manually. The phylogenic tree was drawn using the neighbor-joining method⁵⁰, and the microbial rhodopsin subfamilies were categorized based on the phylogenetic distances, as reported previously³⁸. Based on the phylogenetic tree, 3022 putative ion-pumping rhodopsin genes from bacterial and archaeal origins were extracted, and their aligned sequences were used as the training data set for the prediction of λ_max. The original training and test sets are provided in Supplementary Data 1 and Table 1, respectively, and the entire transformed datasets with physicochemical features (see Supplementary Data 2) are provided in Supplementary Data 7.

ML modeling

Suppose that we have K pairs of an amino acid sequence and an absorption wavelength $\left\{ {\left( {{\boldsymbol{x}}^{\left( k \right)},\lambda _{{\mathrm{max}}}^{(k)}} \right)} \right\}_{k = 1}^K$, where x^(k) ∈${\Bbb R}$ ^MN is the feature vector of the k-th amino acid sequence and $\lambda _{{\mathrm{max}}}^{(k)} \in {\Bbb R}$ is the absorption wavelength of the k-th rhodopsin protein. The least-absolute shrinkage selection operator (LASSO) is a standard regression model in which important regression coefficients can be automatically selected by the penalty on the absolute value of the coefficient, as follows:

$$\mathop {{\min }}\limits_{{\upmu },\,{\mathbf{\beta }}} \mathop {\sum }\limits_{k = 1}^K \left( {\lambda _{{\mathrm{max}}}^{\left( k \right)} - \mu - \mathop {\sum }\limits_{i = 1}^M \mathop {\sum }\limits_{j = 1}^N \beta _{i,j}x_{i,j}^{\left( k \right)}} \right)^2 +\,\gamma \mathop {\sum }\limits_{i = 1}^M \mathop {\sum }\limits_{j = 1}^N |\beta _{i,j}|,$$

where ${\boldsymbol{\beta}} \in {\Bbb R}^{MN}$ is a vector of β_i,j and γ > 0 is the regularization parameter. BLASSO is a Bayesian extension of LASSO for which the model is defined through the following random variables:

$$\lambda _{{\mathrm{max}}}^{\left( k \right)} \sim N\left( {\mu + {\boldsymbol{\beta}} ^{\it{ \top }}{\boldsymbol{x}}^{\left( k \right)},\sigma ^2} \right),{\boldsymbol{\beta}} \sim \pi \left( {{\boldsymbol{\beta}} |\sigma ^2} \right),$$

where N(μ,s²) is a Gaussian distribution with mean μ and variance s², and $\pi \left( {{\boldsymbol{\beta}} \,|\,\sigma ^2} \right) = {\mathrm{{\Pi}}}_{i = 1}^M{\mathrm{{\Pi}}}_{j = 1}^N\frac{\gamma }{{2\surd \sigma ^2}}e^{ - \gamma |\beta _{i,j}|/\surd \sigma ^2}$ is the conditional Laplace prior. In this model, the maximum of the conditional distribution of the parameter ${\boldsymbol{\beta}} \mid \left\{ {\left( {{\boldsymbol{x}}^{\left( k \right)},\lambda _{{\mathrm{max}}}^{\left( k \right)}} \right)} \right\}_{k = 1}^K,\lambda ,\sigma$ is equivalent to the LASSO⁵¹ estimator. For γ, a hyper-prior is set through the gamma distribution prior on γ², and the inverse gamma prior is assumed for σ². For the computational details, see the original paper³⁶. We used the “monomvn” package of R in our implementation. The prediction f (x) was sampled through the Gibbs sampler of β and μ. The number of samplings was set as T = 10,000 times. For each candidate x, we approximately obtain ${\Bbb E}[{\mathrm{gain}}]$ by

$${\Bbb E}\left[ {{\mathrm{gain}}} \right] \approx \frac{1}{T}\mathop {\sum }\limits_{t = 1}^T \max \left( {\mu ^{\left( t \right)} + {\boldsymbol{\beta}} ^{\left( t \right){\it{ \top }}}{\boldsymbol{x}} - \lambda _{{\mathrm{base}}},0} \right),$$

where μ^(t) and β^(t) are the t-th sampled parameters. The parameters of the trained model is provided in Supplementary Data 8.

Protein expression

The synthesized genes of microbial rhodopsins codon-optimized for E. coli (Genscript, NJ) were incorporated into the multi-cloning site in the pET21a(+) vector (Novagen, Merck KGaA, Germany). The plasmids carrying the microbial rhodopsin genes were transformed into the E. coli C43(DE3) strain (Lucigen, WI). Protein expression was induced by 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) in the presence of 10 μM all-trans retinal for 4 h.

Measurement of the absorption spectra and λ _max of rhodopsins by bleaching with hydroxylamine

E. coli cells expressing rhodopsins were washed three times with a solution containing 100 mM NaCl and 50 mM Na₂HPO₄ (pH 7). The washed cells were treated with 1 mM lysozyme for 1 h and then disrupted by sonication for 5 min (VP-300N; TAITEC, Japan). To solubilize the rhodopsins, 3% n-dodecyl-d-maltoside (DDM, Anatrace, OH) was added, and the samples were stirred for overnight at 4 °C. The rhodopsins were bleached with 500 mM hydroxylamine and subjected to yellow light illumination (λ > 500 nm) from the output of a 1-kW tungsten−halogen projector lamp (Master HILUX-HR; Rikagaku) through colored glass (Y-52; AGC Techno Glass, Japan) and heat-absorbing filters (HAF-50S-15H; SIGMA KOKI, Japan). The absorption change upon bleaching was measured by a UV-visible spectrometer (V-730; JASCO, Japan).

Ion-transport assay of rhodopsins in E. coli cells

To assay the ion-transport activity in E. coli cells, the cells carrying expressed rhodopsin were washed three times and resuspended in unbuffered 100 mM NaCl. A cell suspension of 7.5 mL at OD₆₆₀ = 2 was placed in the dark in a glass cell at 20 °C and illuminated at λ > 500 nm from the output of a 1-kW tungsten–halogen projector lamp (Rikagaku, Japan) through a long-pass filter (Y-52; AGC Techno Glass, Japan) and a heat-absorbing filter (HAF-50S-50H; SIGMA KOKI, Japan). The light-induced pH changes were measured using a pH electrode (9618S-10D; HORIBA, Japan). All measurements were repeated under the same conditions after the addition of 10 μM CCCP.

Imaging and electrophysiological assays

For heterologous expression in mammalian cultured cells, the synthesized rhodopsin genes were inserted into the cloning site between the CMV promoter and eYFP in phKR2-3.0-EYFP⁵² using EcoRI and BamHI. All experiments were carried out using ND7/23 cells, lined hybrid cells derived from neonatal rat dorsal root ganglion neurons fused with the mouse neuroblastoma, which were transfected with plasmids as previously described⁵³. EYFP fluorescence (543 nm) in the ND7/23 cells expressing the rhodopsins were imaged under a confocal laser scanning microscopy (LSM510, Carl Zeiss, Oberkochen, Germany) at 512 × 512 pixels using a water-immersion objective (×63/0.95, Achroplan, Carl Zeiss) and Ar laser (514 nm). Currents were recorded using an EPC-8 amplifier (HEKA Electronic, Lambrecht, Germany) under a whole-cell patch clamp configuration while a 200 ms pulse illuminations at 549 ± 15 (nm, >90% of the maximum) and 28 mW‧mm⁻² was given at 0.1 Hz using a SpectraX light engine (Lumencor Inc., Beaverton, OR). The internal pipette solution contained (in mM) 121.2 KOH, 90.9 glutamate, 5 Na₂EGTA, 49.2 HEPES, 2.53 MgCl₂, 2.5 MgATP, 0.0025 ATR (pH 7.4 adjusted with HCl). The extracellular Tyrode’s solution contained (in mM): 138 NaCl, 3 KCl, 2.5 CaCl₂, 1 MgCl₂, 10 HEPES, 4 NaOH, and 11 glucose (pH 7.4 adjusted with HCl).

Statistical analysis

We assessed the effectiveness of the data-driven experimental design method by comparing it with random selection in terms of the proportions of observing red-shift gains in the selected rhodopsins. The statistical significance of the effectiveness was quantified by comparing the red-shift gain proportions 0.82 (=32/39, p = 7.025 × 10⁻⁵) with the probability of observing red-shift gains from randomly selected rhodopsins, i.e., 0.50, based on a binomial test. Since we set the base wavelength of each subfamily to the λ_max of rhodopsin which was studied in detail in previous work and equal or longer than the empirical median of the λ_max in each subfamily (Supplementary Fig. 2), it is reasonable to assume that the probability of observing red-shift gains from randomly selected rhodopsins must be smaller than or equal to 0.50. For statistical analysis of the ML model building and the evaluation of its performance, see the ML modeling section above.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data shown in main figures were deposited in Supplementary Data 9. Data supporting the findings are available from the corresponding authors upon reasonable request.

Code availability

The computational code of this manuscript is available at http://www-als.ics.nitech.ac.jp/~karasuyama/BLASSO-for-Rhodopsins/.

Change history

30 April 2021
A Correction to this paper has been published: https://doi.org/10.1038/s42003-021-02090-5

References

Ernst, O. P. et al. Microbial and animal rhodopsins: Structures, functions, and molecular mechanisms. Chem. Rev. 114, 126–163 (2014).
Article CAS PubMed Google Scholar
Govorunova, E. G., Sineshchekov, O. A., Li, H. & Spudich, J. L. Microbial rhodopsins: diversity, mechanisms, and optogenetic applications. Annu. Rev. Biochem. 86, 845–872 (2017).
Article CAS PubMed PubMed Central Google Scholar
Oesterhelt, D. & Stoeckenius, W. Rhodopsin-like protein from the purple membrane of Halobacterium halobium. Nat. New Biol. 233, 149–152 (1971).
Article CAS PubMed Google Scholar
Oesterhelt, D. & Stoeckenius, W. Functions of a new photoreceptor membrane. Proc. Natl Acad. Sci. USA 70, 2853–2857 (1973).
Article CAS PubMed PubMed Central Google Scholar
Man, D. et al. Diversification and spectral tuning in marine proteorhodopsins. EMBO J. 22, 1725–1731 (2003).
Article CAS PubMed PubMed Central Google Scholar
Venter, J. C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004).
Article CAS PubMed Google Scholar
Inoue, K., Kato, Y. & Kandori, H. Light-driven ion-translocating rhodopsins in marine bacteria. Trends Microbiol. 23, 91–98 (2014).
Article Google Scholar
Inoue, K. et al. A light-driven sodium ion pump in marine bacteria. Nat. Commun. 4, 1678 (2013).
Article PubMed Google Scholar
Nagel, G. et al. Channelrhodopsin-1: a light-gated proton channel in green algae. Science 296, 2395–2398 (2002).
Article CAS PubMed Google Scholar
Niho, A. et al. Demonstration of a light-driven SO₄^2- transporter and its spectroscopic characteristics. J. Am. Chem. Soc. 139, 4376–4389 (2017).
Deisseroth, K. Optogenetics 10 years of microbial opsins in neuroscience. Nat. Neurosci. 18, 1213–1225 (2015).
Article CAS PubMed PubMed Central Google Scholar
Liu, X. et al. Optogenetic stimulation of a hippocampal engram activates fear memory recall. Nature 484, 381–385 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ramirez, S. et al. Creating a false memory in the hippocampus. Science 341, 387–391 (2013).
Article CAS PubMed Google Scholar
Yizhar, O. et al. Neocortical excitation/inhibition balance in information processing and social dysfunction. Nature 477, 171–178 (2011).
Article CAS PubMed PubMed Central Google Scholar
Marshel, J. H. et al. Cortical layer-specific critical dynamics triggering perception. Science 365, eaaw5202 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schneider, F., Grimm, C. & Hegemann, P. Biophysics of channelrhodopsin. Annu. Rev. Biophys. 44, 167–186 (2015).
Article CAS PubMed Google Scholar
Inoue, K. et al. Red-shifting mutation of light-driven sodium-pump rhodopsin. Nat. Commun. 10, 1993 (2019).
Article PubMed PubMed Central Google Scholar
Ganapathy, S. et al. Retinal-based proton pumping in the near infrared. J. Am. Chem. Soc. 139, 2338–2344 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hayashi, S. et al. Structural determinants of spectral tuning in retinal proteins-bacteriorhodopsin vs sensory rhodopsin II. J. Phys. Chem. B 105, 10124–10131 (2001).
Article CAS Google Scholar
Fujimoto, K., Hayashi, S., Hasegawa, J. Y. & Nakatsuji, H. Theoretical studies on the color-tuning mechanism in retinal proteins. J. Chem. Theory Comput. 3, 605–618 (2007).
Article CAS PubMed Google Scholar
Pedraza-González, L., De Vico, L., Marı, N. M., Fanelli, F. & Olivucci, M. a-ARM: automatic rhodopsin modeling with chromophore cavity generation, ionization state selection, and external counterion placement. J. Chem. Theory Comput. 15, 3134–3152 (2019).
Article PubMed PubMed Central Google Scholar
Tsujimura, M. et al. Mechanism of absorption wavelength shifts in anion channelrhodopsin-1 mutants. Biochim. Biophys. Acta Bioenerg. 1862, 148349 (2021).
Article CAS PubMed Google Scholar
Katayama, K. & Sekharan, S. S. Y. Optogenetics (eds Yawo, H., Kandori, H. & Koizumi, A.) Ch. 7, 89–107 (Springer, 2015).
Engqvist, M. K. et al. Directed evolution of Gloeobacter violaceus rhodopsin spectral properties. J. Mol. Biol. 427, 205–220 (2015).
Article CAS PubMed Google Scholar
Kojima, K. et al. Green-sensitive, long-lived, step-functional anion channelrhodopsin-2 variant as a high-potential neural silencing tool. J. Phys. Chem. Lett. 11, 6214–6218 (2020).
Article CAS PubMed Google Scholar
Pushkarev, A. et al. The use of a chimeric rhodopsin vector for the detection of new proteorhodopsins based on color. Front. Microbiol. 9, 439 (2018).
Article PubMed PubMed Central Google Scholar
Oda, K. et al. Crystal structure of the red light-activated channelrhodopsin Chrimson. Nat. Commun. 9, 3949 (2018).
Article PubMed PubMed Central Google Scholar
Klapoetke, N. C. et al. Independent optical excitation of distinct neural populations. Nat. Methods 11, 338–346 (2014).
Article CAS PubMed PubMed Central Google Scholar
Govorunova, E. G. et al. RubyACRs, nonalgal anion channelrhodopsins with highly red-shifted absorption. Proc. Natl Acad. Sci. USA 117, 22833–22840 (2020).
Article CAS PubMed PubMed Central Google Scholar
Karasuyama, M., Inoue, K., Nakamura, R., Kandori, H. & Takeuchi, I. Understanding colour tuning rules and predicting absorption wavelengths of microbial rhodopsins by data-driven machine-learning approach. Sci. Rep. 8, 15580 (2018).
Article PubMed PubMed Central Google Scholar
Brown, G. R. et al. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res. 43, D36–D42 (2015).
Article CAS PubMed Google Scholar
Sunagawa, S. et al. Ocean plankton. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).
Article PubMed Google Scholar
Bishop, C. M. Pattern Recognition And Machine Learning (Springer, 2006).
Snoek, J., Larochelle, H. & Adams, R. P. Advances in Neural Information Processing Systems 25 (NIPS 2012). (eds. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 2951–2959 (Curran Associates, Inc., 2012).
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & Freitas, N. D. in Proceedings of the IEEE. 148–175 (IEEE, 2016).
Park, T. & Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc. 103, 681–686 (2008).
Article CAS Google Scholar
Johnson, M. et al. Ncbi blast: a better web interface. Nucleic Acids Res. 36, W5–W9 (2008).
Article CAS PubMed PubMed Central Google Scholar
Yamauchi, Y. et al. Engineered functional recovery of microbial rhodopsin without retinal-binding lysine. Photochem. Photobiol. 95, 1116–1121 (2019).
Article CAS PubMed Google Scholar
Hasemi, T., Kikukawa, T., Kamo, N. & Demura, M. Characterization of a cyanobacterial chloride-pumping rhodopsin and its conversion into a proton pump. J. Biol. Chem. 291, 355–362 (2016).
Article CAS PubMed Google Scholar
Harris, A. et al. Molecular details of the unique mechanism of chloride transport by a cyanobacterial rhodopsin. Phys. Chem. Chem. Phys. 20, 3184–3199 (2018).
Article CAS PubMed Google Scholar
Béjà, O. et al. Bacterial rhodopsin: Evidence for a new type of phototrophy in the sea. Science 289, 1902–1906 (2000).
Article PubMed Google Scholar
Ozaki, Y., Kawashima, T., Abe-Yoshizumi, R. & Kandori, H. A color-determining amino acid residue of proteorhodopsin. Biochemistry 53, 6032–6040 (2014).
Article CAS PubMed Google Scholar
Shimono, K., Ikeura, Y., Sudo, Y., Iwamoto, M. & Kamo, N. Environment around the chromophore in pharaonis phoborhodopsin: Mutation analysis of the retinal binding site. Biochim. Biophys. Acta 1515, 92–100 (2001).
Article CAS PubMed Google Scholar
Sudo, Y. et al. A blue-shifted light-driven proton pump for neural silencing. J. Biol. Chem. 288, 20624–20632 (2013).
Article CAS PubMed PubMed Central Google Scholar
Inoue, K. et al. Converting a light-driven proton pump into a light-gated proton channel. J. Am. Chem. Soc. 137, 3291–3299 (2015).
Article CAS PubMed Google Scholar
Fenno, L., Yizhar, O. & Deisseroth, K. The development and application of optogenetics. Annu. Rev. Neurosci. 34, 389–412 (2011).
Article CAS PubMed PubMed Central Google Scholar
Thompson, J. D., Higgins, D. G. & Gibson, T. J. Clustal-W - improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Article CAS PubMed PubMed Central Google Scholar
Khorana, H. G. et al. Amino acid sequence of bacteriorhodopsin. Proc. Natl Acad. Sci. USA 76, 5046–5050 (1979).
Article CAS PubMed PubMed Central Google Scholar
Pushkarev, A. et al. A distinct abundant group of microbial rhodopsins discovered using functional metagenomics. Nature 558, 595–599 (2018).
Article CAS PubMed Google Scholar
Saitou, N. & Nei, M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
CAS PubMed Google Scholar
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).
Google Scholar
Kato, H. E. et al. Structural basis for Na⁺ transport mechanism by a light-driven Na⁺ pump. Nature 521, 48–53 (2015).
Article CAS PubMed Google Scholar
Nagasaka, Y. et al. Gate-keeper of ion transport-a highly conserved helix-3 tryptophan in a channelrhodopsin chimera, C1C2/ChRWR. Biophys. Physicobiol. 17, 59–70 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by Grants-in-Aid from the Japan Society for the Promotion of Science (JSPS) for Scientific Research (KAKENHI grant Nos. 17H03007 to K.I., 17H04694 and 16H06538 to M.Karasuyama, 19H04959 to H.K., and 16H06538, 17H00758, and 20H00601 to I.T.), the Japan Science and Technology Agency (JST), PRESTO, Japan (grant Nos. JPMJPR15P2 to K.I. and JPMJPR15N2 to M.Karasuyama), and CREST, Japan (grant No. JPMJCR1502) to I.T.; K.I., H.K., and I.T. received support from RIKEN AIP; O.B. received support from the Louis and Lyra Richmond Memorial Chair in Life Sciences.

Author information

These authors contributed equally: Keiichi Inoue, Masayuki Karasuyama.

Authors and Affiliations

The Institute for Solid State Physics, The University of Tokyo, Kashiwa, Japan
Keiichi Inoue, Kentaro Mannen, Takashi Nagata & Hiromu Yawo
RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Keiichi Inoue, Hideki Kandori & Ichiro Takeuchi
Department of Life Science and Applied Chemistry, Nagoya Institute of Technology, Nagoya, Japan
Keiichi Inoue, Ryoko Nakamura, Masae Konno, Daichi Yamada & Hideki Kandori
OptoBioTechnology Research Center, Nagoya Institute of Technology, Nagoya, Japan
Keiichi Inoue, Hideki Kandori & Ichiro Takeuchi
PRESTO, Japan Science and Technology Agency, Kawaguchi, Japan
Keiichi Inoue, Masayuki Karasuyama & Takashi Nagata
Department of Computer Science, Nagoya Institute of Technology, Nagoya, Japan
Masayuki Karasuyama, Yu Inatsu & Ichiro Takeuchi
Graduate School of Humanities and Sciences, Ochanomizu University, Tokyo, Japan
Kei Yura
Center for Interdisciplinary AI and Data Science, Ochanomizu University, Tokyo, Japan
Kei Yura
School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
Kei Yura
Faculty of Biology, Technion-Israel Institute of Technology, Haifa, Israel
Oded Béjà

Authors

Keiichi Inoue
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Karasuyama
View author publications
You can also search for this author in PubMed Google Scholar
Ryoko Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Masae Konno
View author publications
You can also search for this author in PubMed Google Scholar
Daichi Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Kentaro Mannen
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Nagata
View author publications
You can also search for this author in PubMed Google Scholar
Yu Inatsu
View author publications
You can also search for this author in PubMed Google Scholar
Hiromu Yawo
View author publications
You can also search for this author in PubMed Google Scholar
Kei Yura
View author publications
You can also search for this author in PubMed Google Scholar
Oded Béjà
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Kandori
View author publications
You can also search for this author in PubMed Google Scholar
Ichiro Takeuchi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.I., M.Karasuyama, H.K., and I.T. contributed to the study design. K.I., D.Y., K.Y., and O.B. conducted the phylogenetic analysis of rhodopsins and the construction of training data. M.Karasuyama, Y.I., and I.T. constructed the ML model and calculated ${\Bbb E}[{\mathrm{gain}}]$. K.I., R. N., K.M., and T.N. constructed the DNA plasmids of rhodopsin genes and introduced them into E. coli and mammalian cells. R.N. and K.M. measured λ_max of rhodopsins by bleaching proteins with hydroxylamine. M.Konno conducted the pump activity assay of rhodopsins in E. coli cells. H.Y. conducted the electrophysiological measurement of rhodopsins in mammalian cells. K.I., M. Karasuyama, H.K., and I.T. wrote the paper. All authors discussed and commented on the manuscript.

Corresponding authors

Correspondence to Keiichi Inoue or Ichiro Takeuchi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Inoue, K., Karasuyama, M., Nakamura, R. et al. Exploration of natural red-shifted rhodopsins using a machine learning-based Bayesian experimental design. Commun Biol 4, 362 (2021). https://doi.org/10.1038/s42003-021-01878-9

Download citation

Received: 20 July 2020
Accepted: 19 February 2021
Published: 19 March 2021
DOI: https://doi.org/10.1038/s42003-021-01878-9

This article is cited by

Tetherless Optical Neuromodulation: Wavelength from Orange-red to Mid-infrared
- Chao Sun
- Qi Fan
- Quan Wang
Neuroscience Bulletin (2024)
Dissecting Light Sensing and Metabolic Pathways on the Millimeter Scale in High-Altitude Modern Stromatolites
- Daniel Gonzalo Alonso-Reyes
- Fátima Silvina Galván
- María Eugenia Farias
Microbial Ecology (2023)
Design of English online teaching quality evaluation model based on web embedded system and machine learning
- Guoyan Ruan
Soft Computing (2023)
Phototrophy by antenna-containing rhodopsin pumps in aquatic environments
- Ariel Chazan
- Ishita Das
- Oded Béjà
Nature (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.