Abstract
SAR11 bacteria are the most abundant microorganisms in the surface ocean1 and have global biogeochemical importance2,3,4. To thrive in their competitive oligotrophic environment, these bacteria rely heavily on solute-binding proteins that facilitate uptake of specific substrates via membrane transporters5,6. The functions and properties of these transport proteins are key factors in the assimilation of dissolved organic matter and biogeochemical cycling of nutrients in the ocean, but they have remained largely inaccessible to experimental investigation. Here we performed genome-wide experimental characterization of all solute-binding proteins in a prototypical SAR11 bacterium, revealing specific functions and general trends in their properties that contribute to the success of SAR11 bacteria in oligotrophic environments. We found that the solute-binding proteins of SAR11 bacteria have extremely high binding affinity (dissociation constant >20 pM) and high binding specificity, revealing molecular mechanisms of oligotrophic adaptation. Our functional data have uncovered new carbon sources for SAR11 bacteria and enable accurate biogeographical analysis of SAR11 substrate uptake capabilities throughout the ocean. This study provides a comprehensive view of the substrate uptake capabilities of ubiquitous marine bacteria, providing a necessary foundation for understanding their contribution to assimilation of dissolved organic matter in marine ecosystems.
Similar content being viewed by others
Main
The sunlit surface ocean is dominated by heterotrophic bacterioplankton, particularly those belonging to the SAR11 clade of Alphaproteobacteria (Pelagibacterales)2. SAR11 bacteria are globally distributed and abundant, constituting 20–45% of prokaryotic cells and around 18% of the biomass in the surface ocean and having an estimated global population size of 2.4 × 1028 cells1,7. Like other bacteria adapted to oligotrophic (nutrient-poor) environments, they exhibit a small size (cell volume of 0.02–0.06 µm3)8, extremely streamlined genome (1.2–1.4 Mb), and limited metabolic versatility2,9,10. SAR11 bacteria rely largely on the uptake of dissolved organic matter (DOM) to meet their requirements for carbon, nitrogen, sulfur and phosphorus, and are highly active consumers of low molecular mass DOM, accounting for 30–60% of assimilation of amino acids, taurine, glucose and dimethylsulfoniopropionate (DMSP) in the surface ocean3,11,12,13,14,15. Owing to their high abundance, SAR11 bacteria have major biogeochemical importance—for example, they produce climate-active gases such as methane3 and dimethylsulfide4, and divert carbon from the biological carbon pump through respiration of dissolved organic carbon16 (DOC). Thus, understanding the physiology and metabolic capabilities of SAR11 bacteria is critical to our understanding of marine ecosystems.
To compete for nutrients in the oligotrophic ocean environment, SAR11 bacteria rely heavily on solute-binding proteins (SBPs) to facilitate uptake of specific growth substrates. SBPs are associated with three families of membrane transport systems: ATP-binding cassette (ABC) transporters, which represent the most abundant high-affinity substrate uptake systems in bacteria, as well as tripartite ATP-independent periplasmic (TRAP) transporters and tripartite tricarboxylate transporters (TTTs)17,18. In Gram-negative bacteria, SBPs bind their substrates with high affinity in the periplasm (with dissociation constant (Kd) values typically in the nanomolar to micromolar range) and deliver them to a transmembrane protein complex that facilitates translocation of the substrate across the inner membrane against a concentration gradient, which is driven by coupling to ATP hydrolysis (ABC transporters) or an electrochemical gradient (TRAP and TTT transporters). Consistent with the physiological importance of SBP-dependent transporters for high-affinity substrate uptake, SAR11 bacteria devote a large proportion of their streamlined proteome to SBPs5,6; for example, SBPs represented around 67% of SAR11-derived spectra in metaproteomic analysis of environmental samples from the Sargasso Sea5.
The high abundance1,7 and substrate uptake activity3,11,12,13,14,15 of SAR11 bacteria in the surface ocean, combined with the abundance of SBP-dependent transporters in these bacteria5,6, suggests that a small number of transport proteins in SAR11 bacteria make a substantial contribution to global assimilation of key components of low molecular mass DOM in the surface ocean. However, the properties of these transporters and their specific functions (that is, the transported metabolites) are mostly unknown, which limits our knowledge of the full range of DOM that can be assimilated by SAR11 bacteria, nutrient exchange within the marine microbial community, and the molecular mechanisms for high-affinity substrate uptake. Although homology-based predictions are available, these predictions of protein function have limited accuracy, especially for functionally diverse protein superfamilies such as the ABC and TRAP transporters19,20,21. Transport proteins can also be characterized experimentally through radioassays of substrate uptake in cultured cells, and this approach has been used to characterize the broad-specificity osmolyte transporter of the SAR11 bacterium ‘Candidatus (Ca.) Pelagibacter ubique’22. However, the difficulty of cultivating slow-growing and fastidious SAR11 bacteria makes this a challenging approach for high-throughput characterization of SBP-dependent transporters. Furthermore, owing to the genetic intractability of SAR11 bacteria, the observed transport activity cannot be linked to specific transporter genes, limiting integration of the resulting physiological data with existing multi-omics datasets to uncover the broader geochemical and ecological significance of transport activity. Given these limitations of in vivo approaches, we hypothesized that a heterologous approach, based on heterologous expression, purification and biochemical characterization of SBPs, would be an effective alternative strategy to elucidate the functions and properties of SBP-dependent transporters in SAR11 bacteria. This approach is supported by the fact that the specificity and affinity of substrate uptake by SBP-dependent transporters is mainly determined by the binding specificity and affinity of the corresponding SBPs23, and has been proven to be a valuable method for discovery of new metabolic pathways24,25.
Here we used this approach to systematically interrogate the function of all SBP-dependent transporters in the genome of the prototypical SAR11 bacterium Ca. P. ubique strain HTCC1062. Using high-throughput screening together with rigorous structural and biophysical characterization, we identified the function of the majority of these transporters. Revision of homology-based functional predictions enabled us to accurately interpret patterns of SAR11 transporter abundance in global ocean metagenome and metatranscriptome datasets and identify new transport capabilities and potential carbon sources for SAR11 bacteria. In particular, we identified a high-affinity, broad-specificity transporter for C4 and C5 dicarboxylates that is widely found among SAR11 ecotypes and abundantly distributed in metagenomic and metatranscriptomic datasets, implicating these dicarboxylates as major physiologically relevant carbon sources. Finally, we show how the identification of systematic trends in SBP properties, including their extremely high binding affinity, moderately high binding specificity and limited functional redundancy, provides insight into the evolutionary success of SAR11 bacteria in the oligotrophic ocean environment.
Functional profiling of SAR11 SBPs
We identified 18 SBPs through genomic analysis of Ca. P. ubique strain HTCC1062 (Methods, Supplementary Table 1); these SBPs are found widely across SAR11 bacteria, and conversely, most of the SBPs that are abundant across SAR11 bacteria are represented in this strain (Extended Data Fig. 1). Expression of most of these SBPs in cultured and/or environmental SAR11 cells has been previously demonstrated by proteomic analysis5,6 (Supplementary Table 2). Fourteen of the SBPs yielded soluble protein upon heterologous expression in Escherichia coli strains BL21(DE3) or SHuffle T7 and were successfully purified. Close homologues of two of the remaining SBPs (denoted SAR11_0271* and SAR11_1346*) from a different SAR11 strain (‘Ca. Pelagibacter’ sp. HIMB1321) could also be expressed and purified, whereas the remaining two SBPs, SAR11_0266 and SAR11_1290, could not be expressed in soluble form under any tested condition, nor refolded in vitro from insoluble material (Methods). Two of the SBPs, SAR11_1179 and SAR11_1238, were predicted to represent proteins that are found widely in bacteria and bind inorganic solutes with high specificity: phosphate and iron(iii), respectively. Thus, the functional predictions for these two proteins were directly tested and confirmed by differential scanning fluorimetry (DSF) and isothermal titration calorimetry (ITC) (SAR11_1179; Extended Data Fig. 2a–c) or UV–vis spectroscopy (SAR11_1238, Extended Data Fig. 2d–i) rather than high-throughput screening.
In the remaining cases, the tentative function of each SBP was first identified by high-throughput screening of metabolite libraries by DSF. First, the target protein was screened by DSF against a commercially available metabolite library, representing a set of around 330 unique metabolites, including many common carbon, nitrogen, phosphorus and sulfur sources (full list in Supplementary Table 3). This library was supplemented with a manually curated set of around 40 metabolites that are known to be important for Ca. P. ubique and other marine bacteria26,27,28,29 (for example, osmolytes, sulfonates, and vitamin derivatives) or that were considered to be potential ligands on the basis of the computational annotations of SBP function (for example, opines). Metabolites that resulted in an increase in melting temperature (ΔTM) of the protein of at least 2 °C by DSF were considered to be potential ligands (Supplementary Table 4). Second, a representative subset of the resulting hits was selected, and binding of this subset of ligands to the target protein was confirmed and rank-ordered by repeating DSF with each ligand at a fixed concentration (10 mM) (Fig. 2). Finally, to provide further evidence that the observed increases in TM were a result of specific, high-affinity protein–ligand interactions rather than non-specific protein stabilization, the DSF experiments were repeated with a range of ligand concentrations (Supplementary Fig. 1). Using this workflow, tentative functions were identified for 15 SBPs—that is, all proteins that could be expressed and purified, except SAR11_1068. We showed previously that SAR11_1068 does not have the annotated function (cyclohexadienyl dehydratase activity) and reported extensive but ultimately unsuccessful efforts to identify its function30. The protein was subjected to further high-throughput screening in this work, but no potential ligands were identified.
Next, the function of each SBP was confirmed by ITC, enabling accurate quantification of binding affinity, which is important to establish the physiological relevance of the observed protein–ligand interactions. We typically performed titrations for 2 to 5 ligands for each protein, for a total of 32 protein–ligand interactions, aiming to select ligands with a range of ΔTM values to enable estimation of binding affinity for a broader range of ligands, on the basis of the assumption that protein–ligand interactions that yield similar ΔTM values in DSF experiments for a given protein are similar in binding affinity. Using ITC, most of the protein–ligand interactions identified by DSF could be verified (Fig. 2, Supplementary Fig. 2 and Supplementary Table 5), and at least one high-affinity ligand (with Kd < 500 nM) could be identified for each SBP, except SAR11_0271* and SAR11_0797 (Supplementary Note 1). The interaction between SAR11_1302 and TMAO was confirmed by ITC in another report while this work was in progress31. Thus, in total, based on the DSF and ITC data, 13 out of the 18 SAR11 SBPs of Ca. P. ubique HTCC1062 could be confidently assigned a binding function (Fig. 1 and Supplementary Table 6). Further evidence in support of the functional assignment for six SBPs could be obtained on the basis of a metabolome screening approach24 using X-ray crystallography and/or gas chromatography–mass spectrometry (GC–MS) (Extended Data Fig. 3 and Supplementary Note 2). In addition, the physiological relevance of the proposed binding functions is highlighted by the identification of SBPs that bind known substrates of Ca. P. ubique HTCC1062 such as amino acids, d-glucose, DMSP and taurine with high affinity3,11,12,13,14 and the fact the measured concentrations of these substrates in the surface ocean frequently exceed the measured Kd values of the corresponding SBPs (Supplementary Note 3 and Supplementary Table 7).
The SBPs of Ca. P. ubique HTCC1062 showed remarkably high binding affinities, with multiple SBPs having Kd values as low as 20–30 pM (Fig. 3a,b). Seven out of the thirteen SBPs had Kd values of <5 nM, which are below the quantification limit of direct ITC experiments; thus, to obtain accurate Kd values for these interactions, we also performed competitive ITC binding experiments (Supplementary Table 5). In the case of SAR11_1210, titration with l-arginine in the presence of d-octopine as a competing ligand indicated a Kd value between 10 pM and 100 pM for l-arginine, but variable results were obtained with different concentrations of d-octopine (Supplementary Data 1). Thus, to confirm the high affinity of this interaction, we also performed a protein–protein competition experiment in which SAR11_1210 was mixed with a previously characterized arginine-binding protein, ArgT from Salmonella enterica (Kd = 15 nM), and then titrated with l-arginine. Fitting the resulting data to a two-sets-of-sites binding model yielded a Kd of 32 pM for the interaction between SAR11_1210 and l-arginine (Fig. 3c). We also solved the crystal structure of SAR11_1210 complexed with l-arginine, which showed an unusual binding mode involving a direct interaction between the ligand and the flexible hinge region linking the two α/β domains of the SBP, suggesting a possible structural basis for the high binding affinity (Fig. 3e, Extended Data Fig. 4 and Supplementary Note 4). Finally, in the case of SAR11_0769, titration with d-glucose reproducibly yielded a biphasic binding isotherm (Fig. 3d), which most probably reflects differential binding of the α and β anomers of d-glucose, as supported by a crystal structure of SAR11_0769 complexed with β-d-glucose (Fig. 3f, Extended Data Fig. 5 and Supplementary Note 5). Fitting the ITC data to a competitive binding model enabled estimation of the upper limit of Kd (lower limit of affinity) for the high-affinity anomer as approximately 27 pM. A systematic survey of literature data (n = 206 SBPs) revealed that the typical range of SBP Kd values for organic solutes is 10–1,000 nM, with a lower limit of 200–400 pM (log10 Kd values −6.76 ± 1.15 (mean ± s.d.), Fig. 3a). Together, these results provide robust evidence that some SBPs in SAR11 bacteria exceed the previously established limits of SBP affinity for organic solutes.
Reinterpreting transport gene function
Comparison of the experimentally determined functions of each SBP with the homology-based predictions indicated that the accuracy of the predictions was low (Supplementary Table 6). The binding specificities of four proteins, SAR11_0807, SAR11_1179, SAR11_1203 and SAR11_1238, were correctly predicted as taurine, phosphate, citrate and iron(iii), respectively. The predictions of SAR11_0769, SAR11_0953 and SAR11_1346 as a sugar-binding protein, general amino acid-binding protein and branched-chain amino acid-binding protein were broadly correct, although experimental characterization enabled identification of specific ligands. By contrast, 7 out of the 15 testable functional annotations were incorrect; in 5 of these cases, the binding specificity could be determined experimentally. For example, SAR11_1336 (encoded by potD), which was annotated as a putative spermidine or putrescine-binding protein, showed broad specificity for glycine betaine, DMSP and other osmolytes. The binding specificity of this protein matches the transport activity of a broad-specificity osmolyte transporter that was previously characterized in vivo, which was putatively attributed to SAR11_0797 (proX)22. Overall, these results show that the SBP-dependent transporters of SAR11 bacteria transport a narrower range of nitrogen sources and broader range of carbon sources and exhibit less functional redundancy than predicted32.
The assignment of transport capabilities to specific genes enables integration of the functional data with existing genomic, transcriptomic, and proteomic data. For example, functional assignment of the Ca. P. ubique HTCC1062 SBPs enabled analysis of the geographical distribution of various transport capabilities across SAR11 and other marine bacteria using ocean metagenome and metatranscriptome data. First, using the Ocean Gene Atlas tool33, we analysed the abundance of homologues of the characterized SBPs in metagenome and metatranscriptome datasets from the Tara Oceans project (Extended Data Figs. 6 and 7), which enabled us to identify abundantly transcribed transport genes that might contribute to assimilation of various components of DOM and determine whether transport gene expression correlates with known patterns of nutrient limitation and uptake. We also performed a separate metagenome analysis limited to SAR11 bacteria, estimating the percentage of SAR11 bacteria to contain a given SBP gene at each site based on the relative abundance of different SAR11 genomospecies34, which yielded similar patterns of SBP abundance (Supplementary Fig. 3).
Consistent with the global abundance of SAR11 bacteria and the broad distribution of the characterized SBPs among SAR11 bacteria (Extended Data Fig. 1), and consistent with another recent analysis35, homologues of most of the Ca. P. ubique HTCC1062 SBP genes were present at high abundance across stations in the metagenome and metatranscriptome datasets, including surface, deep chlorophyll maximum (DCM) and mesopelagic samples (Extended Data Figs. 6 and 7). In the metatranscriptome dataset, the mean abundance (fraction of mapped reads) of these SBP genes across surface stations varied from 3 × 10−5 (SAR11_0655) to 3 × 10−3 (SAR11_0953); for comparison, the mean total abundance of SBP transcripts across surface stations was estimated to be 2.7 × 10−2, indicating that the SBPs of Ca. P. ubique HTCC1062 and their putatively isofunctional homologues account for a substantial proportion (around 40%) of SBP transcripts in the surface ocean. Although there is not necessarily a quantitative correlation between transcript abundance and the rate of substrate uptake, these results show qualitatively that the functional assignments of the Ca. P. ubique HTCC1062 SBPs are potentially significant in the broader context of global DOM assimilation. For example, the DMSP/glycine betaine transport gene SAR11_1336 showed a mean transcript abundance of 1.6 × 10−3. The high abundance of SAR11_1336 suggests that, together with a recently described and unrelated DMSP-specific transport protein36, SAR11_1336 and its homologues in other bacteria may make a significant contribution to global microbial uptake of DMSP, a metabolite that has an important role in the marine sulfur cycle and climate regulation via its microbial conversion to the climate-active gas dimethylsulfide37.
In both the metagenome and metatranscriptome analyses, transporters for sulfonates, amino acids, TMAO, glycine betaine, DMSP and dicarboxylates showed a near-universal distribution and particularly high abundance, whereas transporters for l-pyroglutamate, phosphate, iron(III) and d-glucose showed a geographically limited distribution (Extended Data Figs. 6 and 7). These results are consistent with the known contribution of SAR11 bacteria to uptake of taurine, amino acids and DMSP across different environments, compared with ecotype-specific and geographically variable uptake of d-glucose2. Similar patterns of SBP gene abundance were typically observed in the metagenome and metatranscriptome datasets, consistent with high constitutive expression and limited transcriptional regulation of most SBP genes in SAR11 bacteria, with the exception of SBPs for phosphate and iron, which showed higher expression in regions of known phosphate and iron limitation38,39. Notably, these interpretations of the metagenomic and metatranscriptomic data are contingent on accurate functional annotation; for example, misidentification of the SAR11 osmolyte transporter as SAR11_0797 would suggest a much more limited role for DMSP and glycine betaine uptake and broader role for polyamine uptake across SAR11 and other marine bacteria35,40.
Novel functions of SAR11 SBPs
In addition to identifying transporters for known substrates, functional characterization of SBPs also enabled identification of new transport capabilities. SAR11_0655 (l-pyroglutamate) and SAR11_1361 (C4 and C5 dicarboxylates) represent new classes of ABC transporters and previously unknown transport capabilities of SAR11 bacteria. l-Pyroglutamate, which binds to SAR11_0655 with Kd < 5 nM, was an unexpected ligand, as it is a non-proteogenic amino acid that is not known to be a significant component of DOC16. Although the occurrence of SAR11_0655 is limited among SAR11 ecotypes and is restricted mainly to high latitudes (Extended Data Figs. 1, 6 and 8), other SAR11 bacteria appear to achieve l-pyroglutamate uptake using an alternative transporter (Supplementary Note 6), suggesting that l-pyroglutamate is widely utilized. Analysis of genome context suggested a putative pathway for utilization of exogenous l-pyroglutamate as a source of l-glutamate (Extended Data Fig. 9). The fact that SAR11 bacteria, despite their extremely streamlined genome, retain specific and high-affinity transporters for l-pyroglutamate indicates that this amino acid must be a widely available and useful source of carbon and/or nitrogen in the ocean. More generally, given the significant challenges of identifying environmentally important metabolites in heterogeneous, dilute and variable DOC16, this result suggests that identification of new transport capabilities from characterization of SBPs from oligotrophic marine bacteria might be a useful approach to identify new environmentally significant ocean metabolites from the DOC pool.
SAR11_1361 showed binding of a broad range of dicarboxylates that participate in the tricarboxylic acid (TCA) cycle (Figs. 1 and 2). This gene is known to be associated with carbon starvation in SAR11 bacteria; transcription and/or expression is upregulated upon carbon limitation in the dark41 (that is, energy-starved conditions) and downregulated upon nitrogen and sulfur limitation42,43. Analysis of genome context also suggested a putative pathway (via SAR11_1354) for utilization of exogenous glutarate, which was subsequently confirmed to be a substrate of SAR11_1354 by 1H-NMR and a ligand of SAR11_1361 by DSF (Extended Data Fig. 9). These results, together with the identification of a specific and high-affinity citrate-binding protein (SAR11_1203), suggest a broad capacity of Ca. P. ubique HTCC1062 to assimilate dicarboxylates and TCA cycle intermediates. Genomic and biogeographical analysis indicated that the capability for dicarboxylate uptake is also widely distributed among SAR11 bacteria: the dicarboxylate transport protein SAR11_1361 shows a broader distribution among SAR11 genomes than the glucose transport protein SAR11_0769 (Fig. 4a,b and Extended Data Fig. 1), and shows a broader geographical distribution in the Tara Oceans metagenome and metatranscriptome datasets, including both coastal and open ocean samples (Fig. 4c–f), despite the fact that SAR11_1361 has a much more limited phylogenetic distribution among bacteria (Extended Data Fig. 10). In the context of uncertainty surrounding the carbon sources that are universal to SAR11 bacteria44 (Supplementary Note 7), the identification of an SBP with high affinity (Kd < 10 nM) and broad specificity for C4 and C5 dicarboxylates that is conserved among SAR11 ecotypes despite stringent genome streamlining, and widely distributed and highly transcribed throughout the ocean, provides strong evidence that these dicarboxylates are physiologically important carbon sources in SAR11 bacteria.
Specificity and affinity of SAR11 SBPs
Systematic characterization of SBPs provided a global view of transporter specificity and affinity in Ca. P. ubique HTCC1062, providing broad insight into the physiology of oligotrophic bacteria. It has long been hypothesized that oligotrophic bacteria with streamlined genomes, including SAR11 bacteria, rely on broad-specificity transporters to enable transport of a broad range of substrates with a limited number of transporters9. Indeed, a broad-specificity osmolyte transporter had previously been identified22, and three more broad-specificity SBPs for amino acids and dicarboxylates were characterized in this work. However, the majority of SBPs (at least 8 out of 13) showed high binding specificity, suggesting a more nuanced view of uptake specificity in oligotrophic bacteria (Supplementary Note 8). Genome streamlining results in reduction of metabolic genes in addition to transporter genes; thus, broad-specificity transporters are associated with a risk of futile uptake of metabolites that cannot be utilized, especially given the high compositional complexity of ocean DOC. Our results show that Ca. P. ubique HTCC1062 is highly selective in its substrate uptake, using a small number of broad-specificity transporters mainly for metabolites that can be utilized without dedicated catabolic pathways, including amino acids and TCA cycle intermediates. The remaining transporters show high specificity and mainly cover specific gaps in the broad-specificity transporters; indeed, there is little redundancy in binding specificity between the SBPs, except for some overlap between the two broad-specificity amino acid-binding proteins. The high specificity of these transporters does not result from a negative tradeoff between specificity and affinity; for example, SAR11_0953 is estimated to have nanomolar affinity for around 15 proteinogenic amino acids (on the basis of measured ΔTM and Kd values), with a maximum of 550 pM for l-glutamate, demonstrating that broad specificity is compatible with high affinity (Supplementary Note 9). Furthermore, three out of the four broad-specificity transporters (SAR11_0953, SAR11_1336 and SAR11_1346) appear to be widely distributed among Proteobacteria, indicating that use of broad-specificity transporters is not unique to oligotrophic bacteria (Extended Data Fig. 8 and Supplementary Fig. 4). Overall, these considerations suggest that oligotrophic bacteria probably show greater selectivity in substrate uptake than previously assumed.
Our results revealed that a systematic increase in SBP binding affinity is a major adaptation of Ca. P. ubique HTCC1062 to low substrate concentrations in the oligotrophic environment. The binding affinity of the Ca. P. ubique HTCC1062 SBPs was remarkably high on average, and substantially exceeded the known range of SBP binding affinity in some cases (Fig. 3a). Kd values in the picomolar to low nanomolar range were observed in most cases, in concordance with the picomolar to low nanomolar concentrations of amino acids and other substrates typically observed in the surface oligotrophic ocean45,46 (Supplementary Note 3 and Supplementary Table 7) and picomolar to low nanomolar uptake affinities (specifically, Ks + [S], the sum of the half-saturation constant (Ks) and the in situ substrate concentration) for various metabolites in environmental samples from the surface ocean45,47, for which the corresponding transport proteins have generally not been identified22. Although the SBPs may have slightly different properties in vitro compared with their native cellular environment, a strong correlation is generally observed between in vitro properties of SBPs and the in vivo properties of the corresponding transporters23. In addition, the physiological relevance of the observed binding affinity in SBPs is indicated by several considerations: (1) the observed Kd of 2.0 nM for the interaction between SAR11_1336 and glycine betaine is in excellent agreement with the previously measured Ks value for the corresponding transporter22 (0.89 nM); (2) mathematical models of ABC transporter activity indicate that uptake affinity should be greater than SBP binding affinity when SBP concentration is high48 (as in SAR11 bacteria); and (3) the binding affinity of an SBP has physiological significance itself, because it determines the concentration at which substrates can be accumulated in the periplasm49. Whereas previous work has shown how extreme selective pressure driven by large population size under low-nutrient conditions has driven systematic adaptation of SAR11 bacteria at the genome and cellular levels (for example, reduction of GC content9 and increase in periplasmic volume8), this work shows that systematic adaptation of the biophysical properties of SBPs is another important factor in the evolutionary success of SAR11 bacteria. We speculate that the evolutionary tradeoffs underlying ultra-high binding affinity in SBPs may also be an important factor shaping the physiology of SAR11 and other oligotrophic bacteria (Supplementary Note 10).
The identification of SBPs with unprecedented binding affinity in the genome of Ca. P. ubique HTCC1062 resolves uncertainty about the discrepancy between the observed affinity of substrate uptake by microbial communities in the ocean and the binding affinity of previously characterized transporters for substrate uptake. To explain this apparent discrepancy, various alternative mechanisms for high-affinity substrate uptake in oligotrophic bacteria have been proposed. For example, a recent modelling study showed that the uptake affinity of an ABC transporter depends on both SBP concentration and binding affinity, and suggested that oligotrophic bacteria might use high SBP expression to achieve high uptake affinity without increasing SBP binding affinity48; by contrast, our results show (without invalidating this model) that high uptake affinity can be explained without accounting for periplasmic SBP concentration. As another example, the observation that the binding affinity of known phosphate-binding proteins (around 1 µM) is much higher than concentrations of inorganic phosphate in phosphate-depleted regions (less than 5 nM) led to the proposal of an alternative mechanism for accumulation of inorganic phosphate in the periplasm of oligotrophic bacteria49,50. Of note, the phosphate-binding protein of Ca. P. ubique HTCC1062 does indeed have relatively low binding affinity (133 nM), which may reflect the challenge of discriminating phosphate from sulfate, which is present at a concentration of around 28 mM in the ocean; although phosphate-binding proteins from sulfate-rich environments can achieve a discrimination factor of greater than 105 (ref. 51), there is presumably a biophysical limit on discrimination of these two anions due to their physicochemical similarity. Consistent with the hypothesis that the binding affinity of phosphate-binding proteins may be constrained by the requirement for discrimination of phosphate and sulfate, SAR11_1179 showed a small but significant decrease in apparent binding affinity in the presence of 28 mM sulfate (6.7-fold decrease to 890 nM, P < 0.0001, two-tailed t-test on log10 Kd values (Extended Data Fig. 2c)).
Discussion
Systems-level approaches based on metatranscriptomics and related methods are highly valuable for profiling putative biological functions in complex microbial communities across different environments, providing insight into their ecological and biogeochemical functions52,53. However, a limitation of these methods is that they depend on homology-based predictions of protein function, which vary markedly in accuracy between protein families and are usually not validated19. Here we have shown how targeted functional characterization of environmentally abundant proteins can be integrated with existing multi-omics and physiological data to provide insight over multiple biological scales, ranging from mechanisms of functional adaptation at the molecular level to global patterns of substrate uptake capabilities in SAR11 bacteria. We anticipate that improved computational annotation and continued experimental annotation of protein function will be essential to extract maximum value from increasingly high-resolution ocean microbiome datasets and fulfil the broader goal in microbial ecology of bridging microbial gene function and ocean ecosystems biology on a planetary scale54,55.
Methods
Identification of SBP genes
Nineteen candidate SBP genes in the genome of Ca. P. ubique strain HTCC1062 were identified through a search of the TransportDB 2.0 database59 (http://membranetransport.org; accessed 22 January 2020). One of these genes, SAR11_0371, was annotated as a ‘possible transmembrane receptor’ in UniProt and showed a non-canonical predicted domain structure consisting of a short SBP-like domain (170 amino acids) followed by a coiled coil domain and unidentified C-terminal domain. Additionally, genome context analysis showed that, unlike the other ABC SBP genes in Ca. P. ubique HTCC1062, SAR11_0371 was not colocalized with genes encoding the membrane permease or ATP-binding cassette components of an ABC transport system. Thus, SAR11_0371 was considered not to represent the SBP component of an SBP-dependent transport system and was excluded from the analysis. We also attempted to identify additional SBP genes through a search of the UniProt database for proteins in Ca. P. ubique belonging to Pfam clans CL0177 (PBP; periplasmic binding protein) and CL0144 (Periplas_BP; periplasmic binding protein like); however, this search did not return any additional candidate genes.
Cloning
The protein sequence of each SBP from Ca. P. ubique HTCC1062 was obtained from the UniProt database. Signal sequences were predicted using the SignalP 5.0 server60 and removed. The protein sequences were then back-translated and codon-optimized for expression in E. coli, and the resulting genes were obtained as synthetic DNA from Twist Bioscience or Integrated DNA Technologies. The synthetic genes were cloned into the NdeI/XhoI site of the pET-28a(+) expression vector by In-Fusion cloning using the In-Fusion HD Cloning Kit (Takara Bio), yielding expression constructs with an N-terminal hexahistidine tag and thrombin tag. Correct assembly of each expression vector was confirmed by Sanger sequencing (FASMAC). The putative csiD gene, SAR11_1354, and several homologues of the Ca. P. ubique HTCC1062 SBPs (Supplementary Table 8) were cloned similarly into the pET-28a(+) vector, except that the thrombin tag was removed from the constructs of SAR11_1354, SAR11_0266 (Fub), or SAR11_1290 (SAR324). The sequences of oligonucleotides and synthetic genes used in this study are listed in Supplementary Table 9.
Optimization of protein expression
Protein expression was initially tested in E. coli BL21(DE3) cells grown in Luria-Bertani (LB) and Terrific Broth (TB) media at 30 °C and 17 °C. SAR11_0655 showed optimal soluble expression in LB medium at 17 °C, SAR11_1203 showed optimal soluble expression in TB medium at 30 °C, and 7 proteins (SAR11_0797, SAR11_0807, SAR11_0864, SAR11_1068, SAR11_1179, SAR11_1210, SAR11_1238, and SAR11_1361) showed optimal soluble expression in TB medium at 17 °C. Next, the remaining proteins were tested for expression in E. coli SHuffle T7 cells (New England Biolabs) in TB medium at 17 °C; this strain expresses the disulfide bond isomerase DsbC, which can increase soluble recombinant expression of cytoplasmic proteins by promoting correct formation of disulfide bonds. Soluble expression of SAR11_0769, SAR11_0953, SAR11_1302, and SAR11_1336 was achieved under these conditions. Due to the lack of soluble expression for the remaining four proteins (SAR11_0266, SAR11_0271, SAR11_1290 and SAR11_1346), we also tested expression of one or two close homologues of each protein (Supplementary Table 8). The SAR11_0271 homologue from ‘Ca. Pelagibacter’ sp. HIMB1321 (denoted SAR11_0271*) could be expressed in soluble form in SHuffle T7 cells in TB medium at 17 °C, while the SAR11_1346 homologue from the same species (denoted SAR11_1346*) could be expressed in soluble form in BL21(DE3) cells in TB medium at 17 °C. SAR11_0271* and SAR11_1346* share 91.4% and 88.9% sequence identity, respectively, with the corresponding proteins from Ca. P. ubique HTCC1062, and the binding site residues are completely conserved (Supplementary Fig. 5), indicating that the functions and properties of the homologous SBPs are likely to be identical. Neither homologue of SAR11_0266 or SAR11_1290 could be expressed in soluble form in BL21(DE3) or SHuffle T7 cells. Expression of SAR11_0266 and SAR11_1290 without His6 or thrombin tags also yielded insoluble protein.
Protein expression was typically evaluated by SDS–PAGE analysis as follows. Cells transformed with the relevant expression vector by electroporation were spread from a frozen glycerol stock onto an LB agar plate containing 0.2% (w/v) glucose and 25 µg ml−1 kanamycin and incubated at 30 °C overnight. The cells were then scraped into a small volume of LB medium and used to inoculate 3 ml of the relevant growth medium containing 25 µg ml−1 kanamycin in a 10 ml round bottom tube at a starting OD600 of 0.05. The culture was incubated at 37 °C with shaking at 220 rpm until the OD600 reached 0.5. One-millilitre aliquots were transferred to clean round bottom tubes and isopropyl β-d-1-thiogalactopyranoside (IPTG) was added to a final concentration of 0.5 mM. The induced cultures were incubated with shaking at 220 rpm at 17 °C overnight or 30 °C for 3 h. A 500-µl aliquot of each culture was resuspended in lysis buffer (20 mM Tris, 0.5 M NaCl, 1% (v/v) Triton X-100, pH 8.0) and incubated at room temperature for 10 min. The cell lysate was centrifuged at 21,000g for 5 min (4 °C). The soluble fraction of the cell lysate was transferred to a tube containing 30 µl cOMPLETE His-Tag purification Ni-NTA resin (Roche) suspended in 500 µl buffer A (8 M urea, 20 mM Tris, 0.5 M NaCl, pH 8.0), while the insoluble fraction of the cell lysate was dissolved in 500 µl buffer A, centrifuged at 21,000g for 5 min, and then transferred to a tube containing 30 µl Ni-NTA resin suspended in 500 µl buffer A. In both cases, the resin was incubated at room temperature for 10 min, washed twice with 500 µl buffer A, and then eluted by incubation with 50 µl buffer B (8 M urea, 20 mM Tris, 0.5 M NaCl, 0.5 M imidazole, pH 8.0) at room temperature for 5 min. Fifteen microliters of supernatant was mixed with 5 µl of 4× SDS–PAGE sample loading buffer and heated at 90 °C for 10 min, then loaded onto a 4–15% pre-cast SDS–PAGE gel (Bio-Rad). The gel was run at 200 V for 30 min and visualized with Coomassie Blue.
Large-scale protein expression and purification
For expression and purification of the Ca. P. ubique SBPs, E. coli BL21(DE3) or SHuffle T7 cells transformed with the relevant expression vector were spread from a frozen glycerol stock onto an LB agar plate containing 0.2% (w/v) glucose and 25 µg ml−1 kanamycin, and incubated at 30 °C overnight. The cells were then scraped into 3 ml LB medium, and 500 µl of the resulting cell suspension was used to inoculate 500 ml LB or TB medium supplemented with 25 µg ml−1 kanamycin in a 2 l or 3 l flask, preheated at 37 °C. The culture was incubated at 37 °C with shaking at 220 rpm until the OD600 reached 0.5, then cooled briefly in an ice-water bath until the temperature reached ~25 °C. IPTG was added to a concentration of 0.5 mM, and the culture was incubated at 17 °C with shaking at 220 rpm for a further 16 h. Cells were pelleted by centrifugation (3,300g, 15 min, 4 °C) and frozen at −20 °C until use. For protein purification, cells were thawed on ice, resuspended in 100 ml Ni binding buffer (20 mM Tris, 500 mM NaCl, 20 mM imidazole, pH 8.0), and lysed by sonication. After addition of 500 U Benzonase Nuclease (Sigma-Aldrich) to digest DNA, the cell lysate was centrifuged at 10,000g for 1 h (4 °C). The supernatant was filtered through a 0.45-µm syringe filter and then loaded onto a 1 ml HisTrap HP column (Cytiva) equilibrated with Ni wash buffer using an ÄKTA Pure FPLC system (Cytiva). For purification under native conditions, the column was washed with 10 ml Ni binding buffer followed by 10 ml Ni wash buffer (20 mM Tris, 500 mM NaCl, 44 mM imidazole, pH 8.0), and then the target protein was eluted in 10 ml Ni elution buffer (20 mM Tris, 500 mM NaCl, 500 mM imidazole, pH 8.0). For purification under denaturing conditions, the column was washed with denaturing Ni binding buffer (8 M urea, 20 mM Tris, 250 mM NaCl, 20 mM imidazole, pH 8.0) at 1 ml min−1 for 30 min after loading of the clarified cell lysate, and the target protein was eluted with 10 ml denaturing Ni elution buffer (8 M urea, 20 mM Tris, 250 mM NaCl, 250 mM imidazole, pH 8.0). Proteins purified under native conditions were concentrated to 400 µl using a 10 kDa molecular weight cut-off (MWCO) Amicon Ultra-4 centrifugal spin concentrator (Merck-Millipore) and purified by size-exclusion chromatography using a Superdex 200 Increase 10/300 column (Cytiva), eluting in DSF buffer (20 mM HEPES, 0.3 M NaCl, pH 7.50). For storage, proteins were concentrated to a volume of 0.5–2 ml and glycerol was added to a concentration of 10% (v/v). The protein was then flash-frozen in 100–200-µl aliquots in liquid nitrogen and stored at −80 °C until use. ArgT from S. enterica was expressed from a pETMCSIII plasmid and purified as described previously61.
Protein refolding
In most cases, protein purified under denaturing conditions was diluted to a concentration of 0.5 mg ml−1 and volume of 10–30 ml in denaturing Ni binding buffer (8 M urea, 20 mM Tris, 250 mM NaCl, 20 mM imidazole, pH 8.0) and transferred to 10 kDa MWCO SnakeSkin dialysis tubing (Thermo Scientific). The protein was then dialysed against 2 l dialysis buffer (20 mM Tris, 150 mM NaCl, pH 8.0) at 4 °C with three buffer changes over a period of 24 h. The protein was collected and exchanged into DSF buffer using a 10 kDa MWCO Amicon Ultra-15 centrifugal concentrator, then concentrated to 400 µl and purified by size-exclusion chromatography as described above. For SAR11_1346*, an improved yield of monomeric protein was obtained using the rapid dilution for refolding: 2 ml of denatured protein (5 mg ml−1 in denaturing Ni binding buffer) was added dropwise with stirring to 40 ml pre-chilled refolding buffer (20 mM Tris, 150 mM NaCl, 10% (v/v) glycerol, pH 8.0) and incubated at 4 °C with stirring for 20 h. The protein was then concentrated and purified by size-exclusion chromatography as above.
Differential scanning fluorimetry
DSF experiments were performed using a StepOnePlus Real-Time PCR System and StepOne software (Applied Biosystems) based on literature protocols62,63. Reaction mixtures were prepared in twin.tec Real-Time PCR Plates (Eppendorf) and contained 5× SYPRO Orange (Sigma-Aldrich), 2.5 µM protein, and 2 µl 10× ligand in a total volume of 20 µl DSF buffer. The plate was sealed with optically clear sealing film and centrifuged at 2,000g for 1 min before loading into the real-time PCR instrument. The temperature was ramped at a rate of 1% (approximately 1.33 °C min−1), typically over a 60 °C window centred on the melting temperature (TM) of the target protein. Fluorescence was monitored using the ROX channel. TM values were determined by taking the derivative of fluorescence intensity with respect to temperature and fitting the resulting data to a quadratic equation in a 6 °C window in the vicinity of the TM in R software.
Proteins were initially screened for binding to metabolites in four Phenotype MicroArray plates, PM1 to PM4 (Biolog). The contents of each well were dissolved in 50 µl (PM1 to PM3) or 20 µl (PM4) sterile filtered water, giving a concentration of approximately 10–20 mM in each well63. The plates were then sealed with aluminium sealing films and stored at −80 °C. Prior to use, the plates were thawed at room temperature and then shaken at 30 °C until the compounds had redissolved. Two microliters of each compound was added to 18 µl reaction mixture prepared as described above. A 2 °C increase in TM compared with the median value across the plate was taken as indicative of binding63,64.
For screening of individual compounds and confirmatory assays, compounds were dissolved at a concentration of 100 mM in ligand buffer (0.1 M HEPES pH 7.5), and the pH was adjusted with 1 M NaOH or 1 M HCl if necessary (specifically, if the pH of a 10 mM solution of the compound diluted in DSF buffer fell outside the range 6.5–8.0). These stock solutions were stored at −20 °C. Two microlitres of each compound was directly added to 18 µl reaction mixture, giving a final concentration of 10 mM, or first diluted 10-fold or 100-fold in DSF buffer to give final concentrations of 1 mM or 0.1 mM in the assay. A list of chemicals used for screening, including the supplier and catalogue number, is provided in Supplementary Table 3. Sodium (R)- and (S)-2,3-dihydroxypropane-1-sulfonate were synthesized from (R)- and (S)-3-chloro-1,2-propanediol following a literature protocol65 and verified by 1H and 13C NMR.
In the case of the TRAP and TTT SBPs, SAR11_0864 and SAR11_1203, we hypothesized that a metal ion might be required for high-affinity binding, due to the biphasic melting curve observed in the presence of isethionate in Biolog screening experiments, suggesting the presence of a mixture of active and inactive protein (SAR11_0864) or due to the discord between the highly charged ligand and the largely uncharged binding site of the SBP (SAR11_1203). Therefore, we tested the effect of the addition of metal ions (Mg2+, Ca2+, K+, Zn2+, Mn2+, Co2+, Ni2+, Fe2+ and Fe3+) on binding of isethionate to SAR11_0864 and citrate to SAR11_1203 by DSF (Supplementary Fig. 6). DSF experiments were performed using refolded protein as described above, with the addition of 1 mM metal ion and 1 mM ligand. Based on these results, and considering the concentration of each metal ion in seawater66, 10 mM CaCl2 (SAR11_0864) or 53 mM MgSO4 (SAR11_1203) were included in subsequent DSF and ITC binding experiments for these SBPs.
Isothermal titration calorimetry
ITC experiments were performed using a MicroCal PEAQ-ITC system (Malvern Panalytical). Protein samples were refolded and freshly purified (not frozen), and protein and ligand samples were prepared in the same batch of DSF buffer used for size-exclusion chromatography to minimize the heat of dilution. For SAR11_0864 and SAR11_1203, calcium chloride (final concentration 10.3 mM) or magnesium sulfate (final concentration 53 mM), respectively, was added to the protein and ligand samples. Experiments were performed at 25 °C with stirring at 700 rpm and 10 µcal s−1 reference power. Titration parameters were varied depending on the protein yield, the fraction of active protein, and the affinity and enthalpy of the interaction. In a typical titration, 35 µM protein was titrated with 1× 0.4-µl and 19× 1.6-µl injections of ligand, with the ligand concentration chosen to give >1.5-fold molar excess of ligand to active protein at the end of the titration. ITC experiments were generally performed at least in duplicate.
For simple 1:1 binding interactions, the association constant (Ka), enthalpy (ΔH), and stoichiometry (n) of the interaction were determined by fitting the data to the one-set-of-sites model in MicroCal PEAQ-ITC analysis software. In the case of the SAR11_0769 + d-glucose interaction, thermodynamic parameters were estimated through Bayesian fitting to a modified competitive binding model, which incorporated an additional parameter to account for the fraction of the ligand in each anomeric form, and a two-sets-of-sites model implemented in pytc software67; the latter model is equivalent to the two-sets-of-sites model in the MicroCal software, except without the minor correction for heat associated with the displaced volume for each injection (for consistency with the other models in pytc). Thermodynamic parameters for the SAR11_0953 + l-glutamate, SAR11_1203 + citrate, SAR11_1210 + l-arginine, SAR11_1336 + glycine betaine, and SAR11_1346* + l-leucine interactions were determined through competitive displacement experiments68, in which l-phenylalanine, cis-aconitate, d-octopine, glycine, or l-serine (respectively) were included at a fixed concentration in the cell to reduce the apparent binding affinity for the ligand of interest. The data for these competitive binding experiments were analysed by Bayesian fitting to the competitive binding sites model in pytc software. To confirm the high affinity of the SAR11_1210 + l-arginine interaction, a competitive binding experiment was performed where SAR11_1210 and ArgT from S. enterica (which has a Kd of 15 nM for l-arginine) were included in the cell together at the same concentration (28 µM) and titrated with l-arginine. Similarly, for the SAR11_1210(E108A) + l-arginine interaction, a mixture of SAR11_1210(E108A) and SAR11_1210 (35 µM each) was titrated with l-arginine. For these titrations, the data was fitted to a two-sets-of-sites binding model as described above to obtain thermodynamic parameters for both protein–ligand interactions. For all analyses, the heat of dilution was assumed to be a small constant value and included as a fitted parameter in the model. The validity of this assumption was confirmed for each ligand by performing a control titration where the ligand was injected into DSF buffer.
Spectrophotometric analysis of iron(iii) binding
Binding of iron(iii) to SAR11_1238 was analysed using a spectrophotometric assay based on literature protocols69,70. UV–vis spectra were recorded at room temperature (25 °C) in a 96-well plate from 300 nm to 630 nm with 1 nm bandwidth using a Multiskan GO spectrophotometer (Thermo Scientific). An initial protein concentration of 100 µM and an initial volume of 200 µl were used for all spectrophotometric assays. First, purified SAR11_1238 was thawed and exchanged into 50 mM Tris, 200 mM NaCl buffer (pH 8.0) using a centrifugal concentrator, and the spectrum of the resulting protein sample was recorded. To prepare unliganded protein for iron-binding assays, the protein was exchanged into 50 mM Tris, 200 mM NaCl, 20 mM sodium citrate buffer (pH 8.0) by three rounds of 30-fold dilution and concentration, allowing chelation and removal of the metal ligand. Citrate was then removed by four rounds of 30-fold dilution and concentration with 50 mM Tris, 200 mM NaCl buffer (pH 8.0). Binding assays were performed by titrating the unliganded protein (200 µl of 100 µM solution) with 8× or 10× 5-µl injections of 800 µM iron(iii) solution, which was prepared from iron(iii) chloride and a 2.5-fold molar excess of trisodium citrate (which ensures that the iron(iii) remains soluble) in ultrapure water. To confirm that SAR11_1238 binds iron(iii) rather than the iron(iii)–citrate complex, the protein was also titrated under the same conditions with 800 µM ammonium iron(II) sulfate; under the aerobic conditions of the assay, iron(ii) is rapidly oxidized to iron(iii)69. UV–vis spectra were recorded 1 min (iron(ii)) or 15 min (iron(iii)) after each injection. Finally, a competitive binding assay with citrate was used to estimate the affinity of SAR11_1238 for iron(iii). The protein was saturated with a twofold molar excess of iron(iii) solution, diluted to a volume of 1 ml, and then dialysed against 500 ml of 50 mM Tris, 200 mM NaCl buffer (pH 8.0) at 4 °C overnight to remove excess iron(iii) and citrate. The protein was then concentrated to 100 µM and titrated with 5-µl injections of 8 twofold serial dilutions of 500 mM sodium citrate (adjusted to pH 8.0 in 50 mM Tris, 200 mM NaCl buffer). The absorbance at 440 nm was recorded 5 min after each addition. The data were fitted to a hyperbolic curve, yielding an apparent Kd of 9.0 mM for citrate. Given that citrate has a Kd of ~10−17 M for iron(iii), this implies that SAR11_1238 has a Kd for iron(iii) on the order of ~10−19 M, similar to previously characterized iron(iii)-binding proteins70,71.
X-ray crystallography
For the SAR11_0769/d-glucose and SAR11_1210/l-arginine structures, the proteins were first expressed and purified by nickel affinity chromatography under native conditions as described above. After addition of a 20-fold molar excess of d-glucose (SAR11_0769) or l-arginine (SAR11_1210), the protein was purified further by size-exclusion chromatography on a HiLoad 26/600 Superdex 75 pg column (Cytiva), eluting in 3× crystallization buffer (60 mM HEPES, 150 mM NaCl, pH 7.5). Fractions containing the target protein were collected, and d-glucose (SAR11_0769) or l-arginine (SAR11_1210) was added to a concentration of 30 µM. The protein was concentrated to a volume of ~500 µl, diluted threefold in water to reduce the NaCl concentration to 50 mM, and then concentrated further to 12 mg ml−1. For the SAR11_0769/d-galactose and SAR11_0655/l-pyroglutamate structures, the proteins were expressed and purified in the same way, except that no ligands were added. Protein crystals were obtained using the vapour diffusion method in hanging drops at 20 °C, then cryoprotected and flash-frozen in liquid nitrogen. Crystallization and cryoprotection conditions for each protein are given in Supplementary Methods. X-ray diffraction data were collected on beamline BL32XU at the SPring-8 synchrotron (Harima, Japan), using the ZOO suite for automated data collection72. The data were automatically indexed, integrated, scaled and merged in XDS73 using KAMO74. The structure was solved by molecular replacement in Phaser75 or MOLREP76. For SAR11_1210, the structure of an opine-binding protein from Agrobacterium fabrum (PDB ID 5OT8) was used as a search model; in the remaining cases, an AlphaFold2 model was used77. The structures were then refined by iterative real-space and reciprocal-space refinement in REFMAC78, Phenix79, and COOT80. Data collection and refinement statistics are given in Supplementary Table 10 and Supplementary Table 11. Structures were visualized in Pymol.
Gas chromatography–mass spectrometry
SBPs purified under native conditions were exchanged into 200 mM ammonium acetate using a PD-10 desalting column (Cytiva) and concentrated to ~1 mM. A 10-nmol aliquot of protein was mixed with 10 µl of 300 µM α-methylglucopyranoside (as an internal control) and 200 µl methanol. The mixture was agitated at 1500 rpm at 24 °C for 10 min and then centrifuged at 21,000g for 20 min at 4 °C. The supernatant was evaporated to dryness using a vacuum evaporator, redissolved in 20 µl anhydrous pyridine, and derivatized by addition of 30 µl N-methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA) containing 1% trimethylchlorosilane (Supelco) followed by incubation at 70 °C for 1 h. In the case of SAR11_1361, the dried sample was instead dissolved in 20 µl of 20 mg ml−1 methoxyamine hydrochloride in anhydrous pyridine and incubated at 37 °C for 90 min with agitation at 750 rpm before addition of the MSTFA mixture. The derivatized samples were injected immediately onto an Agilent 7890 A GC System (Agilent Technologies) equipped with a PAL COMBI-XT autosampler (CTC Analytics) and connected to a PEGASUS 4D GC×GC TOF-MS instrument (LECO) operating in one-dimensional mode. The GC was fitted with a DB-1MS column (Agilent Technologies) with 30 m length, 0.25 mm internal diameter, and 0.25 µm film thickness. The instrument was operated in pulsed split mode with a split ratio of 2 and injection volume of 1 µl. The inlet temperature was 250 °C. Helium was used as the carrier gas with a flow rate of 1 ml min−1. The GC oven temperature was held at 70 °C for 5 min, then raised at 12 °C min−1 to 300 °C, and finally held at 300 °C for 10 min. Mass spectrometry data were collected from 50 to 500 m/z after a 6.5-min solvent delay. The ion source and transfer line temperatures were 250 °C and the ionization energy was 70 eV. Data analysis and spectral database searches against the NIST database were performed using ChromaTOF software (LECO). Protein-derived samples were analysed before control samples to prevent carryover.
Biogeographical analysis
Biogeographical analysis was performed using the Ocean Gene Atlas v2.0 server33. Abundance data for each SBP gene from Ca. P. ubique HTCC1062 in the Tara Oceans OM-RGC_v2_metaG and OM-RGC_v2_metaT datasets was obtained through a BLAST search with a stringent e-value threshold of 10−30. To avoid inclusion of homologous SBPs with different transport functions, hits with a sequence identity of less than 40% (for ABC SBPs) or 55% (for TRAP and TTT SBPs) compared with the corresponding HTCC1062 SBP were excluded from the analysis.
To estimate the total abundance of SBP transcripts, abundance data for each of the 38 PFAM families in CL0177 (PBP; periplasmic binding protein) and CL0144 (Periplas_BP; periplasmic binding protein like), excluding the transferrin family (PF00405) and any families that contain solely enzymes or transcription factors (PF00800, PF01379, PF01634, PF02621, PF03466, PF09084), were obtained using a hmmer search of the OM-RGC_v2_metaT dataset with an e-value threshold of 10−10. Hits were obtained for 26 out of 31 PFAM families. For each PFAM family, the corresponding hidden Markov model (HMM) was obtained from the InterPro database81. The protein sequences from the hmmer search were then aligned to this HMM using hmmalign and used to construct a new HMM using hmmbuild in HMMER3.4 (http://hmmer.org). A second hmmer search of the OM-RGC_v2_metaT dataset, with a lower e-value threshold of 10−5, was then conducted using the resulting HMM. The hits from all 52 searches were combined and redundant hits were removed, resulting in a total of 211,222 unique SBP genes. The two-step search recovered 94% of the 23,879 genes identified as homologues of the Ca. P. ubique HTCC1062 SBPs in the BLAST analysis before application of a sequence identity threshold; the remaining 1267 genes were also added to the list of SBP genes. Finally, the total abundance of SBP genes at each site was calculated.
To estimate the percentage of SAR11 bacteria at a site containing a given SBP from Ca. P. ubique HTCC1062, we used the recruitment values of 159 SAR11 genomes in the Tara Ocean metagenome dataset calculated by Haro-Moreno et al.34. The presence of a homologue of each SBP in each of the corresponding genomes was determined by BLAST using a 50% sequence identity and 50% coverage threshold. The relative abundance of SAR11 bacteria containing a given SBP homologue was then calculated for each station. Plots were generated using R and GraphPad Prism.
Phylogenetic analysis
Protein sequences homologous to the SBP of interest were identified via a BLAST search of the UniProtKB Reference Proteomes and Swiss-Prot databases82. The resulting sequences were filtered to remove a small number of unusually long sequences (>20% greater than mean length) and aligned in MUSCLE v3.8.3183. The alignment was trimmed in trimAl v1.2 using the automated1 option84 and then used to generate a maximum-likelihood phylogeny in FastTree v2.1.11, using LG + Γ20 as the substitution model85. For each protein sequence in the tree, the fraction of conserved binding site residues, compared with the corresponding protein from Ca. P. ubique HTCC1062, was estimated. The binding site residues were obtained from the crystal structure (SAR11_0769) or estimated from an AlphaFold2 model86,87. For this analysis, the following substitutions were treated as conservative: S/T, I/M, V/L, I/V, L/M, D/E, Q/N, A/V, F/Y, Y/W, F/W. Phylogenetic tree figures were generated using the ggtree package in R88. Figures showing taxonomic distribution (Extended Data Fig. 8b) were generated using Krona89.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Coordinates and structure factors for each crystal structure have been deposited in the Protein Data Bank under the following accession codes: SAR11_0655–l-pyroglutamate (8WCH), SAR11_0769–d-glucose (8HQQ), SAR11_0769–d-galactose (8KD0), SAR11_1210–l-arginine (8HQR). Experimental protein structures referenced in the text are available from the Protein Data Bank under accession codes 1EU8, 1GLG, 2B3B, 2B3F, 2FVY, 2PFY, 2PFZ, 2Q2A, 3OO6, 3ZKK, 4R2B, 4UAB, 4UA8, 5DVI, 5DVJ, 6WGM. Structural models are available from the AlphaFold Protein Structure Database using the following UniProt accessions: SAR11_0266, Q4FP02; SAR11_0271, Q4FNZ7; SAR11_0655, Q4FMW4; SAR11_0769, Q4FMK2; SAR11_0953, Q4FM26; SAR11_1290, Q4FL44; SAR11_1336, Q4FKZ8; SAR11_1346, Q4FKY9; SAR11_1361, Q4FKX4; SAR11_0271*, A0A1X7H7N5; SAR11_1346*, A0A1X7H1Y9; A2cp1_3084, B8JG16. Raw ITC data, DSF data for Biolog assays, GC–MS data, phylogenetic and biogeographical data, and source data for figures are available via the Open Science Framework (https://doi.org/10.17605/OSF.IO/47TR5).
Code availability
Scripts used for data analysis are available via the Open Science Framework (https://doi.org/10.17605/OSF.IO/47TR5).
References
Morris, R. M. et al. SAR11 clade dominates ocean surface bacterioplankton communities. Nature 420, 806–810 (2002).
Giovannoni, S. J. SAR11 bacteria: the most abundant plankton in the oceans. Ann. Rev. Mar. Sci. 9, 231–255 (2017).
Carini, P., White, A. E., Campbell, E. O. & Giovannoni, S. J. Methane production by phosphate-starved SAR11 chemoheterotrophic marine bacteria. Nat. Commun. 5, 4346 (2014).
Sun, J. et al. The abundant marine bacterium Pelagibacter simultaneously catabolizes dimethylsulfoniopropionate to the gases dimethyl sulfide and methanethiol. Nat. Microbiol. 1, 16065 (2016).
Sowell, S. M. et al. Transport functions dominate the SAR11 metaproteome at low-nutrient extremes in the Sargasso Sea. ISME J. 3, 93–105 (2009).
Sowell, S. M. et al. Proteomic analysis of stationary phase in the marine bacterium “Candidatus Pelagibacter ubique”. Appl. Environ. Microbiol. 74, 4091–4100 (2008).
Schattenhofer, M. et al. Latitudinal distribution of prokaryotic picoplankton populations in the Atlantic Ocean. Environ. Microbiol. 11, 2078–2093 (2009).
Zhao, X. et al. Three-dimensional structure of the ultraoligotrophic marine bacterium “Candidatus Pelagibacter ubique”. Appl. Environ. Microbiol. 83, e02807-16 (2017).
Giovannoni, S. J. et al. Genome streamlining in a cosmopolitan oceanic bacterium. Science 309, 1242–1245 (2005).
Tripp, H. J. et al. Unique glycine-activated riboswitch linked to glycine-serine auxotrophy in SAR11. Environ. Microbiol. 11, 230–238 (2009).
Tripp, H. J. et al. SAR11 marine bacteria require exogenous reduced sulphur for growth. Nature 452, 741–744 (2008).
Malmstrom, R. R., Kiene, R. P., Cottrell, M. T. & Kirchman, D. L. Contribution of SAR11 bacteria to dissolved dimethylsulfoniopropionate and amino acid uptake in the North Atlantic Ocean. Appl. Environ. Microbiol. 70, 4129–4135 (2004).
Malmstrom, R. R., Cottrell, M. T., Elifantz, H. & Kirchman, D. L. Biomass production and assimilation of dissolved organic matter by SAR11 bacteria in the Northwest Atlantic Ocean. Appl. Environ. Microbiol. 71, 2979–2986 (2005).
Clifford, E. L. et al. Taurine is a major carbon and energy source for marine prokaryotes in the North Atlantic Ocean off the Iberian Peninsula. Microb. Ecol. 78, 299–312 (2019).
Alonso, C. & Pernthaler, J. Roseobacter and SAR11 dominate microbial glucose uptake in coastal North Sea waters. Environ. Microbiol. 8, 2022–2030 (2006).
Moran, M. A. et al. Microbial metabolites in the marine carbon cycle. Nat. Microbiol. 7, 508–523 (2022).
Davies, J. S. et al. Selective nutrient transport in bacteria: multicomponent transporter systems reign supreme. Front. Mol. Biosci. 8, 699222 (2021).
Mulligan, C., Fischer, M. & Thomas, G. H. Tripartite ATP-independent periplasmic (TRAP) transporters in bacteria and archaea. FEMS Microbiol. Rev. 35, 68–86 (2011).
Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 5, e1000605 (2009).
Schroer, W. F. et al. Functional annotation and importance of marine bacterial transporters of plankton exometabolites. ISME Commun. 3, 37 (2023).
Ford, B. A. et al. Functional characterisation of substrate-binding proteins to address nutrient uptake in marine picocyanobacteria. Biochem. Soc. Trans. 49, 2465–2481 (2021).
Noell, S. E. & Giovannoni, S. J. SAR11 bacteria have a high affinity and multifunctional glycine betaine transporter. Environ. Microbiol. 21, 2559–2575 (2019).
Davidson, A. L., Dassa, E., Orelle, C. & Chen, J. Structure, function, and evolution of bacterial ATP-binding cassette systems. Microbiol. Mol. Biol. Rev. 72, 317–364 (2008).
Vetting, M. W. et al. Experimental strategies for functional annotation and metabolism discovery: targeted screening of solute binding proteins and unbiased panning of metabolomes. Biochemistry 54, 909–931 (2015).
Carter, M. S. et al. Functional assignment of multiple catabolic pathways for D-apiose. Nat. Chem. Biol. 14, 696–705 (2018).
Carini, P. et al. Discovery of a SAR11 growth requirement for thiamin’s pyrimidine precursor and its distribution in the Sargasso Sea. ISME J. 8, 1727–1738 (2014).
Durham, B. P. et al. Sulfonate-based networks between eukaryotic phytoplankton and heterotrophic bacteria in the surface ocean. Nat. Microbiol. 4, 1706–1715 (2019).
Carini, P., Steindler, L., Beszteri, S. & Giovannoni, S. J. Nutrient requirements for growth of the extreme oligotroph “Candidatus Pelagibacter ubique” HTCC1062 on a defined medium. ISME J. 7, 592–602 (2013).
Sun, J. et al. One carbon metabolism in SAR11 pelagic marine bacteria. PLoS ONE 6, e23973 (2011).
Clifton, B. E. et al. Evolution of cyclohexadienyl dehydratase from an ancestral solute-binding protein. Nat. Chem. Biol. 14, 542–547 (2018).
Gao, C. et al. Characterization of the trimethylamine N-oxide transporter from Pelagibacter strain HTCC1062 reveals its oligotrophic niche adaption. Front. Microbiol. 13, 838608 (2022).
Jiao, N. & Zheng, Q. The microbial carbon pump: from genes to ecosystems. Appl. Environ. Microbiol. 77, 7439–7444 (2011).
Vernette, C. et al. The Ocean Gene Atlas v2.0: online exploration of the biogeography and phylogeny of plankton genes. Nucleic Acids Res. 50, W516–W526 (2022).
Haro-Moreno, J. M. et al. Ecogenomics of the SAR11 clade. Environ. Microbiol. 22, 1748–1763 (2020).
Zhao, Z., Amano, C., Reinthaler, T., Orellana, M. V. & Herndl, G. J. Substrate uptake patterns shape niche separation in marine prokaryotic microbiome. Sci. Adv. 10, eadn5143 (2024).
Li, C.-Y. et al. Ubiquitous occurrence of a dimethylsulfoniopropionate ABC transporter in abundant marine bacteria. ISME J. 17, 579–587 (2023).
Curson, A. R. J., Todd, J. D., Sullivan, M. J. & Johnston, A. W. B. Catabolism of dimethylsulphoniopropionate: microorganisms, enzymes and genes. Nat. Rev. Microbiol. 9, 849–859 (2011).
Moore, C. M. et al. Processes and patterns of oceanic nutrient limitation. Nat. Geosci. 6, 701–710 (2013).
Ustick, L. J. et al. Metagenomic analysis reveals global-scale patterns of ocean nutrient limitation. Science 372, 287–291 (2021).
Noell, S. E. et al. SAR11 cells rely on enzyme multifunctionality to metabolize a range of polyamine compounds. mBio 12, e0109121 (2021).
Steindler, L., Schwalbach, M. S., Smith, D. P., Chan, F. & Giovannoni, S. J. Energy starved Candidatus Pelagibacter ubique substitutes light-mediated ATP production for endogenous carbon respiration. PLoS ONE 6, e19725 (2011).
Smith, D. P. et al. Proteomic and transcriptomic analyses of “Candidatus Pelagibacter ubique” describe the first PII-independent response to nitrogen limitation in a free-living Alphaproteobacterium. mBio 4, e00133–12 (2013).
Smith, D. P. et al. Proteome remodeling in response to sulfur limitation in “Candidatus Pelagibacter ubique”. mSystems 1, e00068–16 (2016).
Tripp, H. J. The unique metabolism of SAR11 aquatic bacteria. J. Microbiol. 51, 147–153 (2013).
Suttle, C. A., Chan, A. M. & Fuhrman, J. A. Dissolved free amino acids in the Sargasso Sea: uptake and respiration rates, turnover times, and concentrations. Mar. Ecol. Prog. Ser. 70, 189–199 (1991).
Clifford, E. L. et al. Crustacean zooplankton release copious amounts of dissolved organic matter as taurine in the ocean. Limnol. Oceanogr. 62, 2745–2758 (2017).
Kiene, R. P. & Williams, L. P. H. Glycine betaine uptake, retention, and degradation by microorganisms in seawater. Limnol. Oceanogr. 43, 1592–1603 (1998).
Norris, N., Levine, N. M., Fernandez, V. I. & Stocker, R. Mechanistic model of nutrient uptake explains dichotomy between marine oligotrophic and copiotrophic bacteria. PLoS Comput. Biol. 17, e1009023 (2021).
Kamennaya, N. A., Geraki, K., Scanlan, D. J. & Zubkov, M. V. Accumulation of ambient phosphate into the periplasm of marine bacteria is proton motive force dependent. Nat. Commun. 11, 2642 (2020).
Zubkov, M. V., Martin, A. P., Hartmann, M., Grob, C. & Scanlan, D. J. Dominant oceanic bacteria secure phosphate using a large extracellular buffer. Nat. Commun. 6, 7878 (2015).
Elias, M. et al. The molecular basis of phosphate discrimination in arsenate-rich environments. Nature 491, 134–137 (2012).
Louca, S., Parfrey, L. W. & Doebeli, M. Decoupling function and taxonomy in the global ocean microbiome. Science 353, 1272–1277 (2016).
Dinsdale, E. A. et al. Functional metagenomic profiling of nine biomes. Nature 452, 629–632 (2008).
Azam, F. & Worden, A. Z. Microbes, molecules, and marine ecosystems. Science 303, 1622–1624 (2004).
Worden, A. Z. et al. Rethinking the marine carbon cycle: factoring in the multifarious lifestyles of microbes. Science 347, 1257594 (2015).
Schwalbach, M. S., Tripp, H. J., Steindler, L., Smith, D. P. & Giovannoni, S. J. The presence of the glycolysis operon in SAR11 genomes is positively correlated with ocean productivity. Environ. Microbiol. 12, 490–500 (2010).
Lidbury, I., Murrell, J. C. & Chen, Y. Trimethylamine N-oxide metabolism by abundant marine heterotrophic bacteria. Proc. Natl Acad. Sci. USA 111, 2710–2715 (2014).
Peter, M. F. et al. Structural and mechanistic analysis of a tripartite ATP-independent periplasmic TRAP transporter. Nat. Commun. 13, 4471 (2022).
Elbourne, L. D. H., Tetu, S. G., Hassan, K. A. & Paulsen, I. T. TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life. Nucleic Acids Res. 45, D320–D324 (2017).
Armenteros, J. J. A. et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37, 420–423 (2019).
Clifton, B. E. & Jackson, C. J. Ancestral protein reconstruction yields insights into adaptive evolution of binding specificity in solute-binding proteins. Cell Chem. Biol. 23, 236–245 (2016).
McKellar, J. L. O., Minnell, J. J. & Gerth, M. L. A high-throughput screen for ligand binding reveals the specificities of three amino acid chemoreceptors from Pseudomonas syringae pv. actinidiae. Mol. Microbiol. 96, 694–707 (2015).
Ehrhardt, M. K. G., Warring, S. L. & Gerth, M. L. Screening chemoreceptor-ligand interactions by high-throughput thermal-shift assays. Methods Mol. Biol. 1729, 281–290 (2018).
Fernández, M. et al. High-throughput screening to identify chemoreceptor ligands. Methods Mol. Biol. 1729, 291–301 (2018).
Chen, X. et al. Metabolism of chiral sulfonate compound 2,3-dihydroxypropane-1-sulfonate (DHPS) by Roseobacter bacteria in marine environment. Environ. Int. 157, 106829 (2021).
Pilson, M. E. Q. An Introduction to the Chemistry of the Sea (Cambridge Univ. Press, 2013).
Duvvuri, H., Wheeler, L. C. & Harms, M. J. pytc: open-source Python software for global analyses of isothermal titration calorimetry data. Biochemistry 57, 2578–2583 (2018).
Velazquez-Campoy, A. & Freire, E. Isothermal titration calorimetry to determine association constants for high-affinity ligands. Nat. Protoc. 1, 186–191 (2006).
Badarau, A. et al. FutA2 is a ferric binding protein from Synechocystis PCC 6803. J. Biol. Chem. 283, 12520–12527 (2008).
Koropatkin, N., Randich, A. M., Bhattacharyya-Pakrasi, M., Pakrasi, H. B. & Smith, T. J. The structure of the iron-binding protein, FutA1, from Synechocystis 6803. J. Biol. Chem. 282, 27468–27477 (2007).
Chen, C. Y., Berish, S. A., Morse, S. A. & Mietzner, T. A. The ferric iron-binding protein of pathogenic Neisseria spp. functions as a periplasmic transport protein in iron acquisition from human transferrin. Mol. Microbiol. 10, 311–318 (1993).
Hirata, K. et al. ZOO: an automatic data-collection system for high-throughput structure analysis in protein microcrystallography. Acta Crystallogr. D 75, 138–150 (2019).
Kabsch, W. XDS. Acta Crystallogr. D 66, 125–132 (2010).
Yamashita, K., Hirata, K. & Yamamoto, M. KAMO: towards automated data processing for microcrystals. Acta Crystallogr. D 74, 441–449 (2018).
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
Vagin, A. & Teplyakov, A. molrep: an automated program for molecular replacement. J. Appl. Crystallogr. 30, 1022–1025 (1997).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D 53, 240–255 (1997).
Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D 75, 861–877 (2019).
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126–2132 (2004).
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Res. 51, D418–D427 (2023).
UniProt Consortium. Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. ggtree: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2016).
Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization in a Web browser. BMC Bioinf. 12, 385 (2011).
Niehaus, T. D., Elbadawi-Sidhu, M., de Crécy-Lagard, V., Fiehn, O. & Hanson, A. D. Discovery of a widespread prokaryotic 5-oxoprolinase that was hiding in plain sight. J. Biol. Chem. 292, 16360–16367 (2017).
Knorr, S. et al. Widespread bacterial lysine degradation proceeding via glutarate and l-2-hydroxyglutarate. Nat. Commun. 9, 5071 (2018).
Acknowledgements
B.E.C. was supported by a JSPS Postdoctoral Fellowship for Overseas Researchers from the Japan Society for the Promotion of Science. P.L. gratefully acknowledges funding from the Okinawa Institute of Science and Technology. U.A. gratefully acknowledges funding from Alon Scholarships (The Council for Higher Education, Israel). The synchrotron radiation experiments were performed at BL32XU of SPring-8 with the approval of the Japan Synchrotron Radiation Research Institute (JASRI) (proposal no. 2023A2731). We thank A. Vardi for critical reading of the initial manuscript; D. Kozome for assistance with X-ray data collection; P. Jain for assistance with synthesis and NMR; Y. Iinuma and O. Smith for assistance with GC–MS; and the Instrumental Analysis Section and Sequencing Section at OIST for providing instrument access and training.
Author information
Authors and Affiliations
Contributions
B.E.C. and P.L. conceived the study. B.E.C. and G.-I.U. performed cloning, protein expression and purification. B.E.C. performed the remaining experimental and computational work; B.E.C., U.A., C.J.J. and P.L. analysed the data and wrote the paper. P.L. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Gavin Thomas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Distribution of Ca. P. ubique HTCC1062 SBPs among SAR11 bacteria.
(a) SBP sequences belonging to the SBP PFAM clans CL0177 and CL0144 from candidate order ‘Pelagibacterales’ were obtained from the InterPro database and clustered at 35% sequence identity. The graph shows the number of sequences in each SBP cluster containing ≥8 sequences. Clusters that are represented in Ca. P. ubique HTCC1062 are shown in green. (b) Distribution of homologs of SBPs from Ca. P. ubique HTCC1062 in SAR11 strains with complete genome sequences. Homologs sharing <50% sequence identity with the corresponding protein in Ca. P. ubique HTCC1062 are not shown. Ecotype classifications are taken from ref. 34. and descriptions are taken verbatim from ref. 2.
Extended Data Fig. 2 The SBPs SAR11_1179 and SAR11_1238 are phosphate- and iron(III)-binding proteins.
(a) Thermal denaturation of SAR11_1179 in the absence (black) or presence (red) of 1 mM Na2HPO4 measured by DSF (ΔTM = 10.9 ± 0.4 °C, mean ± s.d., n = 3 technical replicates). (b-c) Representative ITC data for titration of 23 µM SAR11_1179 with 185 µM Na2HPO4 in the absence (b) or presence (c) of 28 mM Na2SO4. Fitting the data to the one-set-of-sites model gave a Kd of 133 ± 28 nM in the absence of sulfate and 892 ± 122 nM in the presence of 28 mM sulfate (mean ± s.d., n = 3 or 4 replicate titrations; significantly different by two-tailed t-test on log10 Kd values; P = 3.03 × 10−5, t = 14.28, df = 5, difference between means = 0.831, 95% confidence interval = 0.681 to 0.981). (d) UV-visible spectrum of SAR11_1238 purified from E. coli, without addition of ligand, showing presence of endogenously bound iron(III). (e-f) UV-visible spectra of unliganded SAR11_1238 titrated with iron(III) delivered as (e) iron(III) citrate or (f) ammonium iron(II) sulfate (see Methods section for further explanation). (g-h) Titration of SAR11_1238 with iron(III) delivered as (g) iron(III) citrate or (h) ammonium iron(II) sulfate, monitored by absorbance at 440 nm. Discrete data points from four (g) or two (h) technical replicates (independent titrations) are shown. The line represents a fit to the linear portion of the titration. (i) Competitive titration of iron(III)-bound SAR11_1238 with citrate, monitored by absorbance at 440 nm. Results from two technical replicates (independent titrations) are shown as discrete data points.
Extended Data Fig. 3 Identification of co-purified SBP ligands by GC-MS and X-ray crystallography.
For each protein, the left-hand graph shows an extracted ion chromatogram of the protein sample (i.e., the co-purified ligand extracted from the protein and subjected to trimethylsilyl derivatization) (top) and the corresponding chemical standard (bottom, reflected). The right-hand graph shows the mass spectrum of the major chromatographic peak for the protein sample (top) and the corresponding standard (bottom, reflected). i.s., internal standard (α-methylglucoside). For SAR11_1361, due to the lack of a distinctive ion in the mass spectrum of bis(trimethylsilyl) succinate, the presence of succinate is illustrated by comparison with a blank run (reagent only) shown in orange. The three additional peaks in the protein sample correspond to glycerol (from protein purification), succinate, and α-methylglucoside (internal standard), in that order. For SAR11_0655 (a) and SAR11_0769 (b), further evidence for the identity of the co-purified ligand was attained from the crystal structure of the protein obtained without addition of exogenous ligand. Electron density for the co-purified ligand is shown by mFo - dFc omit maps contoured at various levels (in the case of SAR11_0655, to distinguish between carbon atoms and oxygen/nitrogen atoms).
Extended Data Fig. 4 Non-canonical binding mode of l-arginine to SAR11_1210.
(a-c) Crystal structure of SAR11_1210 complexed with l-arginine (1.32 Å). The large domain, small domain, and hinge regions are shown in green, orange, and purple, respectively. (a) Overall structure. (b) Electron density for the l-arginine molecule, shown by an mFo - dFc omit map contoured at +3σ. (c-d) Comparison of binding modes of l-arginine to (c) SAR11_1210 and (d) a homologous lysine-/arginine-/ornithine-binding protein from Geobacillus stearothermophilus (Kd 39 nM, PDB ID 2Q2A), which shows the amino acid binding motif typical of this SBP family. Residues are numbered according to the homologous positions in SAR11_1210. (e-g) Effect of E108A substitution on binding affinity of SAR11_1210 for l-arginine. (e, f) Competitive ITC titration of an equimolar mixture of SAR11_1210 and SAR11_1210(E108A) with l-arginine. (g) Thermodynamic parameters for interaction of l-arginine with the WT and E108A variants of SAR11_1210. Bars represent mean of technical replicates (separate titrations), shown as individual data points (WT, n = 5; E108A, n = 3).
Extended Data Fig. 5 Binding mode of β-d-glucose to SAR11_0769.
(a-b) Representative ITC data for titration of SAR11_0769 with d-glucose, fitted to (a) the two-sets-of-sites binding model, or (b) a competitive binding model accounting for the two anomeric forms of d-glucose. (c-e) Crystal structure of SAR11_0769 complexed with β-d-glucose (1.86 Å). (c) Overall structure. (d) Electron density for the β-d-glucose molecule and neighboring water molecules, shown by an mFo - dFc omit map contoured at +3σ. The density for the anomeric hydroxyl group is clearly resolved. (e) Binding mode of β-d-glucose.
Extended Data Fig. 6 Abundance of SBP genes from SAR11 (Ca. P. ubique HTCC1062) in the global ocean metagenome.
(a) The abundance of each gene in surface samples from the Tara Oceans OM-RGCv2+G metagenomic dataset is shown. Data were obtained from the Ocean Gene Atlas v2.0 using an e-value cut-off of 10−30 and filtered using a sequence identity threshold of 40% for ABC SBPs and 55% for TRAP and TTT SBPs. Abundance at each location is expressed as the fraction of mapped reads and represented by point area on a linear scale. (b) Box-and-whisker plots comparing abundance of each SBP in epipelagic/surface (SRF, n = 83, solid color), deep chlorophyll maximum (DCM, n = 53, stripes), and mesopelagic (MES, n = 38, checkerboard) samples from the OM-RGCv2+G metagenome dataset (center line, median; box limits, 25th and 75th percentiles; whiskers, maximum and minimum). Statistical comparisons are given in Supplementary Table 12.
Extended Data Fig. 7 Abundance of SBP genes from SAR11 (Ca. P. ubique HTCC1062) in the global ocean metatranscriptome.
(a) The abundance of each gene in surface samples from the Tara Oceans OM-RGCv2+T metatranscriptomic dataset is shown. Data were obtained from the Ocean Gene Atlas v2.0 using an e-value cut-off of 10−30 and filtered using a sequence identity threshold of 40% for ABC SBPs and 55% for TRAP and TTT SBPs. Abundance at each location is expressed as the fraction of mapped reads and represented by point area on a linear scale. (b) Box-and-whisker plots comparing abundance of each SBP in epipelagic/surface (SRF, n = 103, solid color), deep chlorophyll maximum (DCM, n = 49, stripes), and mesopelagic (MES, n = 26, checkerboard) samples from the OM-RGCv2+T metatranscriptomic dataset (center line, median; box limits, 25th and 75th percentiles; whiskers, maximum and minimum). Statistical comparisons are given in Supplementary Table 12.
Extended Data Fig. 8 Phylogenetic analysis of selected SBPs from Ca. P. ubique HTCC1062.
(a) Maximum-likelihood phylogenies of 1000 homologs of SAR11_0655, SAR11_0953, and SAR11_1336 from the UniProtKB Reference Proteomes and Swiss-Prot databases. The positions of the SAR11 SBPs are indicated by an arrow. Nodes are colored by taxonomy and the fraction of binding site residues conserved relative to the corresponding protein in Ca. P. ubique HTCC1062. (b) Taxonomic distribution of sequences displayed in (a). In the case of SAR11_0655, only sequences belonging to the clade indicated in (a) were considered.
Extended Data Fig. 9 Genome context of SAR11_0655 and SAR11_1361 suggests additional metabolic capabilities of Ca. P. ubique HTCC1062.
ABC transporter genes (including SBP genes) are shown in orange, while genes putatively involved in metabolism of the transported substrates are shown in red. The genomic regions shown are bounded by non-coding regions of ≥75 bp. (a) Genome context of SAR11_0655 (9,248 bp). SAR11_0662–SAR11_0664 are homologous to the pxpABC genes from E. coli (sequence identity 30.0% overall). pxpABC encodes 5-oxoprolinase, which catalyzes ATP-dependent hydrolysis of l-pyroglutamate to l-glutamate90. (b) Genome context of SAR11_1346 and SAR11_1361 (14,635 bp). SAR11_1354 shows high sequence identity (41.5%) to csiD from E. coli encoding glutarate 2-hydroxylase, which converts glutarate to l-2-hydroxyglutarate and is involved in catabolism of l-lysine to α-ketoglutarate91. Although the operon in Ca. P. ubique is lacking the remainder of the the l-lysine catabolic pathway, it does contain a FAD-dependent oxidoreductase of unknown function (SAR11_1353), which may convert l-2-hydroxyglutarate to α-ketoglutarate by analogy with the l-lysine catabolic pathway. Abbreviations: αKG, α-ketoglutarate. (c) Experimental confirmation of in vitro glutarate 2-hydroxylase activity of SAR11_1354 by 1H-NMR (500 MHz, D2O). Complete conversion of α-ketoglutarate and glutarate to succinate and 2-hydroxyglutarate was observed. Reaction conditions: 1 mM glutarate, 1 mM α-ketoglutarate, 100 µM Fe(NH4)2(SO4)2, 5 µM SAR11_1354 in 20 mM ammonium acetate buffer, 16 h incubation at 24 °C. Right, reaction mixture; left, no enzyme control. The asterisked peak corresponds to acetate. Functional validation of SAR11_0662-0664 and SAR11_1353 was also attempted, but SAR11_0662 and SAR11_1353 could not be expressed in soluble form in E. coli.
Extended Data Fig. 10 Contrasting phylogenetic distributions of SAR11_0769 and SAR11_1361.
(a) Maximum-likelihood phylogenies of 500 homologs of SAR11_0769 and SAR11_1361 from the UniProtKB Reference Proteomes and Swiss-Prot databases. The positions of SAR11_0769 and SAR11_1361 are indicated by an arrow. Nodes are colored by taxonomy and the fraction of binding site residues conserved relative to the corresponding protein in Ca. P. ubique HTCC1062. SAR11_0769 is widely distributed among bacteria, while SAR11_1361 appears to be limited mainly to SAR11 bacteria and a small range of other marine Alphaproteobacteria. (b) Expanded view of a clade of the SAR11_1361 phylogeny (indicated in a) showing protein sequences with a similar binding site to SAR11_1361 (suggesting a similar function).
Supplementary information
Supplementary Information
Supplementary Methods; Supplementary Notes 1–10; Supplementary Figs. 1–12; Supplementary Tables 1, 2, 4–8 and 10–13 and references.
Supplementary Data 1
Data from individual ITC experiments.
Supplementary Data 2
Binding affinities of previously reported SBPs.
Supplementary Table 3
Full list of ligands used for high-throughput screening of SBP function.
Supplementary Table 9
Sequences of oligonucleotides and synthetic genes used in this study.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Clifton, B.E., Alcolombri, U., Uechi, GI. et al. The ultra-high affinity transport proteins of ubiquitous marine bacteria. Nature (2024). https://doi.org/10.1038/s41586-024-07924-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41586-024-07924-w