Molecular simulations unravel the molecular principles that mediate selective permeability of carboxysome shell protein

Bacterial microcompartments (BMCs) are nanoscale proteinaceous organelles that encapsulate enzymes from the cytoplasm using an icosahedral protein shell that resembles viral capsids. Of particular interest are the carboxysomes (CBs), which sequester the CO2-fixing enzymes ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) to enhance carbon assimilation. The carboxysome shell serves as a semi-permeable barrier for passage of metabolites in and out of the carboxysome to enhance CO2 fixation. How the protein shell directs influx and efflux of molecules in an effective manner has remained elusive. Here we use molecular dynamics and umbrella sampling calculations to determine the free-energy profiles of the metabolic substrates, bicarbonate, CO2 and ribulose bisphosphate and the product 3-phosphoglycerate associated with their transition through the major carboxysome shell protein CcmK2. We elucidate the electrostatic charge-based permeability and key amino acid residues of CcmK2 functioning in mediating molecular transit through the central pore. Conformational changes of the loops forming the central pore may also be required for transit of specific metabolites. The importance of these in-silico findings is validated experimentally by site-directed mutagenesis of the key CcmK2 residue Serine 39. This study provides insight into the mechanism that mediates molecular transport through the shells of carboxysomes, applicable to other BMCs. It also offers a predictive approach to investigate and manipulate the shell permeability, with the intent of engineering BMC-based metabolic modules for new functions in synthetic biology.

) and ribulose-1,5-bisphosphate (RuBP) into the carboxysome. Then carbonic anhydrase (CA) in the carboxysome lumen dehydrates HCO 3 − to CO 2 and provides high levels of CO 2 close to Rubisco. Rubisco catalyzes the carboxylation of RuBP by adding CO 2 to generate 3-phosphoglycerate (3-PGA), which is transported across the shell and is metabolized via the Calvin-Benson-Bassham cycle. C, Structural representation of the major carboxysome shell building component, CcmK2 (PDB: 2A1B). The CcmK2 complex presents a sixfold symmetry, involving a central pore that mediates metabolite flow. CcmK2 has concave and convex faces; the C-terminal tails of CcmK2, highlighted in purple, are located at the concave side. was the first BMC protein that has been structurally determined 22 . The protein subunits of CcmK2 are arranged in hexameric units with a six-fold symmetry (Fig. 1C). The central pore is ~ 7 Å in diameter, surrounded by six copies of the backbone amide nitrogen atom of Serine 39 (Ser39). The predominant protein surface of CcmK2 is negatively charged, whereas the central pore of CcmK2 has a strong positive electrostatic potential, ascribed to the positively charged residues Arginine 11 (Arg11) and Lysine 36 (Lys36) that are 10 Å away from the narrowest point of the pore in each subunit and solvent exposed to the lumen and cytoplasm, respectively 22 (Supplementary Figures 1,2). The CcmK2 C-terminal tail contains a 5-residue helix on the concave side and has been deduced to play roles in BMC domain assembly and protein-protein interaction 22 (Fig. 2A, B). We performed MD simulations of the full-length CcmK2 from Syn6803 (CcmK2-full, PDB: 2A1B) 22 , the C-terminal truncated CcmK2 from Syn6803 (CcmK2-∆C, PDB: 3CIM) 27 , and the full-length CcmK2 from Synechococcus elongatus PCC 7942 (Syn7942) (PDB: 4OX7) in explicit water for 200 ns. The results revealed that the C-terminal tails of CcmK2 exhibited structural flexibility during simulations (Fig. 2C). These C-terminal residues have a lower resolution and display a higher degree of flexibility than the core residues in the crystal structures, resulting in the disorder of crystal forms 27,28 . A similar fashion of protein contacts has been illustrated in virus capsid assembly: interactions through flexible termini act as conformational switches that mediate virus capsid formation 29,30 .
We conducted root mean square fluctuation (RMSF) analysis to measure the per residue fluctuation around their average positions. High values of RMSFs are seen at the C-termini (Fig. 2C), as a consequence of the high C-terminal flexibility during the course of simulations. Other regions of increased RMSF peaks correspond to regions of high flexibility such as loops and turns in CcmK2. By contrast, no significant difference in the RMSF profiles of the core regions of CcmK2 was recorded, indicative of the stable conformation of the core. To diminish the system complexity, we chose the experimentally obtained CcmK2-∆C structure (PDB: 3CIM) 27 , with a high resolution of 1.3 Å, in the following simulations to explore the molecular transport through the central  www.nature.com/scientificreports/ pore, instead of creating in silico a C-terminal deletion structure based on the full-length CcmK2. Since the MD simulation lengths cannot sample the position of the flexible termini exhaustively to find a stable conformation relative to the whole protein, using the CcmK2-∆C structure eliminates this uncertainty.
Molecular transport of the pore is mediated by electrostatic charge. The pore of CcmK2 has been proposed to be the route by which metabolites pass through the shell in either direction 22 . We conducted US simulations on CcmK2-∆C (PDB: 3CIM) in the solvent containing HCO 3 − , 3-PGA, O 2 and CO 2 , respectively. We determined the free-energy (ΔG) profiles associated with the transition of each metabolite initiating simulations in both directions, along the axis (z) relative to the center of mass of the CcmK2 hexamer (x, y, z) = (0, 0, 0) (Fig. 3A). To assess the US convergence, we divided each window into 5 equal-length trajectories to calculate the error bars, which did not display significant systematic changes (Supplementary Figure 4). We also analyzed the autocorrelation functions at 3 different US windows (Supplementary Figure 5) and obtained around 2 ns for the autocorrelation times, shorter than the US simulation lengths.
As shown in Fig. 3B, the local free energy differences are calculated relative to bulk solvent (at z = − 25 Å, + 25 Å). It is evident that the four metabolites experience three shared local free-energy minima/ maxima, representing their specific interactions with the key amino acid residues (Fig. 3). O 2 and CO 2 have the same local free-energy minima localized around the side chains of Arg11 (z = ~ − 11 Å) and Lys36 (z = ~ 5 Å) as well as a free energy barrier, ΔG ‡ of 3.8 kcal mol −1 and 3.5 kcal mol −1 respectively, in proximity to Ser39 (z = ~ − 6 Å) (Fig. 3B). This indicates that both the CO 2 and O 2 have to pass a small free energy barrier to transition through the pore from one side of Ser39 to the other.
The free-energy minima of HCO 3 − are around the side chains of residues near the center of the pore, which is formed by the K-I-G-S motif 31 ; Ser39 where z is ~ − 5 Å, Arg11 where z is ~ − 12 Å, and Lys36 where z is ~ 6 Å. The lowest of the energy wells (k T ) occurs at the Ser39 bottle neck, with a ΔG of ~ − 0.6 kcal mol −1 . It is the same for 3-PGA except that there is no minimum at Arg11. These free-energy minima correlate with the binding pockets of CcmK2 for metabolites in 3D space ( Fig. 3C-F), indicating the importance of residues Ser39, Lys36 and Arg11 in mediating metabolite flux through the pore. Consistently, recent studies on 1,2-propanediol utilization (1,2-PDU) microcompartments have also verified the role of the equivalent Ser40 of the shell hexamer PduA in channeling metabolite traffic 32,33 . Given that the Ser39, Lys36 and Arg11 residues are conserved among Additionally, we calculated the free energy of K + and Clusing metabolite-free US simulations. The cation K + displays a maximum between z = ~ − 16 to 5, with the highest ΔG of ~ 7.6 kcal mol −1 occurring around Ser39 z = ~ − 4. This suggests that it is unfavorable for cations to transition through the CcmK2 pore. In contrast, the small anion Cl − has the smoothest energy profile and is most similar to that of HCO 3 − . Overall, it experiences much smaller barrier heights for a favorable transition (Fig. 3B). The most favorable free energy occurs when 3-PGA transits around Ser39 with a ΔG of ~ -7.5 kcal mol −1 (Fig. 3B). This could be rationalized by the tendency for hydrogen bonds to be formed between the carboxyl group of 3-PGA and the hydroxyl group of Ser39. The local minimum of 3-PGA on the convex side is higher than around Lys36 on the concave face of CcmK2. In contrast, the local minimum of HCO 3 − around Arg11 (z = − 12 Å) on the convex face of CcmK2 is approximately equal to the minimum around Lys36 (z = 6 Å) on the concave side. There is also a shared local maximum centered at z = 0-2 Å between the minima around Lys36 and the minima at Ser39.
These results suggest that energetically favorable interactions around and in the center of the pore increase the likelihood of transit and could drive transit of HCO 3 − and 3-PGA in both directions. It is possible that 3-PGA preferentially transits from the convex to the concave face. The energy profile suggests more favorable interactions occur on the concave side of Ser39 (Fig. 3B). While HCO 3 − US does not indicate a preference, it is likely in the cellular environment HCO 3 − would preferentially transit from the concave to the convex face with a concentration gradient. This is in agreement with the shell protein orientation characterized in the intact BMC shells 34 and the PduA nanotubes 35 , but distinct from the proposed orientation in recent simulations of the α-carboxysome shell hexamer CsoS1A 36 . The pore, and the binding pockets at the key residues on either side, are accessible to water. Solvent effects, such as the likes of salinity, merit further investigation ( We extended the thermodynamic analysis above to the study of the dynamic characteristics associated with the transition of O 2 , CO 2 , and HCO 3 − molecules through the pore. We analyzed the free energy profiles obtained in Fig. 3B, within the reaction rate theory framework. Considering a constant prefactor (pre-exponential coefficient) in the Arrhenius/Kramers equation and a barrierless transition process for HCO 3 − , we measured the ratio of the rates associated with the transition of O 2 (respectively CO 2 ) and HCO 3 − compounds through the pore. This yields k O2 /k HCO3− = ~ 3.5 × 10 −4 and k CO2 /k HCO3− = ~ 6.6 × 10 −4 , which shows significant differences (3 orders of magnitude) in terms of the characteristic times for the transition through the pore.
Conformational change of the central pore during metabolite transit. To examine whether potential conformational changes of the CcmK2 pore are associated with transit of metabolites, we measured the pore diameter in the presence and absence of contacts between individual metabolites and the residues Arg11, Ser39 and Lys36 during the time course of simulations ( Fig. 5), using HOLE 37 to analyze 2500 equidistant frames of a 50 ns MD trajectory independently (see Materials and methods). The pore size of CcmK2-∆C in the crystal structure (PDB: 3CIM) was 2.55 Å determined using HOLE, and it varied between 1 and 6 Å during the course of simulations (Fig. 5). There were no significant differences found in the pore diameter in the absence and presence of contacts between CO 2 or O 2 and the three residues. It is worth noting that the sample size of contacts between these metabolites and the select residues is small, likely because the substrates with relatively small size have more space to move freely in the pore area without making contacts. On the contrary, a significant difference in the pore size was discerned for HCO 3 − between no contacts and contacts with Lys36 ( Fig. 5, Supplementary Table 2). This reveals that transit of HCO 3 − may first require binding to Lys36 on the concave side of the protein to open up the pore. Significant changes in the pore diameter were also detected in the presence and absence of contacts between 3-PGA and all three of the selected residues of interest, indicating that the passage of 3-PGA through the pore may require a conformational change to increase the pore diameter (Fig. 5). Interactions formed by key charged residues. Lys36 (left) and Arg11 (right) shown as van de Waals spheres at the concave and convex sides of the pore respectively. Interacting conserved Glu71 (turquoise) and Glu35 (green) residues, as well as nearby residues within 4 Å (Tyr34 and Arg41, Leu73) and water molecules are shown as sticks highlighting solvent accessibility.  3B) and may induce a conformational change of the CcmK2 pore during transit (Fig. 5). Likewise, RuBP also has a large size (~ 3-4 Å wide, ~ 9.4 Å long) with respect to the pore diameter (Fig. 6A). We calculated the acidic dissociation constants based on the Hammett and Taft equations using the Epik method (black) and empirically calculated physico-chemical parameters obtained from ionization site-specific regression equations (red) annotated on Fig. 6A. The pKa calculations exhibited consistency regardless of the pKa calculation methods, suggesting the most likely protonation state of RuBP and 3-PGA molecules at ~ pH7.
To estimate their interactions with the CcmK2 pore, we conducted molecular docking simulations of 3-PGA and RuBP with CcmK2-∆C structure using SwissDock 38 and Glide extra precision (XP) docking 39 . Snapshots of the most stable binding poses for RuBP (Fig. 6B) and 3-PGA (Fig. 6C) from both the concave and convex faces illustrate the negative relative free energy changes experienced by RuBP and 3-PGA when in the final docking pose in the CcmK2 pore: ΔG is -9.96 kcal•mol −1 for RuBP (Fig. 6D) and − 9.98 kcal•mol −1 for 3-PGA (Fig. 6E). The FullFitness score for RuBP (− 3383 kcal mol −1 ) in this pose is similar to 3-PGA (− 3275 kcal mol −1 ) (Supplementary Figure 6). These results verify a highly stable binding pocket for 3-PGA and RuBP in the narrowest point of the CcmK2 pore when the phosphate groups of these metabolites interact with the Ser39 hydroxyl groups. All three of the phosphate oxygen groups of both RuBP and 3-PGA can be stabilized by hydrogen bonding to Ser39. This binding pocket also contributes to a local free-energy minimum for 3-PGA, as observed in US calculations (z = ~ − 5 Å) (Fig. 3). This emphasizes a good correlation between our molecular docking and US simulations results, despite the known lower reliability of docking simulations compared to full MD simulations. These binding poses of 3-PGA are directly compared between US and docking in Fig. 7. Interestingly, despite the size Figure 5. Pore diameter of CcmK2-∆C in the presence and absence of contacts with metabolites during simulations. Violin plots of the pore diameters of CcmK2 determined by HOLE from the full MD simulations trajectories for each metabolite, depicting the conformational dynamics of the pore during metabolite transition. The two categories "contact" and "no contact" refer to the presence or absence, respectively, of the metabolite within the cutoff distance of 3.5 Å and angle of 30° of the reside. p values were calculated using the Wilcoxon rank-sum test (Supplementary Table 2). These diameter measurements were made at the region corresponding to the minimum constriction along the pore axis z. www.nature.com/scientificreports/ difference between 3-PGA and RuBP, the free energy detected when docking RuBP and 3-PGA to CcmK2 is relatively similar. This might suggest that there are relatively strong interactions between the binding pocket and RuBP or 3-PGA. This binding pocket is alike for both molecules, but accessibility is equally challenging due to the small diameter of the pore and large sizes of the metabolites. As the binding poses observed are solvent-accessible and make use of the charged phosphate groups of the metabolites bound in the polar center of the pore, we do not expect significant changes in their protonation states upon binding. Nevertheless, a significant change in pH that affects the protonation states of these metabolites might result in the changes in binding energies and poses.
Effects of pore mutation on in vivo physiology. Our simulation results suggest that the Ser39 residue in CcmK2 is a key residue in the charged-based mechanism that mediates the flux of metabolites into and out of the carboxysome. To corroborate the in-silico findings, we investigated the significance of Ser39 on www.nature.com/scientificreports/ carboxysome activities and cell physiology by constructing a CcmK2-S39A point mutation. The carboxysome of Syn6803 in addition to CcmK2 contains another major shell protein CcmK1 21 , whereas in Syn7942, CcmK2 proteins act as the only major shell components in the carboxysome shell, as shown by a previous proteomic study 40 . The CcmK2 proteins of Syn7942 and Syn6803 present a high sequence similarity and the Ser39 residues are highly conserved (Supplementary Figure 3). To prevent the complementary effects of CcmK1, whose role is still unclear, we generated the CcmK2-S39A mutant in Syn7942 and examined cell growth and carbon fixation activities of the Syn7942 mutants including knockouts of the other shell proteins.
We compared the growth of CcmK2-S39A in three conditions: air, NaHCO 3 supplemented, and 4% CO 2 . The CcmK2-S39A strain could grow at 4% CO 2 , with a similar growth rate with WT (Fig. 8A). By contrast, the growth of CcmK2-S39A was greatly impeded under the air and HCO 3 − supplemented conditions, compared with those of WT under the same conditions (Fig. 8B). This raises the possibility that the S39A mutation may preclude transport of bicarbonate into carboxysomes, or prevents other functions of the pore, such as luminal CO 2 retention. We also determined that the distinct carbon-fixation activities of the WT and CcmK2-S39A cells grown under 4% CO 2 (Fig. 8C). CcmK2-S39A has a lower apparent V max (10.0 ± 1.5 µmol•min −1 •OD −1 , n = 4) and a higher K m(RuBP) (1.7 ± 0.6 mM, n = 4) compared with the WT (V max = 28.3 ± 2.0 µmol•min −1 •OD −1 , K m(RuBP) = 0.7 ± 0.4 mM, n = 4), revealing the affected Rubisco activity caused by the pore mutation. The increased RuBP binding affinity K m(RuBP) might act as a physiological adaption to compensate for a decreased propensity of RuBP transit into the carboxysome, especially at a low concentration of RuBP.
We further generated the CcmK2-S39A mutation in ΔccmK3, ΔccmK4, ΔccmK3K4 and ΔccmP Syn7942 strains, respectively. It is evident that CcmK2-S39A had a notable influence on the growth of all these strains in the air and HCO 3 − supplemented conditions and resulted in a detectable reduction in carbon-fixation activity (Supplementary Figure 7). The effects of point mutations of CcmK2 Arg11 and Lys36 residues on shell permeability merit further investigation.

Discussion
Selective permeability of the carboxysome shell allows the generation of a local microenvironment with an optimized luminal concentration of metabolites, crucial for carbon assimilation of this specialized organelle. In particular, the carboxysome shell was proposed to facilitate the diffusion of HCO 3 − and probably provides a barrier to O 2 and reduces CO 2 leakage into the cytosol 41 , and thus, plays roles in minimizing the unproductive oxygenation and improving the carboxylation of Rubisco encased by the carboxysome shell. The shell is also permeable to protons, leading to similar pH conditions in the carboxysome compartment lumen and the cytoplasm 42 . Here, we conducted a systematic characterization of the principles underlying the permeability of the major carboxysome shell protein CcmK2 12 , using an array of computational approaches. Our study suggests that the mechanism of metabolite transit through the CcmK2 pore represents a combination of electrostatic charge-based transition and the conformational change of the pore. Our simulations results, combined with the recent findings which have demonstrated that the concave surfaces of shell proteins in the uniformly-oriented shell 25 face out of the BMC 34,35 , allow us to propose a model for the metabolite flux across the carboxysome shell (Fig. 9). This may be dependent on the protonation states of both the metabolites and amino acid side chain residues in the pore. In different protonation states under distinct pH conditions, transition of metabolites through the pore may vary. It is likely that carboxysomes in the cytoplasm exist exclusively at close to neutral pH due to homeostasis, regardless of the external pH 43 . At pH = ~ 7, the metabolites characterized here may exist as an ensemble of protonation states; for example, bicarbonate ions will exist as both carbonic acid and hydrogen carbonate. These two protonation states may bind and dock differently inside the pore of CcmK2. This is an  www.nature.com/scientificreports/ important consideration for both the result interpretation in this work and for future studies, which may require constant pH simulations 44,45 that were outside the scope of this study. We show that the selective permeability of CcmK2 is dependent on the electrostatic charges of molecules. It is energetically favorable for the negatively charged HCO 3 − to move through the pore, but less favorable for CO 2 and O 2 to pass through the pore, akin to the experimental findings 41,46 . This is of physiological significance for improving carbon fixation. Bicarbonate represents the dominant fraction of carbon dioxide in the aqueous cyanobacteria-growing environment and the cytoplasm. It is pumped across the cytoplasmic membranes of cyanobacteria into the cytoplasm by HCO 3 − transporters, resulting in a great accumulation of bicarbonate within the cell to a level up to 1000-fold higher than exogenous bicarbonate 47 . The cytosolic HCO 3 − then passes across the shell and is converted to CO 2 by CA that is co-encapsulated with Rubisco within the carboxysome (Fig. 9), leading to elevated levels of CO 2 close to Rubisco to favor the carboxylation activities of Rubisco. Although freeenergy simulations could not define explicitly the preferable direction of HCO 3 − transit through the pore (Fig. 3), the HCO 3 − gradient across the shell (a high level of HCO 3 − outside the shell and a falloff of the HCO 3 − level within the compartment in the presence of CA) may play the decisive role in driving the flow of HCO 3 − into the carboxysome. In addition, both transits of CO 2 and O 2 through the pore were revealed to be energetically unfavorable (Fig. 3). This suggests that the shell predominately composed of CcmK2 may act as a barrier to CO 2 leakage from the carboxysome into the cytosol, as well as O 2 influx into the carboxysome, therefore facilitating the CO 2 -fixing activities and diminishing the oxygenation of Rubisco. From the present computational data alone, it is hard to conclude to what extent the transition of O 2 and CO 2 through CcmK2 is blocked. Further simulation data and validation of the force field with particular respect to hydration free energies and partition coefficients are required to investigate the extent of the exclusion of O 2 and the retention of CO 2 .
This mechanism of metabolite transit through the pore described in this study may be applicable to not only β-carboxysomes but also α-carboxysomes, based on the recent simulations study on the α-carboxysome shell protein CsoS1A and the β-carboxysome shell protein CcmK4 36 . Moreover, experimental studies have suggested that the α-carboxysome shell could impede the entrance of CO 2 into the carboxysome 41 and obviate its loss from the carboxysome lumen 46 .
Transit of the relatively large metabolites, such as 3-PGA and RuBP, through the pore of CcmK2 is potentially challenging. It remains to be tested whether it is possible for small molecules, or larger metabolites, or both, to move through the space between adjoining CcmK hexamers. Though the 3-PGA transit is energetically favorable (Fig. 3), it could induce a detectable conformational change of CcmK2 (Fig. 5). Given that RuBP is larger than 3-PGA, the RuBP transit through the CcmK2 pore is presumably more difficult. By contrast, the pseudohexamer CcmP, another shell protein in the β-carboxysome, bears a larger central pore (~ 13 Å) compared with CcmK2 48,49 . It may act as a conduit for large metabolites, as a 3-PGA molecule may fit the pore of CcmP 49 and a glycerol molecule has been observed in a binding pocket of the CcmP open structure 48 . The mechanism that controls RuBP and 3-PGA transit through the shell awaits further investigation. Moreover, cations, like Mg 2+ that are required in the active site of Rubisco, may not be able to pass through the central pore of CcmK2 as easily as these large metabolites. It remains to be tested whether cations could transit through the CcmK2 pore or the pores of other shell proteins i.e. CcmK3, CcmK4 and CcmL, or through the gaps surrounding shell proteins generated by specific protein arrangement 34 and dynamic self-assembly and interactions 25,50 .
Simulations revealed a 1-6 Å variation in the pore size of CcmK2 during metabolite passage (Fig. 5). This conformational change consists primarily of the loops forming the central pore region twisting, and the rotation of the Ser39 side chains into and away from the center of the pore. Greater variations in the pore size of shell proteins have been characterized in the open and closed forms of the pseudo-hexamers CcmP 48 , CsoS1D 51 , and EutL 52 . This suggests that pore conformational change could provide a means for selective metabolite transport (especially for large metabolite molecules) of shell proteins in the BMC counterparts. There was a distinct phenotype for each knockout mutant (Fig. 8, Supplementary Figure 7). This may also suggest that there may be distinct roles for each shell protein or at least each class of BMC-T, BMC-P and BMC-H proteins, including the transit of the larger metabolites.
The rise of synthetic biology has spurred bioinspired engineering of BMC-based organelles for biotechnological applications, such as novel metabolic factories and molecule scaffolds, in light of the self-assembly, encapsulation and modular nature of BMC structures. Manipulating shell permeability is a pivotal strategy for design and construction of new BMCs to direct molecular transport and favor specific metabolic reactions. This study indicates that the mechanism controlling the permeability of the shell is complex and may be different between the different types of BMC as well as between the alpha and beta carboxysomes. The conserved residues of shell proteins that are vital for tuning the pore permeability, especially Ser39 of CcmK2, could be important targets in bioengineering to fine-tune molecular passage through beta-carboxysome shell structures. This study provides a predictive methodology to gain a more complete biophysical understanding of the mechanistic basis underlying molecular passage through the carboxysome shell. A deeper knowledge of the permeability of BMC shell proteins will inform strategies for the design and construction of new BMC-based nanoreactors to improve cell metabolism.

Materials and methods
Cell culture and construction of genetic mutants. Syn7942 cultures were grown at 30 °C under constant white light illumination of 30 μE•m −2 s −1 , in BG11 medium 53 . Cultures were under one of three conditions, "Air"-atmospheric condition, "HCO 3 − "-BG11 supplied with NaHCO 3 in every 24 h to reach a final concentration of 150 mM, and "CO 2 "-4% CO 2 supplied in air. Cell growth was monitored at OD = 750 nm by a spectrophotometer (Jenway 6300). BG11 was supplemented with 10 mM TES buffer to increase its buffering www.nature.com/scientificreports/ capacity and minimize pH changes due to NaHCO 3 addition 54 . E. coli cultures were grown at 37 °C with constant shaking in Luria-Broth (LB). Point mutation of CcmK2 (CcmK2-S39A) from Syn7942 was generated using the In-Fusion® approach in DH5α E. coli. Plasmids containing ccmK2-S39A and a kanamycin resistance cassette were transformed into Syn7942 cells to replace the native ccmK2 gene in the genome, using Lambda RED Recombination 55 as described previously [56][57][58][59] . Syn7942 knockout mutants ΔccmK3, ΔccmK4, ΔccmK3K4 and ΔccmP were generated in a similar fashion by replacing the gene with a spectinomycin resistance cassette. Segregation of recombinant genes was verified by PCR and agarose gel electrophoresis. Mutants were selected for by the addition of 100 µg mL −1 kanamycin or spectinomycin where appropriate.
Rubisco activity assays. In vivo carbon fixation from NaH 14 CO 3 to 14 C 3-PGA was monitored as previously described 11,59 . In brief, Syn7942 cells were incubated at 30 °C with 25 mM NaH 14 CO 3 for 2 min before permeabilization with 0.03% (w/v) mixed alkyltrimethylammonium bromide (Sigma-Aldrich) and RuBP (Sigma-Aldrich) was then added to initiate the reaction. The reaction was terminated after 5 min and radioactivity measurements were carried out using a scintillation counter (Tri-Carb; Perkin-Elmer). Four biological repeats were prepared for each strain used in the kinetic analysis. Data processing was performed using SigmaPlot and figures were prepared using Origin.
Crystal structure preparation. The full-length CcmK2 structures (PDB: 2A1B, 4OX7) 22,28 and the CcmK2 C-terminal deletion structure (PDB: 3CIM) 27 were obtained from the Protein Data Bank. Atoms in the unassigned regions of electron density maps of PDB structures were assigned manually and potential rotamers were fixed to a single conformation manually. Coot 60,61 was used to view the electron density map and replace missing atoms and rotamers with our inferred most likely conformation.

NAMD MD and US simulations. Umbrella sampling (US) molecular dynamics (MD) simulations were
performed with the NAMD 2.12 program 62 using the CHARMM36 force field (FF) 63 , to explore the passage of metabolites across the narrow central pore. Each system was built up using the CHARMM-GUI webserver 64 and the ligands (HCO 3 − , CO 2 , O 2 and 3-PGA) were automatically parameterized with CGenFF using their equilibrium structures obtained with the Gaussian 09 program package 65 at the B3LYP/Def2-TZVP + GD3 level of theory. The protonation state of 3-PGA was chosen according to the microspecies distribution at pH = 7 predicted by the MarvinSketch software of ChemAxon. CcmK2-∆C cyclic hexamer and the ligands were solvated in a hexagonal prism TIP3 water box of height 70 Å, then chloride ions were added to 0.15 M, and potassium ions were added to neutralize the system. For each system, corresponding to the four different types of metabolites, 10 molecules were randomly placed in the simulation box.
The MD protocol consisted of the following steps: (a) energy minimization over 2500 steps; (b) equilibration over 2 ns at constant pressure and temperature (p = 1.01325 bar, T = 303.15 K) with a root-mean-square deviation (RMSD) constraint of 1.0 kcal•mol −1 •Å −2 applied on the protein backbone atoms; (c) 2 ns run in the NPT ensemble without constraints; (d) steered molecular dynamics (SMD) simulation-to obtain the starting positions for the subsequent umbrella sampling (US) simulations-with a pulling velocity of 10 Å ns −1 , and SMD force of 7 kcal•mol −1 •Å −2 applied on all heavy atoms of the small molecule; (e) 10 ns US production runs for each umbrella window in the NPT ensemble. Trajectories were run with a time step of 2 fs and the collective variables employed in US simulations were printed out in each step and used for the analysis. Constant temperature was set by a Langevin thermostat with a thermostat damping coefficient of 1 ps −1 with a collision frequency of 5 ps −165 . All of the bonds and angles involving hydrogen atoms were constrained by the SHAKE algorithm 65 . We used the particle mesh Ewald method 66 for the long-range electrostatics in combination with a 12 Å cutoff for the evaluation of the non-bonded interactions.
The collective variable used as reaction coordinate (z) to describe the passage of ligands is defined as the projection of the molecule's center of mass to the z-axis passing through the center of the pore (the positive z values correspond to the concave side of the protein). For each small molecule, 105 umbrella windows were run at steps of 0.5 Å (− 26.0 ⩽ z ⩽ 26.0 Å) using a spring constant of 5.0 kcal•mol −1 Å −2 . Moreover, the motion of the small molecule was constrained to the interior of a cylinder centered to the z-axis with z min = − 28 Å and z max = 28 Å, whereas the maximum of the radial distance (b) from the z axis set to 15 Å. In case of Cl − and K + ions, 4 and 7 ions were selected in the cylinder, respectively by taking into account the V cylinder /V simulation box ratio and concentration of ions. Their motion was completely free inside the cylinder (i.e. no umbrella bias applied) defined above and their trajectories interpreted as independent simulations during analysis. The free energy profiles and surfaces of substrate passage through the BMC pore, 1D(z) and 2D(z,b), were obtained using the weighted histogram analysis method (WHAM) 67,68 , and the dynamic histogram analysis method (DHAM) 68 , respectively. The mean and standard error for the 1D(z) free energy profiles were obtained by dividing the simulations for each US window into 5 equal-length segments, and analyzed these independently.

Molecular dynamics and molecular docking simulations. Gromacs 69 was used to perform 200 ns
MDs of CcmK2-∆C and CcmK2-full (PDB: 2A1B and 4OX7). All atom simulations were carried out in the GROMACS v2018.1 package with the CHARMM36 force field 63 and the TIP3P model for the explicit solvent model. The leapfrog integrator was used to integrate the equations of motion every 2 fs. To restrict bond lengths the LINCS algorithm and PME with a real space cut-off at 14 Å were applied. RMSD was calculated by least square fitting of the protein to the crystal structure and then calculating the pairwise RMSD at each time step. RMSD was calculated for the entire hexamer assembly to take structural drift of the assembly in whole into con-Scientific Reports | (2020) 10:17501 | https://doi.org/10.1038/s41598-020-74536-5 www.nature.com/scientificreports/ sideration. Per residue RMSF were calculated for each chain and plotted as boxplots, to compare the flexibility of the three CcmK2 structures. Two sets of independent docking simulations were carried out in SwissDock 38 and Glide 39 , to characterize the binding poses of RuBP and 3-PGA with the 3CIM structure. Binding poses were scored using their FullFitness and clustered. Clusters were then ranked according to the average FullFitness of their elements 70 .
Determining the pore diameter during metabolite flux. CPPTRAJ 71 from Amber 72 was used to align the trajectory files to the reference structure of CcmK2 (PDB: 3CIM) using the RMSD of alpha carbon atoms. The trajectories were then split into individual PDB files for each time frame. HOLE 37 was used to measure the radius of the pore and given as an output of the diameter. This procedure was run for every timeframe that was split from the trajectory. These diameter measurements were made at the region corresponding to the minimum constriction along the pore axis z. VMD was used to find all the contacts, within a cutoff distance of 3.5 Å and angle of 30°, between each metabolite and the residues Arg11, Ser39 and Lys36. The number of contacts with each residue in each time frame was recorded. The pore diameter data was split into two categories, "no contact" when the metabolite did not fall within the cutoff of the specified residue, and "contact" when 1 or more contacts occurred. P values were calculated using the Wilcoxon rank-sum test from the SciPy python library, and violin plots were produced using Matplotlib.