Higher order scaffoldin assembly in Ruminococcus flavefaciens cellulosome is coordinated by a discrete cohesin-dockerin interaction

Cellulosomes are highly sophisticated molecular nanomachines that participate in the deconstruction of complex polysaccharides, notably cellulose and hemicellulose. Cellulosomal assembly is orchestrated by the interaction of enzyme-borne dockerin (Doc) modules to tandem cohesin (Coh) modules of a non-catalytic primary scaffoldin. In some cases, as exemplified by the cellulosome of the major cellulolytic ruminal bacterium Ruminococcus flavefaciens, primary scaffoldins bind to adaptor scaffoldins that further interact with the cell surface via anchoring scaffoldins, thereby increasing cellulosome complexity. Here we elucidate the structure of the unique Doc of R. flavefaciens FD-1 primary scaffoldin ScaA, bound to Coh 5 of the adaptor scaffoldin ScaB. The RfCohScaB5-DocScaA complex has an elliptical architecture similar to previously described complexes from a variety of ecological niches. ScaA Doc presents a single-binding mode, analogous to that described for the other two Coh-Doc specificities required for cellulosome assembly in R. flavefaciens. The exclusive reliance on a single-mode of Coh recognition contrasts with the majority of cellulosomes from other bacterial species described to date, where Docs contain two similar Coh-binding interfaces promoting a dual-binding mode. The discrete Coh-Doc interactions observed in ruminal cellulosomes suggest an adaptation to the exquisite properties of the rumen environment.

. Cellulosome of R. flavefaciens strain FD-1 displaying the different group-specific Coh-Doc interactions involved in assembly of the multi-enzyme complex. The scheme is color-coded to highlight the four subgroups of cohesin-dockerin specificities: Dockerins and cognate cohesin counterparts of the different groups are marked in blue (Group 1 dockerins), yellow (Groups 3 and 6), green (Groups 2 and 4) and red (Group 5), respectively. Group 2 dockerins are truncated derivatives of group 4 and are not represented in the figure for simplification. The red oval marks the complex between DocScaA and CohScaB, representing the structure reported in this work.
bound to a group 1b Doc and a ScaB Coh bound to a group 1a Doc 28 , revealed the exquisite properties of rumen cellulosomes. While the first of the three is very similar to the previously described type I complexes, Coh-Doc complexes involving R. flavefaciens strain FD-1group 1 Docs do not bear much homology with any other complexes described to date. Although these three complexes are responsible for the integration of enzymes into the primary scaffoldins, either directly or through an adaptor scaffoldin, none of them possesses a dual-binding mode as observed in other cellulosomes 22,28 .
Here, we report the crystal structure of the R. flavefaciens strain FD-1 Coh-Doc complex established between ScaA Doc and the fifth cohesin of ScaB (RfCohScaB-DocScaA). It is significant and revealing that ScaA Doc displays a unique sequence which exhibits an atypical Ca 2+ -binding site, due to several sequence alterations and a 12-residue insert in the midst of the second Ca 2+ -coordination loop. Comprehensive biochemical analysis of the CohScaB-DocScaA interaction, informed by the structural data, suggests an atypical single-binding mode. Thus, in contrast to the other known cellulosomes, this work supports the view that in R. flavefaciens cellulosome, protein assembly is the result of exclusively single-binding mode Coh-Doc interactions.

Results and Discussion
Previous studies have shown that the Doc module of R. flavefaciens strain FD-1primary scaffoldin ScaA (RfDocScaA) interacts exclusively with Cohs 5 to 9 of scaffoldin ScaB 21,24 . Intriguingly, RfDocScaA is the lone member of the group 5 dockerins and therefore exhibits a unique sequence in strain FD-1. Moreover, the known orthologues in other R. flavefaciens strains each displays a similarly unique group 5-related sequence in the respective strain 20 . Thus, the CohScaB-DocScaA interaction is highly specific and central for R. flavefaciens cellulosome organization. Of the five possible RfDocScaA-CohScaB complexes, the one involving the fifth ScaB cohesin (RfCohScaB5) with RfDocScaA displayed the highest levels of expression 21 . Here, the structure of R. flavefaciens strain FD-1 DocScaA in complex with the fifth cohesin from ScaB, for simplification designated RfCohScaB5-DocScaA, was solved. Established Escherichia coli co-expression strategies for the production and purification of Coh-Doc complexes generated sufficient quantity of highly pure protein complexes to obtain good quality crystals.
Structure of RfCohScaB5-DocScaA protein complex. The RfCohScaB5-DocScaA crystal structure was solved by molecular replacement (Fig. 2). The best crystals belonged to space group P2 1 with unit cell dimensions of a = 30.1 Å, b = 142.9 Å, c = 46.6 Å, α = γ = 90 and β = 90.75. The RfCohScaB5-DocScaA complex displayed an elongated comma shape with overall dimensions of 60 × 50 × 25 Å and included residues 740-877 from RfCohScaB5 and 548-730 from RfDocScaA. The structure included two molecules of the RfCohScaB5-DocScaA heterodimer in the asymmetric unit, with each Doc coordinating two calcium (Ca 2+ ) ions, as well as 1 acetonitrile and 225 water molecules. The crystallographic dimer resulted from interactions between two RfCohScaB5 Figure 2. Structure of the RfCohScaB5-DocScaA complex. Structure of the RfCohScaB5-DocScaA complex with the dockerin in dark red and the cohesin in gold. The molecular surface of each module is represented in transparent colors. Under the transparent molecular surface and above the grey oval disk that marks the plane defined by the Coh 8-3-6-5 β-sheets, a ribbon representation shows the three Doc α-helices labeled α1, α2 and α3. Below the grey oval disk a ribbon representation of the cohesin shows each of the 9 β-strands, labeled from 1 to 9. Ca 2+ ions are depicted as green spheres. Structure of ScaB Coh5. RfCohScaB5 displays an overall typical elliptical Coh structure containing nine β-strands forming two β-sheets in an elongated β-barrel displaying a classical "jelly-roll fold". β-strands 9, 1, 2, 7, 4 comprise one sheet while β-strands 8, 3, 6, 5 are positioned on the opposite face. With the exception of β-strands 1 and 9, which align parallel to each other and close the jelly-roll, all remaining β-strands are antiparallel (Fig. 2). Notably, with the exception of a very poorly defined 3 10 -helix formed by residues Thr-862 to Lys-864, there are no structural motifs other than β-strands (Fig. 2). This observation contrasts with several bacterial Cohs where β-flaps are commonly found interrupting β-strand 4 or 8, including those from Acetivibrio cellulolyticus (PDB code 4UYP), Pseudobacteroides cellulosolvens (PDB code 1TYJ) or R. flavefaciens strain FD-1 ScaC Coh (PDB code 5LXV) 9,22,29 . The distinct α-helix commonly found between β-strands 4 and 5 in other Cohs is also absent. This particularity is shared with the recently described structures of RfCohScaB3 (PDB code 5AOZ) and RfCohScaA2 (PDB code 5M2S) which are the most homologous relevant RfCohScaB5 structural homologs (with a Z-score of 8.1, rmsd of 1.78 Å and sequence identity of 27% over 127 aligned residues and Z-score of 7.9, rmsd of 1.76 Å and sequence identity of 23% over 127 aligned residues, respectively) 28 . Other structural homologs include the type I Acetivibrio cellulolyticus CohScaC3 (PDB code 4UYP) with a Z-score of 8.0, rmsd of 1.81 Å and 14% sequence identity over 125 aligned residues, and the type I P. cellulosolvens CohScaB7 (PDB code 4UMS), with a Z-score of 9.0, rmsd of 1.87 Å and sequence identity of 20% over 129 aligned residues.
Structure of ScaA Doc. RfDocScaA comprises three α-helices, two of which (helix 1: Val-662 to Asp-677; helix 3: Lys-710 to Leu-723) are arranged in an antiparallel orientation forming a planar surface on the Doc that interacts with CohScaB5 (Fig. 2). These two helices comprise portions of the two classic Doc repeating segments, each containing a bound Ca 2+ ion in loops located at opposite ends of the module. However, much like in R. flavefaciens CttA XDoc 27 , the second repeating segment consists of an atypical variation of the EF-hand motif due to a large insertion in the Ca 2+ binding loop (Fig. 2). This is the most defining characteristic of this module and will be further discussed below. Connecting these two structural elements is yet another α-helix (helix 2) extending from Asp-682 to Asp-689. The overall tertiary structure, with the exception of the loop insertion, bears some similarities to enzyme-associated dockerins from C. thermocellum (PDB code 3P0D: Z-score of 6.9, rmsd of 1.24 Å and 25% sequence identity over 65 aligned residues; PDB code 2CCL: Z-score of 6.5, rmsd of 1.44 Å and 26% sequence identity over 61 aligned residues) and R. flavefaciens (PDB code 5M2O: Z-score of 7.6, rmsd of 1.31 Å and 27% sequence identity over 68 aligned residues). The Ca 2+ coordination in the N-terminal segment follows the typical n, n + 2, n + 4, n + 6, n + 11 plus a water molecule (at the n + 8 position) pattern. Thus, the Ca 2+ ion located at the N-terminus is coordinated by the side chains of Asp-653, Asn-655, Asp-657 and Asp-664 (both the Oδ1 and Oδ2), the latter belonging to α-helix 1 (Fig. 3A). The pentagonal bipyramid geometry of the coordination is completed by the main-chain carbonyl of Asp-659 and one water molecule (n + 8, via Asn-661) (Fig. 3A).
In contrast, the pattern of Ca 2+ coordination in the C-terminal repeat is displaced due to the 12-residue loop insertion between Pro-693 and Ser-704 (Fig. 3B). A Phe residue replaces the usual Asn/Asp at position n + 2 and provides a backbone carbonyl oxygen ligand. The Asn/Asp at position n + 4 and water at position n + 8 are absent (Fig. 3B). Therefore, the coordination follows an atypical n, n + 2, n + 18 (at the n + 6 position), n + 23 (at the n + 11 position), pattern with no water molecules involved. Thus, the C-terminal Ca 2+ coordination adopts a tetrahedral configuration involving the side chains of residues Asp-689 and Asp-712 (both the Oδ1 and Oδ2) and completed by the main-chain carbonyl groups of Phe-691 and Asp-707 (Fig. 3C). A similar atypical Ca 2+ -binding loop disruption has been observed in the R. flavefaciens RfXDocCttA structure in complex with RfCohScaE, where a 13-residue long insertion in the C-terminal loop also alters the Ca 2+ -coordination pattern in the Doc of the CttA protein, although the octahedral geometry is maintained thanks to the contribution of two water molecules (Fig. 3C) 27 . In RfXDocCttA, it was found that the loop insert, together with two other inserts, serve as structural buttresses stabilizing the X-Module-Doc relationship. However, there is no X-module dyad associated with RfDocScaA and therefore the function of the 12-residue flap remains unknown. Although the RfDocScaA loop insert and the RfXDocCttA insert have a similar location in the Doc structure, the fact they do not display significant sequence identity would appear to negate a direct evolutionary relationship.
A recent study suggested the existence of an intramolecular clasp between the N-terminal and C-terminal ends of DocScaA, which contributed to enhancing the stability of the Doc module 30 . Based on an in silico model of DocScaA from R. flavefaciens strain 17, the authors predicted a stacking interaction between an N-terminal tryptophan and a C-terminal proline 30 . By mutating those two residues a reduction in thermal and chemical stability was observed for the Doc 30 . The X-ray crystal structure of RfDocScaA, observed here in complex with RfCohScaB5, revealed the same stacking interaction between Trp-651 and Pro-727 ( Figure S1), thus supporting the involvement of this crucial contact to maintain Doc structural integrity. Furthermore, these types of aromatic interactions are commonly involved in structural stabilization and similar intramolecular clasps have been identified in other known Docs [31][32][33] . Additional intramolecular contacts established by both ends of the protein module, such as the hydrogen bonds between Cys-690 and Asp-712/Ala-728 and between Asn-687 and Val-650/ Gly-652 also provide additional structural stabilization to RfDocScaA and contribute to its compact and globular conformation.

RfCohScaB5-DocScaA complex interface. Helices 1 and 3 of RfDocScaA make numerous contacts with
RfCohScaB5 β-sheets 8-3-6-5 ( Fig. 4A,B). The Coh-interacting surface displays a flat topology although the loop connecting β-strands 8 and 9 is elevated from the plane defined by strands 8-3-6-5, thus leading the Coh into a closer proximity to the N-terminus of RfDocScaA helix-1. Similarly the loop connecting β-strands 6 and 7 is also elevated in relation to the Coh plane promoting its interaction with the middle to the C-terminal portion of helix-1. This results in the entire length of RfDocScaA helix-1 interacting with the Coh surface. In contrast, helix-3 interacts with the Coh platform predominantly through the C-terminus. Thus, RfDocScaA displays a similar mechanism of Coh recognition to Group 1 Docs that also bind to ScaA or ScaB Cohs, predominantly through a single helix 28 . In contrast, R. flavefaciens group 3 and group 6 Docs interact with their Coh partners through the entire length of their two helices as previously observed in the R. flavefaciens RfCohScaC-Doc3 complex. Thus, the two Doc3 α-helices (helix 1 and helix 3) of RfCohScaC-Doc3 fully interact with CohScaC 22 ( Figure S2). A large network of polar (Table 2) and non-polar interactions (Table S1)  Asn-661 and Asn-669 of RfDocScaA. The hydrogen bond network established by α-helix 1 is dominated by the interaction of Asn-669 with Glu-814 of RfCohScaB5 and Lys-670 with Thr-856 (both Oγ1 and Oδ1) and Asn-857 of RfCohScaB5 (Fig. 4B). An extra hydrogen bond is established between RfDocScaA Val-666 main-chain N and RfCohScaB5 Gln-778. In α-helix-3 the contacts are dominated by hydrophobic interactions involving Val-721, whose sidechain is positioned in the hydrophobic pocket created by Ala-811, Tyr-809, Tyr-810 and the aliphatic region of Asn-804 of RfCohScaB5. Lys-710, Ile-717, Val-720, His-722 and Leu-723 reinforce the hydrophobic contacts of α-helix-3. The close proximity of the C-terminal portion of α-helix-3 also allows the establishment of an important hydrogen bond between Val-721 of RfDocScaA and Asn-804 of RfCohScaB5. In addition, a salt bridge is established between the Nδ1 atom of RfDocScaA His-722 and the Oε1 atom of RfCohScaB5 Glu-807.
The structure of the RfCohScaB5-DocScaA complex revealed the residues of RfCohScaB5 that recognize DocScaA. Previous work revealed that ScaB cohesins 5 to 9 display a similar binding specificity as these Cohs bind exclusively to the singular group 5 RfScaA Doc 21 . Alignment of the primary sequences of Cohs 5 to 9 (Fig. 5) provided a rationale for the conservation in binding specificity observed in these five Cohs. Thus, CohScaB5 residues Gln-778, Asn-804, Glu-807 and Thr-856, whose sidechains establish the main hydrogen bonds with In both panels the amino-acid residues involved in the metal coordination are depicted as sticks, surrounded by a mesh representation of the Refmac5 maximum-likelihood σ A -weighted 2F o −F c electron density map contoured at 1σ (0.46 electrons/A 3 ). The labels show the RfDocScaA residue and coordination position numbers and also the atoms involved. Both calcium ions are depicted as purple spheres and are overlaid with an idealized geometry representation (green arrows), which is pentagonal bipyramidal for the N-terminal Ca 2+ (Panel A) and tetrahedral for the C-terminal Ca 2+ (Panel B). A single water molecule (Wat) completes the coordination sphere of the N-terminal Ca 2+ ion (Panel A). The bidentate nature of the Asp-664 and Asp-712 coordination is highlighted with blue dashed lines (Panels A and B). The 12-residue insert at the C-terminal calcium coordination loop is colored in light green (Panel B). Panel C depicts the overlay of the C-terminal Ca 2+ of RfDocScaA (purple) with the C-terminal Ca 2+ of the group 4 dockerin of RfDocCttA (cyan), whose coordination is also disrupted by a 13-residue long insert (dark green), but maintains an octahedral geometry due to the contribution of 2 water molecules (Wat). The structure of RfDocScaA is colored tan, and the structure of RfDocCttA is colored blue. DocScaA, are conserved in ScaB Cohs 6, 7 and 9. Interestingly, CohScaB8 Gln-778 and Glu-807 are replaced by hydroxy amino acids (Fig. 5). Whether these differences correspond to a lower affinity for DocScaA remains to be explored. CohScaB5 Ala-811 is also conserved in CohScaB6 to 9. Ala-811 lies in the hydrophobic pocket that accommodates the sidechains of DocScaA Val-662 and Val-721. In CohScaB1-4, this Ala is replaced by a Lys that will not allow these hydrophobic contacts and very likely result in steric clash with CohScaA. Thus, Ala-Lys replacement is an important determinant of Coh-Doc specificity within the R. flavefaciens cellulosome.
RfDocScaA displays an exclusive binding specificity, as it is the only Doc that is able to recognize ScaB cohesins 5 to 9. The alignment of RfDocScaA with the Doc sequences of ScaA scaffoldins recently discovered in diverse R. flavefaciens strains 17 revealed the degree of conservation of residues involved in the recognition of ScaB Coh ( Figure S3). Thus, within the 5 ScaA Doc homologues analyzed, residues Asn-661, Val-666, Asn-669 and Val-721 are completely conserved and Val-662 is replaced by an Ile in 2 strains. This conservation reinforces the importance of these residues for DocScaA's ability to recognize CohScaB5.
RfScaA presents a single binding mode. Initially, non-denaturing gel electrophoresis (NGE) was used to probe the importance of RfDocScaA residues for Coh recognition ( Figure S4). The data revealed that single mutant derivatives of RfDocScaA retain the capacity to interact with its protein partner, suggesting that the amino acid substitutions explored in this study had a marginal impact in affinity. Thus, to gain more insight into the In both panels the most important residues involved in Coh-Doc recognition are depicted in stick configuration, with a dark background label for the Doc residues and a light background label for the Coh residues, using the DocScaA and CohScaB5 numbering. Solid black lines mark hydrogen-bond interactions. Ca 2+ ions are depicted as purple spheres. In all panels, the transparent grey disk marks the plane defined by the 8-3-6-5 β-sheet, where the β-strands form a distinctive dockerin-interacting plateau.  driving forces of Coh-Doc recognition, the binding thermodynamics of RfDocScA to RfCohScaB5 were measured by isothermal titration calorimetry (ITC) at 308 K, which is the rumen approximate temperature of. The data ( Table 3, Fig. 6), revealed a stoichiometry of 1:1 and a K a of ~10 8 M −1 , similar to what was previously observed in other Coh-Doc interactions of R. flavefaciens 22,28 . However, an accurate determination of the K a was not possible as affinity was close to the upper sensitivity range of the technique. The affinity of RfDocScaA mutant derivatives described above for RfCohScaB5 was also explored by ITC. An alanine substitution of RfDocScaA residue Asn-661 resulted in a ~100-fold reduction in the affinity for RfCohScaB5 (Table 3, Fig. 6). Even though the alanine substitutions of residues Val-662, Asn-669, Lys-670 and His-722 did not result in a decreased K a , the associated standard errors were lowered relative those observed for the wild-type interaction, which may indicate a reduction in affinity. The low impact that the alanine substitutions had on the affinity of the RfCohScaB5-DocScaA interaction may reflect the inherent hydrophobic nature of the alanine sidechain and its ability to significantly compensate for the substitution. Overall, single mutations of DocScaA contacting residues seem to have little to no effect on the affinity to the Coh partner. However, combining any two of the tested valine mutations (Val-662, Val-666, Val-721) into RfDocScaA double mutants resulted in a ~10-fold reduction in the affinity for RfCohScaB5. Mutating all three RfDocScaA valines led the K a to decrease by approximately 1000 times relative to the estimated affinity of the wild type interaction ( Table 3). The RfDocScaA Asn-661Ala/Asn-669Ala double mutant derivative completely lost its capacity for RfScaBCoh5 recognition (Table 3). These data suggest that both polar and hydrophobic interactions play an important role in stabilizing the RfCohScaB5-DocScaA interaction, with particularly relevant contributions provided by Val-662, Val-666 and Val-721. A close inspection of the RfCohScaB5-DocScaA complex structure suggests that RfDocScaA residue Asn-661 does not play a critical role in RfCohScaB5 recognition when compared with other residues, such as Val-662 and Val-666. However, Asn-661 is critically involved in the coordination of the N-terminal Ca 2+ , which may explain the decreased affinity observed when substituted with Ala. The methyl side-chain of Ala is unable to contribute to Ca 2+ coordination, which is critical for maintaining the Doc fold and would thus negatively impact the interaction between the two modules. The thermogram resulting from the interaction between RfScaBCoh5 and the Asn-661Ala RfDocScaA mutant is displayed in Fig. 6. Interestingly, the signals in the binding isotherm of this interaction appear to be broader suggestive of difference in kinetics of the interaction with RfScaBCoh5 compared to the wild-type RfDocScaA. Thus, the decreased affinity revealed by RfDocScaA single and multiple The observation that the Asn-661/Asn-669 mutant did not bind to RfCohScaB5 suggests that RfDocScaA presents a single-binding mode; although Asn-661 substitution affected Ca 2+ coordination. It is plausible that under these conditions the symmetry-related helix-3 in the Doc structure could replace helix-1 supporting the recognition of RfScaBCoh5 through a symmetry-related interface. When Doc mudules present a dual-binding mode, mutation of a single or two residues positioned in the same helix usually has no effect on affinity, as a symmetry-related functional binding site can assume Coh recognition involving a 180° rotation of the Doc when binding its protein partner 34 . In addition, it has proven difficult to crystallize dual-binding mode complexes as these types of interactions present conformational heterogeneity that precludes crystal formation. Thus, this initial observation strongly suggested that RfDocScaA presents a single-binding mode. To analyze the nature of structural symmetry observed within RfDocScaA, the structure of RfDocScaA was overlaid with itself after rotation of 180° in the Coh plane (Fig. 7A). The overlay suggests that residues Asn-661 and Asn-669 are replaced by Thr-709 and Ile-717, respectively, when the Doc is rotated by 180°, suggesting a disruption of the capacity of RfDocScaA to recognize the Coh at these positions (Fig. 7A). Furthermore, the symmetry-related residues for valines 662, 666 and 721 are all of polar nature and therefore do not allow establishment of the extensive hydrophobic platform created by this critical valine triad. Overall these observations suggest that the asymmetric nature of RfDocScaA leads to a unique mode for the formation of RfCohScaB5-DocScaA complex. This contrasts with a large majority of Coh-Doc complexes where a dual-binding mode is observed, including those involving the binding of primary to adaptor scaffoldins as is the case for RfCohScaB5-DocScaA. Thus, the symmetrical nature of Acetivibrio cellulolyticus DocScaA, which was previously shown to display a dual-binding mode by binding to cohesin AcCohScaB3 in two distinct orientations (43), is easily demonstrated when its structure is overlaid with itself after a 180° rotation (Fig. 7B).

Conclusions
The assembly of enzyme subunits into the R. flavefaciens strain FD-1cellulosome involves groups 1, 3 and 6 enzyme-borne Docs. Groups 3 and 6 Docs present essentially the same specificity, although a reversed binding mode, and recruit primarily hemicellulases to the multi-enzyme complex through the binding to the Coh of the RfScaC adaptor scaffoldin. RfScaC contains a group 1 Doc that, like the remaining 95 group 1 Docs, specifically binds Cohs of primary scaffoldin RfScaA as well as Coh 1 to 4 of adaptor scaffoldin RfScaB. Thus, group 1 Docs represent the major group of Docs, which recruit the largest number of enzymes to ruminal cellulosomes. Previous studies have revealed that group 1, 3 and 6 Docs essentially display a single-binding mode mechanism 22,28 . This contrasts with previous observations on the cellulosomes of C. thermocellum 34 , A. cellulolyticus 29 and C. cellulolyticum 35 , in which Docs used to assemble the microbial enzymes into cellulosomes display a dual-binding mode. The 2-fold internal symmetry of dual-binding mode Docs permits the binding to the Coh partner in two 180°-related alternate positions. The observation that dual binding mode is highly conserved in bacterial Docs suggests that it might contribute to enhance the flexibility and accessibility in highly populated cellulosomes. Here, we have elucidated the structure of the unique RfDocScaA in complex with the fifth Coh of RfScaB. The data revealed that, like groups 1, 3 and 6 Docs, the key RfDocScaA lacks the internal symmetry previously observed in most cellulosomal Docs. Thus, the recent work in R. flavefaciens strain FD-1cellulosomal protein-protein interactions reveals that the dual-binding mode is not universal to all cellulosomal systems and suggests that ruminal cellulosomes are assembled predominantly through single-binding mode Docs. This is a rather striking observation as the dual-binding mode was believed to universally improve the flexibility of highly populated cellulosomal systems. While it is possible that the dual-binding mode allows Docs, in cellulosomes with a limited scaffoldin repertoire, to explore a larger space by having alternate conformations, it is also possible that the dual-binding mode represents an adaptation to the physicochemical properties of different ecological niches. The fact that CAZymes have spread through bacteria and fungi essentially through horizontal    gene transfer, suggests that the same mechanism operated to exchange the other components of cellulosomal systems 36,37 . As such, it is probable that all Docs evolved from a common ancestral sequence through a mechanism involving gene duplication and subsequent horizontal gene transfer. Thus, it is likely that the dual-binding mode is a property maintained from ancestral Docs, and it should confer a competitive selective advantage in the majority of the cellulosome systems described to date. The biochemical factors that constitute the driving selective force for the evolution of single-binding mode Docs remain to be elucidated. It is possible that the highly stable physical-chemistry properties revealed by the rumen do not require highly flexible Coh-Doc interactions for efficient cellulosome assembly and function but this hypothesis remains to be tested.

Methods
Gene synthesis and DNA cloning. Docs are highly unstable when produced recombinantly in Escherichia coli. To promote stability, R. flavefaciens FD-1 DocScaA (WP_009986657.1 residues 648-730) was co-expressed in vivo with CohScaB5 (WP_009986658.1 residues 737-880). The immediate binding of DocScaA to CohScaB5 is believed to confer immediate stabilization of the Doc structure. The genes encoding the two proteins were designed to maximize expression in E. coli, synthesized in vitro (NZYTech Portugal) and cloned into pET28a (Merck Millipore, Germany) under the control of separate T7 promoters. The DocScaA-encoding gene was at the 5′ end while the CohScaB5-encoding gene was at the 3′ end of the synthetic DNA. A T7 terminator sequence (to terminate transcription of the dockerin gene) and a T7 promoter sequence (to control transcription of the cohesin gene) were incorporated between the sequences of the two genes. NheI and NcoI recognition sites at the 5′ end and XhoI and SalI at the 3′ end were specifically inserted to allow subcloning into pET28a (Merck Millipore, Germany), such that the sequence encoding a six-residue His tag could be introduced either at the N-terminus of the dockerin (through digestion with NheI and SalI, incorporating the additional sequence MGSSHHHHHHSSGLVPRGSHMAS N-terminal of the Doc) or at the C-terminus of the cohesin (by cutting with NcoI and XhoI, which incorporates the additional sequence LEHHHHHH C-terminal of the Coh). Thus, the two pET28a plasmid derivatives led to the expression of protein complexes with the engineered hexa-histidine either located at the dockerin or the cohesin. The two separate plasmids were used to express RfCohScaB5-DocScaA complexes in E. coli. The sequences of DocScaA and CohScaB5 are presented in Table S2. Recombinant cohesins and dockerins were produced individually by using two distinct cloning strategies. First, to express the cohesin individually the previously described cohesin-tagged version of the pET28 derivative was digested with BglII to remove the dockerin sequence. This strategy gave a pET28a derivative encoding the recombinant cohesin CohScaB5 fused to a C-terminal hexa-histidine tag. The DocScaA-encoding gene was cloned into the pHTP2 vector (NZYtech, Lisbon, Portugal) following the manufacturer's protocol. Dockerin genes were isolated from R. flavefaciens FD-1 genomic DNA by PCR and using the primers shown in Table S3. The recombinant dockerin encoded by the pHTP2 derivatives contained an N-terminal thioredoxin A and an internal hexa-histidine tag for increased protein stability and solubility. Sequences of all plasmids produced were verified by Sanger sequencing.
To identify the Doc residues that modulate Coh recognition, several TrxADocScaA protein derivatives were produced using site directed mutagenesis. PCR amplification of the Doc-containing plasmid, using the primers presented in Table S3, allowed the production of seven DocScaA protein derivatives, namely N661A, V662A, V666A, N669A, K670A, V721A, H722A. Each of the newly generated gene sequences was fully sequenced to confirm that only the desired mutation accumulated in the nucleic acid.
Expression and purification of recombinant proteins. Initial expression studies revealed that when the polyhistidine tag was located at the Doc N-terminal end in RfCohScaB5-DocScaA complexes, the expression levels of both Coh and Doc were elevated. Expressing the cohesin with the histidine tag led to the accumulation of of unbound cohesin, which suggests that either the cohesin expresses at higher levels than the dockerin or that the untagged dockerin was less stable. Therefore, the construct encoding the protein complex with the tagged dockerin was subsequently selected to upscale the production of the RfCohScaB5-DocScaA complex. BL21 (DE3) E. coli cells were transformed with the vector containing the construct and grown at 37 °C to an OD 600 of 0.5. 1 mM isopropyl β-D-1-thiogalactopyranoside was added to induce recombinant protein expression, followed by incubation at 19 °C for 16 hours. After harvesting the cells by centrifuging 15 min. at 5000 × g, the cells were resuspended in 20 mL of immobilized-metal affinity chromatography (IMAC) binding buffer (50 mM HEPES, pH 7.5, 10 mM imidazole, 1 M NaCl, 5 mM CaCl 2 ). Disruption of the cells was done through sonication and the cell-free supernatant was then recovered by centrifuging 30 min. at 15,000 × g. After loading the soluble fraction into a HisTrap TM nickel-charged Sepharose column (GE Healthcare, UK), initial purification was carried out by IMAC in a FPLC system (GE Healthcare, UK) using conventional protocols with a 35 mM imidazole wash and a 35-300 mM imidazole elution gradient. After selecting the fractions containing the cohesin-dockerin complex, the buffer of the purified samples was changed to 50 mM HEPES, pH 7.5, containing 200 mM NaCl, 5 mM CaCl 2 using a PD-10 Sephadex G-25M gel-filtration column (Amersham Pharmacia Biosciences, UK). Gel-filtration chromatography using a HiLoad 16/60 Superdex 75 cloumn (GE Healthcare, UK) was used as a second purification step. The purified complex samples were concentrated in an Amicon Ultra-15 centrifugal device with a 10-kDa cutoff membrane (Millipore, USA) and washed three times with molecular biology grade water (Sigma) containing 0.5 mM CaCl 2 . The final protein concentration was adjusted to 45 mg mL −1 . Protein concentration was estimated in a NanoDrop 2000c spectrophotometer (Thermo Scientific, USA) using a molar extinction coefficient (ε) of 31 065 M −1 cm −1 . The storage buffer consisted in molecular biology grade water containing 0.5 mM CaCl 2 . 14% (w/v) SDS-PAGE gels were used to confirm the purity and molecular mass of the recombinant complexes.
His GraviTrap gravity-flow nickel-charged Sepharose columns (GE Healthcare, UK) were used to purify the TrxADocScaA mutant derivatives and CohScaB5 used in native PAGE and ITC experiments. Nondenaturing gel electrophoresis (NGE). For the NGE experiments, the proteins were kept in the IMAC elution buffer (50 mM HEPES, pH 7.5, 300 mM imidazole, 1 M NaCl, 5 mM CaCl 2 ). Each of the TrxADocScaA variants, at a concentration of 15 μM, was incubated in the presence and absence of 15 μM CohScaB5 for 30 min at room temperature and separated on a 10% native polyacrylamide gel. Electrophoresis was carried out at room temperature. The gels were stained with Coomassie Blue. Complex formation was detected by the presence of an additional band displaying a lower electrophoretic mobility than the individual modules.

Isothermal titration calorimetry.
All ITC experiments were carried out at 308 K. The buffer used consisted in 50 mM HEPES pH 7.5, 0.5 mM CaCl 2 and 0.5 mM TCEP. The purified TrxADocScaA variants and CohScaB5 were diluted to the required concentrations and filtered using a 0.45-μm syringe filter (PALL). Protein concentration was estimated in a NanoDrop 2000c spectrophotometer (Thermo Scientific, USA) using a molar extinction coefficient (ε) of 31 065 M −1 cm −1 . During titrations, the Doc constructs were stirred at 307 revolutions/min in the reaction cell and titrated with 28 successive 10 μL injections of CohScaB5 at 220-s intervals. Integrated heat effects, after correction for heats of dilution, were analyzed by nonlinear regression using a single-site model (Microcal ORIGIN version 7.0, Microcal Software, USA). The fitted data yielded the association constant (K A ) and the enthalpy of binding (ΔH). Other thermodynamic parameters were calculated using the standard thermodynamic equation: ΔRTlnK A = ΔG = ΔH − TΔS.
X-ray crystallography, structural determination and refinement. Several crystallization conditions were tested by using the sitting-drop vapor-diffusion method with the aid of an Oryx8 robotic nanodrop dispensing system (Douglas Instruments, UK 38 ). The commercial kits JCSG+ HT96 (Molecular Dimensions, UK), Crystal Screen, PEG/Ion (Hampton Research, California, USA), and an in-house screen (80 factorial) were used for the screening. 1 µl drops of 12.5, 25 and 45 mg ml −1 RfCohScaB5-DocScaA were mixed with 1 µl reservoir solution at room temperature. The resulting plates were then stored at 292 K. Crystal formation was observed under 2 conditions (0.1 M HEPES pH 7.5, 1.2 M sodium citrate; 2.1 M DL-malic acid pH 7.0) after a period of approximately 180 days from setting up the plates (maximum dimensions ~50 × 50 × 20 μm). These crystals were cryoprotected with mother solution containing 20-30% glycerol and flash-cooled in liquid nitrogen. Preliminary X-ray diffraction experiments revealed that these crystals were of very poor quality mainly due to high mosaicity. Optimization plates based on the 2 original hits were set up. Two additive plates (one for each original condition) were also set up using the HT Additive Screen (Hampton Research, California, USA). The additive screen drops consisted of 0.8 µl protein +0.8 µl optimization condition +0.2 µl stock additive solution. This approach generated several good quality crystals. X-ray diffraction data were collected on beamline PROXIMA-1 at the Soleil Synchrotron, Saint-Aubin, France using a PILATUS 6M detector (Dectris Ltd) from crystals cooled to 100 K with a Cryostream (Oxford Cryosystems Ltd). A systematic grid search was carried out on all of these crystals to select the best diffracting part of each crystal. EDNA 39 and iMosflm 40 were used for strategy calculation during data collection. All data sets were processed using the Fast_dp and xia2 41 packages, which use the programs XDS 42 , POINTLESS and SCALA 43 from the CCP4 suite 44 . Data-collection statistics are given in Table 1.
The best diffracting crystal was formed in one of the additive screen conditions (0.1 M HEPES pH 7.5, 1.2 M Sodium Citrate, 4% v/v acetonitrile). It diffracted to a resolution of 1.4 Å and belonged to the monoclinic spacegroup P2 1 . Phaser MR was used to carry out molecular replacement 45 . The best solution was found using a cohesin from R. flavefaciens strain 17 ScaB (unreleased) and an ensemble of 3 R. flavefaciens FD-1 dockerins (Doc1a from 5M2O, Doc1b from 5M2S and Doc3 from 5LXV) produced with Dali 46 . The cohesin had a sequence identity of 33.0% and the dockerins between 22% (Doc3) and 34% (Doc1b). Two copies of the heterodimer RfCohScaB5-DocScaA complex were present in the asymmetric unit. The partially obtained model was completed with Buccaneer 47 and with manual modeling in COOT. It was then refined using REFMAC5 48 and PDB REDO 49 interspersed with model adjustment in COOT. The final round of refinement was performed using the TLS/restrained refinement procedure using each module as a single group, giving the final model (Protein Data Bank code 5N5P, Table 1). The root mean square deviation of bond lengths, bond angles, torsion angles and other indicators were continuously monitored using validation tools in COOT and MOLPROBITY. A summary of the refinement statistics is provided in Table 1.