Structures of a non-ribosomal peptide synthetase condensation domain suggest the basis of substrate selectivity

Non-ribosomal peptide synthetases are important enzymes for the assembly of complex peptide natural products. Within these multi-modular assembly lines, condensation domains perform the central function of chain assembly, typically by forming a peptide bond between two peptidyl carrier protein (PCP)-bound substrates. In this work, we report structural snapshots of a condensation domain in complex with an aminoacyl-PCP acceptor substrate. These structures allow the identification of a mechanism that controls access of acceptor substrates to the active site in condensation domains. The structures of this complex also allow us to demonstrate that condensation domain active sites do not contain a distinct pocket to select the side chain of the acceptor substrate during peptide assembly but that residues within the active site motif can instead serve to tune the selectivity of these central biosynthetic domains.

and Pg 7, line 1. I tend to think of symmetry-related in crystals to refer to interactions between asymmetric units. Thus I do not think that molecule B should be highlighted as "symmetry" nor described as docked to a "symmetry-related" C3 domain (pg 7). The description of Pg 5, 3 lines from the bottom, is more accurate. . Apo and holo are generally used to describe a protein +/-a cofactor. Thus, using "apo" in panels A and B is not accurate, with "unloaded" being correct (as in the legend). Please also list contour level and any carve radius in the legend. Figure 5, panels A and B. Glu2702 is mislabeled as 2707. This is incorrect in the legend as well.
Pg 11. The authors should provide a few sentences to provide a brief description of the Spycatcher/Spytag system so that the reader does not need to go the reference to know whether this a short peptide, two large protein domains, or a biotin/stretavidin based system. Pg 13. " zwitterioinic"

Supplementary Material
A table of contents for the SI can help guide the reader. Table S1. The Resolution entry for all four datasets is formatted as "Low Resolution limit (High Resolution limit)". While everyone would understand what is meant, the format is a bit unusual as in every other entry of the table, the entry not in parentheses is the data for the complete data set and the entry in parentheses is the highest resolution shell. Resolution data should simply be presented as "48.14 -2.18" or if preferred "48.14 -2.18 (2.3 -2.18)". Table S4. Please define Delta ^i^ G Figure S1. Please state in the legend whether the aligned halves are the N-terminal or C-terminal subdomains.
Figures S8, S9, and S10, detailing the plausible mechanisms from the supplementary discussion are not present in the SI but rather are part of Figure 5 of the main text.
Please consider expanding the Legend to Figure S14, including a label for the X-axis. I understand the motivation of this figure and believe I understand how the information is presented but it took me a while to interpret this.

Andrew M. Gulick University at Buffalo
Reviewer #2 (Remarks to the Author): This is a paper that reports on the structure of an NRPS module with the aminoacyl group (stable analog) bound. This is one of the missing structures in the understanding of NRPS assemblies. It was also so good to see that PPant ejection still being used. I discovered it in 2007 and published it in 2008 before starting my own career so it brings back some wonderful memories.
It was an enjoyable read. Some comments and questions. All are minor and even if the authors have no answer, the paper would still be appropriate for NComms.  This is a bit of an open-ended question but this could have a big impact of there was such a trend and it may be possible to answer these questions with this unique structure. Are there conserved residues in the C domain that bind PPant?, are there conserved residues in the C domain that define Acyl selectivity? I see that you did a pocket analysis. It is not entirely clear what this is and what the training set was but perhaps look for two details. 1) are there specific residues associated with the active site depending on the substrate that is loaded. 2) are there polarities of residues that define substrate specificities. The tricky part is that you have to half the active site in the donor and acceptor region and then ask these questions. It may be that this is currently not possible to identify if there is a true donor or acceptor region in the C domain with the currently available structures but if it is then it could be very interesting.
"This forced us to explore alternatives to thioester-tethered amino acids, and we chose to use an analog of the aminoacyl-CoA with a thioether in place of the reactive thioester." I see from the structure that the carbonyl is also missing, can the authors clarify what the actual non-hydrolyzable thioether is in this section? (I saw the structures are shown in the SI but would be nice to mention what the mimic is in this section.) Perhaps just change to the following "….a thioether, missing the reactive carbonyl, that makes the thioester susceptible to nucleophiles." Regarding HHXXXDG vs HHXXXDE, is there a trend among substrates that are connected based on sequence gazing the understanding of the new structure and the type of amino acids or acids that are coupled? I know you tested the E to G mutation in vitro but in vivo do you think the condensations become more promiscuous? I.e. is this another gatekeeper for selectivity for condensation build into the system post A domain activation? I know the discussion has a description of selectivity but limited in the context of the condensation rxn. Figure S11, I would not call it HRMS. As the resolution is quite poor compared to FT instruments and the isotopes for proteins are not detected. The mass accuracy is OK not amazing but works. Maybe just specify the resolution of the MS instead so you don't have to specify HRMS. The numbers of the masses have one too many sig figs for the PPant ejections. For example, 621.3177, found: 621.3049 this is 0.128 Da difference. This should be 621.32, found: 621.31 based on this mass accuracy. The deuterated experiment is a nice confirmation that eliminates the need for the super high mass accuracy.
The paper does not have a data availability section in the paper. Coordinates, MS data, etc.
Of course, due to my personal bias I would also encourage to make the MS/MS spectra of all the synthetic compounds publicly available not just as copies in a PDF.
Congrats, it was a joy to read and stay healthy, Pieter Dorrestein.
Reviewer #3 (Remarks to the Author): The paper by Izoré et al. presents a state-of-the-art multidisciplinary research of C-domain structure and selectivity in non-ribosomal peptide biosynthesis. Four novel crystallographic structures are reported with a resolution of ~2 A. Expert analysis of these structures followed by biochemical characterization, site-directed mutagenesis, bioinformatic analysis and molecular modeling provided novel insights into the roles of key residues of the C-domain. Most notable is the new understanding of the role of the HHxxxDE pattern and R2577. A crystallographic structure of C-domain in complex with an acceptor PCP-domain bearing a substrate is reported for the first time, thus shaping further steps towards our understanding of structure-function relationship in NRPSs and related proteins.
The take-home message of the presented manuscript is clear. I believe, the new crystallographic structures enriched by extensive multidisciplinary analysis have the capacity to make an immediate contribution to mechanistic understanding of non-ribosomal peptide biosynthesis, though some parts of the work require clarification or revision.
My criticism is focused on the handling of the aspect of conformational plasticity of NRPSs and their complexes. As authors clearly state in the introduction "NRPSs are highly flexible and the interactions between individual domains change during the process of chain assembly". In this context, my major points are as follows: 1. First and foremost, if NRPSs and their complexes are known or expected to be flexible, then how was it even possible to characterize the reported PDB structures? What novel approach was used in the work that made it possible to characterize such complexes? 2. If NRPSs and their complexes are flexible, then I would expect a series of conformational variants to exist in equilibrium. Then, authors also continuously refer to their PDB models as "structural snapshots". What conformational and functional states of the complexes are reflected in the reported PDB "snapshots" compared to the conformation ensemble of NRPSs and their complexes? Bioinformatic comparison of the obtained complexes with related PDB structures and also molecular modeling was reported, mostly in Supplementary material, but I did not immediately find the answer to my questions. Thus I believe this issue was not address sufficiently clear and should be presented, possibly, as a dedicated section. How flexible are the reported complexes? How representative these "static" models are in "dynamic" context of conformation ensemble of NRPSs and their complexes? Would they move/rearrange during MD simulation? The molecular modeling part left the impression that the PDB complexes were stable during MD. How does this conclusion coexists with the notion of NRPSs being highly flexible?
3. Authors report molecular docking and transition state calculations. These are well-known and widely used approaches, but they all demand a single "static" high-quality model as input. In practice, this usually means that the protein in question should be relatively "rigid". I had an impression that authors used the original "static" PDB models as input to these methods. Since NRPSs and their complexes are considered highly flexible, I wonder about the validity of such approach. Either there is a justification for use of "static" models of the complexes in question for docking and QM, or additional work should be performed, e.g. docking into several structural states and snapshots obtained from PDB or molecular dynamics.
Other issues:

Regarding the PDB structures
The four crystallographic models present primary result of the work, and yet the respective PDB codes were not mentioned even once in the main text. Instead, the codes and explanation of their contents are buried deep in the Supplementary data. I would like to see the PDB codes and a brief explanation of contents of the respective 3D-structures in a dedicated section of the main text. The figures obtained using the original models should also cite their codes. I even suggest to additionally provide a list of the codes in the abstract, for convenience of the users.
Then, unless there is some specific rule against it that I am not aware of, I would like to see all four PDB structures (i.e. files which I can load into PyMol and view in 3D) before submitting my final decision. In recent years, there is a growing amount of PDB entries containing obvious human-visible errors. The formal validation reports are interesting, but without the PDB models themselves they do not show the full picture. So I believe an extra check of PDB files by independent pair of eyes will be useful to both the authors and further users of this data.

Bioinformatic analysis
A multiple sequence alignment tool was used to superimpose sequences of individual domains for bioinformatic analysis. Sequence alignment is limited to relatively close homologs with a high level of pairwise sequence similarity (at least ~30-40% is needed to hope for a meaningful comparison, as everything below 40% is commonly considered as the "twilight zone" of sequence alignment). However, the level of sequence identity between the aligned sequences was not stated in Supplementary material (Bioinformatic methods section). The level of pairwise sequence similarity should be stated. If it resides below 40%, other methods should probably be used (e.g. 3D-alignment, profile-based alignment, structure-guided sequence alignment, etc.), or additional justification should be provided.
Authors discuss sequence conservation statistics in the manuscript (e.g. 80% of Gly is found in DCL domains in positions equivalent to R2577, etc.). It usually makes sense to calculate such statistics after the removal of redundant, duplicated, incomplete, etc. sequences (i.e. the redundancy filter). No such filter was stated in methods, as it seems 1456 LCL and 593 DCL sequences were collected as is, and may contain redundant information. The same concern also applies to the calculation of correlation coefficients.
3. Molecular modeling I am concerned by the Molecular dynamics protocol. An orthorhombic box of 10 Å around the protein molecule was used for simulation of a potentially flexible protein-protein complex. Such a small gap between the edge and the box means that anticipated relative movement of stacked subunits during the simulation would cause them to interact with their images in neighboring cells (assuming that periodic boundary conditions were used, which, I think, was not clearly stated, but implied by the protocol). One of many potential side-effects of such small-box setup could be artificial overstabilization of the complex. If indeed the authors expect their complexes to implement intrinsic conformational flexibility, a box with 30A or so should probably be used instead. Even such a huge box would not be sufficient if subunits move too much, but in the absence of significant major rearrangements in the protein-protein complexes such box would do to present a valid simulation, ruling out artificial over-stabilization. Authors should clarify their simulation protocol and the choice of this crucial parameter, or perform additional simulations.
I wonder what was the rationale for selecting OPLS3e force field for modeling, given the two key aspects of the study: (1) the need to model the expected conformational plasticity and (2) the need to model crucial interactions involving charged residues (R2577 in particular)?
ProPKA version should be stated, as different releases of this particular software are known to give significantly different results.
The "Results" section of the main text mentions the term "docked" / "docking" multiple times. The authors should clarify the meaning of the term, as it was not immediately clear to me. Does it refer to a predictive computational method called "molecular docking", or does it refer to a "stacking" of protein subunits observed in crystallographic complexes obtained experimentally.
There is a section called "Supplementary Discussion" discussing a computational investigation of the catalytic mechanism. I was neither able to quickly find a reference to this section in the main text, nor the QM calculations were discussed in the Methods. Thus, I was not able to immediately understand the value of this supplementary section, its results and relevance to the study.
Reviewer #4 (Remarks to the Author): In "Understanding condensation domain selectivity in non-ribosomal peptide biosynthesis: structural characterization of the acceptor bound state", Cryle and colleagues present a nice series of structures of a PCP-C didomain, in which the PCP domain is docked at the acceptor site of the adjacent C domain. Structures in which the PCP is loaded with apo PPant show this moiety to curl away from the active site, but when a nearby R2577G is mutated, or when a propylamine representing glycine is attached, the PPant enters the active site.
The authors set up a useful PCP2-C3-PCP3 system with Spycatcher to specifically load PCP2 and PCP3 with different PPant moieties and use it to present Gly, Ala, Leu or Phe to the C domain, and to analyze three C domain mutations.
Overall, the manuscript is well written and scientifically sound. It is not clear from the data presented whether some specific points, such as the putative gating role for R2577, are a general feature of NRPS biology, but this paper will be a nice addition to the NRPS literature.

Specific comments and questions:
About R2577: SI Figure S6: R2577 is conserved in 70% of LCL domains. Can the authors comment on why they think the next most common residues are G, A, Q and S? Also, what is the conservation in other Ctype domains, for example Cyc domains or starter C domains? It seems odd that this is not conserved in DCL domains given that the acceptor substrate has L chirality.
P8: " LCL* " What does the asterisk signify here? P10: "R2577 now forms specific interactions with two of the carbonyl oxygen atoms in the Ppant arm (3.7 Å and 3.8 Å)" These are long distances and would be quite weak interactions.
P13: "One hypothesis for the role of this residue would be to prevent the unwanted "pass-through" of donor substrates without elongation (e.g. from PCP2 to PCP3)." Is pass-through a likely event, given that the nucleophile in the pass-through reaction (PPE thiol) is 3 atoms away from the nucleophile peptide bond formation (amino group). If this were an important mechanism, pass though should be observable in the R2577G mutant. Is it?
About the position of PCP-PPant-Glystab: P10: "is its close proximity (3.6 Å) to the amino group of the Glystab moiety" Similar to comment above, 3.6 Å is too far for deprotonation, a shift of some kind needs to be evoked.
P6: "The overall orientation of the PCP domain relative to the C domain is similar to what has been observed in the structures of SrfA-C 12 and ObiF1 (PDB ID 6N8E)8 (SI Figure S2A-B)" Are there crystal contacts involving the PCP domain, other than the PCP interacting with the acceptor site of the C domain? This is asked because there are some crystal contacts in the PCP domain of SrfA-C, and the packing in ObiF1 prevents its PCP from being able to assume the position seen in AB3403 and LgrA. It would be good to be able to state that PCP2 is only interacting with C2 at the acceptor site. Figure 5: The glycine analog is missing its carbonyl group to make it will be stable. Would a bone fide PPant-glycine be able to assume the position observed? Figure 5b makes it appear like the carbonyl would clash with H2697.
P10: "It is important to note that Glystab sits in a different position to the aminoacyl mimic in a previous model of a C domain bound to the acceptor substrate -in these structures the aminoacyl mimic does not enter into the active site as far as observed in our GlyStab-PCP2-C3 complex.11" Please elaborate with a more quantitative description, or preferably, a supplemental figure.
P10: "A significant energy barrier is observed for proton transfer from the zwitterionic intermediate to the imidazole group of the active site histidine residue, suggesting the mechanism of peptide bond formation in C domains relies on specific base catalysis. This may explain why the mutation of this central histidine residue does not completely abolish activity in some C domains, as an active site water molecule could instead play the role of an alternate specific base." This passage seems confusing -how can there be multiple specific bases? Also, I believe the suggestion that water can accept a proton in C domains is similar to the conclusion of reference 11, so that should be cited here.
Other: Figure S11 / Figure 6: PPant ejection assays are notoriously difficult, so it is not surprising to see somewhat noisy mass spectra, and the deuterium shift is a welcome control. However, given the background in Figure S11, a more detailed description of how the quantitation that led to the percentages listed in Figure 6 is warranted, as is inclusion of the mass spectra for those other experiments in Figure 6. (Note the second P is not capitalized in the title Ppant ejection) Abstract -"we report the first structural snapshots", "previously uncharacterized" -Most journals do not allow primacy claims We are very grateful for the time and effort shown by these reviewers and appreciate their comments that we have incorporated into our revised manuscript. We have addressed all the points that were raised, and we believe that this has certainly improved our revised manuscript.

Reviewer #1 (Remarks to the Author):
The manuscript by Izore and colleagues describes the crystal structure of the condensation domain from the fuscachelin A biosynthetic protein FscG. The authors describe several structures of the PCP2-C3 didomain construct as well as the PCP3 domain. Importantly, the asymmetric unit of the didomain structures contains two independent molecules in which the PCP of one chain is docked in the acceptor site of the condensation domain of the other molecule. Thus, the structure illustrates the positioning of the holo-PCP in the acceptor site of the condensation domain. While other similar structures have been determined, the critical advance of this manuscript is that the authors have loaded the PCP with the amino acyl thioether analog of the pantetheine, thus demonstrating for the first time the structure of a holo PCP bound in the condensation domain acceptor site that is loaded with its amino acid substrate.
This allows the authors to make several observations. First, the authors use bioinformatic analysis to demonstrate that there is not an obvious "specificity pocket", in contrast to what is seen with the NRPS adenylation domains. Additionally, the authors explore the role of several residues in the pantetheine pocket and the active site. Arg2577 is proposed to play a role in gating the pantetheine tunnel to control access to the active site. Glu2702, which replaces the canonical glycine of the C-domain HHxxxDG motif, is proposed to position the acceptor amino acid in the condensation domain for small amino acid substrates. The more common glycine is used with larger amino acids, where interactions with the substrate side chains can probably better position the α-amino group for the condensation reaction. This paper is an important addition to our knowledge of the structural mechanisms of NRPS condensation domains. The structures are well determined and supported by functional and bioinformatic analysis. I have a few suggestions the authors may wish to consider to help clarify the writing and analysis in a few places.
Thank you very much for this positive review! Pg 4. The sentence within the last paragraph of the introduction "The C domain is shown to be tolerant of a range of aliphatic acceptor amino acid acceptor substrates, with the limited acceptance rationalized..." is a bit of a contradiction (tolerant/limited acceptance). Perhaps include the word "small" to read "...tolerant of a small range ...". (Also note duplication of "acceptor") Both changes have been made as suggested. Figure 2 and Pg 7, line 1. I tend to think of symmetry-related in crystals to refer to interactions between asymmetric units. Thus I do not think that molecule B should be highlighted as "symmetry" nor described as docked to a "symmetry-related" C3 domain (pg 7). The description of Pg 5, 3 lines from the bottom, is more accurate.
We have adjusted the figure and the text to refer to remove reference to symmetry-related molecules, and instead refer to these as the second chain in the asymmetric unit. . Apo and holo are generally used to describe a protein +/-a cofactor. Thus, using "apo" in panels A and B is not accurate, with "unloaded" being correct (as in the legend). Please also list contour level and any carve radius in the legend.
The figure has been adjusted as suggested, and the missing information included in the legend as requested.
Figure 5, panels A and B. Glu2702 is mislabeled as 2707. This is incorrect in the legend as well.
Thank you for spotting this mistake -we have corrected it in both the figure and the caption.
Pg 11. The authors should provide a few sentences to provide a brief description of the Spycatcher/Spytag system so that the reader does not need to go the reference to know whether this a short peptide, two large protein domains, or a biotin/stretavidin based system.
We have included a brief description of this system to provide the reader with the necessary background to this useful protein ligation technique.

Supplementary Material
A table of contents for the SI can help guide the reader.
We have added a table of contents for the SI as suggested.
Table S1. The Resolution entry for all four datasets is formatted as "Low Resolution limit (High Resolution limit)". While everyone would understand what is meant, the format is a bit unusual as in every other entry of the table, the entry not in parentheses is the data for the complete data set and the entry in parentheses is the highest resolution shell. Resolution data should simply be presented as "48.14 -2.18" or if preferred "48.14 -2.18 (2.3 -2.18)".
Updated as suggested.  The aligned halves in this figure were the C-terminal halves; this has now been clarified in the legend.
Figures S8, S9, and S10, detailing the plausible mechanisms from the supplementary discussion are not present in the SI but rather are part of Figure 5 of the main text.

Thank you for picking up this oversight -we have corrected these errors in figure citation.
Please consider expanding the Legend to Figure S14, including a label for the X-axis. I understand the motivation of this figure and believe I understand how the information is presented but it took me a while to interpret this.
Thank you for this point; we have adjusted this figure to include an X-axis label and extended the legend.

Reviewer #2 (Remarks to the Author):
This is a paper that reports on the structure of an NRPS module with the aminoacyl group (stable analog) bound. This is one of the missing structures in the understanding of NRPS assemblies. It was also so good to see that PPant ejection still being used. I discovered it in 2007 and published it in 2008 before starting my own career so it brings back some wonderful memories.

It is an incredibly useful technique -thank you!
It was an enjoyable read. Some comments and questions. All are minor and even if the authors have no answer, the paper would still be appropriate for NComms.
Thank you very much -we appreciate your comments and insights. We have adjusted the figure as suggested; we have also changed the colour scheme in Figure  6 along the same lines. We have added this overlay in a new panel in Figure 4 as suggested.
This is a bit of an open-ended question but this could have a big impact of there was such a trend and it may be possible to answer these questions with this unique structure. Are there conserved residues in the C domain that bind PPant?, are there conserved residues in the C domain that define Acyl selectivity? I see that you did a pocket analysis. It is not entirely clear what this is and what the training set was but perhaps look for two details. 1) are there specific residues associated with the active site depending on the substrate that is loaded. 2) are there polarities of residues that define substrate specificities. The tricky part is that you have to half the active site in the donor and acceptor region and then ask these questions. It may be that this is currently not possible to identify if there is a true donor or acceptor region in the C domain with the currently available structures but if it is then it could be very interesting.
Thank you for prompting us to do this -we have explored the conservation of the PPant interacting residues (shown in SI Figure S5), which found similar trends to that seen with R2577, i.e. the conservation of these residues is broadly conserved depending on the stereochemistry selectivity shown by the C domains (LCL and DCL). This analysis has been added to the SI in SI Figure S11 and is also briefly discussed in the main text. To clarify the analyses performed, we have added additional text in the manuscript concerning our analyses of the C domain pocket residues, which did not indicate any patterns of correlation. Due to the highly variable structure of the donor substrates and a lack of structural data concerning how these are accommodated within a C domain, we did not attempt to analyse the donor site as we did for the acceptor site.
"This forced us to explore alternatives to thioester-tethered amino acids, and we chose to use an analog of the aminoacyl-CoA with a thioether in place of the reactive thioester." I see from the structure that the carbonyl is also missing, can the authors clarify what the actual non-hydrolyzable thioether is in this section? (I saw the structures are shown in the SI but would be nice to mention what the mimic is in this section.) Perhaps just change to the following "….a thioether, missing the reactive carbonyl, that makes the thioester susceptible to nucleophiles." We have adjusted the description here as suggested.
Regarding HHXXXDG vs HHXXXDE, is there a trend among substrates that are connected based on sequence gazing the understanding of the new structure and the type of amino acids or acids that are coupled? I know you tested the E to G mutation in vitro but in vivo do you think the condensations become more promiscuous? I.e. is this another gatekeeper for selectivity for condensation build into the system post A domain activation? I know the discussion has a description of selectivity but limited in the context of the condensation rxn. Figure  S16), we see that there is a shift towards smaller acceptor substrates when the G is replaced with another (larger) residue. We see from our in vitro studies that the replacement of the E with G does decrease activity in this specific case, and our computational substrate docking studies support the role of the E in stabilising the desired pose of the acceptor amine group.

However, as the G to E mutation is relatively rare (it is more commonly A or H) we do not as of yet fully understand role of different mutants at this position.
Figure S11, I would not call it HRMS. As the resolution is quite poor compared to FT instruments and the isotopes for proteins are not detected. The mass accuracy is OK not amazing but works. Maybe just specify the resolution of the MS instead so you don't have to specify HRMS. The numbers of the masses have one too many sig figs for the PPant ejections. For example, 621.3177, found: 621.3049 this is 0.128 Da difference. This should be 621.32, found: 621.31 based on this mass accuracy. The deuterated experiment is a nice confirmation that eliminates the need for the super high mass accuracy.
We have adjusted the manuscript in line with these points and are very happy to hear that you agree with the use of the deuterated sample to aid in the clarity of these experiments.
The paper does not have a data availability section in the paper. Coordinates, MS data, etc.

Thank you for pointing out this oversight! We have added this section.
Of course, due to my personal bias I would also encourage to make the MS/MS spectra of all the synthetic compounds publicly available not just as copies in a PDF.
We have uploaded this data now to the ProteomeXchange Consortium via the PRIDE partner repository to make access to the data available.

Reviewer #3 (Remarks to the Author):
The paper by Izoré et al. presents a state-of-the-art multidisciplinary research of C-domain structure and selectivity in non-ribosomal peptide biosynthesis. Four novel crystallographic structures are reported with a resolution of ~2 A. Expert analysis of these structures followed by biochemical characterization, site-directed mutagenesis, bioinformatic analysis and molecular modeling provided novel insights into the roles of key residues of the C-domain. Most notable is the new understanding of the role of the HHxxxDE pattern and R2577. A crystallographic structure of C-domain in complex with an acceptor PCP-domain bearing a substrate is reported for the first time, thus shaping further steps towards our understanding of structure-function relationship in NRPSs and related proteins.
The take-home message of the presented manuscript is clear. I believe, the new crystallographic structures enriched by extensive multidisciplinary analysis have the capacity to make an immediate contribution to mechanistic understanding of non-ribosomal peptide biosynthesis, though some parts of the work require clarification or revision.
My criticism is focused on the handling of the aspect of conformational plasticity of NRPSs and their complexes. As authors clearly state in the introduction "NRPSs are highly flexible and the interactions between individual domains change during the process of chain assembly". In this context, my major points are as follows: 1. First and foremost, if NRPSs and their complexes are known or expected to be flexible, then how was it even possible to characterize the reported PDB structures?
The confusion here seems to have arisen due to our poor choice of language in the sentence "NRPSs are highly flexible and the interactions between individual domains change during the process of chain assembly". We have now clarified this point and replaced this sentence with the following: "NRPS complexes are highly flexible, with domains connected by flexible linkers that allow the interactions between them to change during the process of chain assembly. However, the individual domains (and certain didomain complexes that represent meta-stable points along the catalytic pathway) are less dynamic and can be more readily studied by methods such as X-ray crystallography." What novel approach was used in the work that made it possible to characterize such complexes?
While crystallography on complete NRPS complexes (i.e. with all NRPS modules) would be complicated by the flexible arrangement of the modular domains, by focusing on only the reasonably stable PCP-C 3 didomain construct we were able to crystallize and characterize this complex.
2. If NRPSs and their complexes are flexible, then I would expect a series of conformational variants to exist in equilibrium. Then, authors also continuously refer to their PDB models as "structural snapshots". What conformational and functional states of the complexes are reflected in the reported PDB "snapshots" compared to the conformation ensemble of NRPSs and their complexes? Bioinformatic comparison of the obtained complexes with related PDB structures and also molecular modeling was reported, mostly in Supplementary material, but I did not immediately find the answer to my questions. Thus I believe this issue was not address sufficiently clear and should be presented, possibly, as a dedicated section. How flexible are the reported complexes? How representative these "static" models are in "dynamic" context of conformation ensemble of NRPSs and their complexes? 3. Authors report molecular docking and transition state calculations. These are well-known and widely used approaches, but they all demand a single "static" high-quality model as input. In practice, this usually means that the protein in question should be relatively "rigid". I had an impression that authors used the original "static" PDB models as input to these methods. Since NRPSs and their complexes are considered highly flexible, I wonder about the validity of such approach. Either there is a justification for use of "static" models of the complexes in question for docking and QM, or additional work should be performed, e.g. docking into several structural states and snapshots obtained from PDB or molecular dynamics.
Molecular Docking. The reviewer is correct that the substrate-docking calculations were performed based on the static PDB models of the C-domain (obtained from the 1.90 Å Gly stabloaded crystal structure structure). As the review mentions, the use of a rigid receptor is a "well-known and widely used" approach for trying to identify key residues involved in substrate binding (i.e. to be used to guide or support experimental data), as we were doing here. We do not believe it is necessary to perform substrate-docking calculations using an ensemble-based approach for several reasons:

Regarding the dynamics of the C domain:
As the reviewer correctly mentions, molecular docking "demands a single "static" high-quality model as input". We believe that the structure of the C domain used for molecular docking was both high-quality (1.90 Å) and rigid enough to not require ensemble-based approaches. Certainly, our MD simulation initiated from the C domain structure did highlight fast protein motions (ps -ns timescales), including side chain motions (Figure S5  a,b) and small amounts of "breathing" between the two halves of the C-domain (Figure S5 c,d) To clarify this, we have expanded the text on p. 10 to read as follows: "In order to determine whether the intrinsic mechanistic preference of the amide bondforming reaction is stepwise or concerted, we calculated the reaction of a model donor, acceptor, and imidazole base in solution with density functional theory ( Figure 5C, see Supplementary Discussion for details of the mechanistic investigation). The attack of the model amine on the thioester strongly prefers a stepwise mechanism in which N-C bond formation precedes N deprotonation by the imidazole, rather than a concerted mechanism in which these two events take place simultaneously. Therefore we predict that the enzymecatalyzed amide bond formation likely involves a similar sequence, with a distinct zwitterionic (oxyanion/ammonium) intermediate ( Figure 5D)." Other issues:

Regarding the PDB structures
The four crystallographic models present primary result of the work, and yet the respective PDB codes were not mentioned even once in the main text. Instead, the codes and explanation of their contents are buried deep in the Supplementary data. I would like to see the PDB codes and a brief explanation of contents of the respective 3D-structures in a dedicated section of the main text. The figures obtained using the original models should also cite their codes. I even suggest to additionally provide a list of the codes in the abstract, for convenience of the users.
We have now added the PDB code references for our structures in the main text. As the structures are all introduced during the manuscript, we feel an additional paragraph listing this is not required.
Then, unless there is some specific rule against it that I am not aware of, I would like to see all four PDB structures (i.e. files which I can load into PyMol and view in 3D) before submitting my final decision. In recent years, there is a growing amount of PDB entries containing obvious human-visible errors. The formal validation reports are interesting, but without the PDB models themselves they do not show the full picture. So I believe an extra check of PDB files by independent pair of eyes will be useful to both the authors and further users of this data.
We have happily included these files in the revision for the reviewers to access if they so desire.

Bioinformatic analysis
A multiple sequence alignment tool was used to superimpose sequences of individual domains for bioinformatic analysis. Sequence alignment is limited to relatively close homologs with a high level of pairwise sequence similarity (at least ~30-40% is needed to hope for a meaningful comparison, as everything below 40% is commonly considered as the "twilight zone" of sequence alignment). However, the level of sequence identity between the aligned sequences was not stated in Supplementary material (Bioinformatic methods section). The level of pairwise sequence similarity should be stated. If it resides below 40%, other methods should probably be used (e.g. 3D-alignment, profile-based alignment, structureguided sequence alignment, etc.), or additional justification should be provided.
C domains have been shown to be highly conserved homologs (Rausch et al, 2005doi: 10.1093/nar/gki885 & Ziemert et al, 2012doi: 10.1371, so the MSAs are valid. We nevertheless calculated the average distances of the conserved areas of the C domains to be 49.4%.
In "Understanding condensation domain selectivity in non-ribosomal peptide biosynthesis: structural characterization of the acceptor bound state", Cryle and colleagues present a nice series of structures of a PCP-C didomain, in which the PCP domain is docked at the acceptor site of the adjacent C domain. Structures in which the PCP is loaded with apo PPant show this moiety to curl away from the active site, but when a nearby R2577G is mutated, or when a propylamine representing glycine is attached, the PPant enters the active site.
The authors set up a useful PCP2-C3-PCP3 system with Spycatcher to specifically load PCP2 and PCP3 with different PPant moieties and use it to present Gly, Ala, Leu or Phe to the C domain, and to analyze three C domain mutations.
Overall, the manuscript is well written and scientifically sound. It is not clear from the data presented whether some specific points, such as the putative gating role for R2577, are a general feature of NRPS biology, but this paper will be a nice addition to the NRPS literature.

Thank you very much for your time and helpful comments!
Specific comments and questions: About R2577: SI Figure S6: R2577 is conserved in 70% of LCL domains. Can the authors comment on why they think the next most common residues are G, A, Q and S? Also, what is the conservation in other C-type domains, for example Cyc domains or starter C domains? It seems odd that this is not conserved in DCL domains given that the acceptor substrate has L chirality.
We do not yet fully understand the significance of different potential mutations at this position; however, we do also now see that the PPant interacting residues also seem to follow the same trend. We performed an analysis of the R2577 position in starter C domains (152) as suggested and have incorporated this into SI Figure S6) P8: " LCL* " What does the asterisk signify here?
Apologies for the confusion -this asterisk refers to the footnote where this nomenclature is described.
P10: "R2577 now forms specific interactions with two of the carbonyl oxygen atoms in the Ppant arm (3.7 Å and 3.8 Å)" These are long distances and would be quite weak interactions.
We have altered the phrase from "specific interactions" to "weak interactions" as suggested.
P13: "One hypothesis for the role of this residue would be to prevent the unwanted "passthrough" of donor substrates without elongation (e.g. from PCP2 to PCP3)." Is pass-through a likely event, given that the nucleophile in the pass-through reaction (PPE thiol) is 3 atoms away from the nucleophile peptide bond formation (amino group About the position of PCP-PPant-Glystab: P10: "is its close proximity (3.6 Å) to the amino group of the Glystab moiety" Similar to comment above, 3.6 Å is too far for deprotonation, a shift of some kind needs to be evoked.
We agree -as the His isn't close enough to deprotonate the amino group as it attacks the carbonyl group. However, once the tetrahedral intermediate is formed the His is close enough to the amino group for proton transfer to occur, as our calculations show that C-N bond formation alters the position of the amino group relative to the His.
P6: "The overall orientation of the PCP domain relative to the C domain is similar to what has been observed in the structures of SrfA-C 12 and ObiF1 (PDB ID 6N8E)8 (SI Figure  S2A-B)" Are there crystal contacts involving the PCP domain, other than the PCP interacting with the acceptor site of the C domain? This is asked because there are some crystal contacts in the PCP domain of SrfA-C, and the packing in ObiF1 prevents its PCP from being able to assume the position seen in AB3403 and LgrA. It would be good to be able to state that PCP2 is only interacting with C2 at the acceptor site. . Therefore, we feel that any minor artifacts in the PCP-orientation that may be caused by crystal packing will not impact on the primary conclusions of this work.

Supporting Figure (for reviewers only). Crystal contacts involving the PCP-domain.
Shown here is the interface between one PCP-domain and the neighbouring C-domain within the same asymmetric unit (from the WT PCP 2 -C 3 PPant structure). Similar packing interactions are observed for the other PCP domain within the asymmetric unit, as well as in the other two PCP-C 3 complex structures reported in this work. Figure 5: The glycine analog is missing its carbonyl group to make it will be stable. Would a bone fide PPant-glycine be able to assume the position observed? Figure 5b makes it appear like the carbonyl would clash with H2697.
While we do not have a crystal structure of with PPant-glycine, we have performed computational docking of the "bone fide" PPant-glycine into the structure of the C3 domain; this provides some insight into how the Ppant-glycine would likely dock in the C3 domain (SI Figure S14). In particular, the reviewer was concerned that the carbonyl group would clash with H2697. The top poses obtained during computational docking of the alternate substrates (including Ppant-Gly) does highlight some flexibility in the position of carbonyl group of the Ppant as can be seen in Figure S14; sometimes the carbonyl points towards His2697, but in other poses the carbonyl points towards the opposite side of the tunnel (e.g. towards Met2917). We therefore believe the bone-fide Ppant-glycine would bind in a very similar way to Glystab-Ppant, but that it is possible that the Ppant arm would bind in a slightly different orientation to allow for additional room for the carbonyl group if needed.
P10: "It is important to note that Glystab sits in a different position to the aminoacyl mimic in a previous model of a C domain bound to the acceptor substrate -in these structures the aminoacyl mimic does not enter into the active site as far as observed in our GlyStab-PCP2-C3 complex.11" Please elaborate with a more quantitative description, or preferably, a supplemental figure.
This is an excellent suggestion, and we have added a comparison of these two structures as a new figure in the supplementary information (SI Figure S12).
P10: "A significant energy barrier is observed for proton transfer from the zwitterionic intermediate to the imidazole group of the active site histidine residue, suggesting the mechanism of peptide bond formation in C domains relies on specific base catalysis. This may explain why the mutation of this central histidine residue does not completely abolish activity in some C domains, as an active site water molecule could instead play the role of an alternate specific base." This passage seems confusing -how can there be multiple specific bases? Also, I believe the suggestion that water can accept a proton in C domains is similar to the conclusion of reference 11, so that should be cited here.
We agree that this was confusingly written -the nomenclature of specific vs general base catalysis follows classical physical organic chemistry nomenclature and we have re-written the relevant parts of the manuscript to concentrate simply on the results of our studies (in terms of C domain mechanism) rather than descriptions using the specific/ general terminology. The reference has been added as indicated, thanks for pointing this out! Other: Figure S11 / Figure 6: PPant ejection assays are notoriously difficult, so it is not surprising to see somewhat noisy mass spectra, and the deuterium shift is a welcome control. However, given the background in Figure S11, a more detailed description of how the quantitation that led to the percentages listed in Figure 6 is warranted, as is inclusion of the mass spectra for those other experiments in Figure 6. (Note the second P is not capitalized in the title Ppant ejection) The experiments we used to quantify the conversion of tripeptide into tetrapeptide shown in Figure 6 were not performed using PPant ejection. Instead, the analysis of these reactions was performed using HRMS/ MS2 on samples that had been cleaved from the PCP domains by the addition of methylamine. Examples of the analyses for each of these reactions is shown in SI Figure S28-S33, which we now have added references to in the caption of Figure 6 and also the main text. Furthermore, we have now uploaded all HRMS data to the ProteomeXchange Consortium via the PRIDE partner repository to make these generally available. We also fixed the error in the title of SI Figure S11 -thanks for spotting it! Abstract -"we report the first structural snapshots", "previously uncharacterized" -Most journals do not allow primacy claims Thanks for pointing out our oversight -we've modified the abstract to remove these.