A framework for identifying the recent origins of mobile antibiotic resistance genes

Since the introduction of antibiotics as therapeutic agents, many bacterial pathogens have developed resistance to antibiotics. Mobile resistance genes, acquired through horizontal gene transfer, play an important role in this process. Understanding from which bacterial taxa these genes were mobilized, and whether their origin taxa share common traits, is critical for predicting which environments and conditions contribute to the emergence of novel resistance genes. This knowledge may prove valuable for limiting or delaying future transfer of novel resistance genes into pathogens. The literature on the origins of mobile resistance genes is scattered and based on evidence of variable quality. Here, we summarize, amend and scrutinize the evidence for 37 proposed origins of mobile resistance genes. Using state-of-the-art genomic analyses, we supplement and evaluate the evidence based on well-defined criteria. Nineteen percent of reported origins did not fulfill the criteria to confidently assign the respective origin. Of the curated origin taxa, >90% have been associated with infection in humans or domestic animals, some taxa being the origin of several different resistance genes. The clinical emergence of these resistance genes appears to be a consequence of antibiotic selection pressure on taxa that are permanently or transiently associated with the human/domestic animal microbiome.


Overall impression of the work
The manuscript is well written, interesting, and tackles relatively successfully an important question in antibiotic resistance.I cannot easily think of another source for this type of codified pipeline for approaching and validating the origins of ARGs.Where the manuscript could still benefit is in its explanations for certain decisions and the definitions it uses.For example, why are 653= 9640 ,5:2-26:2.9,5/ %*(9 0<73680/ ":10 >$3,.:,4,909#%)'9# )&*9# 0:.-;: 56: &%+9 68 efflux mechanisms?), what is meant by recent ARG origin (evolutionarily recent?Recent since the clinical use of antibiotics?), and others (see specific comments below).While I believe it is important to address these, I overall find the approach and topic of this manuscript compelling.Specific comments 1.The authors describe their analysis pipeline in good detail and in some cases (e.g.line 101) describe their reasoning for setting numerical cut-offs.However, in other cases (e.g.lines 105, 117) it appears that numerical cut-offs are arbitrary.Could the authors respond why these cut-offs were used and potentially add text to the manuscript that describes how the results of their pipeline change (or don't change) using stricter or more relaxed cut-offs?This would speak to the robustness of the analysis pipeline and the major conclusions of the manuscript.
2. I was surprised and interested to see the conclusion that essentially all ARGs originate from Proteobacterial pathogens when my understanding has been that antimicrobial producing bacteria (largely non-Proteobacterial) are likely the originators of many ARGs (e.g.Benveniste and Davies, 1973;Perry, Waglechner, Wright 2016 and many other Wright lab papers).The authors state in lines 326-331 that they know ARG origins are in Proteobacteria because they are so well studied, but it seems like this could be backwards, that because the databases used have a strong pathogen/Proteobacterial bias this might act as a confound in the analyses.
3. Related to comment #2, the authors could also clarify a bit more what they mean by recent origins of ARGs.Reading the manuscript it seems like the authors change a little in their usage, going from where an ARG originates in the environment then shifting to where an ARG first becomes mobilized.i.e. an ARG originates from some anonymous soil bacteria but is captured by an Acinetobacter sp. which sticks it into a plasmid; which of these two is really the origin depends on a precise definition which is not clearly given.A schematic figure might help? 4. The discussion of mobilizable ARGs seems incomplete without mention or reference to the many findings from functional metagenomic studies that explicitly test mobilization and frequently capture mobilization elements (e.g.Pehrsson et al. 2016 identical TEM bla found across multiple or Forsberg et al. 2014 showing pathogen ARGs to be closely associated with mobilizable elements).
5. Did the authors search for evidence of phage genes as well as evidence for plasmids, transposons, IS, etc? What sort of genomic signal would ARG mobilization by transduction leave and are there examples where this has occurred?6. Could the authors clarify more how they arrived at this particular set of antibiotic/ARGs to study.It is noted (lines 293 -296) that tetracycline resistance was omitted due to lack of origin hypotheses, is that the case for tetX homologs as well?I am also curious why chloramphenicol and the widespread cat and phosphotransferase genes are not discussed?
Reviewer #3 (Remarks to the Author): I read the manuscript from Ebmeyer et al titled "A framework for identifying the recent origins of mobile antibiotic resistance genes" with great interest.The authors have re-evaluated the evidence for claims of the origin of mobile ARGs by appealing to both the literature and examination of available genome synteny and by synthesizing the approaches into a set of criteria.I believe this is an important and timely topic of interest to many readers in antimicrobial resistance and has the potential to be cited by others in this area.As the manuscript stands, I cannot recommend publication without some additional rigor in the methods and materials section (see specific comments) that would enhance reproducibility and allow interested parties to follow along more closely.In particular, phylogenetic trees were produced, but it is not clear if these are trees of genomic contexts (nucleotide data), or ARG sequences (protein sequences), or both, but there is no mention of these trees in the results.Similarly, virtually none of the genomic synteny results are presented except in summary, making these results difficult to evaluate -can the genomic synteny results be provided as supplementary data?I have no sense for how many loci or different species are represented for each ARG.The discussion firmly situates this work in the larger context of the (recent) origins of ARGs.Given the that the ARGs with putatively identifiable origins are restricted to Gram-negative organisms in general, and Proteobacteria specifically, I believe it would be useful for the authors to comment on whether these criteria may be modified to accommodate the 96% of mobile ARGs for which origin cannot currently be assigned.

Specific comments:
Lines 27-33: I would like some additional citations here -I am not familiar with ISCR and it would be helpful to be pointed to some reviews of this topic to orient the reader.Line 43: Dan Andersson, in particular, has contributed much to this topic in the literature.Of particular relevance for this you may cite Andersson and Hughes Nat Rev Microbiol 2010;8(4):260-271.Lines 73-79: How many articles were identified by this literature search procedure?How many matching articles were retained?The reference list in the supplementary file 1 contains 77 citations.Having performed a similar search in the past, I have noticed that many articles using the phrase 'origin' report no such thing.Lines 81-86: How often did an article claim to source the origin of an ARG without being associated with a MGE? Line 92: is the pipeline available for review?Line 95-96: How were these novel ARGs identified?In the supplementary file?Line 95: What components of CARD were downloaded?I assume the protein sequences of all antibiotic resistance determinants?I believe these are also versioned, so which version was used?Similarly, the ResFinder database is also versioned through the bitbucket repository (https://bitbucket.org/genomicepidemiology/resfinder_db.git), which version was used?Line 97: the 'resulting database' is the intersection of CARD and ResFinder db by alignment, or the union, to include everything in at least one of the databases?I want to understand what the mapping/alignment procedure between the two databases accomplished.Line 98: to clarify, 80% query coverage cutoff?Line 99-100: these cutoff criteria are relaxed with respect to the database vs database cutoff criterion, were you only including hits in closely related genera, or more distant taxa?It's very curious to me that the resulting set were restricted entirely to Proteobacteria.Line 100-102: Genbank assemblies vary greatly with respect to contiguity and completeness -did you have any criteria for excluding highly fragmented assemblies?10kbp upstream and downstream might not be available in such assemblies, particularly if your goal is to identify ARG associations with IS and Tn sequences which can be highly repetitive and difficult to assemble (and subject to a higher chance of misassembly).Line 120: FastTree or FastTree2?Which version, and what parameters were used to estimate these trees (substitution model, CAT approximation and categories?)Why produce trees only to never refer to them in the remainder of the manuscript?Were the datasets so large that nonapproximate ML were infeasible?Line 148-149: I appreciate the distinction made between evolutionary time and functional time.I fear some readers might be confused between the origin of a gene, or the origin of a gene as a mobilized ARG which I believe is the intention in this manuscript.There is some debate in the literature between the idea of a 'proto-resistance' gene (Morar and Wright.Annu Rev Genet 2010;44:25-51) and a resistance gene being strictly a gene that confers resistance to a clinically relevant antibiotic in the context of a pathogen (Martínez et al, Nat Rev Microbiol 2015;13:116-123).Line 157: I find this idea intriguing, that a potential ARG may be mobilized within a restricted set of taxa Line 177-179: In your view, are these criteria, particularly #5, likely to hold true outside of Proteobacteria, and what constitutes independence in related taxa?See Pawlowski et al Nat Comm 2016;7:13803 for an investigation of the genetic context/synteny of ARGs in Paenibacillus.Line 202: Unsurprising, Proteobacteria are not known as producers of the antibiotics that are or derived from natural products.Fluoroquinolones are synthetic, however, so we would never expect any of these to be producers and this is worth pointing out.Line 208-209: Given the historically biased nature of the sequence databases towards organisms of medical interest, this isn't surprising.Line 250-251: I do not know if the authors can say these are correct determinations, only that their criteria support the initial determination of origin.Line 340: Can you provide an example, citation of this?Most of the examples in Table 1 have origins in a particular species, but several only have origins in a genus.Are the factors preventing inference of origins related to the level of sequence diversity of the ARG, the level of genomic diversity in host species, the degree to which an ARG has proliferated outside of its origin taxa, or something else? Line 369: all to data -> all to date  I found all of my comments to be thoroughly addressed and think that the revised manuscript is significantly improved in its clarity.I do think it might be worthwhile for the authors to read and possibly incorporate/discuss (potentially in the context of line ~373) Jiang et al 2017 Nat Comm "Dissemination of antibiotic resistance genes from antibiotic producers to (though it is possible the authors have already done this and I missed it).
Aside from this I only noted a few typographical errors and think the manuscript is in very good shape.
Line 108 less -> fewer Line 222 tetX -> TetX Line 293 need an 'is' in there Line 302 has -> have Line 336 an ARG -> a mobilizable ARG Line 370 manynovel -> many novel Line 399 Delete one instance of 'often' Reviewer #3 (Remarks to the Author): Having read the revised manuscript from Ebmeyer et al I would like to express appreciation for the effort the authors put into their revisions.The additional details, particularly to the methods and provision of the pipeline will clearly help readers follow along/reproduce this work.
Specific comments related to numbered points in the rebuttal: 10-11.I think these answers, in combination with the response to point 4 for referee #2, adequately clears up some confusion about the work the word 'origin' is performing -to recognize an ARG's origin as a mobilized element vs where (and when) an ARG first arises.Literature claiming 'origins' sometimes rely on preconceived notions of the direction in which genes move.If the perception is that a pathogen species is the destination of an ARG, sometimes the nonpathogen species is automatically considered the origin even in cases where the gene is not associated with a mobile element.
12. I appreciate that no claim was made for the presentation of production quality software, but in as far as the logic of the methods are represented by steps in software I find it sometimes easier to follow along, so much thanks.17.As referee #2 raised the same issue in point 3, it seems clear to me this is a bias towards the increased diversity and attention paid to Proteobacteria in the sequence databases.20.This was a lot of work, and I appreciate this greatly, thank you.
1.I found all of my comments to be thoroughly addressed and think that the revised manuscript is significantly improved in its clarity.I do think it might be worthwhile for the authors to read and possibly incorporate/discuss (potentially in the of line ~373) Jiang et al 2017 Nat Comm "Dissemination of antibiotic resistance genes from antibiotic producers to pathogens" (though it is possible the authors have already done this and I missed it).
We are glad to see that the reviewer is happy with our revisions.In response to this comment, we have incorporated a short discussion on the results of the mentioned study in line 227-232.
2. Aside from this I only noted a few typographical errors and think the manuscript is in very good shape.
Line 108 less -> fewer Line 222 tetX -> TetX Line 293 need an 'is' in there Line 302 has -> have Line 336 an ARG -> a mobilizable ARG Line 370 manynovel -> many novel Line 399 Delete one instance of 'often' We are thankful the reviewer pointed out these errors and have corrected them.
Reviewer #3 (Remarks to the Author): Having read the revised manuscript from Ebmeyer et al I would like to express appreciation for the effort the authors put into their revisions.The additional details, particularly to the methods and provision of the pipeline will clearly help readers follow along/reproduce this work.
Specific comments related to numbered points in the rebuttal:

3.
10-11.I think these answers, in combination with the response to point 4 for referee #2, adequately clears up some confusion about the work the word 'origin' is performing -to recognize an ARG's origin as a mobilized element vs where (and when) an ARG first arises.Literature claiming 'origins' sometimes rely on preconceived notions of the direction in which genes move.If the perception is that a pathogen species is the destination of an ARG, sometimes the non-pathogen species is automatically considered the origin even in cases where the gene is not associated with a mobile element.
We are happy that the reviewer thinks that the clarity of our definition is adequately cleared up.We thank once again for this important comment.

4.
12. I appreciate that no claim was made for the presentation of production quality software, but in as far as the logic of the methods are represented by steps in software I find it sometimes easier to follow along, so much thanks.
We are glad the reviewer is content with our provided code.

5.
17.As referee #2 raised the same issue in point 3, it seems clear to me this is a bias towards the increased diversity and attention paid to Proteobacteria in the sequence databases.
Indeed, and we address the bias towards proteobacterial species in Genbank in line 224-227 in the revised manuscript.

6.
20.This was a lot of work, and I appreciate this greatly, thank you.
We are happy the reviewer appreciates our effort and are in turn grateful for the reviewers' constructive comments on the manuscript.