Apparent nosocomial adaptation of Enterococcus faecalis predates the modern hospital era

Enterococcus faecalis is a commensal and nosocomial pathogen, which is also ubiquitous in animals and insects, representing a classical generalist microorganism. Here, we study E. faecalis isolates ranging from the pre-antibiotic era in 1936 up to 2018, covering a large set of host species including wild birds, mammals, healthy humans, and hospitalised patients. We sequence the bacterial genomes using short- and long-read techniques, and identify multiple extant hospital-associated lineages, with last common ancestors dating back as far as the 19th century. We find a population cohesively connected through homologous recombination, a metabolic flexibility despite a small genome size, and a stable large core genome. Our findings indicate that the apparent hospital adaptations found in hospital-associated E. faecalis lineages likely predate the “modern hospital” era, suggesting selection in another niche, and underlining the generalist nature of this nosocomial pathogen.

A second point that seems ambiguous if not incorrect, is the interpretation of the study that examined the origins of the genus Enterococcus. Lines 93 -96 seem to suggest it was a study of MDR hospital isolates of E. faecalis and E. faecium. This reviewer's understanding is that a wide representation of enterococcal species were studied, including commensal and clinical representatives of E. faecalis and E. faecium, and the survival traits that positioned select lineages of those species to emerge as potential pathogens capable of endemic residence and spread within hospitals were common to all or nearly all enterococci. This becomes important in understanding the later reference to the origin of HA clusters (lines 377 -393). The prediction from that previous report would be that the traits that favor survival in harsh environments present in all or nearly all enterococcal species, would not be those that would lead to the genesis of new sequence types associated with hospital endemnicity, but would be necessary precursors very likely also associated with their generalist success in many hosts as observed decades ago by Mundt and others. That is, sequence types related to those found now in hospitals should be represented in older collections, and might even be enriched among lineages to which humans are most frequently exposed to. The data presented doesn't seem to change that view. Some global assessments of gene content between the common hospital clusters observed here and elsewhere are made, but a rigorous comparison of specific gene content between these lineages and commensal isolates from the community is not clearly laid out.
The inclusion of isolates from wild birds is a nice augmentation of the collection. However, it is common to observe flocks of seagulls feeding in garbage dumps, or Canada geese feeding in city parks, so just as it is important to discern infection-associated isolates from commensal surveillance isolates for hospital associated strains, for comparing human-associated E. faecalis with wild animal-associated E. faecalis, it is important to distinguish birds that spend substantial feeding time in ecosystems dominated by humans from those that feed at sites free from human influence. The authors refer to findings from a recent study of marine penguins that inhabit an ecology with extremely little human influence. It is unlikely that that level of isolation also applies to the birds from which isolates were obtained here, but that should be clarified. The point is that "non-human origin" or "wild" for isolate sources doesn't equate with lack of human influence. In fact the authors note that the paper on penguin isolates referred to failed to find anything but intrinsic AMR genes, but then make the general statement that appears to be an overinterpretation that "…antibiotic usage and concomitant selective pressure specifically in wild birds is indeed negligible…". In this reviewer's view, given global levels of pharmaceutical pollution, that statement currently likely only applies to birds in the Antarctic and few other extremely isolated ecosystems, and specifically not to birds that even temporarily share human-influenced habitatsfrom city dumps, to crop fields, to orchards, vineyards and other agricultural sites. The extent to which the presence of non-core genome AMR genes identified in this wild bird set is influenced by human activity, or somehow reflects a level of naturally circulating resistance in truly wild ecologies seems highly arguable without clearer information on the provenance of those isolates and the habits of their hosts.
The bottom line is that 1) compared to its abundance in nature, and specifically its abundance as a commensal in humans, E. faecalis is a rare cause of disease, 2) all hospital isolates are not necessarily infection-derived or hospital endemic, and 3) a bird may be wild, but its enterococci may still be heavily influenced by human activity including widespread use of chemically diverse agents with antimicrobial activity.
All of that said, these points do not detract from the technical sophistication and rigor of the analyses conducted, only their interpretation. The extensive analyses made add substantially by filling important knowledge gaps in the field.

Specific:
The manuscript is well edited and the data are well presented in this reviewer's view.

Reviewer #2 (Remarks to the Author):
The study by Pontinen AK. et al. shows the analyses of 2,027 genomes of E. faecalis isolates collected over a time span of 82 years and recovered from human and non-human sources in different countries. The authors provide robust evolutive analyses evidencing that E. faecalis behaves as a generalist organism. This is a novel an interesting study that has not been previously performed. Moreover, the contribution of genomic data of such as human pathogen is valuable. Even though, the methods are sound, authors could provide deeper analysis and some conclusions should be considered according the offered results. Additionally to the genomic analysis and the presented results, authors could perform an analysis of the virulence and antibiotic resistance traits potentially contributing to the nosocomial adaptation of E. faecalis.
I have few suggestions to improve the clarity of the manuscript: Line 64: the assertion of a stable core genome size is not supported by the linear regression analysis, as the model is not adjusted to the data (very low R squared values) and high variance among samples. Please rephrase this to meet the data.
Line 138: Since there is a correlation between ST and PPs, authors could consider to represent this correlation in the figure 1. Line 147: The use of the core genome in a population of genomes so diverse and extensive would limit the available ortholog groups to identify host specificity among the groups, thus reducing the probability of identifying deep branches in the phylogeny, in particular when using a high threshold (99%) to include an ortholog group in the analysis. Please acknowledge this possible limitation in the analysis and the result. Line 159: Please show Heap's Law alpha value to identify if the pangenome is an open or closed one. Lines 167-168: By definition, PopPUNK is kmer based analysis not restricted to core genome, please clarify and correct the sentence accordingly. Lines 189-209: As mentioned before, the results presented in the whole section are of limited value, as the fit of the linear regression of the genome size through time is very poor (very low R squared values). Additionally, the plot clearly does not show a trend among the samples but high variance. Please reevaluate the model, a transformation of the genome size into log scale or kb scale, rather than pb may help. Also, comparison of regression models of the genome sizes across hosts would strengthen the analysis if the data shows that there is no difference among them. Lines 210-252: It is interesting the high diversity of plasmids found in the collection, even from early isolates. How those plasmids expanded or disappeared in the more recent isolates? Where clusters correlated to the presence/absence of such plasmids? Line 252: is the word "clones" correct? or should it be "strains" or "lineages"? Lines 280-308: Were the recombinant regions composed only by phages sequences? Were any genes between those flanking regions and if they were, what possible functions were present? Would those be able to give any ecological advantage/disadvantage? Lines 360-362: That is what would be expected from the analysis of the genomic data, but no functional association among the collection was performed. Is this true for this data? Are wide molecular functions present in all the identified clusters? Are those functions host-associated or not? The presence of mercury and arsenic resistance genes in specific PP would show a little specialization among those groups (lines 243-252). Is there a way to measure how generalist or specialist is a genome? ! " #$"" " ! % " ! &'( )* +,+,,-. / 0 1),12 3 . " " 45 " ! ! 6 " " 5 7 " 6 6 " 5 " ! % " 5 " 8 ! 8 ! " 9 #$"" " &'( )* +,+,,-. / 0 1),12 3 . " " 45 $ 8 : ; < " : = < 5 > ?@ A > $ B 9 C 9 D ! ! 6 "" E5 8 " 5 D ! " ! " ! E5 ! 5 D " " 5 F 8 E5 ! ! ! ! G 5 H < 5 $ 8 "" E5 # % " !8 4 " I 8 5 D ! " # " 4 " ! " " ! 5 D ! 8 ! " % " 5 H 8 " !6 ! J " ! 5 #K " 4 " " ! "" E5 E5 8 " " 5 L" 8 E5 "" 8 5 H 5 $ 8 " 8 " E5 M "" " 6 ! 5 D 8 ! ! " 5 D " " ! ! ! N J " " " 8 ! " 8 " " " 8 ! " ODH " 5