Genomic epidemiology of vancomycin resistant Enterococcus faecium (VREfm) in Latin America: Revisiting the global VRE population structure

The prevalence of vancomycin-resistant Enterococcus faecium varies across geographical regions yet little is known about its population structure in Latin America. Here, we provide a complete genomic characterization of 55 representative Latin American VREfm recovered from 1998-2015 in 5 countries. We found that VREfm population in the region is structured into two main clinical clades without geographical clustering. To place our regional findings in context, we reconstructed the global population structure of VREfm by including 285 genomes from 36 countries from 1946-2017. Our results differ from previous studies showing an early branching of animal related isolates and a further split of clinical isolates into two sub-clades, all within clade A. The overall phylogenomic structure was highly dependent on recombination (54% of the genome) and the split between clades A and B is estimated to have occurred more than 3585 years BP. Furthermore, while the branching of animal isolates and clinical clades was predicted to have occur ∼894 years BP, our molecular clock calculations suggest that the split within the clinical clade occurred around ∼371 years BP. By including isolates from Latin America, we present novel insights into the population structure of VREfm and revisit the evolution of this pathogen.


Introduction
Enterococci are predominantly non-pathogenic gastrointestinal commensal  In terms of antibiotic resistance, one of the most relevant antibiotic resistance traits 84 acquired by enterococci is resistance to vancomycin due to the van gene clusters 8 . 85 Furthermore, vancomycin-resistant E. faecium (VREfm) frequently exhibits 86 5 resistance to ampicillin and high-level resistance to aminoglycosides 9,10 . Indeed, 87 the World Health Organization (WHO) has categorized VREfm as a priority agent 88 for which the finding of new and effective therapeutic strategies is imperative 11 . 89 VREfm is widely distributed in hospitals around the world, with the prevalence 90 varying according to geographical location. In US hospitals, VREfm is an important 91 clinical pathogen, particularly in immunosuppressed and critically-ill patients 1,12 .

92
The National Health-Care Safety Network described that 82% of E. faecium 93 recovered from bloodstream infections in the US were vancomycin-resistant, 94 whereas only 9.8% of E. faecalis were resistant to vancomycin 12 . In Europe, 95 prevalence rates of VREfm vary widely by country, but according to the European analysing the population structure of E. faecium, since high rates of recombination 110 in the MLST loci often occurs in these organisms 18 Table 1). We constructed a pangenome (29,503 224 orthogroups) and core genome (978 orthogroups). Using the core genome, we built and showed 7 early-branching subclades that included 73 genomes (mostly from 258 animal sources) rather than a split into clades A1 and A2.

259
Following these animal-related early branches, we observed a split into two main 260 subclades (Supp. Figure 2B). Overall, these subclades were related to clinical 261 sources, exhibiting a high similarity in terms of prevalence of antibiotic resistance 262 and virulence determinants (Supplementary table 3). We refer to them as clinically- Our results indicate that VREfm is widely present in Latin America but that their 338 frequency and population structure seem to vary from country to country. As Previous studies estimated that the separation between clades A and B occurred 383 2776 ± 818 y.a. 3 , a time frame that is similar to our results. However, the previously 384 reported split between animal branches and the clinically-related subclades was 385 reported to occur 74 ± 30 y. a., which is much more recent than what we found.

396
Our study could be subject to sampling bias due to small sample size of genomes 397 from Latin America, but we attempted to include as many and as diverse genomes     Table 1). 469 Accordingly to the source, the E. faecium genomes were grouped into different CRISPRfinder 56 and BLASTX searches using Cas system proteins 57 as templates.

491
All BLASTX hits were selected if they had an identity percentage higher or equal to 492 95% and a coverage of at least 80% of the target sequence. For BLASTN 493 searches, hits were selected if they had an identity percentage higher than 90%  Next, the model was tested on the whole dataset of PBP5 sequences and had a 513 100% specificity with 96% sensitivity, which resulted in 6 cases of major errors 514 were the isolate was resistant but predicted to be susceptible. were the same as above with a chain length of 300 million steps, a burn-in of 80 534 million steps, and a random starting tree.

535
The second phylogenetic reconstruction included the genomes grouped into the 536 clade corresponding to the previously designed Clade A 3 . We realized pairwise 537 comparisons of the assemblies with Mummer 3.23 62 against the reference genome 538 Aus0085 (CP006620.1). The identified variants and the reference sequence were 539 used to create a multiple whole genome alignment and, with it, we built a guide 540 tree with RAxML 60 using the abovementioned parameters. This guide tree was 541 used later to obtain the recombinant regions in the alignment with 542 ClonalFrameML 33 for each isolate. Those regions were further removed from the 543 alignment and then used to produce a MCC tree with BEAST. The same run 544 parameters as above were used with a 50-million step burn-in. 545 Finally, a strict molecular clock analysis was performed on clade A strains. We       ring shows which isolates were recovered from blood.