Comparative analysis of prophages carried by human and animal-associated Staphylococcus aureus strains spreading across the European regions

Staphylococcus aureus is a major human and animal pathogen although the animal-associated S. aureus can be a potential risk of human zoonoses. Acquisition of phage-related genomic islands determines the S. aureus species diversity. This study characterized and compared the genome architecture, distribution nature, and evolutionary relationship of 65 complete prophages carried by human and animal-associated S. aureus strains spreading across the European regions. The analyzed prophage genomes showed mosaic architecture with extensive variation in genome size. The phylogenetic analyses generated seven clades in which prophages of the animal-associated S. aureus scattered in all the clades. The S. aureus strains with the same SCCmec type, and clonal complex favored the harboring of similar prophage sequences and suggested that the frequency of phage-mediated horizontal gene transfer is higher between them. The presence of various virulence factors in prophages of animal-associated S. aureus suggested that these prophages could have more pathogenic potential than prophages of human-associated S. aureus. This study showed that the S. aureus phages are dispersed among the several S. aureus serotypes and around the European regions. Further, understanding the phage functional genomics is necessary for the phage-host interactions and could be used for tracing the S. aureus strains transmission.


Results
To study prophages diversity, genome sequences of S. aureus strains associated with human (Homo sapiens) infections (n = 34), bovine (Bos tourus) infections (n = 22), and dog (Canis lupus familiaris) infections (n = 4) were retrieved from NCBI database. The geographical locations of the selected S. aureus strains were shown in Figure S1. The list of genomes and their respective features are shown in Table S1.
Identification and general genomic features of S. aureus prophages. PHASTER identified a total of 170 prophages of which 101 prophages (46 complete, 14 questionable, and 41 incomplete) were extracted from the genomes of S. aureus associated with human infections, 59 prophages (16 complete, 8 questionable, and 35 incomplete) were extracted from the genomes of S. aureus associated with B. tourus infections, and 10 prophages (3 complete, 6 questionable, and 1 incomplete) were extracted from genomes of S. aureus associated with C. lupus familiaris infections. The distribution of prophages in the S. aureus genome is represented in Figure S2. The 65 intact/complete prophages were selected based on PHASTER scores (Table S2). All the identified complete prophages were belonged to Siphoviridae family and having temperate lifestyles. Among the 65 analyzed prophages, 57 prophages were extracted from methicillin-resistance S. aureus (MRSA) strains, while the 8 analyzed prophages were extracted from methicillin-sensitive S. aureus (MSSA) strains in which 4 prophages were from 4 MSSA strains (S. aureus HD1410, S. aureus I3, S. aureus SA13-192, and S. aureus SA14-639)-associated with human infections, and another 4 prophages were extracted from 4 MSSA strains (S. aureus 483, S. aureus 909, S. aureus C3489, and S. aureus C5086)-associated with bovine and dog infections. Among the MRSA strains, SA G6, and S. SA G8 strains were reported earlier as hospital-associated MRSA (HA-MRSA) 25 , while the other MRSA strains carried Staphylococcal Cassette Chromosome mec (SCCmec) types IV, and V which are regarded as community-associated-MRSA (CA-MRSA) 26 . The MLST analysis result revealed that S. aureus strains associated with the animal infections have sequence types ST398 (clonal complex, CC398) and ST151 (CC151), while the S. aureus strains associated with human infections have ST398 (CC398), ST8 (CC8), ST5 (CC5), etc. (Table S2). The genome sizes of intact/complete prophages were in the range of approximately 24.8-87.8 kb, and the GC content varied between 32.16 and 35.38%. The highest number (132) of coding sequence (CDS) was found in phiG4-3 (S. aureus SA G6) and the lowest number (29) of CDS was found in phiH7-2 (S. aureus strain H7).
Sequence clustering and phylogenetic relationship of the prophages. Sequence clustering was performed by aligning the whole-genome sequences of 65 intact/ complete prophage genomes carried by human and animal-associated S. aureus. The generated phylogenetic tree grouped the prophages into 7 different clades ( Fig. 1; Table 1). The prophages carried by human and animal-associated S. aureus strains linked with different infected sites and different geographical locations were dispersed randomly in all clades. Although, it was expected that the prophages of animal-associated S. aureus and human-associated S. aureus strains would from individual clusters, this was not the case. There was no correlation between the clade, the host, site of infection, or geographical location.
Comparative genome analyses of S. aureus prophages. The pan-genome analysis of 65 intact prophages of S. aureus is represented in Fig. 2. The reference pan-genome was found to be 107,158 bp in length with 90 CDS features, however, a core-genome was not observed, and the accessory genomes of all individual prophages were unique. This signified that all the prophages were variants in their genetic components. Clade/ subclade-wise pan-genome analysis was executed to find the shared genes within the clade/subclade and the observations are given below: Staphylococcus aureus prophages of clade 1. The clade 1 encompasses 15 prophages with a total sequence size of 122,268 bp encoding 168 CDS features. The sequences shared 19.84% similarities, and the similar region encodes 28 common CDS features ( Fig. 1; Table 1).
Putative virulence factors associated with S. aureus prophages. The prophages of S. aureus associated with human and animal infections were found to harbor virulence factor encoding genes that may play a role in immune evasion, tissue evasion, toxins, adherence, and iron uptake or may code for toxins. The comparative analyses of virulence factor encoding genes associated with prophages are summarized in a heatmap ( Fig. 5; Table 2). Clades 3 and 5 showed the highest occurrence of immune evasion genes. The highest prevalence of toxin encoding genes was observed in clade 7. The lowest prevalence of virulence encoding genes was observed in clades 2 and 4. The prophages belonging to clade 1, 6, and prophage phiH15-2 of clade 5 showed the presence of clpP and virE genes. Some of the prophages of clade 1 also showed the presence of PVL encoding genes (lukF-PV and lukS-PV), yopX gene belonging to Type-III secretion system, sak (staphylokinase) gene, scn (staphylococcal complement inhibitor) gene, and chp (chemotaxis inhibitory protein) gene. Besides, genes encoding a β-hemolysin/sphingomyelinase C (hlb), elastin binding protein (ebp), and tyrosine recombinase (xerD gene) were also detected in the prophages of Clade 1. The prophages phiCM124-1 of clade 2 showed the presence of iron-regulated surface determinants (isd) gene clusters (isdA-isdH). The genes carried by the prophages of clade 3 mostly include hlb, sak, scn, and chp. In addition, yopX gene and genes that encode enterotoxins, sep, and sec were also detected in some prophages of clade 3. The xerD gene was also carried by the prophages phiH4-2, phiH2-2, phiH1-1, and phiI3-4 of clade 6, and clade 4. The prophages of clade 5 carried genes such as sak, scn, chp, and hlb. However, some prophages also showed the presence of a sea gene. yopX and eap/map genes. The prophages of clade 6 showed the presence of lukF-PV and lukS-PV, clpP, xerD, virE, and yopX genes. However, prophage phiRD.3-2 carried geh www.nature.com/scientificreports/ gene encoding lipase protein which might be a virulence factor and associated with the lysogenic conversion of S. aureus 30 . Most of the prophages especially phi909-4 and phi483-4 of clade 7 carried the toxin encoding genes viz., leukocidin-related gene (lukE), and enterotoxin genes (seg, sei, sem, sen, and seo). These prophages have genes that encode for the HNH endonuclease, a key component of phage DNA packaging machines 1,31,32 , and VRR-NUC domain protein (virus-type replication-repair nuclease). The variation in virulence factor encoding genes in prophages within each clade and among clades showed the genomic diversity of prophages, evolution, and the emergence of highly pathogenic S. aureus strains.

Discussion
S. aureus causes moderate to severe infections in humans and animals 5,6 . The S. aureus strains that infect animals can be a potential risk of human zoonoses and a threat to public health 9,10,33 . The transmission of S. aureus strains from animals to humans occurs commonly 34 . Animals-associated S. aureus strains may spread to the human population through various routes such as contact with contaminated meat products, or infected farmers, butchers, and veterinary staff. Also, the contaminated effluent release from the animal farmhouses or veterinary hospitals could be another route for the transmission of S. aureus from animals to humans 35 . The S. aureus with CC398 (ST398) and CC151 (ST151) are the most identified clone types of bovines in the European regions 6,36 . It was reported that these CCs are transmitted to humans and are considered to be an emerging zoonotic agent 37 .
In addition, the excessive or improper use of antibiotics in veterinary hospitals and animal husbandry promote antibiotic-induced SOS response in S. aureus strains 38,39 . This response triggers the phage induction and escalates the frequency of phage-mediated horizontal gene transfer (HGT) between the animals and humans-associated Comparison of prophages in clade 3, representing prophages phiG4-3 (subclade 3C) sequenced of our previous study and 6 prophages (subclade 3A) associated with S. aureus spreading in France, Spain, and Germany. Phages and country of origins are indicated on the right (Prophage names labeled in black, red, and blue color indicated prophages associated with Homo sapiens, Bos tourus, and Canis lupus familiaris respectively. And prophage names labeled in green color indicated the prophages extracted from our previous study 25,26 ) and grey shaded regions are homology regions.  40 . For this reason, we selected prophages of CC398 and CC151 strains associated with animals to compare with prophages of CC398 or other CCs strains associated with human infections. The pathogenicity of S. aureus is mainly driven by the acquisition of MGE such as chromosome cassettes, insertional sequence element (IS), plasmids, genomic (νSa) or pathogenicity islands (SaPIs), prophages, integrative conjugative elements (ICEs), and transposons as these elements carry genes that encode proteins involved in antibiotic resistance, and pathogenicity 7,19,33,41 . Acquisition of prophages may lead to an increase of the genome plasticity and maintains the architecture of the S. aureus genome and facilitating the adaptation in diverse conditions during infection 42,43 . Besides, they confer novel virulence properties that lead to the expansion of the pathogenic spectrum and enhance the severities of human and animal S. aureus infections 13 .
In this study, we characterized and compared the intact prophage regions of S. aureus strains associated with various human and animal site infections of different geographical locations to understand their relatedness, genomic architectures, and pathogenicity of the host S. aureus strains. The analyzed prophage genomes are mosaic in their architecture with extensive variation in genome size, and GC content (Table S2). All the identified complete prophages were belonged to Siphoviridae family and having temperate lifestyles. Also, based on the genomic architecture, most of the analyzed prophages consist of five functional modules, which are also found in the Siphoviridae family (Figs. 3, 4) 2 . The gene contents varied between 29 and 132 CDS. These prophages were prophage sequence of our previous study 25 and 3 prophages of subclade 7B associated with S. aureus reported from Italy, and Netherland. Phages and country of origins are indicated on the right (the prophage names labeled in black, red, blue, and purple color indicate prophages associated with Homo sapiens, Bos tourus, Canis lupus familiaris, and reference, respectively. And prophage names labeled in green color indicated the prophages extracted from our previous study 25,27 and grey shaded regions are homologous regions. www.nature.com/scientificreports/ identified as dsDNA temperate phages and can integrate into the host's chromosome during the infection and behave as "quiescent" prophages 44 . It was reported that large phage genomes have more gene insertions such as transposons, self-splicing introns, and homing endonucleases, and many genes with unknown functions between various structural protein genes 45 . Prophages often encode 'morons' that are not directly engaged in their replication but may be components of bacterial host conferring a benefit to their bacterial host 46 . It was also reported that temperate phages can recombine with other prophages in the host genome results in high variation in genes and genomic sizes in the Siphoviridae family 47 . All the prophage genomes carried by human and animal-associated S. aureus showed the mosaic-like structure indicating that the frequency of HGT among the prophages was high and facilitated the development of new variant phages which in-turn may lead to the emergence of new pathogenic S. aureus strains. The phylogenetic analysis of identified prophages showed that the prophages of the same host were dispersed in different clades rather than appearing in a single clade (Fig. 1). This finding suggested each S. aureus strain carried two or more different prophages with unique features. It was reported that a bacterial cell owning one or more prophages is considered as a lysogen that provides immunity toward the infection by the same group of phages 48 .
Prophages are a part of the accessory genome in a bacterial genome; however, identified prophages themselves have a pan-genome of 107,158 bp size. Notably, identified prophages did not have a core-genome that is conserved among all prophages across their phylogeny (Fig. 2); such a similar finding was reported previously 49 . The presence of functional modules with low sequence similarity may be due to recombination of two or more prophages within host genomes or horizontal exchange of functional modules between related phages 50 . Furthermore, the presence of variable MGEs, bacterial genes, and unspecified genes in the genome of prophages which were thought to be acquired from the different S. aureus strains suggested that such prophages had undergone several HGT events which result in prophage genomes with high variation 45 , and rapid emergence of new phages 23,51 . The low sequence similarity in the identified prophage genomes made it difficult to generate their core-genome. To overcome this limitation, we performed clade/subclade-wise prophage genomes analyses based on gene-by-gene alignment at a finer synteny level. The prophages carried by S. aureus associated with human or bovine infections have relatively high genome sizes in comparison with prophages of S. aureus associated with dogs. In Fig. 3A synteny, phiAB333-3, and phiP333-3 showed the highest sequence identities, and their host S. aureus strains have the same SCCmec IVc type and ST8 and same geographical origin (France), but these prophages were carried by S. aureus associated with human skin and nares infections. Similarly, in Fig. 3B synteny, phiVET1913R-2, and phiVET1914R-1 showed the closest relationship in this clade, and these two prophages were from the same country origin (Netherlands) and their host S. aureus strains carried the same SCCmec Vc type and ST398, however, they were found in the S. aureus associated with throat and nasal infections. In clade 4, phiI3-1, and phiHD1410-1 found 100% sequence similarities, these prophages are carried by S. aureus of different geographical locations (Germany, and Denmark). However, their host strains were identified as MSSA and have the same CC30. Similarly, Fig. 3D synteny revealed the highest degree of sequence similarity between the prophages carried by S. aureus strains (SCCmec IVa /ST398) associated with bovine infections and prophage (phi23237-1)   Fig. 3D). In the Fig. 4B synteny, the host strains of prophages (phi53180-1-4, and phi81629-1) were associated with bovine, and human infections, and occupied distant geographical locations (Italy, and Denmark), however, the host strains of these prophages carried the same SCCmec IVa type, and ST398 (CC398), as a result, their prophages revealed high sequence similarities. In clade 7B, the S. aureus strains isolated in Netherlands that were associated with bovine milk were identified as MSSA strains and both have the same ST151, as a result, their prophages showed high sequence similarities (Fig. 4D). The phylogenetic analyses result showed no difference in clustering patterns of prophages carried by HA-MRSA and CA-MRSA strains. The prophages (phiG5-2, phiG5-3, phiG5-5) carried by HA-MRSA strain were clustered with other prophages of CA-MRSA strains in clade 1, subclade 3B, and subclade 7B. Besides, the prophages carried by MSSA strains showed high proximity among them and found clustered in subclade 3B (phiI3-2, phiC5086-2, phiSA13-192-2, and phiSA14-639-2), clade 4 (phiI3-1, and phiHD1410-1), clade 7 (phi483-3, phi909-4, phiC3489-2, and phiC5086-2) (Fig. 1). This similar clustering pattern of prophages is Table 2. Details of the putative virulence factors associated with prophages of S. aureus.
The present study was performed to characterize and compare the prophages carried by humans and animalassociated S. aureus strains reported from different geographical locations as well as different infection sites. This comparative study revealed the diversity of prophages of S. aureus associated with humans or animals. In our study, all the CC398 strains were identified in MRSA strains and showed high prevalence in animal-associated S. aureus strains. The prophages carried by CC398 clone of animals and humans associated with S. aureus strains showed disperse in different clades (Fig. 1). The presence of similar genetic elements in the prophages isolated from S. aureus associated with animals and humans suggested that prophages may have played a major role in the epidemiological changes. The appearance of the mosaic nature of prophage genomes suggested the occurrence of genetic exchange among the S. aureus strains via phages. Also, the presence of VFGs in the genomes of prophages supports S. aureus to adapt in different environmental niches, promote the pathogenesis and facilitate their evolution. The IEC was identified in both prophages harbored by human and animal-associated S. aureus which are a human niche-specific adaptation of S. aureus strains. The IEC is highly human-specific, however, our finding revealed that the presence of IEC could not differentiate between phages of human and animal-associated S. aureus. The presence of various virulence factors in the genomes of prophages of animal-associated S. aureus suggested that these prophages could have more pathogenic than the prophages of human-associated S. aureus.
This study also showed that the prophages carried by human-associated S. aureus strains with different serotypes and from different geographical locations scattered in all the clades, indicating that these phages have a wide distribution across the European regions. Comparative studies of prophages carried by human and animal-associated S. aureus strains have very crucial importance for the investigation of S. aureus transmission from human to animal and vice-versa, as well as to gain a better understanding of their evolutionary relationships, and diversity.

Methods
Data collection and Identification of prophages. A total of 60 whole genomes of S. aureus strains reported to cause human and animal infections across the European regions were used in this study. Of these 60 whole genome sequences of S. aureus strains, 54 were retrieved from the NCBI database and additional six genome assemblies of S. aureus were from our previous studies 25,27 . The S. aureus strains used in this study originated from Austria (n = 7), Denmark (n = 5), France (n = 12), Germany (n = 11), Hungary (n = 3), Italy (n = 9), Netherlands (n = 11), and Spain (n = 2). The genome sequences were analyzed for SCCmec types 61 , and Multilocus sequence Type (MLST) 62 using a web-based server provided by the Center for Genomic Epidemiology. The prophage sequences or phages associated with these genomes were analyzed for their diversity based on the geographic location and nature of S. aureus infected hosts. The details of the whole genomes used in this study are presented in Table S1.
General genomic features of the putative prophages. PHAge Search Tool Enhanced Release (PHASTER) algorithm was used to identify and annotated prophage sequences from 60 S. aureus genomes 63 . Prophage sequences with PHASTER score ≤ 70 is considered as incomplete, score between 70 and 90 is regarded as questionable, while the score ≥ 90 is considered as intact/complete prophages. The intact prophage genome sequences were extracted from their respective host S. aureus genomes to predict open reading frames (ORFs) using GeneMarkS 64 and the predicted genes were analyzed against the NCBI database using BLASTP 65 . The identified intact prophages were classified for their lifestyles using PHACTS (Phage Classification Tool Set) 66 . The tRNAscan-SE v.1.21 was used to decipher the tRNA-coding regions in the prophage sequences 67 . Further, the intact/complete prophage genomes were re-annotated using prokka 68 1.14.
Sequence clustering and phylogenetic relationship of the prophages. A total of 65 intact prophage sequences of S. aureus strains were identified by PHASTER. The 65 intact prophage nucleotide sequences were subjected to Multiple sequence Alignment using Fast Fourier Transform (MAFFT) version 69 v7.475. Further, the aligned sequences of intact prophage nucleotide sequences were run on SplitsTree4 software 70 to generate the hierarchical clusters and displayed as a phenogram using the BioNJ algorithm 71 .
Comparative genomic analyses of S. aureus prophages. The identified intact prophage sequences were in silico analyzed for the identification of ARGs and VFGs using CARD 72 and VirulenceFinder-2.0 Server 73 , respectively. The heatmap was generated to illustrate the presence or absence of VFGs using Morpheus 74 . The intact prophage sequences were used for a pan-genome comparison using the TBLASTX and prophage phiH14-1 as a seed genome in Gview server 75 (https:// server. gview. ca/). Furthermore, the prophage sequences belonging to each cluster or clade were analyzed for core and accessory genomes using Spine and AGEnt version 0.3.1 webserver 76 . The number of core and accessory genomes of prophages in the gene pool of each cluster or clade was extracted, and a flowerplot was generated using plotrix in RStudio 1.3 (RStudio_Team, 2020) 77 . The intact prophage sequences that comprised each cluster were aligned using Easyfig version 78 2.2.3. Easyfig alignments were performed on selected groups of prophages based on their clusters to show regions of sequence identity and their closest phages (phiNM-3, phiStauST398-3, and phi2958PVL) defined by PHASTER.

Use of human subjects or animals in research.
This article does not contain any studies involving human and animal participants performed by any of the authors.

Data availability
The following information was supplied regarding data availability: The S. aureus genome sequences used in this study are available at https:// www. ncbi. nlm. nih. gov/ nucle otide/ or https:// www. patri cbrc. org/ remote under the accession number given in Table S1.