Genomic comparisons and phylogenetic analysis of mastitis-related staphylococci with a focus on adhesion, biofilm, and related regulatory genes

Mastitis is a common and costly disease on dairy farms, commonly caused by Staphylococcus spp. though the various species are associated with different clinical outcomes. In the current study, we performed genomic analyses to determine the prevalence of adhesion, biofilm, and related regulatory genes in 478 staphylococcal species isolated from clinical and subclinical mastitis cases deposited in public databases. The most prevalent adhesin genes (ebpS, atl, pls, sasH and sasF) were found in both clinical and subclinical isolates. However, the ebpS gene was absent in subclinical isolates of Staphylococcus arlettae, S. succinus, S. sciuri, S. equorun, S. galinarum, and S. saprophyticus. In contrast, the coa, eap, emp, efb, and vWbp genes were present more frequently in clinical (vs. subclincal) mastitis isolates and were highly correlated with the presence of the biofim operon (icaABCD) and its transcriptional regulator, icaR. Co-phylogenetic analyses suggested that many of these adhesins, biofilm, and associated regulatory genes could have been horizontally disseminated between clinical and subclinical isolates. Our results further suggest that several adhesins, biofilm, and related regulatory genes, which have been overlooked in previous studies, may be of use for virulence profiling of mastitis-related Staphylococcus strains or as potential targets for vaccine development.

www.nature.com/scientificreports/ adhesins in genomes of Staphylococcus spp. isolates from cattle 5 . In addition, the ica genes, which are associated with the synthesis of polysaccharide intercellular adhesin (PIA), are thought to play a crucial role in biofilm development in these bacteria. Molecular epidemiology-based methods such as specific PCR assays, MLST, and PFGE have been used to analyze the genetic diversity and virulence factors and to track the dissemination of Staphylococcus spp. infections, but they have limitations 1 . Accordingly, the current work aimed to ascertain the prevalence of adhesion and biofilm genes by investigating whole genome sequences of Staphylococcus spp. from clinical and subclinical mastitis cases and evaluate the phylogenetic relationship of these isolates and determine if any adhesin or biofilm genes associated with acute bovine/bubaline mastitis.
In strains associated with clinical mastitis, the ebpS (83.8%), atl (83.8%), sasF (83.8%), sasH (77.5%), rbf (56.3%), tcaR (56.3%), sarA (56.3%), sigB (56.3%), pls (55.0%), sasA (47.5%), pls (37.5%) and sasC (30.0%) genes were most frequently detected. The carriage of adhesin/biofilm related genes in isolates associated with subclinical mastitis were less frequent (e.g., ebpS (68.8%), atl (68.3%), sasF (68.1%), atl (51.3%), rbf ( Phylogenetic analyses reveal no clear relationship between clinical and subclinical isolates. Analysis of the 16S RNA genes from the genome sequences of the Staphylococcus spp. from bovine and buffalo mastitis cases revealed that the clinical and subclinical isolates (n = 478) are present in a wide variety of clades and do not show any clear relationship ( Supplementary Fig. 1). The 16S RNA gene phylogeny also indicated that the mastitis related S. aureus, S. epidermidis, and S. capitis have a close phylogenetic relationship. These species also possess many adhesion genes (avg. no. = 26, 11, and 17 respectively), followed by S. chromogenes and S. warneri (avg. no. = 9 and 12, respectively). S. capitis has a close phylogenetic relationship to the species that are mainly associated with clinical mastitis (S. aureus and S. epidermidis). S. chromogenes, which was implicated in cases of clinical (n = 23/80) and subclinical mastitis (n = 61/398), is most closely related to S. agnetis and S. hyicus species that were only associated with subclinical mastitis. "Subclinical species" S. saprophyticus, S. xylosus, S. gallinarum and S. arlettae formed a distinct node with few strains involved in clinical mastitis and with most of these strains not carrying known adhesion/biofilm related genes. The "subclinical species" S. warneri and S. pasteuri were also phylogenetically related and carried biofilm/adhesion associated genes (n = 35; avg. no. of genes = 12 and 10, respectively). No specific pattern was observed between clinical and subclinical strains based on ebpS, rbf, sarA, sasH, sigB, and tcaR gene phylogeny  Fig. S13). Although the co-phylogenetic analysis suggested that HGT might be taking place, further investigatation of the association of these virulence genes with mobile genetic elements, such as transposons and S. aureus pathogenicity islands (SaPIs), must be carried out to test the HGT hypothesis.
Data analysis indicates adhesion and biofilm genes exclusively related to clinical isolates. Hierarchical Table 1) contained most of the strains that harbored a typical pattern of nine genes (rbf, pls, sasF, sarA, atl, sasH, sigB, tcaR and ebpS) in both clinical and subclinical isolates. This pattern is also demonstrated in the heatmap of the gene frequency (Fig. 4).

Discussion
Staphylococcus spp. are the most common etiologic agents of mastitis, with S. aureus thought the most important, while coagulase-negative staphylococci and non-aureus staphylococci considered less significant 6 . Based on 16S RNA identification of the 478 available genome sequences, S. chromogenes (28.7%) and S. simulans (20.0%) were the staphylococcal species most frequently associated with clinical mastitis. S. aureus was the next most prevalent species (18.7%) associated with clinical mastitis and it was rarely (3.3%) associated with subclinical Table 2. Relative and absolute frequency of adhesin, biofilm genes, and related regulatory genes of staphylococcal species associated with clinical and subclinical mastitis.

Mastitis Clinical Subclinical
Clump factor A clfA 18.7% (15) 9.05% (36) Clump factor B clfB 16.2% (13) 3.27% (13) Collagen adhesion cna 6.25% (5) 4.77% (19) Fibronectin binding protein A fnbA 21.2% (17) 7.29% (29) Fibronectin binding protein B fnbB 11.2% (9) 6.03% (24) Elastin binding protein ebpS 83.8% (67) 68.8% (274) Staphylococcal protein A spa 20.0% (16) 7.04% (28) Ser-Asp rich fibrinogen-binding protein C sdrC 17.5% (14) 9.80% (39) Ser-Asp rich fibrinogen-binding protein D sdrD 7.50% (6) 2.26% (9) Ser-Asp rich fibrinogen-binding protein E sdrE 22.5% (18) 11.5% (46) Staphylocoagulase coa 17.5% (14) 3.52% (14) Extracellular adherence protein Eap/Map eap 17.5% (14) 3.52% (14) Extracellular matrix protein-binding protein emp 17.5% (14) 3.27% (13) Fibrinogen binding protein efb 17.5% (14) 3.77% (15) Secreted von Willebrand factor-binding protein vWbp 18.7% (15) 6.78% (27) Bifunctional autolysin atl 83.8% (67) 3%). These findings are consistent with a growing number of studies which report that coagulase-negative staphylococci are emerging pathogens associated with mastitis and persistence of intramammary infection in bovine worldwide 7 . As observed in this study, in a recent Canadian study, the S. chromogenes and S. simulans were among the most common species found in clinical mastitis cases 8 . Adherence is considered the first step of staphylococcal infection and the presence of biofilm aids in the process. Accordingly, adhesion-related genes are thought to be key virulence factors 8 . In the current study, the most frequently observed adherence and biofilm-forming genes were ebpS, atl, pls, sasH, sasF, rbf, tcar, sarA and sigB in both clinical and subclinical isolates. The genomes of some subclinical species i.e., S . arlettae, S. succinus,  www.nature.com/scientificreports/ S. sciuri, S. equorun, S. galinarum and S. saprophyticus lacked most of the adherence and biofilm genes, which could indicate that these species are more likely to be contaminants associated with the milk microbiota 9,10 rather than subclinical mastitis agents. Nevertheless, the S. equorum. S. sciuri, S. galinarum, and S. succinus isolates have been associated with skin and urinary tract infections in humans and mice [11][12][13] which indicates that they can potentially harbor virulence genes. These findings supports previous reports on the high presence of the ebpS gene in subclinical mastitis staphylococcal isolates from China, Iran and Poland [14][15][16] . The elevated incidence of this gene in these isolates was attributed to the fact that it mediate the binding to surface proteins or soluble elastin peptides on host cells, therefore it's importance since cell-binding is the first step of staphylococcal infection 14 . The atl gene was the second most frequently detected gene. It encodes an autolytic protein that can cause the lysis of other bacterial that compete with Staphylococcus spp. for the acquisition of nutrients in the milk 17 . This gene is also associated with bacterial internalization and secretion of proteins in S. aureus 18 . This gene's presence in most mastitis isolates could be attributed to the fact that it is implicated in diverse functions such as bacterial attachment to surface, lysis mediated biofilm formation and secretion of the cytoplasmic proteins from the staphylococcal cell wall. The atl gene has also been implicated in adherence to fibronectin, heparin, and gelatin 19 which could confer an advantage during infection as heparin is released by mast cells and basophils at the site upon tissue damage 20 . The same could be noted about the pls gene, which encodes the plasmin-sensitive protein that has a role in adherence and is an important virulence factor in mouse septic arthritis model 4 . To date, and atl genes have not been well studied-even in recent genomic comparison studies of S. aureus isolates from bovine mastitis 21,22 .
The surface proteins encoded by sasH and sasF, play an essential role in virulence because they can bind to host extracellular matrix and plasma components. They have been recently reported as the most prevalent adhesins in a genome comparison study of 24 bovine-associated staphylococcal isolates, with all isolates positive for both genes 5 . The sasH gene is significantly associated with invasive disease isolates due to its ability to inhibit the oxidative burst and promote S. aureus survival in neutrophils 20 thus allowing the organism to avoid the bovine immune response and colonize the mammary gland. In the current study, the sasH gene was not only present in S. aureus, but was also detected in S. chromogenes, S. haemolyticus, S. simulans, S. agnetis, S. capitis, and S. warneri. In contrast, Little is known about sasF and its role in virulence but it is believed that it may have an important role in thromboembolic lesions 23 and in advanced stages of mastitis when capillary damage caused by S. aureus 24 .
The coa, eap, emp efb and vWbp genes were most frequently present in clinical mastitis isolates and their presence was highly correlated with the presence of the icaABCD and R genes. This correlation was not observed in subclinical isolates. The presence of coagulase is commonly associated with virulence since it is known that coa positive strains are more resistant to neutrophil activities than those which lack the gene 25 . Also, the vWbp is another known coagulase in Staphylococcus likely has a similar effect 26 . The detection of the eap gene was only recently describe in strains of S. aureus from subclinical mastitis cases in china 14 . Manual examination of the S. aureus SAMN02603524 (NC_021670.1) genome revealed that the emp gene is located 300 nucleotides downstream from the vWbp gene ( Supplementary Fig. 14), but no other close spatial relationships were observed with the other genes. Although the eap, emp and vWbp do not share a common promoter, there is evidence that their www.nature.com/scientificreports/ expression is regulated by a conserved octanucleotide sequence (COS) and since they are involved in modulating the immune response to S. aureus infections or antibiotic, it is possible to assume that the emp gene would work with the vWbp gene in S. aureus immune response evasion 27,28 . The eap gene product has recently been shown to suppress the formation of "neutrophil extracellular traps" (NETs), which are thought to function as a neutrophil-mediated extracellular trapping mechanism 29 . The icaABCD operon is the most studied Staphylococcus biofilm forming genes and it is most frequently reported in mastitis isolates highlighting their potential to form biofilm 30 . In the current study, the icaC gene was the most prevalent in clinical and subclinical isolates, in contrast to a previous report of which icaA and icaD are the most prevalent 31 . Also, the finding of this study observed that coagulase negative Staphylococcus (CoNS) and S. aureus possessed the icaA and/or icaD gene in contrast with a previous findings that the icaA was only observed in CoNS strain while the icaD was found both in CoNS and S. aureus. The most prevalent biofilm regulatory genes detected were: rbf, tcaR, sarA and sigB which is in agreement with previous finding of high presence of sarA, tcaR in S.aureus from bovine subclinical mastitis isolates 32 . The rbf gene is an important biofilm regulatory gene and its inactivation results in a biofilm negative phenotype 33 . It has recently been shown that rbf mutants exhibit significantly increased pathogenicity compared to the wild type S. aureus strains 34 suggesting an important role in host adaptation. The rbf gene product negatively regulates hemolytic activity by repressing the expression of the hla and psmA genes. It also upregulates sarX, which, in turn, activates the icaADBC locus leading to biofilm production 35 .
The tcaR gene increases the production of PIA by regulating the expression of the icaADBC operon and the spa, sasF and sarS genes 35 . Given the high frequency of the sasF gene observed in this study, detection of its transcriptional regulators was not unexpected. The sarA family of transcriptions regulators proteins are responsible for controlling many target genes involved in virulence. Most notably, SarA is responsible for regulating the agr loci, which is a pivotal regulator of virulence in S. aureus 36 . The presence of the sarA gene in mastitis was observed in a recent study int all of the 84 S. aureus isolates from mastitis cases in Xinjiang, China 36 . The rRNA polymerase sigma factor (SigB) has a central role in stress homeostasis. This protein contributes to the synthesis of several virulence determinants defining staphylococcal pathogenesis, including the transcriptional activation of many surface proteins (such as clfA and fnbA) while downregulating the production of secreted toxins and proteases (such as Aur, SspA, SspB) 37 .
Phylogenetic analysis of Staphylococcus spp. related to mastitis has been done before, studies to date have focused on comparatively few isolates and mainly on S. aureus and observed that strains that had different origin were clustered togeter 38 .In this study, phylogenetic analysis of the 16S RNA genes indicated that S. aureus, S. epidermidis, S.caprae and S. capitis have a close relationship as observed in human clinical isolates 39 . Also, these species all possessed and shared a large number of adhesion genes. In a previous study, some authors have suggested that dairy cows can be subclinically infected with S. aureus subtypes that can cause clinical mastitis if the right conditions are present 38 . In the current study, some clinical and subclinical strains clustered together based on their16S RNA sequences, but they had different biofilm and coagulase gene contents. Also, this study shows that S. chromogenes isolates from cases of clinical and subclinical mastitis were closely related to S. agnetes and S. hyicus suggesting that these species could also demonstrated the same potential to became an emerging mastitis agent as S. chromogenes 40 .
S. aureus is reported to acquire and disseminate SaPIs through HGT events mediated by phages 41 . Moreover, S. aureus colonization of different host species is known to be facilitated by the HGT of virulence factors across different staphylococcal species 42 . It is further known that biofilm growth can increase the rate of HGT of virulence determinants such as antibiotic resistance genes 43 . In this study, the co-phylogenetic analysis suggested that HGT amongst clinical and subclinical isolates of S. chromogenes, S. aureus, and S. simulans (mainly ebpS, rbp, sarA, tcaR, pls) and sigB gene in S. aureus may occur. Moreover, the phylogenetic relationship of the adhesion and biofilm genes: ebpS, sasH, atl, sarA, rbf and tcaR are different from 16S phylogenetic distribution. This finding is consistent with the notion that HGT occurs among clinical and subclinical isolates 44 . It is therefore tempting to speculate that virulence factors may arise in staphylococcal species not generally associated with clinical mastitis by known Staphylococcus HGT mechanisms, but further study is need to demonstrate this.
Although the arbitrary source of isolates used (478 staphylococcal spp. isolated from clinical and subclinical mastitis from Brazil, Canada, India, Netherlands, and United States which had sequenced) might have introduced some biases, a number of the adhesins, biofilm, and related regulatory genes identified in this study might be useful for virulence profiling or as targets for vaccine development for mastitis-related staphylococcus species.

Methods
Genomic data. The genomes of Staphylococcus spp. from clinical (n = 80) and subclinical (n = 398) mastitis cases worldwide were downloaded from the National Center for Biotechnology Information (NCBI). An initial advanced search of the NCBI Biosample database I with "mastitis" and "staphylococcus" as keywords resulted in 925 entries. After this initial step, only complete genomes that were identified as mastitis isolates from Bos taurus or Bubalus bubalis and which were the sole agent associated with the diseases were evaluated. Also, for more detailed information, the publications or their BioSample descriptions were evaluated (Supplementary Table 1). It was assumed that mastitis states were classified according to the clinical presentation and standard triage test described by Radostits et al. 3 . Genomes in this study were from bacteria isolated in Brazil 45  Genome annotation and adhesion-related gene identification. Genomes were annotated using Rapid Annotation using Subsystem Technology (RAST) 47,48 . The sequences of the genes classified as adhesion/ adhesins or implicated in biofilm formation and their respective regulatory genes were downloaded and ana- www.nature.com/scientificreports/ lyzed manually. The genes were considered to encode adhesins or be play a role in biofilm formation based on their classification in the VFDB reference database for bacterial virulence factors 49 and/or in the RAST annotation engine 47 . 16S rRNA gene sequences were obtained from the complete genomes using the Basic Rapid Ribosomal RNA Predictor (Barrnap) v 0.9 (https:// github. com/ tseem ann/ barrn ap).

Data analysis.
The presence or absence of selected genes was used in hierarchical clustering analysis with PAST software v4.03 50 . Clusters of the isolates were created based on the most and least frequent genes. The Spearman test was used to analyze the correlation between the presence/absence of adhesin and biofilm genes in both clinical and subclinical mastitis isolates (A coefficient close to 1.0 indicates a high correlation). Gene profiling by frequency heatmaps was calculated using Numpy v1.20.3 51 . Graphs were made using Matplotlib v3.4.2 52 and, when needed, with R-software v4.1.0 53 . The statistical significance of gene presence and mastitis state was obtained by logistic regression with R software.