Genomic analysis of European bovine Staphylococcus aureus from clinical versus subclinical mastitis

Intramammary infections (IMI) with Staphylococcus aureus are a common cause of bovine mastitis and can result in both clinical (CM) or subclinical mastitis (SCM). Although bacterial isolates of S. aureus differ in their virulence potential it is largely unclear which bacterial virulence factors are responsible for increased clinical severity. We performed a genome wide association study and used a generalized linear mixed model to investigate the correlation between gene carriage, lineage and clinical outcome of IMI in a collection of S. aureus isolates from cattle with CM (n = 125) and SCM (n = 151) from 11 European countries. An additional aim was to describe the genetic variation of bovine S. aureus in Europa. The dominant lineages in our collection were clonal complex (CC) 151 (81/276, 29.3%), CC97 (54/276, 19.6%), CC479 (32/276, 11.6%) and CC398 (19/276, 6.9%). Virulence and antimicrobial resistance (AMR) gene carriage was highly associated with CC. Among a selection of nine virulence and AMR genes, CC151, CC479 and CC133 carried more virulence genes than other CCs, and CC398 was associated with AMR gene carriage. Whereas CC151, CC97 were widespread in Europe, CC479, CC398 and CC8 were only found in specific countries. Compared to CC151, CC479 was associated with CM rather than SCM (OR 3.62; 95% CI 1.38–9.50) and the other CCs were not. Multiple genes were associated with CM, but due to the clustering within CC of carriage of these genes, it was not possible to differentiate between the effect of gene carriage and CC on clinical outcome of IMI. Nevertheless, this study demonstrates that characterization of S. aureus CC and virulence genes helps to predict the likelihood of the occurrence of CM following S. aureus IMI and highlights the potential benefit of diagnostics tools to identify S. aureus CC during bovine mastitis.

Heatmaps of the BLAST score ratio (BSR) 25 of all S. aureus genes annotated as (putative) SAs or SSLs by prokka 26 demonstrated that bovine S. aureus CCs differ in their carriage of these immune evasion factors (Supplementary Figs. S1, S2). Notable differences in SA carriage were the high number of SAs (up to 12 different SAs) genes carried by CC151, whereas CC398 isolates lacked SAs. Although most SSLs were detected in all S. aureus, the BSR of these SSLs differed between S. aureus CCs. However, the SSL-7, SSL-8, SSL-9 genes were only absent in CC479 S. aureus. Furthermore, two genes that were annotated as unnamed SSLs (GenBank references: WP_143564871.1 and WP_124375191) were exclusively found among CC97 isolates ( Supplementary Fig. S2).
To screen for target genes that could differentiate between major ruminant CCs in a PCR-based assay, the presence of potential CC exclusive genes was also investigated ( Supplementary Fig. S3, Supplementary Table S1). The highest number of CC-exclusive genes were found for CC479 (n = 17), followed by CC20 (n = 4), CC151 (n = 3), CC8 (n = 2) and CC133 (n = 2). For CC97, CC398, only a single unique gene was identified and no CC exclusive genes were found for CC1 isolates.
Heterogeneous spatial distribution of CCs across Europe. There was a significant difference in the distribution of S. aureus CCs between different countries (Fisher's Exact Test, p < 0.001). Whereas CC151 (10 out 11 countries) and CC97 (9 out 11) S. aureus were detected in almost all countries, CC398 (6 out 11) and CC479 (5 out 11) were considerably less widespread in our collection ( Table 2). The CC398 lineage was predominantly     www.nature.com/scientificreports/ found in isolates from Poland and Spain. Isolates belonging to CC8 originated exclusively from either Italy or Bavaria region in Germany. Also, all CC50 isolates originated from Denmark ( Table 2).

Pangenome of bovine S. aureus and phylogenetic analysis.
To investigate the phylogeny of our bovine S. aureus isolates, the pangenome of the entire collection was determined (n = 5720 genes) and a phylogenetic tree was constructed based on a super-alignment of the 1953 genes in the core genome (genes present in > 99% of isolates) (Fig. 1). A second phylogenetic tree was built based on the binary presence or absence of genes in the accessory genome (Fig. 2). Within both phylogenetic trees, isolates clustered within CC and two  In general, the phylogenetic tree based on the core genome was similar to the tree based on the accessory genome. In addition, the phylogenetic tree shows considerable variation in accessory gene content, most notably within CC151 and CC97, but almost no variation in accessory gene carriage was observed among CC479 and CC133 S. aureus isolates (Fig. 2).
Staphylococcus aureus CC479 is associated with CM. In order to further study the association between S. aureus CC and clinical outcome of IMI, a generalised linear mixed model (GLMM) was built with CM versus SCM as the outcome variable, CC as predictor variable and the country of orgin of isolates as a random effect. For this model, isolates belonging to CCs with n < 10 (CC9, CC50, CC49, CC7, CC45, CC5, CC101, CC22, CC30, and CC425) were clustered together in a single category labeled as 'other' . Including CC improved the fit over the model compared to the model with only random effects (ANOVA, p = 0.003) and isolates belonging to CC479 were more likely to originate from CM than from SCM (OR 3.62; 95% CI 1.38-9.50, p < 0.01)

Clinical manifestation of mastitis is associated with a large number of different genes.
To study genetic differences that could underlay variation in pathogenity of bovine S. aureus isolates, we performed a genome wide association study (GWAS) and 153 genes were associated with CM isolates. In agreement with the results from the GLMM, genes exclusively present in the CC479 isolates (n = 59) had the highest OR, ranging from 4.6 to 5.6 [Benjamini-Hochberg Procedure (BHP); p-values < 0.05]. Among the CM-associated genes, a total of 20 genes matched our inclusion criteria for significant genes of interest (best pairwise comparision p < 0.1; BHP p < 0.1) and the OR of these genes ranged from 1.95 to 3.42 (Table 3). Among these genes, the hypothetical potentially bacteriophage derived DUF3310 domain-containing protein (OR 2.35) was the only gene associated with CM that was detected among all main lineages. Carriage of SA genes seM, seN, seI, seG, seO www.nature.com/scientificreports/ and seU, as well as the accessory gene regulator (agr) D type II gene was associated with CM and these genes were present in all CC479 isolates and the majority of CC151 isolates (Table 3). In addition, three genes from within the S. aureus pathogenicity island bovine 3 (SaPI bov3 ) were also associated with CM and were carried by all CC151, CC479 and 77% of CC97 isolates. Furthermore, 61 genes were associated with SCM (i.e. an OR < 1), from which 10 genes matched our selection criteria and the OR of these genes ranged between 0.09 and 0.44 (Table 4). Most genes (8/10) coded for hypothetical proteins and the genes with predicted function were identified as an antitoxin YezG family protein (OR 0.32) and a putative DNA binding protein (OR 0.29). The SCM-associated genes were mostly carried by CC1, CC20, and CC8 S. aureus, but were always absent from CC479 and most CC151 isolates (Table 4).

Discussion
There is a large diversity in the carriage of virulence and AMR genes between bovine S. aureus lineages 6,9 , but it is unclear which genetic differences underly the observed variation in pathogenicity following bovine IMI. Therefore, this study aimed to identify genetic differences between S. aureus isolated from CM and SCM in dairy cattle. A secondary goal of the study was to describe the diversity of bovine S. aureus lineages in Europe and their carriage of immune evasion factors. We found CC479 to be strongly associated with CM rather than with SCM cases.
Although eighteen different CCs were present in our isolate collection, most S. aureus belonged to a limited number of CCs, with the five CCs (CC151, CC97, CC479, CC133 and CC398) making up more than 75% of all isolates. All these CCs have previously been associated with bovine mastitis 6,19,27,28 . The distribution of S. aureus CCs differed between geographical locations. Although the isolates in our collection were not a random sample, they did originate from 254 unique herds with a maximum of one CM and one SCM isolate from the same herd. The prevalence of S. aureus CCs per country from our study is in line with studies performed in Denmark 29 , Germany 6 and The Netherlands 30 . Nevertheless, these isolates cannot be expected to fully represent the genetic diversity of bovine S. aureus on a national level. Because in addition to a relative low number of isolates, there likely was some clustering within certain regions in several of the countries, which may have resulted in an underestimation of the true genetic diversity in the population. We must also note that the aim of our sampling Table 4. Odds Ratio, carriage rate per clonal complex and GenBank references of genes associated with subclinical mastitis based on a genome wide association study performed on 276 S. aureus isolates obtained from bovine clinical and subclinical mastitis in 11 European countries. a Gene annotation by prokka and confirmed using BLAST search of gene sequence. b Odds Ratio of causing clinical rather than subclinical mastitis from the Genome wide association study. c Best pairwise comparison P value from GWAS performed using scoary. d Carriage rate of gene per clonal complex. e Includes isolates belonging to CC9, CC50, CC5, CC49, CC7, CC45, CC101, CC20, CC30 and CC425.   20,21 , did identify genetic differences between CM and SCM S. aureus isolates, but possibly, some other associations may have been missed in our approach.
A key finding of the current study was the association between S. aureus belonging to CC479 and CM. This corresponds with a previous study which reported that experimental infection with CC479 S. aureus results in more severe clinical signs and a higher bacterial load than infection with a CC151 S. aureus strain 32 . In addition, the association between CC479 and CM was also observed in our previous analysis of Dutch mastitis isolates 20 . The underlaying mechanisms for this apparent increased virulence of CC479 remains unknown, but we have suggested a SNP in the repressor of toxins (rot) gene, resulting in an increased production of LukMF' by CC479 isolates as a possible cause 20 . Our GWAS identified several genes associated with clinical outcome of IMI, but as S. aureus belonging to the same CC share a similar (virulence) gene profile, there was a strong link between carriage of these genes and lineage. Therefore, it was not possible to differentiate between the effect of individual genes/sets of genes and S. aureus lineage on the clinical outcome of IMI. Interestingly, genes associated with CM by GWAS which were all present in CC479 isolates, were also almost all present in CC151 S. aureus (Table 3), whereas the latter CC had an approximately even distribution of isolates originating from CM (51%) and SCM (49%). Since CC151 and CC479 share most CM-associated genes, these genes are likely spuriously associated to CM, due to confounding by other factors within CC479. Differences in gene expression are more likely causes of the increased virulence of CC479, in line with our hypothesis regarding the non-functional rot gene 18 . We confirmed that this mutation was present in all CC479 isolates within our collection (results not shown). It is likely that the absence of functional rot affects the expression levels of multiple virulence genes within CC479 S. aureus 20 , likely leading to substantially increased probability of causing CM. Future research, e.g. infection studies with genetically modified CC479 isolates lacking potential virulence factors or with restored rot function, are required to identify these potential causal mechanisms.
The results from our GLMM suggest that CC8 and CC1 are less likely to cause CM in cows and most of the genes that were associated with CM were carried by CC8 and CC1 isolates. Previous work identified that CC8 predominantly make up the S. aureus genotype B, a highly contagious subtype of S. aureus 28,33 . We only detected CC8 S. aureus among isolates from Italy and Germany, suggesting that this lineage is not widespread throughout Europe. Indeed, reports on CC8/GTB S. aureus primarily originate from Switzerland and Austria 28,34,35 . In contrast, the CC1 lineage was detected in six different countries in this study, although 50% of the CC1 isolates were collected in Poland.
Phylogenetic trees constructed based on core genome alignment and binary presence and absence of accessory genes clustered the isolates perfectly within the assigned CC. This demonstrates that MLST is an adequate genotyping technique, but Fig. 2 shows that considerable variation exists in the accessory genome among isolates within a CC. Interestingly, there was very limited variation in accessory gene carriage among CC479 and CC133 S. aureus isolates and this suggests a rapid clonal expansion of these specific S. aureus clones. The CC133 lineage is considered a ruminant-adapted lineage but is mostly associated with small ruminants 16 . Therefore, it is possible that the highly similar bovine CC133 S. aureus isolates represent a subgroup of CC133 S. aureus that is adapted to cattle and this subgroup, as well as the highly similar CC479 isolates, is considerably more successful at infecting the bovine mammary gland than other CC133, CC479 S. aureus.
Although our analyses showed associations between specific genes and CC with CM, it is important to note that none of these genes were essential for CM in our collection. Clones that were associated with CM cases based on CC or gene carriage were also cultured from SCM and vice versa. Besides pathogen factors, the clinical outcome of an IMI depends on host factors, such as breed, stage of lactation, environmental factors and their interaction 31,36 . Nevertheless, from a diagnostic perspective, the identification of S. aureus isolates with an increased risk of developing into CM (e.g. CC479) or highly contagious isolates (e.g. CC8) can support farmers and veterinarians in deciding on the most appropriate intervention strategy to control S. aureus in a herd. For Streptococcus uberis, it has been demonstrated that MALDI-TOF MS performed on milk samples can identify S. uberis isolates with increased CM risk 37 . However, it is not certain if this technique is able to distinguish between different CCs of S. aureus. There are several PCR based test that detect pathogen-specific genes in milk and an assay for the detection of CC8 (genotype B) S. aureus has already been developed 44 . We identified multiple CCspecific genes that could be employed to differentiate between common ruminant-associated S. aureus lineages (Supplementary Table S1).
In summary, this study identified that a limited number of S. aureus CCs are responsible for bovine mastitis in Europe and that CC479 is strongly associated with CM. Although our analysis shows specific genes are associated with CM, the presence of these genes in CC associated with SCM indicates it is likely that differential gene expression rather than gene carriage affects clinical presentation of IMI. This study shows that the type of infecting S. aureus influences the clinical outcome of IMI in dairy cattle. Therefore, identification of S. aureus CC can help predict the likelihood of the occurrence of CM following S. aureus IMI and highlights the potential benefit of diagnostics tools to identify S. aureus CC during bovine mastitis.

Methods
Bovine mastitis isolates. Twelve  Clinical mastitis was defined as visible signs of inflammation of the udder, indicated by a hard swollen quarter and/or abnormal milk; SCM was defined as a high somatic cell count (> 200,000 cells/mL) while no clinical signs of mastitis could be observed.. The definitions were communicated to all participating researchers and were stated in a workbook to ensure that a uniform definition of CM and SCM was applied by all participants. No constraints were put on duration or chronicity of infections. The clinical status (CM or SCM) was observed and recorded at the moment of sampling.
Isolates were recultured from their transport media onto to sheep blood agar plates, and incubated overnight at 37 °C. Next, single colonies were picked and grown in 2 mL Todd Hewitt Broth (THB) (Sigma, St. Louis, MO, USA) for 16 h at 37 °C with agitation. Bacterial glycerol stocks (25% glycerol) were made by adding 0.5 mL of bacterial broth to 0.5 mL 50% glycerol solution in distilled water. Furthermore, DNA was extracted using a simple boiling protocol to confirm bacterial species by PCR targeting the S. aureus specific femA gene, as described by Hoekstra et al. 20 . Isolates that were negative for the femA PCR or lacked mandatory metadata were excluded from the final collection. For each herd, a maximum of one isolate from a CM case and one from a SCM case was allowed and if multiple CM or SCM isolates were donated from the same herd, one was selected using the random number function of Excel 2015 (Microsoft, Redmond, WA, USA). DNA extraction, genome sequencing and multilocus sequence typing. DNA for whole genome sequencing was extracted using the DNeasy UltraClean Microbial Kit (Qiagen, Venlo, The Netherlands) according to the manufacturer's instructions. Purity and DNA yield were measured by spectrophotometry and wholegenome sequencing was performed using Illumina HiSeq sequencing (Illumina Inc., San Diego, CA, United States). Multilocus sequence type was determined based on genome data. Each sequence type was assignment to a CC based on eBURST analysis using PHYLOViZ Online 38 .
Annotation of genomes, pan-genome analysis and phylogenetic analyses. After quality control of whole genome sequence results, genomes were annotated using prokka v1.11 26 and the pan/core genome (genes present in 99% < of genomes) determined using roary v3.12 39 . Alignment of the core genome was performed using MAFFT v7.407 and the phylogenetic trees was built with Fasttree v2.1 40 . Subsequently, trees were visualized using iTOL v3.6 41 and trees were rooted using the CC22 clade. The large-scale BLAST score ratio (LS-BSR) pipeline was used to obtain matrixes with BLAST score ratio of each annotated gene 25 . For each isolate, the presence of the genes encoding leukocidin LukMF' (lukM, GenBank accession: 1262967; lukF′, GenBank accession: 1262954), ruminant-specific Staphylococcal complement inhibitor variant (scn, Genbank accession: ADN53656.1), SaPI encoded ruminant specific vWFbp variant (SaPI vWFbp, GenBank accession: HM234507.1), enterotoxin type I (seI, GenBank accession: EFC00985.1), enterotoxin L (seL, GenBank accession: BAO65763.1), toxic shock syndrome toxin 1 (tsst-1, GenBank accession: WP_001035596.1), penicillin-hydrolyzing class A beta-lactamase (blaZ, GenBank accession: WP_000733621.1), tetracycline resistance protein type M (tetM, Gen-Bank accession: AKI94996.1) and penicillin binding protein 2A (mecA, GenBank accession: WP_104447100.1) was determined using LS-BSR output. Genes were identified using a threshold value of BSR of > 0.9 compared to the reference gene. In addition, a heatmap of LS-BSR score of isolates was created for genes annotated as SLL or SA by prokka 26  Statistical analysis. The GLMM analysis was performed using the lme4 package 44 of the R statistical software version v3.5.4 43 . The model used clinical manifestation of mastitis (SCM, CM) as outcome variable and CC of S. aureus was a fixed effect. The country of origin of isolates was used as a random effect in the model. To reduce the number of levels within the variable CC, CCs represented by < 10 isolates were grouped together into a category 'Other' . Furthermore, the association between CC and country of origin of mastitis isolates was investigated by Fisher's exact test and association between genetic cluster and clinical manifestation was investigated using the Pearson's Chi-squared test. Both tests were performed using the R statistical software version v3.5.4 43 . GWAS. Based on the pangenome analysis in roary, GWAS was performed using scoary v1.6.16 45 . Default settings of scoary were used during our analysis with an initial threshold of naive p < 0.01. Genes that matched our inclusion criteria (BHP p < 0.1; best pairwise comparison p value of p < 0.1) were considered significant results of interest.