Metagenomic deep sequencing reveals association of microbiome signature with functional biases in bovine mastitis

Milk microbiomes significantly influence the pathophysiology of bovine mastitis. To assess the association between microbiome diversity and bovine mastitis, we compared the microbiome of clinical mastitis (CM, n = 14) and healthy (H, n = 7) milk samples through deep whole metagenome sequencing (WMS). A total of 483.38 million reads generated from both metagenomes were analyzed through PathoScope (PS) and MG-RAST (MR), and mapped to 380 bacterial, 56 archaeal, and 39 viral genomes. We observed distinct shifts and differences in abundance between the microbiome of CM and H milk in phyla Proteobacteria, Bacteroidetes, Firmicutes and Actinobacteria with an inclusion of 68.04% previously unreported and/or opportunistic strains in CM milk. PS identified 363 and 146 bacterial strains in CM and H milk samples respectively, and MR detected 356 and 251 bacterial genera respectively. Of the identified taxa, 29.51% of strains and 63.80% of genera were shared between both metagenomes. Additionally, 14 archaeal and 14 viral genera were found to be solely associated with CM. Functional annotation of metagenomic sequences identified several metabolic pathways related to bacterial colonization, proliferation, chemotaxis and invasion, immune-diseases, oxidative stress, regulation and cell signaling, phage and prophases, antibiotic and heavy metal resistance that might be associated with CM. Our WMS study provides conclusive data on milk microbiome diversity associated with bovine CM and its role in udder health.

of analytical tool, while the rest of the genera had relatively lower mean abundances (<0.10%). In contrast, the H milk metagenomes also had higher mean relative abundances of Acinetobacter (52.90%) in PS and MR pipelines followed by Pseudomonas (22.81%), Micromonospora (10.57%), Eubacterium (5.37%), Catenibacterium (2.12%), and Ralstonia (0.12%), and the rest of the genera had much lower abundances (<0.10%). In general, MR detected higher numbers of microbial genera than PS (Supplementary Tables 2 and 3), however results from the both tools were concordant, with 98.00% of the total microbial abundance composed of shared genera (Supplementary Table 4; Data 2). CM-associated bacteria changes at the strain level. We further investigated whether strain level relative abundances of the bacteria differed between CM and H samples (Figs 3 and 4). The CM milk metagenome had significantly (p = 0.001) higher number of bacterial strains than the H milk, and among the detected strains, 62.85% had unique association with bovine CM, and 7.63% were solely found in H milk (Fig. 1a). The presence of few predominating bacterial species in both categories of samples suggests that the crucial differences might be occurring at the strain level, and most of the species identified in each sample were represented by a single strain. The CM milk metagenome was dominated by 26 strains (7.16%) of Acinetobacter species while Pseudomonas, Streptococcus, Corynebacterium, Staphylococcus, Enterococcus, Bacillus and Escherichia species were represented by 22,16,12,11,8,7 and 6 different strains, respectively. However, in both metagenomes, Acinetobacter johnsonii XBB1 remained as the most abundant strain with a relative abundance of 39 Importantly, this study demonstrated that 68.04% of the detected bacterial strains were exclusively found in CM milk metagenome, and among them Pantoea dispersa EGD-AAK13, Klebsiella oxytoca, Kluyvera intermedia, Shewanella oneidensis MR-1, Kluyvera ascorbata ATCC 33433, Klebsiella aerogenes KCTC 2190, Kluyvera cryocrescens NBRC 102467, Acinetobacter pittii PHEA-2, Pseudomonas mendocina ymp and Acinetobacter gyllenbergii NIPH 230 were the most predominant strains. Furthermore, most of these strains were previously unreported and possibly played an opportunistic role in the mammary gland pathogenesis (Supplementary Data 2; Table 5).

CM-associated changes of archaea and viruses at the genus level.
Another noteworthy finding of this study is the detection of archaeal (relative abundance 0.13%, p = 0.025) and viral (relative abundance 0.38%, p = 0.019) components of the microbiome both in CM and H milk samples (Fig. 5a,  (12.30%), Methanocaldococcus (2.59%), Methanobrevibacter (1.85%), Thermococcus (1.79%) and Methanosphaera (1.53%) archaeal genera with a lower relative abundance (<0.05%) of the rest of the genera (Fig. 5a). Interestingly, none of the archaeal genera were detected in one CM sample (Ctg3C2 Microbial metabolic functions associated with cM. MR simultaneously analyzed and compared the taxonomic composition and functional profiles of our metagenomic sequences in several ways. On average, the putative genes with known and unknown protein functions were 3.94% and 5.51%, respectively, suggesting that a large proportion of the genes encoding for different functional properties are yet unknown (Supplementary Data 1). By comparing the number of genes assigned to each KEGG pathway between the groups, we found a www.nature.com/scientificreports www.nature.com/scientificreports/ series of significant differences (p = 0.001) that lead to the functional divergence between the CM and H milk microbiome groups. The PCoA analysis of functional components showed significant differences between the CM and H samples indicating significant functional differences (p = 0.035) ( Supplementary Fig. 4). In the comparative analysis, we found that genes associated with metabolism (central carbohydrate, amino acids, cofactors, vitamins, prosthetic groups and pigment), substrate dependence, clustering-based subsystems, cell motility (bacterial chemotaxis, flagellar assembly, invasion of epithelial cells), phases, prophages, transposable elements and plasmids, regulation and cell signaling, stress response, virulence, disease and defense, and immune and infectious diseases were significantly (p < 0.05) overrepresented and positively correlated with bovine CM (Figs 6 and 7; Supplementary Data 3).
Genes associated with citrate synthase (CS, gltA), fumarate hydratase class I (fumA, fumB), oxidative phosphorylation, bacterial translation, ribosome biogenesis and tRNA amino-acylation were significantly enriched in the metabolic pathways of CM associated microbiomes. The CM associated microbiome had significantly (p < 0.001) higher relative abundance (50.51%) of genes coding for benzoate degradation than the H milk biomes www.nature.com/scientificreports www.nature.com/scientificreports/ (36.41%). The CM samples had upregulation of genes for energy metabolism including carbon, sulfur, and methane metabolism compared to the H samples. The relative abundance of genes encoding ABC transporter (38.97%) and bacterial chemotaxis (68.61%) were significantly higher in CM microbes than those detected in H milk microbiomes (p < 0.005). Among the pathways in infectious diseases, genes coding for epithelial cell signaling, epithelial cells invasion, Legionellosis, Vibrio cholerae pathogenic cycle, Staphylococcus aureus, Salmonella and pathogenic Escherichia coli infection were most abundant in the CM metagenomes. Likewise, there was a predominant abundance of genes responsible for glutathione S-transferase (GST), breakpoint cluster region protein (BCR1), fumarate hydratase class II (fumC) and pyruvate kinase (pk) in different pathways causing mammary gland inflammation. The CM milk microbiomes had a significantly (p < 0.001) higher number of reads (64.29%) coding for severely combined immune deficient gene adenosine deaminase (ADA) compared to H milk microbes (28.58%) ( Supplementary Fig. 5). Furthermore, sporulation related hypotheticals and CRISPR-associated proteins We found that the CM microbiome had significantly higher abundance of genes encoding for oxidative stress (36.46%), pathogenicity islands (10.13%), phage related transposable elements (19.48%), phage packaging machinery (6.37%), phage replication (6.70%) and phage regulatory gene expression (7.10%) compared to those of H milk biomes (p < 0.003). However, the phage lysogenic conversion related genes remained higher in abundance among the healthy milk microbes. A deeper look at microbial genes associated with regulation and cell signaling revealed that CM microbes had significantly higher expression of this gene compared to healthy milk microbiome (p = 0.001). Within this subsystem, genes coding for two-component regulatory system BarA-UvrY (SirA; CM = 85.78% vs H = 67.41%), pericellular trafficking and cell invasion-the membrane type-1 matrix metalloproteinase (MT1-MMP; CM = 86.59% vs H = 73.80%), programmed cell death (CM = 55.00% vs H = 28.57%) and intra-membrane regulatory proteolytic pathway-endoplasmic reticulum chaperon grp78 (BiP; CM = 92.85% vs H = 71.42%) were predominantly found to be associated with the onset of bovine CM. We also identified novel associations of biofilm formation (BF) properties among the microbes identified in both metagenomes. The relative abundance of genes coding for protein YjgK cluster linked to biofilm formation, biofilm PGA synthesis, deacetylase PgaB, N-glycosyltransferase PgaC and auxiliary protein PgaD were over-expressed www.nature.com/scientificreports www.nature.com/scientificreports/ in mastitis-causing pathogens (p = 0.035). In contrast, the genes coding for quorum sensing (QS) in particular to QS in Yersinia, Pseudomonas and Vibrio remained overexpressed in H milk metagenomes. Moreover, of the reads assigned to different levels of SEED subsystems (6.45 million), 2.63% mapped against 30 and 28 different resistance to antibiotic and toxic compounds (RATC) genes in CM and H milk metagenomes, respectively ( Fig. 8; Supplementary Data 3). Among them, genes encoding multidrug resistance (efflux pumps, mdtABCD cluster, CmeABC operon), methicillin, vancomycin, and compounds (arsenic and chromium) resistance had two-fold higher relative abundances in CM microbiomes than H milk microbiomes. There was 5 to 7-fold overexpression of multidrug resistance to MAR locus and mercury resistance genes in CM microbes than in H milk organisms. In addition, CM-causing microorganisms harbored two additional resistance genes; multidrug resistance to operon (mdtRP) and aminoglycoside adenyltransferase (Supplementary Data 3).

Discussion
Over the past decade, metagenomics has helped shed light on the 'known unknown' component of the milk microbiome and enabled insights into its taxonomic composition, dynamics, and importance to cows udder health homeostasis. Metagenomic deep sequencing (WMS) of bovine milk has uncovered previously overlooked microbial populations of high complexity with potential roles in regulation of overall microbiome composition and function, and in the onset, progression, and treatment strategies of bovine CM. Yet today, 16Sr RNA partial gene sequencing remains the dominant technique for characterizing milk microbiomes, and findings are limited to bacterial identification at the genus level 5,9,18 , though this method has serious inherent limitations 19 . However, little is known about the association of other microbes (archaea and viruses), microbiome shift and particularly functional changes. The noteworthy findings of the present WMS study are the taxonomic profiling of bacteria at both the species and/or strain-level, the possible association of the archaeal and viral fractions with bacterial mastitis, and the crosstalk between the identified microbiomes and their functional genomics in the association of bovine CM.
The findings generated from shotgun metagenomic data are much higher in taxonomic resolution and predicted protein functions and are consistent with previous 16S rRNA partial gene based studies 1,9,18 . The core bacteria associated with bovine CM such as Acinetobacter, Pseudomonas, Klebsiella, Escherichia, Enterobacter, Staphylococcus, Streptococcus, Bacillus, Pantoea, Shewanella, Ralstonia etc. remained consistent with the metagenomic data regardless of the analytic tool. Though CM milk samples had relatively higher taxonomic abundance, their abundance remained more inconsistent corroborating several recent findings 5,18,20 . To date, around 50 www.nature.com/scientificreports www.nature.com/scientificreports/ bacterial genera have been reported in bovine milk through 16Sr RNA-based targeted amplicon sequencing 1,9,18,21 , while our current WMS study detected 356 and 251 bacterial genera in CM and H milk, respectively indicating the increased discriminatory power of this cutting-edge technology in identifying taxa in the milk microbiome 10,16 . The observed increase in phylum-level signature of Proteobacteria, Bacteroidetes, Firmicutes and Actinobacteria in CM milk independent of quarter, parity, and breeds of the cows is mostly consistent with many of the previous studies 5,9,21 . Furthermore, the CM milk metagenome had an inclusion of 68.04% previously unreported bacterial species and/or strains, most of which are opportunistic in nature. Until now, no substantial information was available regarding the association of different strains of Acinetobacter with bovine mastitis 22 . In a recent study, association of Acinetobacter causing bubaline CM 7 has been reported, supporting our present findings. The H milk metagenome had higher relative abundance of soil or environmental microbes (Micromonospora) 23 and animal skin microbes (Pseudomonas) 24 , which can potentially act as opportunistic infection leading to disease. Klebsiella pneumoniae is an opportunistic environmental pathogen, and transmission of this bacterium might occur from contaminated feces and bedding materials 25 . The gut microbiome plays a key role in maintenance of nutrition, host defense, and immune development 26 , and we revealed a close association between the gut microbiome and milk microbes in the pathogenesis of bovine CM as also reported previously 27 . Additional support for this finding includes the potential existence of an endogenous entero-mammary pathway, through which gut bacteria migrate to the mammary gland, and this could explain the predominating presence of gut bacteria such as the phyla Proteobacteria, Bacteroidetes, Firmicutes, Actinobacteria, Fusobacteria and Tenericutes, with Acinetobacter, Campylobacter, Bacillus, Enterobacter, Staphylococcus, Streptococcus and Kocuria genera in CM milk 26,27 . These pathogens use very efficient strategies to evade host defenses in order to colonize and invade mammary tissues through adhesion 28 , thereby damaging host cells and fighting with cow immune systems to produce clinical and/or chronic mastitis [28][29][30] .
Our study marks an additional step towards identifying the significant co-occurrence of archaea and viruses with bacterial population in bovine milk. In comparison to bacteria, the relative abundance and diversity of archaea 31 and viruses 32 remain substantially lower. Currently, there is no extensive evidence supporting the role of archaea and viruses in the pathogenesis of bovine mastitis. However, these microbes mostly seize the opportunity during the pathophysiological changes in the mammary glands created by bacteria 33 . Thus, it is hypothesized that archaea might follow the exact mechanisms of bacterial pathogens producing bovine CM 31   www.nature.com/scientificreports www.nature.com/scientificreports/ major bacterial phyla found in our samples: Firmicutes, Bacteroidetes, Proteobacteria and Actinobacteria. This corresponded with an increased relative abundance of these bacterial taxa in CM milk samples together with an overrepresentation of Caudovirales taxa compared with H milk samples 34 . In addition, we found a significant association of Herpesvirales (Macavirus and Rhadinovirus genera) with bovine CM 34,35 . Our current findings demonstrated that viruses neither cause bovine mastitis directly nor play a role in the initiation of the disease process, but later, when bacterial infection of the udder occurs, they replicate in the immune and epithelial cells of the udder and/or milk ducts and may act as a predisposing factor as well as a primary etiological agent for more severe and prolonged mastitis 36 .
The KEGG pathways and SEED subsystems of the MR pipeline uncovered significant differences in microbial metabolic functions in both CM and healthy samples 5,37 as supported by several previous reports on mastitis in lactating cows 9 and women 5 . The CM microbiome had significantly higher abundance of Proteobacteria and Bacteroidetes, which are well-known utilizers of milk oligosaccharides through one carbon metabolism 38 . Genes associated with the TCA cycle (gltA, fumA) and energy metabolism (oxidative phosphorylation) remained overexpressed in CM microbiomes, which might be associated with host-pathogen interactions during the progression of bovine mastitis 39 . Increased benzoate degradation by different strains of Acinetobacter and Klebsiella in CM metagenome through TCA cycle is thought to promote bacterial growth and virulence factors expressed during pathogenesis 40 . To elucidate the role of bacterial cell to cell communication in bovine mastitis, we found that genes coding for bacterial chemotaxis (cheBR, motB, rbsB and tsr) remained predominantly abundant in CM milk microbiomes suggesting their role in the early phase of mastitis for attachment to or entry into the udder tissues and virulence regulation 41 . The p38 signaling pathway exerts its biological effects in the pathophysiology of bovine CM through several complex biologic processes including expression of many cytokines, transcription factors, cell surface receptors, enzymes and oxidative stress mediators 42,43 . The up-regulation of genes coding for programmed-cell death during host-pathogen interactions in CM is associated with increased secretion of bacterial toxins or pro-inflammatory mediators 44 . Biofilm formation can be a strain specific or genetically linked trait, representing a selective advantage in pathogenesis of mastitis. The relative overexpression of genes encoding the protein YjgK cluster linked to biofilm formation and biofilm PGA synthesis in CM microbiomes is in accordance with several earlier reports 45 . Moreover, biofilm formation can also be harmful to host tissues since they can promote the phagocyte release of lysosomal enzymes, proliferation of reactive oxygen and nitrogen species, and transfer of antibiotic resistance 45 . The observed increased abundance of genes for primary immune diseases (e.g., adenosine deaminase) in CM pathogens is responsible for inhibition of T cell maturation and lymphocytic proliferation 46 , very low CD4 www.nature.com/scientificreports www.nature.com/scientificreports/ count 47 , cell-to-cell communication 47 and therefore could be used as a selective marker for bovine CM diagnosis. CRISPR/Cas systems are present in both pathogenic and commensal organisms found in bovine milk and play critical roles during the pathogenesis of mastitis by evading the hosts defense system particularly under stress conditions 48 . The type III and IV secretion systems found on the pathogenicity islands of CM associated microbes are capable of producing immunosuppression in cows by delivering effector proteins 9,49 . Phages, which are the regulators of bacterial population, play important and diverse roles in all bacterial ecosystems 36 , but their precise impact on the milk microbiome is far from being understood. The relative overrepresentation of genes coding for phage-related transposable elements, phage packaging machinery, phage replication and phage regulatory gene expression in CM microbes may suggest that bacteriophages participate in the horizontal gene transfer among the members of bovine milk microbiomes and ultimately to mammary gland pathogens 34 .
Bovine milk microbiomes are a wide source of resistance to antibiotic and toxic compounds (RATC) genes and the pathogenic bacteria within this potential reservoir are becoming more resistant. The current metagenomic deep sequencing provides a wealth of information not only on RATC genes, but on the entire gene content thereby enabling the identification of the community composition and metabolic profile. We found that all of the samples in both metagenomes harbored RATC genes (2.63%) indicating their wide and indiscriminate use in Bangladeshi dairy farms. However, most of the resistant genes in RATC functional groups remained predominantly higher in CM milk microbes. While our knowledge of the uncontrolled spread of antibiotics resistant genes in bovine mastitis pathogens 50 is increasing, information on heavy metal resistance is not yet available. This worrisome trend in increasing RATC against mastitis pathogens has become a major concern for the dairy farmers of Bangladesh, given the seriousness of such problems; effective therapies using alternative medicines are needed for successful prevention and control of bovine mastitis.

conclusions
In this study, the metagenomics of milk samples from bovine with clinical mastitis (CM) and healthy (H) controls clearly show that the microbiome composition in CM milk samples are significantly different from H milk. Furthermore, some of the detected microbes (bacteria, 68.04%, archaea, 31.82% and viruses, 40.00%) are solely found only in CM samples. The co-relations of the microbiome composition and functional metagenomics in the progression of clinical mastitis are also evidenced by abundance differences in metabolic pathways related to bacterial colonization, proliferation, chemotaxis and invasion, immune-diseases, oxidative stress, regulation and cell signaling, antimicrobial resistant genes, biofilm formation, phage and prophases etc. between CM and H samples. The presence of human pathogens including Escherichia coli O157:H7 str. Sakai, Salmonella enterica subsp. enterica serovar Typhi str. CT18, Salmonella enterica subsp. enterica serovar Typhimurium str. LT2, Bacillus cereus ATCC 14579 etc. in bovine milk and RATC genes in milk microbiome are serious concerns for public and animal health since diseased animals are improperly handled in Bangladesh. Because of the limitations we faced with fewer samples, it would be interesting to conduct similar trials using a larger sample size with a different animal population (breed, parity, body condition, lactation) and matrices prior to undertaking a metagenomics sequencing venture to elucidate the progression of the disease. Furthermore, such studies would also be enhanced by the inclusion of gut microbiome sampling in addition to the milk samples for direct testing of microbial transfer across this axis.

Methods
Study population and sampling. Details of study population and collected samples are presented in Supplementary Table 1. A total of 21 milk samples (14, CM and 7, H) from 21 lactating crossbred cows at their early stage of lactation (within 10-40 days of calving) were collected from three districts of Bangladesh (Chattogram = 12, Dhaka = 3, Gazipur = 6). Cows were diagnosed with CM using the California mastitis test 51 . Two CM and one H milk samples were collected from the same farm. Approximately 15-20 ml of milk from each cow was collected in a sterile falcon tube during the morning milking (8.0-10.0 am) with an emphasis on pre-sampling disinfection of teat-ends and hygiene during sampling 1,51 . The samples were kept in an ice box (at 4 °C temp) immediately after collection, transported to the laboratory following similar transport protocols, and stored at −20 °C until DNA extraction. DNA extraction and sequencing. Genomic DNA (gDNA) was extracted by an automated DNA extraction platform (Promega, UK) following previously described protocols 5,17 . DNA quantity and purity was determined with NanoDrop (ThermoFisher, USA) by measuring 260/280 absorbance ratios. Sequencing libraries were prepared with Nextera XT DNA Library Preparation Kit 52 according to the manufacturer's instructions and paired-end (2 × 150 bp) sequencing was performed on a NextSeq 500 machine (Illumina Inc., USA) at the George Washington University Genomics Core facility. Our metagenomic DNA yielded 483.38 million reads with an average of 23.01 million (maximum = 35.10 million, minimum = 6.77 million) reads per sample (Supplementary Data 1).

Microbiome community analysis.
We analyzed the WMS data using mapping-based and assembly-based hybrid methods of PathoScope 2.0 (PS) 53 and MG-RAST 4.0 (MR) 8 , respectively. In PS analysis, a 'target' genome library was constructed containing all bacterial and archaeal sequences from the NCBI Database (https://en.wikipedia.org/wiki/National_Center_for_Biotechnology_Information) using the PathoLib module. The reads were (2019) 9:13536 | https://doi.org/10.1038/s41598-019-49468-4 www.nature.com/scientificreports www.nature.com/scientificreports/ then aligned against the target libraries using the very sensitive Bowtie2 algorithm 16,17 and filtered to remove the reads aligned with the cattle genome (bosTau8) and human genome (hg38) as implemented in PathoMap (-very-sensitive-local -k 100-score-min L, 20, 1.0). Finally, the PathoID 54 module was applied to obtain accurate read counts for downstream analysis. In these samples, an average of 12.90 million aligned reads per sample mapped to the target reference genome libraries (96.24%) after filtering the cow and human genome (Supplementary Data 1). The raw sequences were simultaneously uploaded in MR server (release 4.0) with proper embedded metadata and were subjected to the quality filter containing dereplication and removal of host DNA by screening 55 for taxonomic and functional assignment. Diversity analysis. Alpha diversity (diversity within samples) was estimated using the Shannon index for both PS and MR reads. To test beta diversity (differences in the organismal structure) of the milk microbiome, a principal coordinate analysis (PCoA) was performed based on weighted-UniFrac distances (for PS data) through Phyloseq R 56 and Bray-Curtis dissimilarity matrix 57 for MR data. In addition, non-metric multidimensional scaling (NMDS) on PS data was also used for beta diversity 58 analysis between the sample groups 59 . Taxonomic abundance was determined by applying the "Best Hit Classification" option using the NCBI database as a reference with the following settings: maximum e-value of 1 × 10 −30 , minimum identity of 95% for bacteria, 60% for archaea and viruses and a minimum alignment length of 20 as the set parameters. The phylogenetic origin of the metagenomic sequences was projected against the NCBI taxonomic tree and determined by the lowest common ancestor (LCA) with the same cutoff mentioned above. Two phylogenetic trees consisting of 363 and 146 bacterial strains in CM and H metagenomes, respectively, with >80% taxonomic identity were constructed using the neighbor-joining method in Clustal W (version 2.1) 60  Statistical analysis. The characteristics of cows with and without CM were compared using Fisher's exact test for categorical variables and Mann-Whitney U test for quantitative variables 5,7-9 . The Shapiro-Wilk test was used to check normality of the data and the non-parametric test Kruskal-Wallis rank sum test was used to evaluate differences in the relative percent abundance of taxa in CM and H groups. The statistical analyses for the MR data were initially performed by embedded calls to statistical tests in the pipeline and validated further using SPSS (SPSS, Version 23.0, IBM Corp., NY USA) using above mentioned tests. For the functional abundance profiling, the statistical tests were applied at different KEGG and SEED subsystem levels in the MR pipeline. Differences between the pipelines were evaluated using ANOVA and the Friedman rank sum test. A significance level of alpha = 0.05 was used for all tests 8 .

Data Availability
The sequence data reported in this paper have been deposited in the NCBI database (BioProject PRJNA529353) and are available from the corresponding author upon reasonable request.