Comparative genomics and metabolomics analysis of Riemerella anatipestifer strain CH-1 and CH-2

Riemerella anatipestifer is a major pathogenic microorganism in poultry causing serositis with significant mortality. Serotype 1 and 2 were most pathogenic, prevalent, and liable over the world. In this study, the intracellular metabolites in R. anatipestifer strains RA-CH-1 (serotype 1) and RA-CH-2 (serotype 2) were identified by gas chromatography-mass spectrometer (GC–MS). The metabolic profiles were performed using hierarchical clustering and partial least squares discriminant analysis (PLS-DA). The results of hierarchical cluster analysis showed that the amounts of the detected metabolites were more abundant in RA-CH-2. RA-CH-1 and RA-CH-2 were separated by the PLS-DA model. 24 potential biomarkers participated in nine metabolisms were contributed predominantly to the separation. Based on the complete genome sequence database and metabolite data, the first large-scale metabolic models of iJL463 (RA-CH-1) and iDZ470 (RA-CH-2) were reconstructed. In addition, we explained the change of purine metabolism combined with the transcriptome and metabolomics data. The study showed that it is possible to detect and differentiate between these two organisms based on their intracellular metabolites using GC–MS. The present research fills a gap in the metabolomics characteristics of R. anatipestifer.

Riemerella anatipestifer (RA) is a Gram-negative pathogen with a capsule and belongs to the family of Flavobacteriaceae. Previous studies have reported that the genetic modification in RA was occurred simply and rapidly by the natural transformation [1][2][3][4][5][6][7][8] . RA infection is a major epidemic disease of fowls, which causes septicemia and is characterized by serious fibrinous polyserositis. So far, at least 21 serotypes RA were reported in the world [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] . Serotypes 1 and 2 of RA were the dominating serotypes in China. The virulence and metabolic capability in two serotypes have seen some differences across and within serotypes. While some research has been carried out on many pathogenicity and resistance factors in the two strains [24][25][26][27][28] data about the metabolic pathways required for growth and infection, particularly the natural utilization of carbon and nitrogen sources has remained unclear.
Metabolism plays an important role in host-microbe interactions whether acute or persistent infections 29 . Researches on the types and contents of metabolites, as the final products of gene expression in the organism, are a critical supplement to genomics and proteomics research. Previous studies have shown that RA cannot grow on common nutrition agar or MacConkey nutrition agar and is slightly more challenging to grow in vitro. Unlike many other Flavobacteriaceae family germs, RA cannot ferment carbohydrates, although some strains can produce acid by glucose, maltose, inositol and fructose 30 . Compared with other gram-negative bacterial infection of the domestic birds, the reaction of oxidase and catalase in RA were positively tested. Moreover, acid and alkaline phosphatase, gelatinase, esterase C4, ester lipase C8, α-glucosidase and leucine-, valine-and

Results
The metabolite profile of two RA species. To find more intracellular metabolites in RA, a non-target metabolomics approach was used for RA metabolic profiling. Figure 1a showed that the total ion chromatogram (TIC) of samples from the two RA strains, thus illustrating the significant variations of metabolic profiling. 81 different metabolites were confirmed as listed in Table 1. 39 compounds were identified by pure standard compounds and MS-library match. These compounds were divided into nine classes of chemical substances, including nucleotides, organic acids, amino acids, phosphates, sugars, fatty acids, amines, polyols and others (nicotinamide, urea, parabanic acid and 2,4,6-tritert-butylbenzenethiol). The chemical classes of amino acids, organic acids, nucleotides, phosphates, sugars, fatty acids accounted for 28%, 15%, 11%, 11%, 11% and 10% of all identified metabolites, respectively. Relatively few polyols (8%) and putrescine involved in amine (1%) were identified (Fig. 1b).

Metabolites correlation analysis in RA-CH-1 and RA-CH-2. Correlation analysis was performed to
search for the metabolite-metabolite potential relationships in these two organisms. The results allowed the identified chemicals concerning one other. Particularly, the metabolite-metabolite correlations of the differential metabolites were compared in two sample combinations. As shown in Fig. 2, 1053 of 3240 metabolitemetabolite correlations found a statistically correlation (p < 0.05). Out of 1053 statistical correlations, 874 had a significant positive correlation and 179 were negative. The same chemical class of metabolites was found to have positive correlations. Most of the amino acids and the polyols seem to be negatively correlated by comparison to other metabolites. Meanwhile, many fatty acids and amino acids including hexadecanoic acid, cysteine, octadecanoic acid, proline, pentadecanoic acid, glutamic acid, suberyl glycine and tetradecanoic acid found a significant inverse correlation with other metabolites. However, most of the organic acids and amino acids had positive correlations in comparison to other metabolites.

Construction and application of the metabolic models in RA.
To insight the function and the relationship of the metabolites, the 81 identified metabolites were classified by hierarchical clustering analysis (HCA). HCA results showed that metabolites were clustered in 9 groups (Fig. 3). Metabolites with similar metabolic patterns have similar functions and participate in the same metabolic pathway. We identified 56 metabolites involved in 7 main metabolisms by the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The 7 main metabolisms include amino acid metabolisms, carbohydrate metabolisms, energy metabolisms, lipid metabolisms, nucleotide metabolisms, cofactors and vitamin metabolisms and biosynthesis of secondary metabolites.
In an attempt to understand the complex interactions between metabolites and gene products in RA, we constructed the draft genome-scale metabolic (GSM) model of the two organisms based on genomic and metabolite data. The formulation of biomass components directly influences the essentiality of each gene. The available relative abundances of RA were DNA, RNA, proteins, cell wall, lipids and cofactor (Supplementary Table S1). The models were gap-filled in simulated media to force a minimum flux of 0.1 through the bio1 reaction. 16 new reactions were added to the draft RA models, while 1 existing reactions were made reversible (Supplementary Table S2). As shown in Table 2, the GSM model of RA-CH-1 (iJL463) consists of 788 reactions, 841 metabolites and 463 genes, while the GSM model of RA-CH-2 (iDZ470) consists of 801 reactions, 862 metabolites and 470 genes. No mass imbalance was found at neutral pH. A detailed description of the models containing all network reactions was detailed in the supplemental data (iJL463 in Supplementary Table S3 and iDZ470 in Supplementary  Table S4). Most of the reactions are broadly concentrated in lipid, amino acid, and carbohydrate metabolism. Flux balance analysis (FBA) of iJL463 and iDZ470 reveals a diversity of the predicted metabolites that are essential for growth in RA (see Supplementary Tables S5, S6). To further separate the groups, supervised PLS-DA was carried out by fitting tested samples. By PLS-DA analysis, excellent separation of the groups of RA-CH-1and RA-CH-2 was achieved, suggesting significant differences in metabolites between the two groups. Subsequently, a cluster of 100 permutated models from two components was visualized using validation plots. The quality of the supervised models and the goodness of fit were evaluated by the values of R 2 and Q 2 . The values of R 2 and Q 2 in the exact test were below the original ones, which indicated that the model was reliable. Moreover, the value of Q 2 = − 0.299 was obtained, indicating that the model was not over-fit.    Table 3. As shown in the hierarchical clustering heat map (Fig. 4a), the HCA results showed significant variation between the two organisms. 7 metabolites were detected at higher concentrations in RA-CH-1 samples than RA-CH-2 samples, while 17 metabolites had lower contents in RA-CH-1 samples than RA-CH-2 samples. The 7 higher metabolites included urea, glucose-6-phosphate, citramalic acid, guanine, fructose-6-phosphate, pyruvic acid and lactic acid, while the 17 lower metabolites included 7 nucleotides, 4 sugars, 3 amino acids, 1 organic acid, 1 phosphate and 1 fatty acid. Besides, we found that there was a strong correlation among 24 biomarkers. To compare metabolic pathway activities from metabolite profiles between the two organisms, we have correlated metabolomics datasets and metabolic pathways activities using Pathway Activity Profiling (PAPi) 45 . A variety of metabolic pathways participated in the metabolism of amino acids, secondary metabolites, carbon, energy, lipids, cofactors and nucleotides (Fig. 4b). As compared with RA-CH-2, RA-CH-1 did not display much difference in the metabolism of amino acids. However, among secondary metabolites, streptomycin biosynthesis, phenylpropanoid biosynthesis, novobiocin biosynthesis and carbapenem biosynthesis were up-regulated in RA-CH-2. These metabolisms are essential for RA adapted to their environment and improve the survivability of RA against abiotic or biotic stresses. The metabolism of cofactors and vitamins was up-regulated in RA-CH-2 compared to RA-CH-1. This indicated that the metabolic compounds related to this pathway were active in RA-CH-2. Most subgroups of carbohydrate metabolism in RA-CH-2 were up-regulated, suggesting that carbohydrates biosynthesized supply energy for fatty acid metabolism. For example, pyruvate, propanoate and pentose phosphate metabolism were all upregulated in RA-CH-2, indicating that their metabolism was activated.
To observe the metabolite-by-metabolite differences for compounds, nucleotides in two strains were with the maximum changes compared to other metabolites, but purine metabolism did not change a lot (0.39-fold higher in RA-CH-2). Figure 4c represents a schematic of purine metabolism. AMP is converted to adenosine by 5′-nucleotidase (EC 3.1.3.5) and then to adenine by Purine nucleoside phosphorylase (PNP, EC 2.4.2.1). The concentration of Adenosine in RA-CH-2 was two-fold higher than that in RA-CH-1, while the concentration of Adenine in RA-CH-2 was 0.3-fold higher in RA-CH-1.To explore whether differential expression of particular genes would cause reaction flux changes, the reactions associated with genes change were check by the transcriptome analysis (see Supplementary Table S8), the up-regulation of Adenosine in RA-CH-2 could be the reason for the level of 5′-nucleotidase expression significantly increased (1.73-fold change) compared with that in RA-CH-1. Meanwhile, the level of Purine nucleoside phosphorylase expression was significantly reduced (0.86-fold change) in RA-CH-2.

Discussion
Metabolomics is a group of indicators for high-throughput detection and data processing, dynamic metabolic changes in the overall, especially for intercellular metabolism, genetic variation and environmental changes. Like the other three omics techniques, metabolomics plays an important role in systems biology. Metabolomics data sets have become more comprehensive and provide substantial evidence for the molecular mechanisms 46 . The cellular metabolism of microbial cell responds very quickly, generating marked changes in intracellular metabolites as low as 0.1 mM/s [47][48][49] . The quenching of metabolism is an important precondition for accurate measurement of the concentrations of identified metabolites. The bacterial membrane and cell wall might damage during quenching causing the leakage of intracellular metabolites 50 . As a gram-negative bacterium, the cell wall structure of RA is narrower and less robust compared with yeasts and gram-positive bacteria. It seems that RA may be very susceptible to the leakages during quenching. In this study, we choose cold methanol as a quenching solution. Several studies have revealed that cold methanol is the currently most used quenching solution allowing removal of extracellular metabolites [51][52][53][54] .
After quenching and chemical derivatization, over a hundred GC-MS peaks were detected and 81 intracellular metabolites of the two organisms in the TSB medium were identified. In this study, we reveal compositional differences by multivariate analysis. The PCA model of RA is very useful to detect outliers, but the PCA score plot   www.nature.com/scientificreports/ does not give a good representation of the class difference between the groups of two strains. For PCA, the group separations are exposed only when the inter-class variation is more than the intra-class variation. The group differentiation was influenced by many factors, like sample preparation problems 55 , experimental deviation 56 and inadequate data pretreatments 57 . In contrast with PCA, PLS aggressively over-fit models to the sets and contributed scores in which groups are separated 58 . Meanwhile, validation is a critical step in guarantying the quality of PLS model 59 . As a result, PLS-DA analysis generates excellent group separation of two strains samples. The 81 metabolites include organic acids, nucleotides, phosphates, amino acids, sugars, fatty acids, polyol and amine. We detected a few new metabolites that had not been shown before in Flavobacteriaceae, such as Inositol 1-phosphate (Ins-1P). Ins-1P is primarily formed by Myo-inositol-phosphate synthase (MIPS) catalyzing glucose-6-phosphate (Glc-6P) 60 , which is an important precursor substance of phosphatidyl-myo-inositol and mycothiol 61 . Ins-1P is an essential metabolite for growth and virulence in some bacteria 62,63 .
Putrescine as only one polyamine was identified in two strains. Putrescine is mainly related to biofilm formation 64 and protein synthesis 65 , so putrescine metabolic pathways may be considered as a potential novel target for antibiotics [66][67][68] . The metabolism of putrescine biosynthesis in microorganisms has two alternative pathways. The first putrescine biosynthesis pathway begins with the conversion of ornithine to putrescine by ornithine decarboxylase (ODC, EC 4.1.1.17), which is encoded by speC gene. Another one is synthesized from  Table 3. Differential intracellular metabolites between RA-CH-1 and RA-CH-2 groups. a *p < 0.05; **p < 0.01; ***p < 0.001. b FC: fold change in RA-CH-2/RA-CH-1. . Ornithine is also detected in this study. It seems that the second pathway exists in RA. In Fig. 2, the variation trend of Ornithine does not correlate with Putrescine. Meanwhile, the result of ODC and ADC enzyme active is negative and positive by decarboxylase test (data not shown in this study), respectively. It suggests that the metabolite profiles provide insight and inferences into a lot of information quickly, but it remains for the analysis.

Metabolites
On the other hand, here, we have shown that the cellular metabolism of purine (Fig. 4c). These pathways have a pivotal role in the control of the intracellular nucleotides concentration. Although the concentration of Adenosine in RA-CH-2 was 2 times higher than that in RA-CH-1, the concentration of Adenine was regulated by the transcription and expression of Purine nucleoside phosphorylase. Linking the metabolomics data to preexisting transcriptomics data on RA can be useful in the next steps to explain the difference in the metabolism.
In this study, the application of metabolomics was first used to research the metabolic profiling and biomarkers of RA. This study revealed that RA-CH-1 and RA-CH-2 have different metabolic profiles. A large portion of different virulence and serotype phenotypes can be attributed to known virulence-associated secondary www.nature.com/scientificreports/ metabolites. Using the metabolome dataset, we obtained a classification model that differentiates RA-CH-1 and RA-CH-2 with good accuracy. Further, to establish a GSM model of two strains are useful frameworks to explore the versatility of RA and to understand the molecular pathogenesis. The combination of genomics and metabolomics has greatly promoted the research to identify interacting pathways. Metabolomics can also be used to bridge genotype to the phenotype of RA.

Materials and methods
Strains and growth conditions. RA wild strains CH-1 (BioSample ID: SAMN02603758) and CH-2 (BioSample ID: SAMN02602992) were isolated and keep in our lab. Briefly, two strains were grown in 500 ml flasks overnight at 37 °C and 150 rpm in 200 ml tryptic soy broth (TSB) and used as seed broth. The overnight cultures were added in a new 300 ml TSB medium and made the initial value of OD 600 to 0.1. The flasks were incubated at 37 °C with shaking at 180 rpm. For untargeted metabolomics analysis, the samples were collected after 8 h of incubation. The growth curve was shown in Supplementary Fig. S1. The value of OD 600 in RA is about 3.2. The medium was autoclaved at 121 °C for 15 min. 8 biological samples of each strain were separately processed in the same conditions and same culture time.
Quenching and chemical derivatization of intracellular metabolites. For cell quenching, 200 ml cultures were rapidly transferred to centrifuge tubes containing 800 ml 60% methanol solution (− 40 °C) and maintained at − 20 °C in a refrigerated bath for 5 min. The mixture was centrifuged for 5 min at 7500 g and − 10 °C. The supernatant was removed rapidly and the cell suspension was washed twice with 10 ml cold 0.85% NaCl solution. After washing, the cell suspension was frozen at low temperature by the vacuum freeze dryer. The 10 mg freeze-dried powder samples were extracted with 1 ml 50% methanol (− 20 °C) and vortexed for 30 s. To improve the precision of GC analysis, 60 μl 0.2 mg/ml nonadecylic acid-methanol solution and 60 μl of 10 mM d4-alanine-methanol solution were added to 1 ml extract as internal standard (IS). To ensure effective extraction, the tubes were treated by the freeze-thawing method with liquid nitrogen for 5 min and repeated 3 times. The extract was centrifuged for 10 min at 13,000g and 4 °C. The supernatant was blow-dried by vacuum concentration. The samples were resuspended in 60 μl 15 mg/ml methoxyamine pyridine solution and reacted for 120 min at 37 °C. 60 μl BSTFA reagent (containing 1% TMCS) was used for the derivatization reactions. The reaction mixture was derivatized within 90 min at 37 °C. After low-temperature centrifugation, the supernatant was used for GC-MS analysis.

GC-MS analysis.
To get the high-quality data in high throughput analysis, the quality control (QC) method was used to monitor analytical accuracy from the pooled samples and report the data quality as previously described 69 . The pooled mixtures were prepared by 20 μl of all the biological test samples.
The samples were analyzed by Agilent 7890A/5975C GC/MSD System. The complex mixture of compounds was separated on GC column HP-5MS (30 m × 250 μm × 0.25 μm, Agilent) coated with 5% phenyl/95% methylpolysiloxane. To separate the derivatives, the helium was the carrier gas set at a constant flow of 1 ml/min. 1 µl sample was injected by a split mode with a split ratio of 20:1 using the auto-sampler. The injection port temperature was set at 280 °C, the transfer line temperature set to 150 °C and the ion source temperature adjusted to 230 °C. The programs of temperature-rise were as followed: 60 °C initially for 2 min, increased to 300 °C at 10 °C/min and 300 °C was maintained for 5 min. The range of mass spectrometry was set from 35 to 750 (m/z) by a full-scan method.
Genome-scale metabolic reconstruction of R. anatipestifer. The genomic data of RA-CH-1 (Gen-Bank: CP003787.1) and RA-CH-2 (GenBank: CP004020.1) were re-annotation by RAST prokaryotic genome annotation server 70 . The draft metabolic models were generated by Model SEED (https ://model seed.org/genom es/) based on the gene re-annotation 71 . The growth conditions of RA are unknown, so RA is auto-completed in simulated media (Supplementary Table S7). The gene-protein-reaction (GPR) associations were available as an Excel file and showed the relationship among genes, their corresponding proteins and the reactions catalyzed by the proteins. The Excel files of the models were converted to SBML format to be fully compatible with COBRA Toolbox version 3.0 (https ://openc obra.githu b.io/cobra toolb ox/stabl e/) 72 . Because no detailed information on the biomass composition of RA has been found, the biomass composition was assembled in the Model SEED, which accounts for DNA, RNA, amino acids, nucleotides, cell wall and cofactors. DNA was calculated from the genome sequence of RA-CH-1 and RA-CH-2. DNA coefficients are calculated by first computing the GC content of the chromosome. Then the molar fractions of the deoxynucleotides are set according to the GC content. The other metabolites and their coefficients are approximations garnered from E.coli model iAF1260 73 . The metabolites (e.g., lipids, cofactors, cell wall components) are included only if the genome annotation includes evidence for functional roles associated with the biosynthesis or utilization of the metabolites. Lack of fully connected metabolic pathways may lead to contain multiple gaps due to incomplete or inconsistent annotations, the draft metabolic models were checked as a gap-filling process in TSB medium to allow biomass formation. After the gap-filling procedure, the mass balance of all reactions in the RA models was checked with elementary and charge balance by Check Model Mass Balance app in KBase (https ://www.kbase .us/) 74 . To predict simulating biomass production, flux balance analysis (FBA) was used to verify the growth rate and the rate of production in the computational medium. The information of the medium in silicon used for FBA was listed in Supplementary  Table S9. GC-MS data processing. For the metabolite identification, the Agilent GC-MS 5975 Data Analysis software was used to convert the raw GC-MS data into NetCDF format. The peaks were identified, filtered and Scientific Reports | (2021) 11:616 | https://doi.org/10.1038/s41598-020-79733-w www.nature.com/scientificreports/ aligned via XCMS software version 1.42.0 using XCMS's default settings with the following changes: xcmsSet (fwhm = 3, snthresh = 3, mzdiff = 0.5, step = 0.1, steps = 2, max = 300), group (bw = 2, minfrac = 0.3, max = 300) 75 . Metabolites were annotated using the Automatic Mass Spectral Deconvolution and Identification System (AMIDS) version 2.73 based on Wiley Registry and National Institute of Standards and Technology (NIST). Metabolites were confirmed via comparison of mass spectra and retention indices to the Golm Metabolome Database (GMD, http://gmd.mpimp -golm.mpg.de/) using a cut-off value of 70%. The signal integration area of each metabolite was normalized to the internal standard (Nonadecylic acid and d4-Alanine) for each sample.
Statistical analysis. The normalized data were checked by mean-centering and unit variance (UV) scaling methods before the multivariate statistical analysis. The PCA and PLS-DA were applied to the normalized samples of the intracellular metabolite by the Simca-P version 13.0 software (Umetrics, Kinnelon). Cross-validation was used to check the quality of PCA and PLS-DA models, and 100 random permutations testing were carried out to guard against the over-fitting of PLS-DA models. The discriminating metabolites were obtained using a statistically significant threshold of variable influence on projection (VIP > 1.0). Values obtained from the PLS-DA model were validated via the Mann-Whitney-Wilcoxon test (p < 0.05). The metabolites with the values of p < 0.05 and VIP > 1.0 were chosen as discriminating metabolites between RA-CH-1 and RA-CH-2. HCA was performed using Euclidean distance methods and visualized by the pheatmap package version 1.0.8 in the R language. Metabolite correlation was assessed using the Pearson Correlation Coefficient and corresponding p values were also calculated using the Cor. test version 3.2.1 function in R. Identified metabolites were mapped onto general biochemical pathways according to the annotation in KEGG [76][77][78] . Pathway Activity Profiling (PAPi) algorithm was used to predict and compare the relative activity of different metabolic pathways by PAPi package version 1.14.0 in R.

Data availability
The genome sequences described in this manuscript have been submitted to the National Center for Biotechnology Information (NCBI) under accession codes PRJNA172646 (whole genome and assembly data of RA-CH-1) and PRJNA183917 (whole genome and assembly data of RA-CH-2). Raw spectral data for measurement of RA-CH-1 and RA-CH-2 by GC-MS were uploaded to the Open Science Framework [Doi: https ://doi. org/10.17605 /OSF.IO/X6AMD ]. The datasets generated during and/or analyzed in the current study are available from the corresponding author on reasonable request.