Comparative in-silico proteomic analysis discerns potential granuloma proteins of Yersinia pseudotuberculosis

Yersinia pseudotuberculosis is one of the three pathogenic species of the genus Yersinia. Most studies regarding pathogenesis of Y. pseudotuberculosis are based on the proteins related to Type III secretion system, which is a well-known primary virulence factor in pathogenic Gram-negative bacteria, including Y. pseudotuberculosis. Information related to the factors involved in Y. pseudotuberculosis granuloma formation is scarce. In the present study we have used a computational approach to identify proteins that might be potentially involved in formation of Y. pseudotuberculosis granuloma. A comparative proteome analysis and conserved orthologous protein identification was performed between two different genera of bacteria - Mycobacterium and Yersinia, their only common pathogenic trait being ability to form necrotizing granuloma. Comprehensive analysis of orthologous proteins was performed in proteomes of seven bacterial species. This included M. tuberculosis, M. bovis and M. avium paratuberculosis - the known granuloma forming Mycobacterium species, Y. pestis and Y. frederiksenii - the non-granuloma forming Yersinia species and, Y. enterocolitica - that forms micro-granuloma and, Y. pseudotuberculosis - a prominent granuloma forming Yersinia species. In silico proteome analysis indicated that seven proteins (UniProt id A0A0U1QT64, A0A0U1QTE0, A0A0U1QWK3, A0A0U1R1R0, A0A0U1R1Z2, A0A0U1R2S7, A7FMD4) might play some role in Y. pseudotuberculosis granuloma. Validation of the probable involvement of the seven proposed Y. pseudotuberculosis granuloma proteins was done using transcriptome data analysis and, by mapping on a composite protein-protein interaction map of experimentally proved M. tuberculosis granuloma proteins (RD1 locus proteins, ESAT-6 secretion system proteins and intra-macrophage secreted proteins). Though, additional experiments involving knocking out of each of these seven proteins are required to confirm their role in Y. pseudotuberculosis granuloma our study can serve as a basis for further studies on Y. pseudotuberculosis granuloma.

functional characterization of common orthologs of Yps and Mycobacterium spp. and their probable involvement in granuloma formation. A comparative analysis of protein conservation in the three Mycobacterium spp. and four Yersinia spp. revealed that seven Yps proteins were present in the three Mycobacterium spp. but absent in other species of Yersinia. Since, the only common feature in the three Mycobacterium spp. and Yps is their capability to form macro-granuloma, it might be anticipated that these seven proteins might play a potential role in granuloma formation (Table 2). To validate the role of these proteins in Yps granuloma, a detailed functional characterization of all the seven proteins was performed using UniProtKB, STRING and KEGG databases. It was observed that, of the seven proteins, two proteins were functionally uncharacterized, while functions of five proteins were known. The details of the seven proteins with their UniProt ids, interaction partners and pathway information analysis are presented in Table 3.    85 . '*' represent proteins shared among all seven proteomes, '$' and '#' shows intra-genus conserved proteins of Mycobacterium spp. and Yersinia spp respectively. www.nature.com/scientificreports www.nature.com/scientificreports/ However, the expression of remaining two proteins (UniProt id: A0A0U1QT64 and A0A0U1QTE0) could not be ascertained.
Validation of the role of identified Yps granuloma proteins using experimentally identified Mtb granuloma proteins. Earlier studies have proved that Mtb RD1 locus proteins 28 , ESAT-6 secretion system proteins 29 and intra-macrophage secreted proteins 30 play an important role in the formation and regulation of granuloma. To ascertain the probable involvement of the seven proposed Yps proteins in Yps granuloma www.nature.com/scientificreports www.nature.com/scientificreports/ formation, we constructed a composite protein-protein interaction (PPI) network map of the Mtb RD1 locus proteins, ESAT-6 secretion system proteins and intra-macrophage secreted proteins. The orthologs of the proposed Yps granuloma proteins in the Mtb proteome were mapped on the composite PPI network. Interestingly, all the mapped proteins showed moderate to strong connections with other proteins of the composite PPI network (Fig. 4).

Discussion
Identification of the orthologous protein(s)/gene(s) is a useful method for determining relatedness among different taxonomic groups viz. genera, species and strains. In the present study this approach was used to identify Yps proteins, which might be involved in granuloma formation. A comparative in-silico analysis of the conserved orthologs of Yen, Ype, Yps, Yfr, Map, Mbov and Mtb proteomes was performed to predict proteins that might be involved in Yps granuloma formation. Initially, we analyzed the genomic and proteomic relatedness among the seven species. At genomic level, analysis was done using pair-wise comparison of ANI values, which is routinely used as a measure of overall similarity between two genome sequences 31 . Results of the present study reiterated the results of previous phylogenetic studies, that the number of shared homologs between different organisms is directly proportional to their evolutionary relatedness 32 . At the proteome level, comparison was done on the basis of pair-wise conservation of orthologs. Our results indicated that the evolutionary relatedness at both genomic ( Fig. 1a) and proteomic levels ( Fig. 1b) remained the same. A higher number of conserved proteins in Mbov and Mtb reiterated the close relationship between the two species of Mtb-complex. Our results were also in-line with the 16S rRNA gene sequences based phylogenetic studies on Mbov and Mtb 33 . Also, the number of proteins shared by Mbov and Mtb, with Map (atypical mycobacteria) was less than the proteins shared between Mbov and Mtb. This was similar to the earlier reports based on 16S rRNA based phylogenetic study 34 . During pair-wise comparison of conserved proteins among the four species of Yersinia, Ype and Yps shared maximum number of proteins. Similar to Mtb and Mbov, a large number of conserved proteins in Ype and Yps can be attributed to their phylogenetic proximity 3 . An earlier study has also reported that Ype has evolved from Yps 35 , which might be a probable reason behind their closeness. On the other hand Yfr, which is an opportunistic pathogen 36 , shared more proteins with Yen than with Ype or Yps. Earlier phylogenetic studies using multi locus sequence typing have also shown that Ype and Yps belonged to the same cluster and, Yen and Yfr clusters were close to each other 32 . This further confirms their relatedness at the proteome level. We also observed an interesting pattern of shared orthologs of Map with Yen and Yfr. The number of shared orthologs of Map with Yen and Yfr was more, than with Ype and Yps. This might have happened because Map, Yen and Yfr cause gastrointestinal infections and hence occupy the same niche which might have resulted in horizontal transfer of genes among them 37 .
The enrichment of sulfur-containing amino acids and metabolic proteins in intra-genus protein cluster of Mycobacterium spp. indicates their importance in survival of Mycobacterium spp. Earlier reports also suggested that sulfur containing amino acids help Mtb in sustaining the oxidative stress, nutrient starvation and, in dormancy adaptation 38,39 . Due to presence of sulfur-containing amino acid synthesis pathway proteins exclusively in Mycobacterium spp., this pathway was also proposed as a potential target candidate for anti-TB therapy 40 . Analysis of intra-genus protein cluster containing proteins of Yersinia spp. revealed conservation of proteins involved in reproduction and mitotic cell cycle. This showed that except for a few, functions of most of the proteins were  www.nature.com/scientificreports www.nature.com/scientificreports/ conserved in both inter-and intra-genus orthologous protein clusters. This also indicates that though sequences of core proteins might have diverged, functionality is retained during evolution. Also, proteins which were unique to a single species and whose orthologs were absent in other species were considered as unique proteins. No significant functional enrichment in species-specific unique proteins was observed because they might belong to different biological pathways, involved in diverse molecular function and reside in different cellular component.
The main objective of this study was to identify proteins that might help in Yps granuloma formation. Therefore, the function of the seven Yps proteins, which were common in Yps and the three Mycobacterium spp. were critically analyzed to investigate their probable role in Yps granuloma formation. The protein with the UniProt id A0A0U1QT64 was an uncharacterized protein with an ATPase domain. Since ATPase domains are capable of unfolding the protein substrates, hence proteins harboring ATPase domains are known to be involved in protein degradation. ATPase domains are also essential for intracellular protein degradation because macromolecular assemblies, for e.g. proteasome machinery, confine their proteolytic and protease activity in an inner nano-compartment which is accessible only to the unfolded protein substrates 41 . This suggests that proteolytic machinery might be functionally linked to unfolding machinery (AAA -ATPases domain proteins) and are preserved throughout evolution 42,43 . In Mtb, proteostasis network provides protection from different stresses and host immunity. The machinery used for this comprises a complex network of chaperones, proteases, and a eukaryotic-like proteasome (functionally linked AAA -ATPases domain protein family) which helps in evading the host immunity by maintaining the integrity of the mycobacterial proteome 44 . Besides this, AAA -ATPases domain protein family also play a significant role in recognition of ESAT-6 secretion system (ESX-1) secreted virulence factors 45 , which is a type VII secretion system of Mtb and is capable to form pores and rupture phagosomes 46 . This leads to cell toxicity, necrosis and ultimately cell death 47 . On the basis of the functional role of constituent domains in different organism, it can be inferred that this protein might play a probable role in formation of Yps granuloma.
The protein with the UniProt id A0A0U1QTE0 is a functionally uncharacterized protein with a formylglycine-generating sulfatase enzyme domain. Such proteins are reported to be involved in ergotheonine (EGT) synthesis which is a histidine-derived thiol 48 . It reportedly enabled the pathogens in withstanding the host hostile environment during initial phase of infection 49 . EGT-containing proteins are present only in prokaryotes, while plants and animals (including humans) do not produce EGT 50 . Also, macrophages with EGT show an increased cytokine production that enhances Th17 polarization of CD4 + T cells. Therefore, it acts as TLR agonist   51 and show immune enhancing property, which causes more cells to come into contact. This indicates that EgtB might be involved in attracting more cells to the site of granuloma formation and thereby help in the process of granuloma formation.
The protein with UniProt id A0A0U1QWK3 was a protein of ABC transporter family (yadG), an integral membrane protein responsible for active transport of ligands across biological membranes 52 . This ABC transporter, ATP-binding protein-encoding gene is present as a pseudogene in Ype (a closely related species of Yps) but is active in Yps 53 . These transporters couple ATP hydrolysis for the uptake and efflux of solutes across the membrane in both bacterial and eukaryotic cells. These are considered as important bacterial virulence factors due to their role in nutrient uptake, secretion of toxins and antimicrobial agents in the host 54 . In Yersinia and Mycobacterium iron uptake is important for infection and survival in host macrophages 55,56 . Also, ABC transporter system of Mycobacterium is similar to the Yersinia YbtPQ system 56 . Since, ATP binding cassette transporter proteins are also enriched in tubercular granuloma 30 , it indicates that these proteins might also play an important role in Yps granuloma The protein with the UniProt id A0A0U1R1R0 is an enzyme, 5-carboxymethyl-2-hydroxymuconate semi aldehyde dehydrogenase (hpaE). The gene encoding this protein is also annotated as a pseudogene in Ype but is actively expressed in Yps 53 . Aldehydes are highly reactive chemical moiety that triggers oxidative stress in both prokaryotes and eukaryotes, that makes them toxic for cells. Enzymes with aldehyde dehydrogenase domain (ALDHs) play an important role in metabolism of both endogenous and exogenous aldehydes. Earlier studies have shown an increased production of ALDHs to cope up with environmental and chemical stress in bacteria 57 . Both Yersinia and Mycobacterium are intracellular pathogens; hence these bacteria have to combat oxidative stress inside the cell. Previous reports also indicated oxido-reductase enzymes were present in tubercular granuloma 30 . This suggests that this protein might also play an important role in Yps granuloma.
The protein with the UniProt id A0A0U1R1Z2 belongs to the glycosyl hydrolase 10 family of proteins (DJ40_3168). Proteins containing domains of glycoside hydrolase family are present in cellulases (glycoside hydrolases). These proteins play a crucial role in degrading plant cellulose and bacterial cell walls 58 . It has been reported that for transforming from disease causing active state to persistent stage, Mtb dissolves the polysaccharide biofilm in the mammalian host 59 . This also indicates the role of glycoside hydrolases in Mtb virulence. Linkage of Mtb persistence to biofilm also indicates that this protein might also have an important role in Yps granuloma.
The protein with the UniProt id A0A0U1R2S7 is a bacterial regulatory, tetR family protein (yxaF). This gene is also present as pseudogene in Ype but is active in Yps 53 . These proteins contain a TetR DNA-binding, helix-turn-helix (HTH) domain. Proteins containing HTH domains function as DNA-binding transcriptional regulators. These proteins regulate gene expression by binding to the major grove of DNA. These proteins regulate the expression of mycobacterial membrane protein family transporters which are critical for exporting fatty acids and lipidic elements important for mycobacterial virulence 60 . Also, a high rate of lipid transport and metabolism helps in better survival in diverse environments. In Mtb, TetR proteins are found to induce necrosis in lungs 61 . The above-mentioned function of orthologous proteins in Mtb indicates that A0A0U1R2S7 might also play a significant role in necrosis of Yps granuloma.
The protein with the UniProt id A7FMD4 was an enzyme 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (ispH), which is required in DOXP/MEK pathway. The DOXP pathway plays an important role in the pathogenic potential of mycobacterial species. Disruption of DOXP pathway in Mtb hinders its ability to prevent acidification of the phagosome. This results in a decreased potential in intracellular survival 62 . These proteins are present only in pathogenic bacteria, but not in human 63,64 . In M. avium subsp. paratuberculosis, gcpE mutants were reported to be less efficient in tissue colonization in mice or calves 65,66 , which further confirms the importance of this pathway in virulence. Thus, enzymes of the DOXP pathway have also been proposed as a potential drug target against Mtb 67 .
Yersinia and mycobacteria formed two distinct branches on the cladogram drawn on the basis of ANI and PCOP. Despite the differences at genomic and proteomic levels, there are similarities between the granuloma formed by Yps and Mycobacterium spp. For example, Yps granuloma is characterized by a central necrosis (caseation) and micro abscess which is also common in tubercular granuloma 24 . Interestingly, functional enrichment revealed that Yps proteins were involved in lipid, phospholipid, isoprenoid, aldehyde and pyruvate metabolism which is similar to the mechanism of granuloma formation in Mtb 30 . Also, lipid metabolism is associated with caseation of granuloma and dissemination of bacteria in the neighboring tissues and organs and, increases the infectivity of bacteria 68 . Despite some similarities, there are also certain subtle differences between the granuloma formed by Yps and Mycobacterium spp. For example, the chaperones are well known virulence factors in the formation of tubercular granuloma and are required for bacterial virulence, detoxification and adaptation in Mtb 30,69 . But chaperone proteins were absent in Yps granuloma. Interestingly, protein-protein interaction and metabolic pathway mapping of interacting proteins of the seven common proteins of Yps and Mycobacterium spp. revealed that most of these interacting proteins have been proposed as potential drug targets in Mtb (Table 3). Hence, the seven Yps proteins proposed in the present study might be explored as useful drug targets against Yps. Also, an attempt was made to discern if the seven Yps proteins interacted with each other. However, no interaction was observed among these proteins. This might have happened because each of these proteins was involved in a different biological pathway. This also indicates that like Mtb, diverse mechanisms might underlie granuloma formation in Yps 70 .
A comparison of the expression patterns of in vivo and in vitro derived transcriptome analysis revealed that the Yps early phase infection expression pattern was similar to the in vitro expression pattern at 37 °C. Also, the expression pattern of persistent Yps bacteria was approximately similar to that of bacteria grown in vitro at 26 °C 26 . Hence, to validate our findings regarding expression of seven proteins we used the RNAseq expression data derived from Yps during in vitro growth at 26 °C and 37 °C (GSE55292) 26 . We found that out of the seven proteins, five proteins were also expressed during in vitro growth. However, the expression of the remaining two proteins (UniProt ids A0A0U1QT64 and A0A0U1QTE0) could not be ascertained. This might have probably happened because we used Yps strain IP 31758 for our genomic and proteomic analysis, while the transcriptomic data used in our study was based on Yps strain YPIII, which is a plasmid curated strain and, the two proteins, viz. UniProt id: A0A0U1QT64 and A0A0U1QTE0 were present on the plasmid. Also, we failed to find the expression in RNAseq data generated by in vivo derived total RNA samples. This might be due to a very low abundance of Yps transcripts, which ultimately leads to low coverage of Yps ORFs 26 .
Granuloma formation is primarily a host-defence mechanism which restricts the spread of bacteria. However, some pathogens use it as a protective shell to survive till the advent of favourable conditions. The pathogen resumes its activity and starts multiplication when the conditions become favourable. The best known example of a well-studied pathogen and granuloma is Mtb. Experimental studies in Mtb suggested that RD1 locus proteins 28 , ESAT-6 secretion system proteins 29 and proteins of intra-macrophage secretome 30 were mainly involved in Mtb granuloma. Several studies indicated the importance of RD1 locus and ESAT-6 secretion system in Mtb granuloma [71][72][73] . Mtb strains devoid of RD1 proteins failed to induce Mtb granuloma 74 . Thus, it can be inferred that proteins of the RD1 region, ESAT secretion system and intra-macrophage secretome are important for Mtb granuloma formation. Even after an extensive literature survey we could not find study regarding the mechanistic details of granuloma formation in Yps. Hence, we constructed a composite PPI network of proteins encompassing the proteins of the Mtb RD1 locus, ESAT-6 secretion system and intra-macrophage secretome. The orthologs of the proposed Yps granuloma proteins present in the Mtb proteome were mapped on the composite PPI network map. It was interesting to note that all the Mtb proteins mapped on the composite PPI network showed moderate Scientific RepoRtS | (2020) 10:3036 | https://doi.org/10.1038/s41598-020-59924-1 www.nature.com/scientificreports www.nature.com/scientificreports/ to strong connections with other proteins of the network. This, suggested that the seven proteins identified in this study might be important for Yps granuloma formation.
To summarize, using a comparative in silico proteome analysis of Yps with Map, Mbov and Mtb we identified seven proteins that were absent in Yen, Yfr and Ype. The in-silico functional characterization and validation with experimental Mtb granuloma proteins further strengthen our findings that the proposed seven proteins might play some role in Yps granuloma. However, additional experiments involving knocking out of each of these seven proteins are required to confirm their role in Yps granuloma. Additionally, the seven proteins proposed in the present study might not only be the proteins responsible for Yps granuloma and, despite adoption of stringent parameters, many potential granuloma proteins might have been missed. We understand that a detailed functional characterization of Yps proteins is required to unravel the complex mechanisms underlying Yps granuloma formation. Nevertheless, our study provides some useful insights and can serve as a basis for further studies on Yps granuloma.

Materials and Methods
Genomes and proteomes used for analysis. The proteome and genome data sets used in the present study, were obtained from UniProtKB (release 2017_09) 75 (Table 1)

Determination of relatedness and distinctiveness among different species.
To determine the genomic relatedness among the seven bacterial species, Average Nucleotide Identity (ANI) was calculated using OrthoANI 76 . To estimate the evolutionary distance among proteomes of different species, the percentage of conserved proteins (PCOP) was calculated 77 . The values of ANI and PCOP were used to construct the Neighbor-Joining (NJ) tree 78 using MEGA 79 .
Identification of orthologous proteins. All the possible combinations of the seven proteomes were made and, orthologous proteins in each group were identified. To find orthologous proteins we used InParanoid (version 4.1) at default parameters. InParanoid performs reciprocal BLAST and labels protein sequences based on sequence similarity as orthologs 80 . For each ortholog, InParanoid provides bit score in the range of 0.5-1. In this study we considered two proteins as orthologs, if the InParanoid score was ≥0.8.
Clustering of orthologous sequences. On the basis of the number of proteomes in which a set of orthologous proteins was present, orthologous sequences were categorized into mutually exclusive clusters. Proteins of each cluster represented a specific chunk of proteins that was not shared by other clusters. For example proteins in inter-genus ortholog cluster contained proteins that were present in all the seven proteomes. Similarly, intra-genus ortholog cluster contained proteins, which were present only in Yersinia or Mycobacterium. functional enrichment of proteins. Functional annotation of each protein cluster was done by assigning them gene ontology (GO) terms. The GO terms were retrieved from the Gene Ontology Consortium 81 . The functional enrichment of orthologous protein clusters was done using topGO tools (v2.24.0) of Bioconductor package 82 . In the present work, high-level view of GO terms, namely GO-slim terms, was used to determine the enriched functions. These terms were extracted from the GO annotation dataset by GO Slim Mapper OWL Tool (https://github.com/owlcollab/owltools.git). During enrichment, all the proteins present in the seven proteomes were divided into seven broad categories (or "test datasets") and a unique background was used during enrichment of each test-dataset. The enrichment was done on cluster of orthologous sequences (or "test datasets") and a unique background was used during enrichment of each test-dataset.

Set I (inter-genus conserved proteins).
Inter genus conserved set included proteins that were conserved across all the seven proteomes. During functional enrichment of this category of proteins, combined GO-slim terms of all seven complete proteomes was used as background.
Set II (intra-genus conserved proteins). Intra genus conserved set included the proteins present in all species of genus Mycobacterium or Yersinia. During functional enrichment of this group of proteins, collective GO-slim terms of proteome of respective genus was used as background.
Set III (Conserved in Ype and Mycobacterium spp.). This contains the proteins which were common in Ype and the three Mycobacterium spp., The functional enrichment of proteins of Set III proteins were determined against using all four Yersinia spp. as background.
Set IV (Conserved in Yen and Mycobacterium spp.). The functional enrichment of proteins of Yen whose orthologs were present in all Mycobacterium spp. (as test-dataset) were determined against all the four Yersinia spp. (as background).
Set V (Yfr with Mycobacterium spp.). enriched function of Yfr proteins, whose orthologs were present in Mycobacterium spp. (as test-dataset) were determined against all the four Yersinia spp. (as background).
Set VII (with-in the species). This set includes proteins unique to a particular species. To find functional enrichment in these proteins, GO-slim terms retrieved from the complete proteome of the same species were used as the background. characterization of Yps proteins involved in granuloma formation. Since, the aim of the current study was identification of Yps proteins involved in granuloma formation, hence only those proteins of Yps whose orthologs were present in the MTB-complex members, were functionally characterized. For functional annotation a three-step process was followed: (a) domain information of each protein was collected from UniProtKB; (b) the protein and their interaction partners were identified using STRING database (https://string-db.org/) 83 and characterized; (c) the metabolic pathways in which the interacting protein partners were involved were identified using the KEGG database 84 and, (d) information on whether the interacting proteins and/or pathways have been used as drug targets was retrieved from the published literature.
Mapping of Yps predicted proteins on composite PPI interaction network of experimentally identified Mtb granuloma proteins. A composite interaction network of RD1 locus proteins 28 , ESAT-6 secretion system proteins 29 and intra-macrophagic secretome of Mtb 30 was created using STRING database, at confidence score 0.150. The Mtb orthologs of the proposed seven Yps granuloma proteins were mapped on this interaction network.