Elucidating the network features and evolutionary attributes of intra- and interspecific protein–protein interactions between human and pathogenic bacteria

Acharya, Debarun; Dutta, Tapan K.

doi:10.1038/s41598-020-80549-x

Download PDF

Article
Open access
Published: 08 January 2021

Elucidating the network features and evolutionary attributes of intra- and interspecific protein–protein interactions between human and pathogenic bacteria

Debarun Acharya¹ &
Tapan K. Dutta¹

Scientific Reports volume 11, Article number: 190 (2021) Cite this article

2298 Accesses
5 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Host–pathogen interaction is one of the most powerful determinants involved in coevolutionary processes covering a broad range of biological phenomena at molecular, cellular, organismal and/or population level. The present study explored host–pathogen interaction from the perspective of human–bacteria protein–protein interaction based on large-scale interspecific and intraspecific interactome data for human and three pathogenic bacterial species, Bacillus anthracis, Francisella tularensis and Yersinia pestis. The network features revealed a preferential enrichment of intraspecific hubs and bottlenecks for both human and bacterial pathogens in the interspecific human–bacteria interaction. Analyses unveiled that these bacterial pathogens interact mostly with human party-hubs that may enable them to affect desired functional modules, leading to pathogenesis. Structural features of pathogen-interacting human proteins indicated an abundance of protein domains, providing opportunities for interspecific domain-domain interactions. Moreover, these interactions do not always occur with high-affinity, as we observed that bacteria-interacting human proteins are rich in protein-disorder content, which correlates positively with the number of interacting pathogen proteins, facilitating low-affinity interspecific interactions. Furthermore, functional analyses of pathogen-interacting human proteins revealed an enrichment in regulation of processes like metabolism, immune system, cellular localization and transport apart from divulging functional competence to bind enzyme/protein, nucleic acids and cell adhesion molecules, necessary for host-microbial cross-talk.

A host–microbiota interactome reveals extensive transkingdom connectivity

Article 20 March 2024

Next-generation proteomics for quantitative Jumbophage-bacteria interaction mapping

Article Open access 24 August 2023

Higher-order interactions shape microbial interactions as microbial community complexity increases

Article Open access 31 December 2022

Introduction

Pathogen–host interaction is the perfect example of evolutionary arms race where sustained coevolution is continuously shaping the hosts’ and pathogens’ genome and life history characteristics. The success and failure of the development of a disease depend on the survival, reproduction, and transmission of a pathogen into a host, which is countered by the host-resistance and immune system components.

Pathogen–host interactions are better understood from molecular perspectives, where pathogens hijack and manipulate the host’s cellular machinery and immune system components for their growth, thereby establishing a pathogen-host protein–protein interaction (PHPPI) network inside a host¹. In plants, such interactions are mediated by pathogen effectors, which are pathogen proteins, translocated inside host cells and target particular host genes/proteins to interfere with host cellular mechanisms, eventually causing infections². In human-pathogen interaction, proteins from both human and pathogen are involved in the PHPPI network that ultimately leads to either disease progression or elimination of pathogen from the human body. The human protein–protein interaction represents a scale-free distribution, where the majority of the proteins interact with only a few proteins while there are a few proteins that interact with a large number of proteins. Such a distribution increases the robustness of the human PPI network against random pathogen attacks. Therefore, in order to cause pathogenicity, pathogens target particular human proteins (directed attack) for their growth and establishment³. Conversely, the strategy of the human cellular system is to resist the pathogen attack by hindering its growth and ultimately eliminating it, which is mostly mediated by the human immune system components^4,5. Pathogens that evade the immune system can be killed by targeted therapeutics like broad-spectrum or specific antibiotics. However, with the increasing ability of pathogens to evade both the human immune system and antibiotics⁶, it has become more difficult to counter such infectious agents. The human–pathogen interactome is now considered very important for studying pathogenic disease, as it provides crucial information on the virulence factors along with their interactions essential for pathogenicity at the system level^1,7. The accumulation of PHPPI data in the last decade paved the way for system-level analyses with the whole interactome, leading to a better understanding of the pathogenicity, disease progression, and human–pathogen coevolution for a better therapeutic approach to prevent and cure infections.

A detailed analysis of interspecific pathogen–human protein–protein interaction revealed that pathogen proteins mainly interact with proteins having high centrality values in the human PPI network. This includes hubs and bottlenecks, proteins having a high degree and betweenness centrality, respectively^2,8,9. Although both these groups of proteins are functionally important counterparts of the human PPI network, often essential for host survival^10,11, a phenomenon known as “centrality-lethality rule”^12,13, the group that interacts more with pathogen proteins are not known. Additionally, these proteins evolve at a slower rate^14,15,16,17, providing an opportunity for a sustainable host–pathogen interaction for over a long evolutionary time scale, a beneficial event for pathogen species. Moreover, higher connectivity of these pathogen-interacting hub proteins may bring about an increased influence of pathogen protein on the components of the human PPI network. The hubs with their interacting partners, form functional modules, each assigned to a specific function, where they may either act as intramodular or party-hubs (participating in the same functional module with their interacting partners) or intermodular or date-hubs (participating in different functional module with different interacting partners). However, it will be interesting to know which of these hubs interact more often with pathogen proteins, as it can be useful to understand the functional modules that get targeted by the pathogens for their pathogenicity and disease progression.

Most of the human–pathogen interactions are focused on viral infections, where viruses hijack the human transcriptional machinery to synthesize their proteins. The viral proteins evolve in a very sophisticated manner, and their interactions with human proteins often involve short linear motifs (SLiMs) present in the latter^18,19. However, the interspecific PPI data between human and a majority of bacterial pathogens are not comprehensive. Thus, very little is known on human bacteria protein–protein interaction where pathogenic bacteria also interacts with human hubs and hijack the immune system components to evade host immune response¹. In the present study, we explored the attributes of human bacteria protein–protein interaction from three pathogenic bacterial species, Bacillus anthracis, Francisella tularensis, and Yersinia pestis for which large-scale interspecific PPI data is available. All these pathogenic bacteria are enlisted as ‘Category A bioterrorism agents’. In addition, in silico approaches were undertaken to understand various aspects of the human–bacteria protein–protein interaction network and its participants, to better understand the mechanism of pathogenicity and disease progression.

Results and discussion

Hubs and Bottlenecks in pathogen-interacting and non-interacting human proteins

The human–bacteria protein–protein interaction networks for three bacterial pathogens, namely Bacillus anthracis, Francisella tularensis, and Yersinia pestis were analyzed to understand the network features of bacterial protein-interacting human proteins. In general, the protein–protein interaction (PPI) data contains many false positives and false negatives. Here, we selected three bacterial species for this study that have the highest number of interspecific interactions with human proteins verified by multiple databases. Additionally, the PPI data is not yet comprehensive and therefore, all the interpretations are made from the currently available data. It has been previously reported that the pathogen proteins mainly interact with the highly connected host proteins (host-hubs)^1,20. In this study, we classified the human proteins into four groups: (a) not-interacting with any bacterial pathogen, (b) interacting with only one pathogen, (c) interacting with only two pathogens and (d) interacting with all three pathogens. The human protein–protein interaction network was constructed using the PICKLE database, where the PPIs supported by any two of four widely used PPI databases (BIOGRID²¹, MINT²², HPRD²³, DIP²⁴ and IntAct²⁵) were considered as true-interaction. The final data contain 11,815 proteins involved in 61,273 high-quality interactions, representing a little less than half of the human proteome. Comparing the proportion of hubs, it has been observed that the pathogen-interacting human proteins correspond to a higher proportion of hubs and bottlenecks than that of the non-interacting group (Supplementary Table S3). The pathogen-interacting proteins also have higher mean interacting partners (degree centrality) than that of the non-interacting group with respect to both hubs and nonhubs. Additionally, human proteins that interact with more bacterial pathogens have a higher proportion of hubs and higher mean interacting partners than those interacting with fewer pathogens (Table 1). This suggests that pathogenic proteins preferentially target human hubs and bottlenecks that comprise functionally most important proteins in the human protein interaction network, which in turn, may damage the functional implication of the network. The high degree centrality of pathogen-interacting human proteins may also ensure the pathogens’ establishment within the human host via its control over a broad range of target human proteins. When human proteins were classified into hub-bottlenecks, hub-nonbottleneck, nonhub-bottleneck, and nonhub-nonbottleneck based on these two centrality measures, the highest proportion of pathogen-interacting proteins was obtained in the hub-bottleneck class. More interestingly, the hub-nonbottleneck and nonhub-bottleneck possess no significant difference, which indicates that hubs and bottlenecks are equally targeted by proteins of these pathogens (Fig. 1).

Table 1 Proportion of hubs and bottlenecks in human proteins based on their interactions with bacterial pathogens.

Full size table

Moreover, the whole protein interaction network can be subdivided into many functional modules, with each distinct module representing a specific function. Based on modularity, the hubs which belong to the same functional module as their interacting partners are known as intramodular hubs or party hubs, and those having interacting partners that belong to different functional modules are known as intermodular hubs or date hubs. To evaluate the preferential interaction of pathogen proteins with any one class of these hubs, the human party- and date hubs were identified using co-expression values of human proteins and their interacting partners and their interacting interface (see “Materials and methods”). Based on the above, the proportion of party hubs was found to be significantly higher in pathogen-interacting proteins, signifying pathogen proteins target some of the functional modules for their benefit (Table 2).

Table 2 Proportion of party-hubs and date-hubs in pathogenic bacteria-interacting and non-interacting human proteins.

Full size table

Hubs and Bottlenecks in human-interacting and non-interacting bacterial proteins

The scale-free network topology follows power-law node degree distribution, comprising a few nodes with a higher degree centrality than many other nodes. Such a network is resilient against random-attacks, which applies to human as well as pathogenic bacteria alike (Supplementary Fig. 1). In order to disrupt the human PPI network, the pathogen proteins need to act against particular human proteins via non-random directed interactions. The pathogenic proteins with high degree centrality may be potential candidates involved in such disruption, due to their inherent property of high interaction ability. To explore this further, we subdivided the pathogen proteins into hubs or nonhubs based on their degree centrality and bottlenecks or nonbottlenecks, based on betweenness centrality (see “Materials and methods”). Following this classification, the network properties of human-interacting and non-interacting pathogen proteins were explored and it was observed that the bacterial proteins which interact with human proteins are significantly enriched in bacterial hubs and bottlenecks in the bacterial PPI network. These hub proteins also have higher mean interacting partners (Table 3), indicating that the human-interacting pathogen proteins have the potential to interact with multiple type of proteins in the intraspecific PPI network, which may facilitate in interspecific host–pathogen interactions.

Table 3 Proportion of hubs and bottlenecks in bacterial pathogens’ PPI network in human-interacting and non-interacting proteins.

Full size table

Gene essentiality of pathogen-interacting human proteins

Genes indispensable to the survival and reproduction of an organism are considered as essential genes^26,27. Proteins encoded by such genes are associated with vital molecular functions and are under strong purifying selection. It had been observed that the pathogen-interacting proteins comprise a higher proportion of essential proteins, which however, maybe due to their enrichment among hubs^10,28. Moreover, when we considered hub and nonhub proteins separately, the pathogen-interacting proteins were found to be enriched in essential proteins for both groups, suggesting that these deadly pathogens may disrupt vital functions of the host, thereby facilitating pathogenicity and disease progression (Fig. 2).

Evolutionary rates of pathogen-interacting and noninteracting human proteins

The evolutionary rate of proteins depicts the change in its amino acid sequence over time. As hubs are evolutionarily more conserved than nonhubs and also enriched with pathogen-interacting proteins, they are supposed to reveal a slower evolutionary rate. However, very little is known regarding the differences in evolutionary rate between pathogen-interacting and noninteracting hubs. Considering pathogen-interacting/-noninteracting hubs/nonhubs, a comparison of the evolutionary rate as dN/dS ratio using 1:1 Mouse and Chimpanzee orthologs²⁹ revealed a slower evolutionary rate in hub proteins. Nevertheless, among the pathogen-interacting and noninteracting hubs, the former shows a slower evolutionary rate (Fig. 3), suggesting that the evolutionarily more conserved hubs are more likely to be targeted by pathogens. It is, however, beneficial from the pathogens’ perspective, as it may allow an efficient pathogen–host protein–protein interaction throughout large evolutionary time-scale.

Intrinsic disorder of pathogen-interacting and noninteracting human proteins

Functional implication of protein is always mediated by its proper three-dimensional configuration. However, there are certain amino acid residues or stretches in proteins’ sequence, which do not let a protein fold into a definite conformation, and under such a situation, its associated flexibilities often facilitate in imparting productive protein–protein interactions. Such residues/regions on a protein are known as intrinsically disordered residues/regions. Intrinsically disordered proteins, naturally, lack distinct three-dimensional structure but can adopt definite conformation upon their interaction with other proteins, facilitating low-affinity interactions with high-specificity³⁰. Proteins that are highly connected in a network of proteins are usually rich in these regions³¹, which may play an important role in the interactions between host and pathogen proteins. Although bacterial proteins are less disordered than the human proteins^32,33, the disordered regions in human proteins are supposed to be utilized by the bacterial pathogens as potential regions for interaction. To address the same, IUPred algorithm was used to identify the disordered residues in pathogen-interacting and non-interacting proteins³⁴. The proportion of disordered proteins (P_disordered) in the pathogen-interacting proteins is significantly higher than the non-interacting proteins (P_{disordered_interacting} = 59.73, N_interacting = 2677, P_{disordered_noninteracting} = 49.07, N_{noninteracting} = 9136, Z = 9.706, P < 1.00 × 10⁻⁴), suggesting that they may play an important role in pathogen–host interactions. Additionally, when the total number and percentage of disordered regions and residues of individual proteins were considered, we found that pathogen interacting proteins have a higher number and mean percentage of long disordered regions and disordered residues (Supplementary Table S4), indicating human proteins with intrinsically disordered regions and residues are more prone to pathogen-attack. However, as smaller disordered segments can also be important for interaction, therefore we also considered the proteins having ≥ 15 residue long disordered stretches, which gives a consistent result (Supplementary Table S4).

To further strengthen the claim as stated above, the number of interacting pathogen proteins for each of the three bacteria were calculated for each human protein and it appears to hold a significant positive correlation with the amount of disorder content present in the human protein (Supplementary Table S5). When the human proteins were binned based on their disorder content into five bins (see “Materials and methods”), it was observed that the proportion of pathogen-interacting genes increases gradually with increasing disorder content up to 80% (Fig. 4). Together, these results suggest that the protein intrinsic disorder plays a major role in the host–pathogen interactions.

Molecular recognition features (MoRFs) in pathogen-interacting and noninteracting human disordered proteins

We also considered the Molecular Recognition Features or MoRFs, which are 5–25 residues long specialized elements located within the disordered regions of proteins that undergo disorder to order transition upon binding with their respective interacting partners. Here, to understand whether the disordered regions in pathogen interacting human proteins can serve as the disordered protein binding sites for pathogen proteins, we explored the MoRFs within the human disordered proteins, using the fMoRFpred³⁵ webserver. The pathogen interacting human proteins were found to be rich in molecular recognition features (MoRFs) than the noninteracting counterpart (MoRF_regions_interacting = 1.017, MoRF_regions_{noninteracting} = 0.931, P = 3.949 × 10⁻² ; MoRF_residues_interacting = 15.035, MoRF_residues_{noninteracting} = 12.765, P = 3.718 × 10⁻⁹, Mann–Whitney U test, N_interacting = 1599, N_{noninteracting} = 4472), suggesting that pathogen-interacting human proteins are more enriched in these regions, which may favour the interspecific protein–protein interaction.

Protein domains in pathogen-interacting and non-interacting human proteins

Although, protein intrinsic disorder facilitates protein–protein interaction by providing flexibility to the proteins’ structure³⁶, protein domains, the most conserved and functionally essential part of a protein serve a distinct role in such interaction³⁷. More specifically, the protein–protein interaction can be viewed as interaction between domains of different proteins. Therefore, proteins with a greater number of domains may have a higher probability of interaction with other proteins. To study the influence of protein domains on human-bacteria interaction, the mean number of domains of pathogenic bacteria interacting- and noninteracting-human proteins were calculated using Interpro repository³⁸. It was observed that the pathogen-interacting proteins contain a higher number of domains than that of the noninteracting ones (P = 6.73 × 10⁻¹⁶, Mann–Whitney U test). Moreover, the higher number of domains in pathogen-interacting human proteins may be attributable to the abundance of hubs within them. Thus, we divided the data into hubs and nonhubs. Interestingly, within both hubs and nonhubs, the pathogen-interacting proteome has a higher number of domains (P_hub = 8.60 × 10⁻⁵, P_nonhub = 6.58 × 10⁻⁷). Additionally, the proteins interacting with more pathogens hold a higher number of protein domains (P = 2.41 × 10⁻¹⁵, Kruskal–Wallis test) (Fig. 5). This suggests that proteins with a higher domain number have a higher probability of interaction with pathogen proteins, facilitated via interspecific domain–domain interaction.

Functional enrichment analysis of pathogen-interacting proteins

The association of party hubs with pathogen proteins indicates that these bacterial pathogens mostly target particular functional modules of human proteome for the establishment of pathogenicity and progression of the disease. For a detailed insight, the functional enrichment of the pathogen-interacting human proteins was studied using the Humanmine³⁹ and Gorilla⁴⁰ webservers. The top 10 enriched Gene Ontology (GO) terms matched in both the datasets were observed for both the GO domains, ‘Biological Process’ and ‘Molecular Function’ (Supplementary Table S6). The pathogen-interacting proteins were revealed to be enriched in processes like regulation of biological/cellular processes, cellular localization, immune system, interspecies interaction between organisms, regulation of cellular (metabolic) processes, regulation of nitrogen compound metabolic processes, regulation of primary metabolic processes, and vesicle-mediated transport processes. These proteins were also shown to be enriched in functions like RNA binding, enzyme/protein binding, nucleic acid binding, protein-containing complex biomolecule binding, cadherin binding, cell adhesion molecule binding, transcription factor binding, chromatin binding, and kinase binding. The above functional enrichment clearly suggest that during pathogenesis, these pathogens primarily regulate the processes related to immune system, cellular localization and transport, apart from influencing the binding of host macromolecules and cell-adhesion molecules, necessary for host-microbial cross-talks.

Materials and methods

Protein–protein interaction datasets

The human–bacteria protein-interaction data for the three bacterial species namely Bacillus anthracis, Francisella tularensis, and Yersinia pestis were obtained from four well established host–pathogen interactome databases: APID (Agile Protein Interactome Dataserver, http://cicblade.dep.usal.es:8080/APID/init.action#tabr1⁴¹; MENTHA, https://mentha.uniroma2.it/⁴², HPI-DB (Host Pathogen Interaction Database), http://hpidb.igbb.msstate.edu/index.html⁴³ and PHISTO (Pathogen Host Interaction Search Tool), http://www.phisto.org/browse.xhtml⁴⁴. The binary interactions reported in no less than three of the four databases were used in this study as the pathogen-interacting human proteins. The human proteins and their sequences were obtained from Uniprot (https://www.uniprot.org/)⁴⁵. The human proteins with no reported interaction with none of the pathogen protein in either of the databases were considered as pathogen-non-interacting proteins (Supplementary Table S1).

The human PPI data was obtained from PICKLE (Protein InteraCtion KnowLedgebasE) (www.pickle.gr)⁴⁵, which combines all the globally used protein–protein interaction database like BIOGRID²¹, MINT²², HPRD²³, DIP²⁴ and IntAct²⁵. We removed all the self-interactions and considered interactions supported by at least two of these databases for our study⁴⁵.

The within-species PPI data of all three bacterial pathogens were obtained from the STRING database (https://string-db.org/)⁴⁶, considering the experimentally validated interactions only. The STRING IDs were annotated to Uniprot IDs using the annotation file present in the STRING database. Reciprocal BLAST with 100% sequence identity and e-values < e⁻¹⁰ BLAST parameters was used to determine the orthologous proteins of two different pathogen strains belonging to the same species as available in pathogen–PPI and pathogen–human PPI databases. The final dataset consists of 122,546 Homo sapiens binary interactions involving 11,833 proteins, 277,210 B. anthracis binary interactions involving 3285 proteins, 53,614 F. tularensis interactions involving 1167 proteins and 135,090 Y. pestis binary interactions involving 2872 proteins. We analyzed each network using the Network Analyzer plugin of Cytoscape (version 3.7.1) to get the degree and betweenness centrality. The node degree of all the species shows power-law distributions (Supplementary Fig. S1). We subdivided the proteins of each species into hubs and nonhubs depending on their degree centrality. The top ~ 20% proteins of the node degree distribution having the highest number of interacting partners were considered as hubs, while the rest as nonhubs, according to the 20–80 rule of power-law distributions (Pareto principle)⁴⁷. Similarly, we classified the proteins into bottlenecks (proteins that are central to many paths in the network) and non-bottlenecks considering the proteins representing the top ~ 20% of betweenness centrality as bottlenecks and the rest as non-bottlenecks (Supplementary Table S2).

Party-hubs and date-hubs

For the determination of human party- and date-hubs, human gene expression data were obtained from the Human Protein Atlas⁴⁸, which contains tissue-wise RNA levels (TPM) for 37 tissues, namely the adipose tissue, adrenal gland, appendix, bone marrow, breast, cerebral cortex, cervix/uterine, colon, duodenum, endometrium, epididymis, esophagus, fallopian tube, gallbladder, heart muscle, kidney, liver, lung, lymph node, ovary, pancreas, parathyroid gland, placenta, prostate, rectum, salivary gland, seminal vesicle, skeletal muscle, skin, small intestine, smooth muscle, spleen, stomach, testis, thyroid gland, tonsil, and urinary bladder. For each interacting protein pair, the RNA levels of both the partners were correlated using the Pearson correlation coefficient (PCC). The mean PCC values for all the partners of the hub proteins were used to classify the hub further into party-hubs and date-hubs⁴⁹. We have used PRISM⁵⁰ webserver to confirm that no two interacting partners of a party hub share the same interacting surface with the latter. The hubs having a mean PCC value ≥ 0.5 were considered as party hubs and those having a PCC value < 0.5 were considered as date hubs⁵¹. We have also used mean PCC value of all proteins as the cutoff to select party-hubs (above mean) and date-hubs (below mean)¹⁴.

Human essential genes

Genes essential for human survival and reproduction, collectively known as essential human genes, were obtained from three recent experiments based on gene trap mutagenesis⁵² and high-resolution CRISPR-screening^53,54. Human genes (and their encoded proteins) considered as essential or nonessential in all the three screenings were considered as essential and nonessential, respectively. The final data consists of 768 essential and 8080 nonessential human proteins.

Evolutionary rate

For the calculation of evolutionary rate of human proteins, the nonsynonymous nucleotide substitutions per nonsynonymous site (dN) and synonymous nucleotide substitutions per synonymous site (dS), were obtained from the Ensembl biomart⁵⁵, using 1:1 mouse and chimpanzee orthologs for each human protein. The mutation saturation was controlled by discarding dS values greater than 3 and the dN/dS ratio was used as evolutionary rate²⁹.

Intrinsically disordered proteins

We used IUPred algorithm to predict the intrinsically disordered regions in the protein sequence. In IUPred, each amino acid residue is given a probability score based on its pairwise energy profile with respect to its interaction with other residues along the protein sequence. Residues with scores ≥ 0.50 are considered as disordered and < 0.50 as ordered³⁴. We have downloaded the ‘reviewed’ human protein sequence from Uniprot (Accession UP000005640). We discarded all proteins with < 30 amino acid residues. Proteins with a continuous stretch of ≥ 30 disordered residues were considered as proteins with long intrinsically disordered regions. We have calculated the number of these disordered stretches, the proportion of residues in the long-disordered stretches, the total number of disordered amino acid residues and the proportion of disordered amino acid residues for each human protein. Following Panda et al. 2017⁵⁶, human proteins were classified into five groups based on their disorder content: A, Ordered (having 0–20% disordered amino acid residues); B, Moderately disordered (having 20–40% disordered amino acid residues); C, Disordered (having 40–60% disordered amino acid residues); D, Highly disordered (having 60–80% disordered amino acid residues) and E, Extremely disordered (having 80–100% disordered amino acid residues).

Molecular recognition features

The Molecular recognition features (MoRFs) were obtained from fMoRFpred³⁵ webserver. We have selected MoRF regions of ≥ 5 residues and calculated the number of such MoRF regions and total MoRF residues for our study.

Protein domains

The Ensembl biomart⁵⁵ was used to obtain the interpro³⁸ domains of human proteins.

Functional enrichment analysis

The functional enrichment analysis was carried out using the Gene Ontology⁵⁷ based on Humanmine³⁹ and Gorilla⁴⁰ web-servers. The gene ontology terms under different Gene Ontology domains like GO biological process and GO molecular function were used to determine the overrepresented biological processes and molecular functions of pathogen-interacting human proteins. The P-values determining the overrepresented GO terms were corrected using Benjmini-Hochberg correction. The top ten GO biological process and GO molecular function terms represented in both datasets were used as overrepresented GO terms.

Statistical analyses

All the statistical analyses in this study have been done using in-house PERL script (for Z-test to compare percentages in different samples) and IBM SPSS 22 statistical package (for all other statistical tests)⁵⁸.

Conclusions

Recent developments of high-throughput interspecific protein–protein interaction data paved the way for host–pathogen interaction studies to understand detailed aspects of pathogenicity, leading to the development of platforms for host-directed therapeutic research. In this study, we explored the attributes of the human–bacteria protein–protein interaction (PPI) network from the available large-scale interspecific interactome data of three bacterial species, Bacillus anthracis, Francisella tularensis and Yersinia pestis, for which large-scale high-throughput intraspecific and interspecific PPI data are available. It was observed that the central proteins within intraspecific human and bacterial interactome preferentially participate in human-bacteria interaction. This includes hubs and bottlenecks of both human and bacterial PPI networks. Additionally, within human hubs, party-hubs participate in the interspecific PPI network more often than that of date hubs. It was also revealed that these pathogens preferentially interact with human essential proteins, both within hubs and nonhubs, thereby assisting in disease progression. From evolutionary perspective, these bacterial pathogens interact with evolutionarily more conserved human proteins, leading to a sustainable interaction, helpful for pathogen species. A detailed analysis of host proteins’ structural features revealed that the pathogen-interacting human proteins contain a higher number of protein domains and an abundance of intrinsically disordered residues and regions, which are likely to assist human-bacteria interaction by promoting high-affinity and low-affinity protein–protein interactions, respectively. Furthermore, the functional enrichment in pathogen-interacting human proteins revealed an enrichment of proteins involved in various biological processes, including catalytic functions related to the binding of several biomolecules. These enriched proteins are supposed to regulate essential metabolic and immune system processes, cellular localization, and transport and also influence the binding of host macromolecules and cell-adhesion molecules that are necessary for host-microbial cross-talks.

Data availability

All the data are available upon request.

Abbreviations

PPI:: Protein–protein interaction
PHPPI:: Pathogen–host protein–protein interaction
GO:: Gene ontology
dN:: Nonsynonymous nucleotide substitutions per nonsynonymous site
dS:: Synonymous nucleotide substitutions per synonymous site

References

Durmus Tekir, S., Cakir, T. & Ulgen, K. Infection strategies of bacterial and viral pathogens through pathogen–human protein–protein interactions. Front. Microbiol. 3, 46 (2012).
Article PubMed PubMed Central CAS Google Scholar
Ahmed, H. et al. Network biology discovers pathogen contact points in host protein–protein interactomes. Nat. Commun. 9, 2312–2312 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Saha, S., Sengupta, K., Chatterjee, P., Basu, S. & Nasipuri, M. Analysis of protein targets in pathogen–host interaction in infectious diseases: A case study on Plasmodium falciparum and Homo sapiens interaction network. Brief. Funct. Genom. 17, 441–450 (2017).
Google Scholar
Bahia, D., Satoskar, A. R. & Dussurget, O. Cell signaling in host–pathogen interactions: The host point of view. Front. Immunol. 9, 221 (2018).
Article PubMed PubMed Central CAS Google Scholar
Blasi, F., Tarsia, P. & Aliberti, S. Strategic targets of essential host–pathogen interactions. Respiration 72, 9–25 (2005).
Article CAS PubMed Google Scholar
Neu, H. C. The crisis in antibiotic resistance. Science 257, 1064–1073 (1992).
Article ADS CAS PubMed Google Scholar
Nicod, C., Banaei-Esfahani, A. & Collins, B. C. Elucidation of host–pathogen protein–protein interactions to uncover mechanisms of host cell rewiring. Curr. Opin. Microbiol. 39, 7–15 (2017).
Article CAS PubMed PubMed Central Google Scholar
Halehalli, R. R. & Nagarajaram, H. A. Molecular principles of human virus protein–protein interactions. Bioinformatics 31, 1025–1033 (2014).
Article PubMed CAS Google Scholar
Schleker, S. & Trilling, M. Data-warehousing of protein–protein interactions indicates that pathogens preferentially target hub and bottleneck proteins. Front. Microbiol. 4, 51 (2013).
Article PubMed PubMed Central Google Scholar
He, X. & Zhang, J. Why do hubs tend to be essential in protein networks?. PLoS Genet. 2, 826–834. https://doi.org/10.1371/journal.pgen.0020088 (2006).
Article CAS Google Scholar
Hahn, M. W. & Kern, A. D. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evol. 22, 803–806 (2005).
Article CAS PubMed Google Scholar
Jeong, H., Mason, S. P., Barabasi, A. L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42. https://doi.org/10.1038/35075138 (2001).
Article ADS CAS PubMed Google Scholar
Tew, K. L., Li, X.-L. & Tan, S.-H. Functional centrality: Detecting lethality of proteins in protein interaction networks. Genome Inform. 19, 166–177 (2007).
CAS PubMed Google Scholar
Ekman, D., Light, S., Björklund, Å. K. & Elofsson, A. What properties characterize the hub proteins of the protein–protein interaction network of Saccharomyces cerevisiae?. Genome Biol. 7, R45 (2006).
Article PubMed PubMed Central CAS Google Scholar
Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. & Feldman, M. W. Evolutionary rate in the protein interaction network. Science 296, 750–752 (2002).
Article ADS CAS PubMed Google Scholar
Helsen, J., Frickel, J., Jelier, R. & Verstrepen, K. J. Network hubs affect evolvability. PLoS Biol. 17, e3000111 (2019).
Article PubMed PubMed Central CAS Google Scholar
Alvarez-Ponce, D., Feyertag, F. & Chakraborty, S. Position matters: Network centrality considerably impacts rates of protein evolution in the human protein–protein interaction network. Genome Biol. Evol. 9, 1742–1756 (2017).
Article CAS PubMed PubMed Central Google Scholar
Becerra, A., Bucheli, V. A. & Moreno, P. A. Prediction of virus-host protein–protein interactions mediated by short linear motifs. BMC Bioinform. 18, 163 (2017).
Article CAS Google Scholar
García-Pérez, C. A., Guo, X., Navarro, J. G., Aguilar, D. A. G. & Lara-Ramírez, E. E. Proteome-wide analysis of human motif-domain interactions mapped on influenza A virus. BMC Bioinform. 19, 238 (2018).
Article CAS Google Scholar
Yang, H. et al. Insight into bacterial virulence mechanisms against host immune response via the Yersinia pestis-human protein–protein interaction network. Infect. Immun. 79, 4413–4424 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stark, C. et al. BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
Article CAS PubMed Google Scholar
Chatr-Aryamontri, A. et al. MINT: The molecular INTeraction database. Nucleic Acids Res. 35, D572–D574 (2006).
Article PubMed PubMed Central Google Scholar
Peri, S. et al. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res. 32, D497–D501 (2004).
Article CAS PubMed PubMed Central Google Scholar
Xenarios, I. et al. DIP, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002).
Article CAS PubMed PubMed Central Google Scholar
Hermjakob, H. et al. IntAct: An open source molecular interaction database. Nucleic Acids Res. 32, D452–D455 (2004).
Article CAS PubMed PubMed Central Google Scholar
Liao, B.-Y., Scott, N. M. & Zhang, J. Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol. Biol. Evol. 23, 2072–2080. https://doi.org/10.1093/molbev/msl076 (2006).
Article CAS PubMed Google Scholar
Acharya, D., Mukherjee, D., Podder, S. & Ghosh, T. C. Investigating different duplication pattern of essential genes in mouse and human. PLoS ONE 10, e0120784–e0120784. https://doi.org/10.1371/journal.pone.0120784 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chen, H. et al. New insights on human essential genes based on integrated analysis and the construction of the HEGIAP web-based platform. Brief. Bioinform. 21, 1397–1410 (2019).
Article PubMed Central CAS Google Scholar
Acharya, D. & Ghosh, T. C. Global analysis of human duplicated genes reveals the relative importance of whole-genome duplicates originated in the early vertebrate evolution. BMC Genom. 17, 1–14. https://doi.org/10.1186/s12864-016-2392-0 (2016).
Article CAS Google Scholar
Mészáros, B., Simon, I. & Dosztányi, Z. Prediction of protein binding regions in disordered proteins. PLoS Comput. Biol. 5, e1000376 (2009).
Article ADS PubMed PubMed Central CAS Google Scholar
Dunker, A. K., Cortese, M. S., Romero, P., Iakoucheva, L. M. & Uversky, V. N. Flexible nets: The roles of intrinsic disorder in protein interaction networks. FEBS J. 272, 5129–5148 (2005).
Article CAS PubMed Google Scholar
Dunker, A. K., Romero, P., Obradovic, Z., Garner, E. C. & Brown, C. J. Intrinsic protein disorder in complete genomes. Genome Inform. 11, 161–171 (2000).
CAS Google Scholar
Ward, J. J., Sodhi, J. S., McGuffin, L. J., Buxton, B. F. & Jones, D. T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 337, 635–645 (2004).
Article CAS PubMed Google Scholar
Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433–3434. https://doi.org/10.1093/bioinformatics/bti541 (2005).
Article CAS PubMed Google Scholar
Disfani, F. M. et al. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. Bioinformatics 28, i75–i83. https://doi.org/10.1093/bioinformatics/bts209 (2012).
Article CAS PubMed PubMed Central Google Scholar
Uversky, V. N., Oldfield, C. J. & Dunker, A. K. Intrinsically disordered proteins in human diseases: Introducing the D2 concept. Annu. Rev. Biophys. 37, 215–246 (2008).
Article CAS PubMed Google Scholar
Basu, M. K., Poliakov, E. & Rogozin, I. B. Domain mobility in proteins: Functional and evolutionary implications. Brief. Bioinform. 10, 205–216 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hunter, S. et al. InterPro: The integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2008).
Article PubMed PubMed Central CAS Google Scholar
Smith, R. N. et al. InterMine: A flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 28, 3163–3165 (2012).
Article CAS PubMed PubMed Central Google Scholar
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: A tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinform. 10, 48 (2009).
Article Google Scholar
Prieto, C. & De Las Rivas, J. APID: Agile protein interaction DataAnalyzer. Nucleic Acids Res. 34, W298–W302. https://doi.org/10.1093/nar/gkl128 (2006).
Article CAS PubMed PubMed Central Google Scholar
Calderone, A., Castagnoli, L. & Cesareni, G. Mentha: A resource for browsing integrated protein-interaction networks. Nat. Methods 10, 690 (2013).
Article CAS PubMed Google Scholar
Ammari, M. G., Gresham, C. R., McCarthy, F. M. & Nanduri, B. HPIDB 2.0: A curated database for host–pathogen interactions. Database 2016, baw103 (2016).
Article PubMed PubMed Central CAS Google Scholar
Durmuş Tekir, S. et al. PHISTO: Pathogen–host interaction search tool. Bioinformatics 29, 1357–1358 (2013).
Article PubMed CAS Google Scholar
Gioutlakis, A., Klapa, M. I. & Moschonas, N. K. PICKLE 2.0: A human protein–protein interaction meta-database employing data integration via genetic information ontology. PLoS ONE 12, e0186039 (2017).
Article PubMed PubMed Central CAS Google Scholar
Szklarczyk, D. et al. The STRING database in 2017: Quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. gkw937 (2016).
Newman, M. E. J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 323–351 (2005).
Article ADS Google Scholar
Uhlen, M. et al. Towards a knowledge-based human protein atlas. Nat. Biotechnol. 28, 1248–1250 (2010).
Article CAS PubMed Google Scholar
Han, J.-D.J. et al. Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature 430, 88–93 (2004).
Article ADS CAS PubMed Google Scholar
Baspinar, A., Cukuroglu, E., Nussinov, R., Keskin, O. & Gursoy, A. PRISM: A web server and repository for prediction of protein–protein interactions and modeling their 3D complexes. Nucleic Acids Res. 42, W285–W289 (2014).
Article CAS PubMed PubMed Central Google Scholar
Batada, N. N. et al. Still stratus not altocumulus: Further evidence against the date/party hub distinction. PLoS Biol. 5, e154 (2007).
Article PubMed PubMed Central Google Scholar
Blomen, V. A. et al. Gene essentiality and synthetic lethality in haploid human cells. Science 350, 1092–1096. https://doi.org/10.1126/science.aac7557 (2015).
Article ADS CAS PubMed Google Scholar
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101. https://doi.org/10.1126/science.aac7041 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Hart, T. et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell https://doi.org/10.1016/j.cell.2015.11.015 (2015).
Article PubMed Google Scholar
Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–D716. https://doi.org/10.1093/nar/gkv1157 (2016).
Article CAS PubMed Google Scholar
Panda, A., Acharya, D. & Ghosh, T. C. Insights into human intrinsically disordered proteins from their gene expression profile. Mol. BioSyst. 13, 2521–2530 (2017).
Article CAS PubMed Google Scholar
Gene Ontology, C. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261. https://doi.org/10.1093/nar/gkh036 (2004).
Article CAS Google Scholar
Nie, N. H., Bent, D. H. & Hull, C. H. SPSS: Statistical Package for the Social Sciences (McGraw-Hill, New York, 1970).
Google Scholar

Download references

Acknowledgements

We thank members of our lab for stimulating discussions on this topic and Bose Institute for financial support. We also thank Dr. Arup Panda, Tel Aviv University, Israel for technical help.

Funding

This work was supported by Bose Institute, Kolkata, India.

Author information

Authors and Affiliations

Department of Microbiology, Bose Institute, P-1/12, CIT Scheme VII M, Kolkata, West Bengal, 700 054, India
Debarun Acharya & Tapan K. Dutta

Authors

Debarun Acharya
View author publications
You can also search for this author in PubMed Google Scholar
Tapan K. Dutta
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.A. contributed to design, acquisition, and analysis of data while D.A. and T.K.D. contributed to the concept and preparation of the manuscript. All the authors have read and approved the final manuscript.

Corresponding author

Correspondence to Tapan K. Dutta.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Acharya, D., Dutta, T.K. Elucidating the network features and evolutionary attributes of intra- and interspecific protein–protein interactions between human and pathogenic bacteria. Sci Rep 11, 190 (2021). https://doi.org/10.1038/s41598-020-80549-x

Download citation

Received: 24 August 2020
Accepted: 09 December 2020
Published: 08 January 2021
DOI: https://doi.org/10.1038/s41598-020-80549-x

This article is cited by

Molecular mimicry of host short linear motif-mediated interactions utilised by viruses for entry
- Saumyadeep Goswami
- Dibyendu Samanta
- Kheerthana Duraivelan
Molecular Biology Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

A host–microbiota interactome reveals extensive transkingdom connectivity

Next-generation proteomics for quantitative Jumbophage-bacteria interaction mapping

Higher-order interactions shape microbial interactions as microbial community complexity increases

Introduction

Results and discussion

Hubs and Bottlenecks in pathogen-interacting and non-interacting human proteins

Hubs and Bottlenecks in human-interacting and non-interacting bacterial proteins

Gene essentiality of pathogen-interacting human proteins

Evolutionary rates of pathogen-interacting and noninteracting human proteins

Intrinsic disorder of pathogen-interacting and noninteracting human proteins

Molecular recognition features (MoRFs) in pathogen-interacting and noninteracting human disordered proteins

Protein domains in pathogen-interacting and non-interacting human proteins

Functional enrichment analysis of pathogen-interacting proteins

Materials and methods

Protein–protein interaction datasets

Party-hubs and date-hubs

Human essential genes

Evolutionary rate

Intrinsically disordered proteins

Molecular recognition features

Protein domains

Functional enrichment analysis

Statistical analyses

Conclusions

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Molecular mimicry of host short linear motif-mediated interactions utilised by viruses for entry

Comments

Search

Quick links