Elucidating the network features and evolutionary attributes of intra- and interspecific protein–protein interactions between human and pathogenic bacteria

Host–pathogen interaction is one of the most powerful determinants involved in coevolutionary processes covering a broad range of biological phenomena at molecular, cellular, organismal and/or population level. The present study explored host–pathogen interaction from the perspective of human–bacteria protein–protein interaction based on large-scale interspecific and intraspecific interactome data for human and three pathogenic bacterial species, Bacillus anthracis, Francisella tularensis and Yersinia pestis. The network features revealed a preferential enrichment of intraspecific hubs and bottlenecks for both human and bacterial pathogens in the interspecific human–bacteria interaction. Analyses unveiled that these bacterial pathogens interact mostly with human party-hubs that may enable them to affect desired functional modules, leading to pathogenesis. Structural features of pathogen-interacting human proteins indicated an abundance of protein domains, providing opportunities for interspecific domain-domain interactions. Moreover, these interactions do not always occur with high-affinity, as we observed that bacteria-interacting human proteins are rich in protein-disorder content, which correlates positively with the number of interacting pathogen proteins, facilitating low-affinity interspecific interactions. Furthermore, functional analyses of pathogen-interacting human proteins revealed an enrichment in regulation of processes like metabolism, immune system, cellular localization and transport apart from divulging functional competence to bind enzyme/protein, nucleic acids and cell adhesion molecules, necessary for host-microbial cross-talk.


Results and discussion
Hubs and Bottlenecks in pathogen-interacting and non-interacting human proteins. The human-bacteria protein-protein interaction networks for three bacterial pathogens, namely Bacillus anthracis, Francisella tularensis, and Yersinia pestis were analyzed to understand the network features of bacterial proteininteracting human proteins. In general, the protein-protein interaction (PPI) data contains many false positives and false negatives. Here, we selected three bacterial species for this study that have the highest number of interspecific interactions with human proteins verified by multiple databases. Additionally, the PPI data is not yet comprehensive and therefore, all the interpretations are made from the currently available data. It has been previously reported that the pathogen proteins mainly interact with the highly connected host proteins (host-hubs) 1,20 . In this study, we classified the human proteins into four groups: (a) not-interacting with any bacterial pathogen, (b) interacting with only one pathogen, (c) interacting with only two pathogens and (d) interacting with all three pathogens. The human protein-protein interaction network was constructed using the PICKLE database, where the PPIs supported by any two of four widely used PPI databases (BIOGRID 21 , MINT 22 , HPRD 23 , DIP 24 and IntAct 25 ) were considered as true-interaction. The final data contain 11,815 proteins involved in 61,273 high-quality interactions, representing a little less than half of the human proteome. Comparing the proportion of hubs, it has been observed that the pathogen-interacting human proteins correspond to a higher proportion of hubs and bottlenecks than that of the non-interacting group (Supplementary Table S3). The pathogen-interacting proteins also have higher mean interacting partners (degree centrality) than that of the non-interacting group with respect to both hubs and nonhubs. Additionally, human proteins that interact with more bacterial pathogens have a higher proportion of hubs and higher mean interacting partners than those interacting with fewer pathogens (Table 1). This suggests that pathogenic proteins preferentially target human hubs and bottlenecks that comprise functionally most important proteins in the human protein interaction network, which in turn, may damage the functional implication of the network. The high degree www.nature.com/scientificreports/ centrality of pathogen-interacting human proteins may also ensure the pathogens' establishment within the human host via its control over a broad range of target human proteins. When human proteins were classified into hub-bottlenecks, hub-nonbottleneck, nonhub-bottleneck, and nonhub-nonbottleneck based on these two centrality measures, the highest proportion of pathogen-interacting proteins was obtained in the hub-bottleneck class. More interestingly, the hub-nonbottleneck and nonhub-bottleneck possess no significant difference, which indicates that hubs and bottlenecks are equally targeted by proteins of these pathogens (Fig. 1). Moreover, the whole protein interaction network can be subdivided into many functional modules, with each distinct module representing a specific function. Based on modularity, the hubs which belong to the same functional module as their interacting partners are known as intramodular hubs or party hubs, and those having interacting partners that belong to different functional modules are known as intermodular hubs or date hubs. To evaluate the preferential interaction of pathogen proteins with any one class of these hubs, the human partyand date hubs were identified using co-expression values of human proteins and their interacting partners and their interacting interface (see "Materials and methods"). Based on the above, the proportion of party hubs was found to be significantly higher in pathogen-interacting proteins, signifying pathogen proteins target some of the functional modules for their benefit (Table 2).
Hubs and Bottlenecks in human-interacting and non-interacting bacterial proteins. The scalefree network topology follows power-law node degree distribution, comprising a few nodes with a higher degree centrality than many other nodes. Such a network is resilient against random-attacks, which applies to human as well as pathogenic bacteria alike ( Supplementary Fig. 1). In order to disrupt the human PPI network, the pathogen proteins need to act against particular human proteins via non-random directed interactions. The patho- www.nature.com/scientificreports/ genic proteins with high degree centrality may be potential candidates involved in such disruption, due to their inherent property of high interaction ability. To explore this further, we subdivided the pathogen proteins into hubs or nonhubs based on their degree centrality and bottlenecks or nonbottlenecks, based on betweenness centrality (see "Materials and methods"). Following this classification, the network properties of human-interacting and non-interacting pathogen proteins were explored and it was observed that the bacterial proteins which interact with human proteins are significantly enriched in bacterial hubs and bottlenecks in the bacterial PPI network. These hub proteins also have higher mean interacting partners ( Table 3), indicating that the humaninteracting pathogen proteins have the potential to interact with multiple type of proteins in the intraspecific PPI network, which may facilitate in interspecific host-pathogen interactions.
Gene essentiality of pathogen-interacting human proteins. Genes indispensable to the survival and reproduction of an organism are considered as essential genes 26,27 . Proteins encoded by such genes are associated with vital molecular functions and are under strong purifying selection. It had been observed that the pathogen-interacting proteins comprise a higher proportion of essential proteins, which however, maybe due to their enrichment among hubs 10,28 . Moreover, when we considered hub and nonhub proteins separately, the pathogen-interacting proteins were found to be enriched in essential proteins for both groups, suggesting that these deadly pathogens may disrupt vital functions of the host, thereby facilitating pathogenicity and disease progression (Fig. 2).
Evolutionary rates of pathogen-interacting and noninteracting human proteins. The evolutionary rate of proteins depicts the change in its amino acid sequence over time. As hubs are evolutionarily more conserved than nonhubs and also enriched with pathogen-interacting proteins, they are supposed to reveal a slower evolutionary rate. However, very little is known regarding the differences in evolutionary rate between pathogen-interacting and noninteracting hubs. Considering pathogen-interacting/-noninteracting hubs/nonhubs, a comparison of the evolutionary rate as dN/dS ratio using 1:1 Mouse and Chimpanzee orthologs 29 revealed a slower evolutionary rate in hub proteins. Nevertheless, among the pathogen-interacting and noninteracting hubs, the former shows a slower evolutionary rate (Fig. 3), suggesting that the evolutionarily more conserved hubs are more likely to be targeted by pathogens. It is, however, beneficial from the pathogens' perspective, as it may allow an efficient pathogen-host protein-protein interaction throughout large evolutionary time-scale.

Intrinsic disorder of pathogen-interacting and noninteracting human proteins. Functional
implication of protein is always mediated by its proper three-dimensional configuration. However, there are certain amino acid residues or stretches in proteins' sequence, which do not let a protein fold into a definite conformation, and under such a situation, its associated flexibilities often facilitate in imparting productive proteinprotein interactions. Such residues/regions on a protein are known as intrinsically disordered residues/regions. Intrinsically disordered proteins, naturally, lack distinct three-dimensional structure but can adopt definite conformation upon their interaction with other proteins, facilitating low-affinity interactions with high-specificity 30 . Table 2. Proportion of party-hubs and date-hubs in pathogenic bacteria-interacting and non-interacting human proteins.  www.nature.com/scientificreports/ Proteins that are highly connected in a network of proteins are usually rich in these regions 31 , which may play an important role in the interactions between host and pathogen proteins. Although bacterial proteins are less disordered than the human proteins 32,33 , the disordered regions in human proteins are supposed to be utilized by the bacterial pathogens as potential regions for interaction. To address the same, IUPred algorithm was used to identify the disordered residues in pathogen-interacting and non-interacting proteins 34 . The proportion of disordered proteins (P disordered ) in the pathogen-interacting proteins is significantly higher than the non-interacting proteins (P disordered_interacting = 59.73, N interacting = 2677, P disordered_noninteracting = 49.07, N noninteracting = 9136, Z = 9.706, P < 1.00 × 10 −4 ), suggesting that they may play an important role in pathogen-host interactions. Additionally, when the total number and percentage of disordered regions and residues of individual proteins were considered, we found that pathogen interacting proteins have a higher number and mean percentage of long disordered regions and disordered residues (Supplementary Table S4), indicating human proteins with intrinsically disordered regions and residues are more prone to pathogen-attack. However, as smaller disordered segments can also be important for interaction, therefore we also considered the proteins having ≥ 15 residue long disordered stretches, which gives a consistent result (Supplementary Table S4).   Table S5). When the human proteins were binned based on their disorder content into five bins (see "Materials and methods"), it was observed that the proportion of pathogen-interacting genes increases gradually with increasing disorder content up to 80% (Fig. 4). Together, these results suggest that the protein intrinsic disorder plays a major role in the host-pathogen interactions.

Molecular recognition features (MoRFs) in pathogen-interacting and noninteracting human disordered proteins.
We also considered the Molecular Recognition Features or MoRFs, which are 5-25 residues long specialized elements located within the disordered regions of proteins that undergo disorder to order transition upon binding with their respective interacting partners. Here, to understand whether the disordered regions in pathogen interacting human proteins can serve as the disordered protein binding sites for pathogen proteins, we explored the MoRFs within the human disordered proteins, using the fMoRFpred 35 webserver. The pathogen interacting human proteins were found to be rich in molecular recognition features (MoRFs) than the noninteracting counterpart (MoRF_regions interacting = 1.017, MoRF_regions noninteracting = 0.931, P = 3.949 × 10 −2 ; MoRF_residues interacting = 15.035, MoRF_residues noninteracting = 12.765, P = 3.718 × 10 −9 , Mann-Whitney U test, N interacting = 1599, N noninteracting = 4472), suggesting that pathogen-interacting human proteins are more enriched in these regions, which may favour the interspecific protein-protein interaction.

Protein domains in pathogen-interacting and non-interacting human proteins.
Although, protein intrinsic disorder facilitates protein-protein interaction by providing flexibility to the proteins' structure 36 , protein domains, the most conserved and functionally essential part of a protein serve a distinct role in such interaction 37 . More specifically, the protein-protein interaction can be viewed as interaction between domains of different proteins. Therefore, proteins with a greater number of domains may have a higher probability of interaction with other proteins. To study the influence of protein domains on human-bacteria interaction, the mean number of domains of pathogenic bacteria interacting-and noninteracting-human proteins were calculated using Interpro repository 38 . It was observed that the pathogen-interacting proteins contain a higher number of domains than that of the noninteracting ones (P = 6.73 × 10 −16 , Mann-Whitney U test). Moreover, the higher number of domains in pathogen-interacting human proteins may be attributable to the abundance of hubs within them. Thus, we divided the data into hubs and nonhubs. Interestingly, within both hubs and nonhubs, the pathogen-interacting proteome has a higher number of domains (P hub = 8.60 × 10 −5 , P nonhub = 6.58 × 10 −7 ). Additionally, the proteins interacting with more pathogens hold a higher number of protein domains (P = 2.41 × 10 −15 , Kruskal-Wallis test) (Fig. 5). This suggests that proteins with a higher domain number have a higher probability of interaction with pathogen proteins, facilitated via interspecific domain-domain interaction.

Functional enrichment analysis of pathogen-interacting proteins. The association of party hubs
with pathogen proteins indicates that these bacterial pathogens mostly target particular functional modules of human proteome for the establishment of pathogenicity and progression of the disease. For a detailed insight, the functional enrichment of the pathogen-interacting human proteins was studied using the Humanmine 39 and Gorilla 40 webservers. The top 10 enriched Gene Ontology (GO) terms matched in both the datasets were observed for both the GO domains, 'Biological Process' and 'Molecular Function' (Supplementary Table S6). The pathogen-interacting proteins were revealed to be enriched in processes like regulation of biological/cellular processes, cellular localization, immune system, interspecies interaction between organisms, regulation www.nature.com/scientificreports/ of cellular (metabolic) processes, regulation of nitrogen compound metabolic processes, regulation of primary metabolic processes, and vesicle-mediated transport processes. These proteins were also shown to be enriched in functions like RNA binding, enzyme/protein binding, nucleic acid binding, protein-containing complex biomolecule binding, cadherin binding, cell adhesion molecule binding, transcription factor binding, chromatin binding, and kinase binding. The above functional enrichment clearly suggest that during pathogenesis, these pathogens primarily regulate the processes related to immune system, cellular localization and transport, apart from influencing the binding of host macromolecules and cell-adhesion molecules, necessary for host-microbial cross-talks.  44 . The binary interactions reported in no less than three of the four databases were used in this study as the pathogen-interacting human proteins. The human proteins and their sequences were obtained from Uniprot (https ://www.unipr ot.org/) 45 . The human proteins with no reported interaction with none of the pathogen protein in either of the databases were considered as pathogen-noninteracting proteins (Supplementary Table S1). The human PPI data was obtained from PICKLE (Protein InteraCtion KnowLedgebasE) (www.pickl e.gr) 45 , which combines all the globally used protein-protein interaction database like BIOGRID 21 , MINT 22 , HPRD 23 , DIP 24 and IntAct 25 . We removed all the self-interactions and considered interactions supported by at least two of these databases for our study 45 .

Materials and methods
The within-species PPI data of all three bacterial pathogens were obtained from the STRING database (https ://strin g-db.org/) 46 , considering the experimentally validated interactions only. The STRING IDs were annotated to Uniprot IDs using the annotation file present in the STRING database. Reciprocal BLAST with 100% sequence identity and e-values < e −10 BLAST parameters was used to determine the orthologous proteins of two different pathogen strains belonging to the same species as available in pathogen-PPI and pathogen-human PPI databases. The final dataset consists of 122,546 Homo sapiens binary interactions involving 11,833 proteins, We analyzed each network using the Network Analyzer plugin of Cytoscape (version 3.7.1) to get the degree and betweenness centrality. The node degree of all the species shows power-law distributions ( Supplementary Fig. S1). We subdivided the proteins of each species into hubs and nonhubs depending on their degree centrality. The top ~ 20% proteins of the node degree distribution having the highest number of interacting partners were considered as hubs, while the rest as nonhubs, according to the 20-80 rule of power-law distributions (Pareto principle) 47 . Similarly, we classified the proteins into bottlenecks (proteins that are central to many paths in the network) and non-bottlenecks considering the proteins representing the top ~ 20% of betweenness centrality as bottlenecks and the rest as non-bottlenecks (Supplementary Table S2).
Party-hubs and date-hubs. For 49 . We have used PRISM 50 webserver to confirm that no two interacting partners of a party hub share the same interacting surface with the latter. The hubs having a mean PCC value ≥ 0.5 were considered as party hubs and those having a PCC value < 0.5 were considered as date hubs 51 .
We have also used mean PCC value of all proteins as the cutoff to select party-hubs (above mean) and date-hubs (below mean) 14 .
Human essential genes. Genes essential for human survival and reproduction, collectively known as essential human genes, were obtained from three recent experiments based on gene trap mutagenesis 52 and high-resolution CRISPR-screening 53,54 . Human genes (and their encoded proteins) considered as essential or nonessential in all the three screenings were considered as essential and nonessential, respectively. The final data consists of 768 essential and 8080 nonessential human proteins.
Evolutionary rate. For the calculation of evolutionary rate of human proteins, the nonsynonymous nucleotide substitutions per nonsynonymous site (dN) and synonymous nucleotide substitutions per synonymous site (dS), were obtained from the Ensembl biomart 55 , using 1:1 mouse and chimpanzee orthologs for each human protein. The mutation saturation was controlled by discarding dS values greater than 3 and the dN/dS ratio was used as evolutionary rate 29 .
Intrinsically disordered proteins. We used IUPred algorithm to predict the intrinsically disordered regions in the protein sequence. In IUPred, each amino acid residue is given a probability score based on its pairwise energy profile with respect to its interaction with other residues along the protein sequence. Residues with scores ≥ 0.50 are considered as disordered and < 0.50 as ordered 34 . We have downloaded the 'reviewed' human protein sequence from Uniprot (Accession UP000005640). We discarded all proteins with < 30 amino acid residues. Proteins with a continuous stretch of ≥ 30 disordered residues were considered as proteins with long intrinsically disordered regions. We have calculated the number of these disordered stretches, the proportion of residues in the long-disordered stretches, the total number of disordered amino acid residues and the proportion of disordered amino acid residues for each human protein. Following Panda et al. 2017 56 , human proteins were classified into five groups based on their disorder content: A, Ordered (having 0-20% disordered amino acid residues); B, Moderately disordered (having 20-40% disordered amino acid residues); C, Disordered (having 40-60% disordered amino acid residues); D, Highly disordered (having 60-80% disordered amino acid residues) and E, Extremely disordered (having 80-100% disordered amino acid residues).

Conclusions
Recent developments of high-throughput interspecific protein-protein interaction data paved the way for host-pathogen interaction studies to understand detailed aspects of pathogenicity, leading to the development of platforms for host-directed therapeutic research. In this study, we explored the attributes of the human-bacteria protein-protein interaction (PPI) network from the available large-scale interspecific interactome data of three bacterial species, Bacillus anthracis, Francisella tularensis and Yersinia pestis, for which large-scale highthroughput intraspecific and interspecific PPI data are available. It was observed that the central proteins within intraspecific human and bacterial interactome preferentially participate in human-bacteria interaction. This includes hubs and bottlenecks of both human and bacterial PPI networks. Additionally, within human hubs, party-hubs participate in the interspecific PPI network more often than that of date hubs. It was also revealed that these pathogens preferentially interact with human essential proteins, both within hubs and nonhubs, thereby assisting in disease progression. From evolutionary perspective, these bacterial pathogens interact with evolutionarily more conserved human proteins, leading to a sustainable interaction, helpful for pathogen species. A detailed analysis of host proteins' structural features revealed that the pathogen-interacting human proteins contain a higher number of protein domains and an abundance of intrinsically disordered residues and regions, which are likely to assist human-bacteria interaction by promoting high-affinity and low-affinity protein-protein interactions, respectively. Furthermore, the functional enrichment in pathogen-interacting human proteins revealed an enrichment of proteins involved in various biological processes, including catalytic functions related to the binding of several biomolecules. These enriched proteins are supposed to regulate essential metabolic and immune system processes, cellular localization, and transport and also influence the binding of host macromolecules and cell-adhesion molecules that are necessary for host-microbial cross-talks.