Network-assisted investigation of virulence and antibiotic-resistance systems in Pseudomonas aeruginosa

Pseudomonas aeruginosa is a Gram-negative bacterium of clinical significance. Although the genome of PAO1, a prototype strain of P. aeruginosa, has been extensively studied, approximately one-third of the functional genome remains unknown. With the emergence of antibiotic-resistant strains of P. aeruginosa, there is an urgent need to develop novel antibiotic and anti-virulence strategies, which may be facilitated by an approach that explores P. aeruginosa gene function in systems-level models. Here, we present a genome-wide functional network of P. aeruginosa genes, PseudomonasNet, which covers 98% of the coding genome, and a companion web server to generate functional hypotheses using various network-search algorithms. We demonstrate that PseudomonasNet-assisted predictions can effectively identify novel genes involved in virulence and antibiotic resistance. Moreover, an antibiotic-resistance network based on PseudomonasNet reveals that P. aeruginosa has common modular genetic organisations that confer increased or decreased resistance to diverse antibiotics, which accounts for the pervasiveness of cross-resistance across multiple drugs. The same network also suggests that P. aeruginosa has developed mechanism of trade-off in resistance across drugs by altering genetic interactions. Taken together, these results clearly demonstrate the usefulness of a genome-scale functional network to investigate pathogenic systems in P. aeruginosa.

, where P(L|E) and P( L|E) are probability of gold standard positive and negative link for given experimental data, respectively. P(L) and P( L) are probability of gold standard positive and negative link before the experimental data provided, respectively. For data sets in which each gene pair is associated with a continuous score (e.g., correlation coefficient), we calculated LLS scores for bins containing equal numbers of gene pairs using gold standard gene pairs. Those LLS scores and their corresponding data scores (the mean data scores for a bin) were used to calculate regression models, which were then used to map individual data intrinsic scores to LLS scores for both gold standard gene pairs and unlabeled gene pairs in a continuous manner. Finally, we take only gene pairs that are significantly higher likelihood than those by random chance (e.g., LLS > 0.7 or 2 fold more likely than random chance).
Since we inferred functional associations from nine distinct data sets (Table 1), a functional association between genes could be supported by multiple LLSs that pass the cutoff. For the links with multiple LLSs, we integrated the scores using weighted sum (WS) method as described in 2 : • , , where represents LLS ( is the maximum LLS of a given functional link), and is the index number for all other LLS by ranked order. D is a free parameter used as a weight factor, and T is a minimum threshold of LLS. If data types to be integrated are not correlated at all, summation of all scores (e.g., naïve Bayes) would be the optimal method for integration. In contrast, if the data types are completely correlated, taking the maximum of the scores would result in the best integration. However, data types in general have partial correlation, in which taking partial credit of additional score with appropriate weight improves the integrated network. We choose the free parameters for the weighted sum where we achieve the best precision-recall performance of the integrated network.

Co-functional links inferred from co-citation of P. aeruginosa genes (CC)
The original co-citation algorithm was based on an idea that functionally related two P. aeruginosa genes tend to be cited at the same research article abstract. 3 However, some articles have names of genes in the main text. To improve sensitivity of search, we search PubMed Central (PMC, http://www.ncbi.nlm.nih.gov/pmc/) for articles containing "Pseudomonas aeruginosa" in abstract and any P. aeruginosa gene name in full text. As a result, we found a total of 8,029 articles containing P. aeruginosa gene names, and then assign probability of association between genes by one-tail Fisher's exact test.

Co-functional links inferred from co-expression of P. aeruginosa genes (CX)
We downloaded 34 microarray data sets containing no less than 8 samples of gene expression from Gene Express Omnibus (GEO) 4 on October 30 th , 2012. Pearson correlation coefficient was measured between all pairs of gene vectors of expression values to infer functional association between two genes. We tested a total of 34 microarray data sets and were able to infer functional links from 12 of them (Supplementary Table S7).

Co-functional links inferred from correlation of protein domain profiles (DP)
Domains are recurring functional motifs of proteins. Because domains are structural, functional and evolutionary units of protein, proteins that share a similar set of domains are likely to be functionally associated. Using profiles of domain occurrence in proteins by InterPro database, 5 we measure likelihood of functional association for given tendency of domain co-occurrence between two proteins. To learn a more informative co-occurrence pattern, we used a weighted version of mutual information score, in which higher weights were given to more infrequent domains under the assumption that infrequent domains harbor specific pathway information.

Co-functional links inferred from genomic contexts (PG and GN)
We used genomic context information of P. aeruginosa genes to discover functional associations between genes with two different methods, phylogenetic profiling 6 and gene neighborhood. 7 The similarity of phylogenetic profiles between two P. aeruginosa genes reflects the degree of co-inheritance of two genes during speciation, because functional constraints between functionally coupled genes mainly determine coinheritance pattern. We first ran BLASTP to compare all P. aeruginosa protein sequences against all protein sequences from 1,626 bacteria genomes, 122 archaea genomes and 396 eukaryotic genomes. Phylogenetic profile matrices of the blast-hit scores were constructed and the association between profiles was measured by mutual information scores. For P. aeruginosa genes, we found that the similarity of phylogenetic profiles for each of the two domains of life (archaea and bacteria) performed better than that for all 2,144 genomes in retrieving gold-standard functional links. We integrated the two domain-specific networks into a single network by phylogenetic profiles.
For network inference by genomic neighborhood across 1,746 prokaryote genomes, we used two approaches to measure the genomic neighborhood, distance-based gene neighbourhood (DGN) and probability-based gene neighbourhood (PGN), as described in our previous work 7 . For the DGN measure, we took the median value of chromosomal distance between orthologs of the two query genes with PBLAST Evalues < 1 across the 1,746 reference prokaryote genomes. Each median distance value was normalized using the number of genomes in which orthologs of the two genes cooccurred, giving greater weight to gene pairs conserved in a larger number of prokaryote genomes. For the PGN measure, we calculated the probability of two genes being separated by fewer than d genes in a genome containing N genes as follows: We then calculate the product of the above probability across the m reference genomes containing orthologs of the two query genes:

1
To calculate the likelihood that two genes belong to the same conserved neighbourhood, we determine the probability of obtaining a value of X that is smaller than the observed value:

!
Because we previously found complementarity between these two methods, we integrated them into a single network by genomic neighborhood using weighted sum method as described above.

Co-functional links by orthology-based transfer from Escherichia coli and bacterial protein-protein interactions
Associalogs are conserved functional associations transferred from different species by orthology. 8 We transferred conserved functional links between P. aeruginosa genes from E. coli and bacterial protein-protein interactions. All transferred co-functional associations are re-scored by Inparanoid weighted LLS (IWLLS) 8 as following: where A' and B' are P. aeruginosa genes and A and B are orthologous genes from E. coli or other bacteria, and the transferred functional association of A'-B' from that of A-B obtains weighted values as how likely A-A' and B-B' are orthologous by Inparanoid. 9 We transferred our previously published E. coli co-citation (EC-CC) and co-expression (EC-CX) networks 10 and bacterial protein-protein interactions (BA-HT and BA-LC). BA-LC network is based on PPIs derived from literature curated smallscale experiments. These interactions were collected from four databases of BIND, 11 DIP, 12 IntAct, 13 and MPIDB. 14 We collected PPIs of E. coli and H. pylori as well as those of P. aeruginosa, because P. aeruginosa PPIs are very limited. BA-HT network is based on high-throughput PPI assays, collected from six high-throughput yeast twohybrid experiments for H. pylori, 15 Treponema pallidum, 16 Synechocystis sp. PCC6803, 17 Mesorhizobium loti, 18 Campylobacter jejuni, 19 and E. coli 20 and from four high-throughput tandem affinity purification followed by mass spectrometry analysis for E. coli [21][22][23] and M. pneumonia. 24 Then, these bacterial PPI networks were converted into P. aeruginosa protein networks by the associalog method as described above.  Distribution of the number of connections indicates that PseudomonasNet is a small-world network with broad-scale, which is characterized by a connectivity distribution that has a power law regime followed by a sharp cutoff such as exponential decay of the tail (r 2 = 0.999).
The similar degree distribution were observed from task-driven social networks (e.g., Board of directors) and other functional gene networks.