FIGURE 3
FROM:
Understanding biological functions through molecular networks
Jing-Dong Jackie Han
BACK TO ARTICLEFigure 3.

An example of data integration by a probabilistic model. Heterogeneous dataset types can be evaluated by gold standard positive (GSP) and gold standard negative (GSN) functional relationships, for example PPIs. The potential of forming a true functional relationship can be scored as the likelihood ratio (LR) for protein/gene pairs to be true positive interactions versus true negative interactions, according to the GSP and GSN datasets. Taking each data type as independent, a Naïve Bayesian model can be used to integrate heterogeneous data. Each interaction is assigned a LR within a data type. When evidence arises from more than one dataset within a data type, the maximal LR among the datasets is used for a gene pair. Then the LRs given by different data types are multiplied to generate a final prediction score for a potential functional relationship. Based on an acceptable confidence level, a final integrated network can be obtained with each edge representing a likelihood of forming the functional relationship. PCC, GO, SSBP and DDI stand for Pearson Correlation Coefficient, Gene Ontology, Smallest Shared Biological Process and Domain-Domain Interaction, respectively. Adapted from 41.
