Box 2. Function prediction using the MRF method
FROM:
Network-based prediction of protein function
Roded Sharan, Igor Ulitsky & Ron Shamir
doi:10.1038/msb4100129
BACK TO ARTICLEThe Markov random field (MRF) model provides a probabilistic framework for simulating the mutual influence of random variables via a neighborhood system. Given a network of influence, the state of any random variable is assumed to be independent of all other random variable states given those of its immediate neighbors. In the function prediction setting, each random variable corresponds to a protein, and its states correspond to certain functional annotations. The joint distribution of the random variables can be shown to factorize over the cliques (Box 1) of the network (Besag, 1974). That is, the probability of a certain assignment of discrete states x=(x1,...,xN) is
where N is the total number of variables, Z is a normalizing constant, C is the set of all cliques in the network, Hc is a potential function associated with clique c and xc is the assignment of states to the members of c.
Inference in this general model is computationally hard, hence it is common to assign 0 potentials to all cliques of size greater than 2, and further homogenize the model by associating the same potential function with all cliques of the same size. For such a homogeneous second-order MRF, we have
Deng et al (2003) treat one function at a time. To obtain a second-order MRF model, they assume that the probability of a 0/1 annotation over the entire network is proportional to exp(
N01+
N11+N00), where
,
are parameters for weighting the contributions of the different terms and Nij is the number of interacting pairs with assignment i,j (unordered). Combining the a priori probability of an assignment with N1 1 s, which depends on the frequency f of the function and is proportional to (f/(1 - f))N1, they obtain a homogeneous second-order MRF for which
Hence, the probability that protein v is assigned with the function given the annotations of its neighbors N(v) is
where N(v,i) is the number of neighbors of v that are assigned with i
{0,1} and logit is the logistic function logit(x)=1/(1+e-x). Deng et al (2003) estimate the two parameters of the model using a quasi-likelihood method and apply Gibbs sampling to infer the unknown functional annotations.
