Box 2. Function prediction using the MRF method

FROM:

Network-based prediction of protein function

Roded Sharan, Igor Ulitsky & Ron Shamir

doi:10.1038/msb4100129

BACK TO ARTICLE

The Markov random field (MRF) model provides a probabilistic framework for simulating the mutual influence of random variables via a neighborhood system. Given a network of influence, the state of any random variable is assumed to be independent of all other random variable states given those of its immediate neighbors. In the function prediction setting, each random variable corresponds to a protein, and its states correspond to certain functional annotations. The joint distribution of the random variables can be shown to factorize over the cliques (Box 1) of the network (Besag, 1974). That is, the probability of a certain assignment of discrete states x=(x1,...,xN) is

where N is the total number of variables, Z is a normalizing constant, C is the set of all cliques in the network, Hc is a potential function associated with clique c and xc is the assignment of states to the members of c.

 Inference in this general model is computationally hard, hence it is common to assign 0 potentials to all cliques of size greater than 2, and further homogenize the model by associating the same potential function with all cliques of the same size. For such a homogeneous second-order MRF, we have

  Deng et al (2003) treat one function at a time. To obtain a second-order MRF model, they assume that the probability of a 0/1 annotation over the entire network is proportional to exp(alphaN01+betaN11+N00), where alpha,beta are parameters for weighting the contributions of the different terms and Nij is the number of interacting pairs with assignment i,j (unordered). Combining the a priori probability of an assignment with N1 1 s, which depends on the frequency f of the function and is proportional to (f/(1 - f))N1, they obtain a homogeneous second-order MRF for which

Hence, the probability that protein v is assigned with the function given the annotations of its neighbors N(v) is

where N(v,i) is the number of neighbors of v that are assigned with iset symbol{0,1} and logit is the logistic function logit(x)=1/(1+e-x). Deng et al (2003) estimate the two parameters of the model using a quasi-likelihood method and apply Gibbs sampling to infer the unknown functional annotations.

BACK TO ARTICLE