The availability of whole genome
sequences for many species has made it possible to structurally
characterize many of the proteins that make up the proteome,
including those with unknown function. However, it has been
difficult so far to learn more about the function of a protein
from sequence and structure alone without any experimental characterization.
To help elucidate the functions of unknown proteins, scientists
at the Howard Hughes Medical Institute have developed a new
informatics tool called ProKnow that enables one to infer function
for unknown proteins based on existing knowledge of the structural
features of proteins with known functions.
Currently used methods for assigning function are often based on the assumption that similar sequences have descended from a common ancestor and therefore share similar function. However, several reports suggest that this approach is not particularly accurate. A more accurate annotation of function can be obtained by using protein folds, sequence motifs, domains and sequence orthology, or by the use of algorithms based on the identification of functionally significant residues. But for all these methods, some existing knowledge of sequence or structural similarity is essential.
To add further challenges to protein characterization, new insights into the complexities of protein function have recently been described. The existence of moonlighting proteins that behave differently depending on cellular context has led to attempts to study proteins in their native environments, and even proteins that have the same fold and active site architecture have been shown to have completely different functions.
Learning systems based on statistical theory, such as support vector machines, have recently been developed using information about the properties of amino-acid residues, such as hydrophobicity and polarity, and the use of neural networks that have been trained on protein features is a promising tool; however, both these technologies are limited by their extent of coverage and accuracy.
When a protein is submitted to ProKnow, it extracts all the
structural features such as three-dimensional fold, motifs,
sequence and functional linkages (such as those from Database
of Interacting Proteins, DIP, http://dip.doe-mbi.ucla.edu)
from the uncharacterized protein and maps these to the same
structural features within the ProKnow knowledgebase, which
it then links to functions that are described (annotated)
using the controlled vocabulary of Gene Ontology. The functions
that seem to link to most of the structural features are statistically
weighted to give a final list of putative functions and their
statistical scoring. Using ProKnow, the authors were able
to correctly assign 70% of proteins overall, with 93% coverage
of the function annotations for 1,507 distinct folded proteins.
The authors plan to regularly update the knowledgebase and
include additional algorithms in their future release to improve
prediction accuracy.
ORIGINAL RESEARCH PAPER
| 1. |
Pal, D.& Eisenberg, D. Inference of protein function
from protein structure Structure 13, 121–130
( 2005). | Article
| | PubMed
|
|
 |
|