Main

As interest grows in the still-young field of organelle proteomics, inventive in silico strategies are essential if researchers are to construct accurate hypotheses from mountains of raw data. Computational and experimental approaches have a symbiotic relationship, explains Vamsi Mootha of the Broad Institute of MIT and Harvard University: “They complement each other—you can't tease them apart. In order to support high-quality computational approaches, you need to begin with high-quality datasets.”

Mootha recently illustrated this relationship, describing a 'smarter' in silico approach for identifying mitochondrial proteins (Calvo et al., 2006). Earlier strategies have largely emphasized motif-based predictors, but the Mootha group's 'Maestro' program takes a more holistic approach, integrating eight different 'predictors', based on both structural and experimental data, to generate scores predicting the likelihood of mitochondrial localization. After training Maestro with a 'gold standard' set of known positive and negative controls, Mootha's team confirmed hundreds of known mitochondrial proteins and confidently identified nearly 500 that were previously unidentified. Notably, Maestro also proved capable of tentatively identifying genes associated with several human mitochondrial diseases, including at least one that had not been previously recognized as mitochondrial.

Søren Brunak, of the Technical University of Denmark, and his colleagues recently described an alternative computational tool for organelle proteomics and used in silico methods to predict protein complexes in the nucleolus (Hinsby et al., 2006). They began by constructing an interaction atlas for a collection of known human nucleolar proteins based on publicly available interaction data and then subjected each putative complex to component-by-component computational analysis based on dozens of protein features, to predict the likelihood of nucleolar localization. Using conservative parameters, Brunak's team confidently predicted 15 nucleolar complexes; several of them were expected, but many were rather surprising from a functional standpoint (for example, proteins involved in DNA repair). This work also revealed 11 new nucleolar proteins, which were confirmed by experimental data from Brunak's collaborator, Matthias Mann, in a process the two call 'reverse proteomics'.

Both groups benefited from smart use of existing data sets, and Mootha suggests that more data should mean more options for future computational efforts. “More generally,” he says, “if we get different types of really good functional genomics data sets, it might be possible to reconstruct all organelles in silico.” Both approaches, however, also illustrate the value of using conservative cutoffs to eliminate 'junk' data and to ensure confidence in one's analysis. “Mapping something often means to throw a lot of information away, and this is, I think, what we try to do with our work,” says Brunak. “We would rather not waste the precious time of the experimentalists!”