The Negatome is a database of non-interacting protein pairs that can be used for training protein-protein interaction prediction algorithms.
Who cares about negative results? It's fairly safe to say that most researchers would not try to publish a paper that focused on what they did not find, and that even if they did try, they would be hard-pressed to find a journal that would agree to publish it. However, that is not to say that negative results do not have scientific value—in fact, they can be quite useful.
In the field of bioinformatics, for example, both a positive and a negative dataset are required for training machine learning algorithms, such as protein-protein interaction prediction tools. Dmitri Frishman, along with co–first authors Pawel Smialowski and Philipp Pagel and their colleagues at the Technical University of Munich and the German Research Center for Environmental Health, recently introduced a well-curated database of protein pairs that are unlikely to interact, aptly called the “Negatome.”
Protein interactions are responsible for carrying out almost all biological functions; the entire network of interactions is known as the interactome. We are still very far away from mapping the entire interactome of any cellular organism, so good prediction tools to generate hypotheses are needed. But although well-curated positive datasets of protein-protein interactions exist for several organisms, defining with certainty which proteins do not interact is actually extremely difficult. In addition to the slim literature evidence, “there is no technique that can conduct a large-scale measurement of non-interacting pairs,” explains Frishman.
In the past, others have used information about cell localization to construct a negative training dataset, based on the hypothesis that proteins found in different cellular compartments are unlikely to interact. But this is not ideal, explains Frishman: “If you use protein localization to train your predictor, you end up with a predictor of co-localization and not necessarily of interaction between proteins.”
When they compared their non-interacting protein pairs to the STRING database, a vast resource containing both experimental and predicted protein-protein interactions based on physical, genetic and functional evidence, the researchers observed that less than 10% of their negative pairs were functionally associated by STRING. However, because STRING is not just limited to physical interactions, functional associations are likely to yield false positive hits. The Negatome also certainly includes false “negative” information; it is surely possible that some of the negative interactions can indeed occur under some biological context.
In addition to training protein-protein interaction prediction algorithms, the Negatome could also be used to judge the quality of high-throughput interactome screens such as two-hybrid methods, which have been criticized for being subject to a high false positive rate. “If you think about these famous 'hairballs,' these huge networks of interactions, use of the Negatome would be a way to erase some of the edges, if a particular edge is stated as being false,” notes Frishman.
The Negatome currently contains data mostly for mammalian proteins, but Frishman and his colleagues have longer-term plans to continue adding new literature evidence and structure-based data from the PDB, which will continually improve the resource. Perhaps the larger scientific community will see the value of the Negatome and thus be encouraged to make negative results, in many different fields, more widely available.