To the editor:
A network of genes and proteins extends through the scientific literature, touching on phenotypes, pathologies and gene function. We report the development of an information system that provides this network as a natural way of accessing the more than ten million abstracts in PubMed. By using genes and proteins as hyperlinks between sentences and abstracts, we convert the information in PubMed into one navigable resource and bring all the advantages of the internet to scientific literature investigation. Moreover, this literature network can be superimposed on experimental interaction data (e.g., yeast-two hybrid data from Drosophila melanogaster1 and Caenorhabditis elegans2) to make possible a simultaneous analysis of new and existing knowledge. The network, called Information Hyperlinked over Proteins (iHOP), contains half a million sentences and 30,000 different genes3 from humans, mice, D. melanogaster, C. elegans, zebrafish, Arabidopsis thaliana, yeast and Escherichia coli. Fig. 1
Whereas conventional keyword and related article searches4 result in long and not always informative lists of abstracts, navigation along the gene network allows for a stepwise and controlled exploration of the information space. Each step through the network produces information about one single gene and its interactions. Exploration of this gene-guided information network is intuitive and follows the associative organization of human memory5, in which information is retrieved by connecting similar concepts6. The precision of gene name and synonym identification in iHOP ranges between 87% and 99% depending on the organism. Because researchers can move in iHOP between sentences taken directly from source abstracts, however, they always retain control over the reliability and relevance of information. This is an advantage over systems that translate the protein network from the literature into graphical representations7, because these representations could give a misleading sense of confidence to the users and cloud the relevance of individual associations.
The iHOP system shows that distant medical and biological concepts can be related by surprisingly few intermediate genes; the shortest path between any two genes involves, on average, only four steps. We believe that this highly connected network will make human literature research more intuitive and efficient and also create a theoretical basis for the development of new automatic retrieval algorithms.
URL. The iHOP server is publicly accessible at http://www.pdg.cnb.uam.es/UniPub/iHOP/. Detailed descriptions of the text-mining methods and the technical architecture of iHOP will be published elsewhere.
References
Giot, L. et al. Science 302, 1727–1736 (2003).
Li, S. et al. Science 303, 540–543 (2004).
Hoffmann, R. & Valencia, A. Trends Genet. 19, 79–81 (2003).
Kim, W., Aronson, A.R. & Wilbur, W.J. Proc AMIA Symp. 319–323 (2001).
Koch, C. & Laurent, G. Science 284, 96–98 (1999).
Motter, A.E., de Moura, A.P., Lai, Y.C. & Dasgupta, P. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 65, 065102 (2002).
Jenssen, T.K., Laegreid, A., Komorowski, J. & Hovig, E. Nat. Genet. 28, 21–28 (2001).
Blaschke, C. & Valencia, A. Genome Inform. Ser. Workshop Genome Inform. 12, 123–134 (2001).
Acknowledgements
We thank the US National Library of Medicine for making MEDLINE publicly available and M. Tress and R. Allende for discussion. This work was supported in part by the ORIEL and TEMBLOR EC projects.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Hoffmann, R., Valencia, A. A gene network for navigating the literature. Nat Genet 36, 664 (2004). https://doi.org/10.1038/ng0704-664
Issue Date:
DOI: https://doi.org/10.1038/ng0704-664
This article is cited by
-
TOR1B: a predictor of bone metastasis in breast cancer patients
Scientific Reports (2023)
-
Sparse principal component analysis based on genome network for correcting cell type heterogeneity in epigenome-wide association studies
Medical & Biological Engineering & Computing (2022)