Abstract
We have carried out automated extraction of explicit and implicit biomedical knowledge from publicly available gene and text databases to create a gene-to-gene co-citation network for 13,712 named human genes by automated analysis of titles and abstracts in over 10 million MEDLINE records. The associations between genes have been annotated by linking genes to terms from the medical subject heading (MeSH) index and terms from the gene ontology (GO) database. The extracted database and accompanying web tools for gene-expression analysis have collectively been named 'PubGene'. We validated the extracted networks by three large-scale experiments showing that co-occurrence reflects biologically meaningful relationships, thus providing an approach to extract and structure known biology. We validated the applicability of the tools by analyzing two publicly available microarray data sets.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998); errata: 283, 35 (1999); 283, 2103 (1999); 285, 1493 (1999).
Goffeau, A. et al. Life with 6000 genes. Science 274, 546, 563–567 (1996).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).
Schena, M., Shalon, D., Davis, R.W. & Brown, P.O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).
Andrade, M.A. & Bork, P. Automated extraction of information in molecular biology. FEBS Lett. 476, 12–17 (2000).
Blaschke, C., Andrade, M.A., Ouzounis, C. & Valencia, A. Automatic extraction of biological information from scientific text: protein-protein interactions. in Intelligent Systems for Molecular Biology 60–67 (AAAI Press, Heidelberg, 1999).
Thomas, J., Milward, D., Ouzounis, C., Pulman, S. & Carroll, M. Automatic extraction of protein interactions from scientific abstracts. Pac. Symp. Biocomput. 5, 541–552 (2000).
Humphreys, K., Demetriou, G. & Gaizauskas, R. Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Pac. Symp. Biocomput. 5, 505–516 (2000).
Sekimizu, T., Park, H.S. & Tsujii, J. Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. in Genome Informatics Workshop 62–71 (Universal Academy Press, Tokyo, 1998).
Craven, M. & Kumlien, J. Constructing biological knowledge bases by extracting information from text sources. in Intelligent Systems for Molecular Biology 77–86 (AAAI Press, Heidelberg, 1999).
Rindflesch, T.C., Rayan, J.V. & Hunter, L. Extracting molecular binding relationships from biomedical text. in Applied Natural Language Processing and the North American Chapter of the Association for Computational Linguistics 188–195 (Association for Computational Linguistics, Seattle, 2000).
Rindflesch, T.C., Tanabe, L., Weinstein, J.N. & Hunter, L. EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac. Symp. Biocomput. 5, 517–528 (2000).
Proux, D., Rechenmann, F., Julliard, L., Pillet, V. & Jacq, B. Detecting gene symbols and names in biological texts: A first step toward pertinent information extraction. in Genome Informatics Workshop 72–80 (Universal Academy Press, Tokyo, 1998).
Fukuda, K., Tsunoda, T., Tamura, A. & Takagi, T. Toward information extraction: identifying protein names from biological papers. Pac. Symp. Biocomput. 3, 705–716 (1998).
Andrade, M.A. & Valencia, A. Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system. in Intelligent Systems for Molecular Biology 25–32 (AAAI Press, Halkidiki, 1997).
Jenssen, T.-K., Komorowski, J., Lægreid, A. & Hovig, E. Pubgen: Discovering and visualising gene-gene relations. in Currents in Computational Molecular Biology (eds. Miyano, S., Shamir, R. & Takagi, T.) 48–49 (Universal Academy Press, Tokyo, 2000).
Stapley, B.J. & Benoit, G. Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac. Symp. Biocomput. 5, 529–540 (2000).
Shatkay, H., Edwards, S., Wilbur, W. & Boguski, M. Genes, themes and microarrays: using information retrieval for large-scale gene analysis. in Intelligent Systems for Molecular Biology 317–328 (AAAI Press, San Diego, 2000).
Alizadeh, A.A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000).
Iyer, V.R. et al. The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999).
Gu, Y., Shen, Y., Gibbs, R.A. & Nelson, D.L. Identification of FMR2, a novel gene associated with the FRAXE CCG repeat and CpG island. Nature Genet. 13, 109–113 (1996).
Jager, U. et al. Follicular lymphomas' BCL-2/IgH junctions contain templated nucleotide insertions: novel insights into the mechanism of t(14;18) translocation. Blood 95, 3520–3529 (2000).
Stamatopoulos, K. et al. Molecular insights into the immunopathogenesis of follicular lymphoma. Immunol. Today 21, 298–305 (2000).
Lossos, I.S. et al. Ongoing immunoglobulin somatic mutation in germinal center B cell-like but not in activated B cell-like diffuse large cell lymphomas. Proc. Natl. Acad. Sci. USA 97, 10209–10213 (2000).
Bravo, R. Growth factor-responsive genes in fibroblasts. Cell Growth Differ. 1, 305–309 (1990).
Parsons-Wingerter, P., Elliott, K.E., Clark, J.I. & Farr, A.G. Fibroblast growth factor-2 selectively stimulates angiogenesis of small vessels in arterial tree. Arterioscler. Thromb. Vasc. Biol. 20, 1250–1256 (2000).
Aronson, A.R., Rindflesch, T.C. & Browne, A.C. Exploiting a large thesaurus for information retrieval in Proceedings of the 6th Applied Natural Language Processing Conference (Rockefellar University Press, New York, 1994).
White, J.A. et al. Guidelines for human gene nomenclature. Genomics 45, 468–471 (1997).
Cormen, T.H., Leiserson, C.E. & Rivest, R.L. Introduction to Algorithms 1028 (MIT Press, Cambridge, Massachusetts, 1990).
Acknowledgements
We thank the National Library of Medicine for access to MEDLINE; D. Tjeldvoll for help with installation and use of the Graphviz software, and contributions to programming on early versions of the web-interface; H.-C. Aasheim and Ø. Fodstad for discussions; and S. Bade, W.P. Kuo, S. Vinterbo and D. Warren for comments on the manuscript. This work was supported in part by grants from the Norwegian Cancer Society. T.K.J. was supported by grant 134422/410 from the Norwegian Research Council.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jenssen, TK., Lægreid, A., Komorowski, J. et al. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28, 21–28 (2001). https://doi.org/10.1038/ng0501-21
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/ng0501-21
This article is cited by
-
Systematic Profiling of Mitogen-Inducible Gene 6 and Its Derived Peptides Binding to Receptor Tyrosine Kinases in Bone Cancers at Molecular and Cellular Levels
International Journal of Peptide Research and Therapeutics (2024)
-
High-throughput toxicogenomic screening of chemicals in the environment using metabolically competent hepatic cell cultures
npj Systems Biology and Applications (2021)
-
Automatic extraction of protein-protein interactions using grammatical relationship graph
BMC Medical Informatics and Decision Making (2018)
-
Muscle-specific regulation of right ventricular transcriptional responses to chronic hypoxia-induced hypertrophy by the muscle ring finger-1 (MuRF1) ubiquitin ligase in mice
BMC Medical Genetics (2018)
-
Millions of online book co-purchases reveal partisan differences in the consumption of science
Nature Human Behaviour (2017)