Introduction

Completion of the Human Genome Project and advances in genomic technology has stimulated the emergence of new multidisciplinary research areas. One such area is human genome epidemiology,1 which uses population-based, epidemiological methods to assess the relationship of human genetic variation to health and disease. Human genome epidemiology (HuGE) focuses on the study of gene–disease associations, gene–gene and gene–environment interactions, and the evaluation of genetic tests.2 The Human Genome Epidemiology Network (HuGENet) (http://www.cdc.gov/genomics/hugenet/default.htm) is a global collaboration of individuals and organizations committed to developing and disseminating population-based human genome epidemiological information. HuGENet promotes quality reporting of genetic associations, as well as the systematic and quantitative synthesis of rapidly evolving information on gene-disease associations.

HuGENet maintains a knowledge base called HuGE Navigator,3 which contains published data on genetic associations and human genome epidemiology. Since 2001, the data have been extracted weekly from PubMed, curated, and deposited in the knowledge base.4 Recently, an automatic literature screening tool (GAPscreener) based on machine-learning techniques (Support Vector Machine) has been used for routine literature screening.5 This new method has significantly increased the sensitivity of the screening process to an estimated 97.5%. We also have developed and implemented a novel data-mining method6 to extract author profiles and geographical information from PubMed records.

We developed a Web-based application called HuGE Watch as one component of HuGE Navigator. HuGE Watch can be used to track the evolution of published studies in near-real time. HuGE Watch provides researchers, health care practitioners, and funding agencies a way to easily and quickly assess the current status of research in this field.

Implementation

The HuGE Navigator knowledge base is based on an open source infrastructure7 developed by HuGENet. The Web-based HuGE Watch application was built on J2EE technology (http://java.sun.com/javaee/) and on other Java open source frameworks such as Hibernate (http://www.hibernate.org/) and Strut (http://struts.apache.org/). We used JChart open source software (http://jcharts.krysalis.org/) to generate dynamic charts and Google MAP API (http://www.google.com/apis/maps/documentation/) to build geographic maps.

HuGE Navigator records are indexed using several automatic and manual processes. A utility retrieves and parses the author information (including institute and country) from the affiliation string in PubMed records6 and assigns it to all articles by that author. Relevant concept terms are indexed using Concept Unique Identifiers from the Unified Medical Language System (UMLS),8 based on MeSH indexing of PubMed records. Concepts positioned under the disease category in the MeSH tree structure (http://www.nlm.nih.gov/bsd/disted/mesh/tree.html) are used for indexing by disease. The database curator indexes the gene symbol using the Entrez GeneID (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=gene) and assigns a knowledge category (ie, genotype prevalence, gene–disease association, gene–gene interaction, gene–environment interaction, pharmacogenomics/toxicogenomics, and genetic testing) and study type (ie, observational study, meta-analysis, HuGE review, genome-wide association study, and clinical trial) to each record. Although the database is updated weekly, it lags PubMed by 1 week because of the curation process.

Features

Overall temporal trends and spatial distributions

HuGE Watch users can view publication patterns by year or by country. Patterns can be displayed for each of four parameters – number of relevant publications, number of relevant investigators, number of diseases, and number of genes. Patterns for each parameter can be specified further by category (genotype prevalence, gene–disease association, gene–gene interaction, gene–environment interaction, pharmacogenomics, genetic testing) or study type (observational study, genome-wide association, meta-analysis, HuGE review). Temporal trends are displayed as line charts, dynamically generated according to the weekly updated literature data in the database and the user's selection. Geographic distributions are presented by Google Map, with features that can be manipulated by the user (eg, zoom in or out) (Figure 1).

Figure 1
figure 1

One view in HuGE Watch.

Temporal trends and spatial distributions for specific genes or diseases, alone or in combination

Publication patterns for specific genes or diseases, alone or in combination, can be viewed by using the two integrated components (Genopedia and Phenopedia) in HuGE Navigator. When searching any specific gene or disease term in Genopedia or Phenopedia, the publication patterns for a given query can be found by clicking HuGE Watch icon on the summary page (Figure 2).

Figure 2
figure 2

Temporal trends and spatial distributions of literature in a gene–disease combination view.

Journal rankings by the number of publications

In the HuGE Published Literature section, the Journal option allows users to view journals ranked by the number of publications in human genome epidemiology. Journal ranks also can be viewed for different itemized categories and study types.

Most studied genes and diseases

In the Genes Studied and Diseases Studied sections, the publication option allows users to view genes and diseases ranked by numbers of publications. The same information can be viewed for different itemized categories and study types. Publication patterns by year and by country also can be viewed for each gene or each disease term.

Examples of scenarios for HuGE Watch users

  1. 1

    A quick citation analysis can provide an overview of human genome epidemiology research regionally or globally. For example, Adany and Pocsai9 published a review of genetic epidemiology literature in Europe; a more comprehensive, updated review can now be done via HuGE Watch with just a few clicks.

  2. 2

    The recent boom in genome-wide association research depends on collaboration to achieve the necessary large sample sizes. HuGE Watch combined with other components of HuGE Navigator could be used to identify potential collaborators in other countries.

  3. 3

    Investigators can use HuGE Watch to help decide on the target journal for their manuscripts to increase the possibility of acceptance. For example, the American Journal of Epidemiology (AJE) has published the most HuGE review articles (47 of 71). Users might have a better chance of publishing such papers in AJE.

Conclusion

As the study of genetic associations and human genome epidemiology continues to grow throughout the world, research collaboration and synthesis are becoming increasingly important.10 The HuGE Watch application is a valuable tool for monitoring and supporting these efforts. By documenting the rapid expansion of this field in near-real time, HuGE Watch provides a point of reference not only for investigators but also for funding agencies, publishers, and other stakeholders in genomics research.