GWAS Integrator: a bioinformatics tool to explore human genetic associations reported in published genome-wide association studies

Yu, Wei; Yesupriya, Ajay; Wulf, Anja; Hindorff, Lucia A; Dowling, Nicole; Khoury, Muin J; Gwinn, Marta

doi:10.1038/ejhg.2011.91

Download PDF

Short Report
Published: 25 May 2011

GWAS Integrator: a bioinformatics tool to explore human genetic associations reported in published genome-wide association studies

Wei Yu¹,
Ajay Yesupriya¹,
Anja Wulf¹,
Lucia A Hindorff²,
Nicole Dowling¹,
Muin J Khoury¹ &
…
Marta Gwinn¹

European Journal of Human Genetics volume 19, pages 1095–1099 (2011)Cite this article

2932 Accesses
29 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Genome-wide association studies (GWAS) have successfully identified numerous genetic loci that are associated with phenotypic traits and diseases. GWAS Integrator is a bioinformatics tool that integrates information on these associations from the National Human Genome Research institute (NHGRI) Catalog, SNAP (SNP Annotation and Proxy Search), and the Human Genome Epidemiology (HuGE) Navigator literature database. This tool includes robust search and data mining functionalities that can be used to quickly identify relevant associations from GWAS, as well as proxy single-nucleotide polymorphisms (SNPs) and potential candidate genes. Query-based University of California Santa Cruz (UCSC) Genome Browser custom tracks are generated dynamically on the basis of users’ selected GWAS hits or candidate genes from HuGE Navigator literature database (http://www.hugenavigator.net/HuGENavigator/gWAHitStartPage.do). The GWAS Integrator may help enhance inference on potential genetic associations identified from GWAS studies.

Reproducibility in the UK biobank of genome-wide significant signals discovered in earlier genome-wide association studies

Article Open access 20 September 2021

Jack W. O’Sullivan & John P. A. Ioannidis

The GenomeAsia 100K Project enables genetic discoveries across Asia

Article Open access 04 December 2019

GenomeAsia100K Consortium

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Article Open access 10 February 2021

Daniel Taliun, Daniel N. Harris, … Gonçalo R. Abecasis

Introduction

The completion of the human genome and HapMap projects combined with advances in high throughput genotyping techniques have resulted in an explosion of genome-wide association studies (GWAS).¹ These studies interrogate hundreds of thousands to a few million genetic variants and have identified a large number of loci associated with phenotypic traits or disease outcomes. As a result of their early and continued success, the number of published GWAS has steadily increased each year, from just two in 2005 to 238 in 2010 (as of 8 December; data from the statistic page in GWAS Integrator). To help the research community find these publications and further explore the reported associations, the National Human Genome Research institute (NHGRI) has established, and maintains the NHGRI GWAS Catalog (http://www.genome.gov/26525384), an online, regularly updated database of single nucleotide polymorphism (SNP)-trait associations from GWAS.² We have developed the GWAS Integrator, a bioinformatics tool that offers a robust search capacity and a set of data mining functions by integrating information from the NHGRI GWAS Catalog, with data from other established bioinformatics resources including HapMap (http://hapmap.ncbi.nlm.nih.gov/), the Human Genome Epidemiology (HuGE) Navigator (http://www.hugenavigator.net/), SNP Annotation and Proxy Search (SNAP) (http://www.broadinstitute.org/mpg/snap/ldsearch.php) and University of California Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway).

Implementation

The GWAS Integrator was built on J2EE technology (http://java.sun.com/javaee/) and on other Java open-source frameworks, including Hibernate (http://www.hibernate.org/), Strut (http://struts.apache.org/), and JChart (http://jcharts.krysalis.org/). The database is populated and updated with SNP-trait associations from the NHGRI GWAS Catalog each week when new associations are available; details about the selection criteria for these associations are available on the NHGRI GWAS Catalog website. Chromosomal locations of the associated SNPs and relevant proxy SNPs are downloaded from SNAP and UCSC, added to the database as needed (NCBI Build 36/UCSC Version 18 (hg18)). Records from the NCBI Entrez Gene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=gene) are used as standards for gene information, including chromosomal location. As a component of the HuGE Navigator,³ the GWAS Integrator can take advantage of the established informatics infrastructure used in this integrated knowledge on the basis of human genome epidemiology. The HuGE literature database includes PubMed abstracts indexed with MeSH terminology (http://www.ncbi.nlm.nih.gov/sites/entrez?db=mesh); allowing use of the MeSH tree hierarchies and the Unified Medical Language System metathesaurus for mapping different synonyms of the phenotype/disease terms into a standard code enhances the search capacity of the GWAS Integrator. In addition, genes indexed in the HuGE literature database can be used to identify relevant candidate genes. The detailed schema for the HuGE Navigator database can be found in the paper by Yu et al.⁴

Features

Robust search capacity

Users can perform free text searches of data extracted from published GWAS. Searchable terms include the disease/trait, gene name/gene symbol/gene alias, rs number, first author name, journal, chromosome region, platform, PubMed ID, and any text in the publication title or abstract. Search results can be filtered by variant, gene, region, trait, publication, author, journal, and year, as well as by ‘hit’ (ie, the SNP-trait association identified in a GWAS). Results can be filtered multiple times. The filtering function also can be used to obtain a quick snapshot of GWAS published in a particular research field. For example, a user can easily get descriptive statistics for GWAS on breast cancer, including the number of variants that have been studied, the number of GWAS publications, etc (Figure 1a).

Data mining capacity

A series of data mining capacities can be used to further explore search results.

Variant->proxy function

This function provides information on SNP proxies related to the variants (SNP) of the selected GWAS hits. Users can define configuration parameters for proxy SNP retrieval, such as the HapMap release version, HapMap population, and r2 cutoff (Figure 1b).

Variant->UCSC function

This function dynamically creates an SNP custom track to display selected GWAS hits in the UCSC Genome Browser. Users can select the SNP to center the display in the UCSC Genome Browser using a dropdown menu, which lists all the rs numbers for the selected GWAS hits. The ‘Window Size’ field defines the display range around the centered SNP in the UCSC Genome Browser; for example, when 500 kb is specified in the Window Size field, the UCSC Genome Browser will display 250 kb on each side of the centered SNP. Users can also include proxy SNPs in the SNP custom track, or create a separate custom track for genes indexed in the HuGE literature database related to the query (Figure 1c).

Variant->GWAS function

This function uses proxy SNPs to identify additional GWAS hits that may be related to the user-selected GWAS hits. Users can define configuration parameters for proxy SNP retrieval, such as HapMap release version, HapMap population, and r2 cutoff (Figure 1d).

Variant->gene function

This function lists all genes that fall into the region around the selected GWAS Hits. Users can define the genomic distance around the hits. Genes that are also indexed in the HuGE literature database and reported with the query term are highlighted with a hyperlink to the corresponding Genopedia⁵ record in HuGE Navigator (Figure 1e).

Proxy reference search

Users can also search for variant-trait associations using proxy SNPs. For example, searching with ‘rs663129’ will lead to six proxy SNPs that have GWAS hits.

Real-time tracking

The statistics page presents an overview of published GWAS, including total numbers of publications, hits, reported genes, genic SNPs, intergenic SNPs, variants, and disease/traits. Temporal trends are displayed graphically for each item. A top 10 list is generated and displayed in web tables, including variant, gene, chromosome region, disease/trait, first author, and journal. As of 10 February 2011, the database contains 4817 GWAS hits, representing 475 disease/traits and 3920 variants from 796 publications.

Conclusion

GWAS of phenotypic traits and diseases have successfully identified a large number of genetic loci for further investigation by replication, meta-analysis, imputation of untyped loci, resequencing, identification of functional polymorphisms, and analysis of gene–gene and gene–environment interactions.⁶ By integrating relevant information from multiple data sources, the GWAS Integrator helps researchers to quickly identify GWAS of interest, examine the findings in the context of other genetic and epidemiologic research, perform on-line data mining, and make inferences that can inform future studies. Including the GWAS Integrator in the HuGE Navigator allows it to take advantage of the established informatics infrastructure of one of the most comprehensive repositories of published genetic associations – the HuGE literature database. The dynamic generation of a custom track is an efficient way to access all features offered by the UCSC genome browser. Ongoing collaboration between CDC and NHGRI in collecting and synchronizing GWAS data will guarantee the most updated GWAS data sources. As a new application in HuGE Navigator, GWAS Integrator was built to interconnect to other applications already in the system, such as Genopedia, Gene Prospector, etc, so that navigation to other information is provided. Although GWAS Integrator database content is currently limited to the NHGRI GWAS Catalog, we plan to implement a feature that allows users to import their own GWAS data for data mining.

References

Hardy J, Singleton A : Genomewide association studies and human disease. N Engl J Med 2009; 360: 1759–1768.
Article CAS Google Scholar
Hindorff LA, Sethupathy P, Junkins HA et al: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 2009; 106: 9362–9367.
Article CAS Google Scholar
Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ : A navigator for human genome epidemiology. Nat Genet 2008; 40: 124–125.
Article CAS Google Scholar
Yu W, Yesupriya A, Wulf A, Qu J, Khoury MJ, Gwinn M : An open source infrastructure for managing knowledge and finding potential collaborators in a domain-specific subset of PubMed, with an example from human genome epidemiology. BMC Bioinformatics 2007; 8: 436.
Article Google Scholar
Yu W, Clyne M, Khoury MJ, Gwinn M : Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics 2010; 26: 145–146.
Article CAS Google Scholar
Panoutsopoulou K, Zeggini E : Finding common susceptibility variants for complex disease: past, present and future. Brief Funct Genomic Proteomic 2009; 8: 345–352.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Office of Public Health Genomics, Centers for Disease Control and Prevention, Atlanta, GA, USA
Wei Yu, Ajay Yesupriya, Anja Wulf, Nicole Dowling, Muin J Khoury & Marta Gwinn
Office of Population Genomics, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Lucia A Hindorff

Authors

Wei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Yesupriya
View author publications
You can also search for this author in PubMed Google Scholar
Anja Wulf
View author publications
You can also search for this author in PubMed Google Scholar
Lucia A Hindorff
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Dowling
View author publications
You can also search for this author in PubMed Google Scholar
Muin J Khoury
View author publications
You can also search for this author in PubMed Google Scholar
Marta Gwinn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Yu.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, W., Yesupriya, A., Wulf, A. et al. GWAS Integrator: a bioinformatics tool to explore human genetic associations reported in published genome-wide association studies. Eur J Hum Genet 19, 1095–1099 (2011). https://doi.org/10.1038/ejhg.2011.91

Download citation

Received: 21 February 2011
Revised: 05 April 2011
Accepted: 21 April 2011
Published: 25 May 2011
Issue Date: October 2011
DOI: https://doi.org/10.1038/ejhg.2011.91

Keywords

This article is cited by

Gene annotation bias impedes biomedical research
- Winston A. Haynes
- Aurelie Tomczak
- Purvesh Khatri
Scientific Reports (2018)
Association of Genome-Wide Association Study (GWAS) Identified SNPs and Risk of Breast Cancer in an Indian Population
- Rajini Nagrani
- Sharayu Mhatre
- Rajesh Dikshit
Scientific Reports (2017)
The Qatar genome: a population-specific tool for precision medicine in the Middle East
- Khalid A Fakhro
- Michelle R Staudt
- Juan L Rodriguez-Flores
Human Genome Variation (2016)
Replication of genome-wide association signals in Asian Indians with early-onset type 2 diabetes
- Manickam Chidambaram
- Samuel Liju
- Venkatesan Radha
Acta Diabetologica (2016)
Integrative analysis of the transcriptome profiles observed in type 1, type 2 and gestational diabetes mellitus reveals the role of inflammation
- Adriane F Evangelista
- Cristhianna VA Collares
- Eduardo A Donadi
BMC Medical Genomics (2014)