The Genetic Association Database

Becker, Kevin G; Barnes, Kathleen C; Bright, Tiffani J; Wang, S Alex

doi:10.1038/ng0504-431

Correspondence
Published: 01 May 2004

The Genetic Association Database

Kevin G Becker¹,
Kathleen C Barnes²,
Tiffani J Bright¹ &
…
S Alex Wang³

Nature Genetics volume 36, pages 431–432 (2004)Cite this article

12k Accesses
624 Citations
12 Altmetric
Metrics details

You have full access to this article via your institution.

Download PDF

To the editor:

The increasing availability of polymorphism data has allowed more gene association studies to be carried out and the number of published genetic association studies is growing rapidly. Studies done secondarily to successful linkage studies over the last decade have also fueled the increase in published association studies. Although there are single-nucleotide polymorphism and human variation databases^1,2, there is currently no public repository for genetic association data. It is difficult to query association data in a systematic manner or to integrate association data with other molecular databases. OMIM³, the main repository of genetic information for mendelian disorders, is largely text based and is of a historical narrative design, making it difficult to compare large sets of molecular data. Moreover, OMIM archives mature, high-quality data of high significance, the standard in rare mendelian disorders. Although this data is useful, OMIM does not routinely collect findings of lower significance or negative findings. The study of nonmendelian, common complex disorders is often a struggle to find disease relevance with lower significance values, and often conflicting evidence. Negative data are often not reported or are marginalized into obscure and less accessible scientific journals, resulting in a publication bias favoring positive genetic associations⁴. Here, we describe the development of a genetic association database (GAD; http://geneticassociationdb.nih.gov) that aims to collect, standardize and archive genetic association study data and to make it easily accessible to the scientific community.

There are no standards for designing, implementing, interpreting or reporting association studies (e.g., sample size, replication, significant P values), although guidelines have been suggested^4,5,6,7. The literature is filled with alternative, idiosyncratic and arbitrary gene names and gene symbols, as well as a continuum of phenotypic descriptions. Studies using arbitrary nomenclature continue to be published, making cross-comparison and meta-analysis difficult. One goal of GAD is to standardize molecular nomenclature in the archival process by including official HUGO gene symbols. After this assignment, each record is annotated with links to molecular databases (LocusLink, GeneCards, HapMap, etc.) and reference databases (PubMed, CDC), among others. Once they are standardized, integrating association data with other molecular databases, data mining tools, annotation and future sources of molecular data (e.g., gene interactions, quantitative trait loci) can be done systematically. Moreover, cross-comparison and meta-analysis of studies becomes more efficient.

There are three main components of GAD: a web interface, Perl modules and the database, which uses the Oracle RDBMS. The database has three layers; gene and disease data are organized into a large fact table in a middle layer with dimensional views on the top layer. The bottom layer contains the tools for adding, editing, batch loading and downloading data to and from the database.

We identify data fields common to genetic association studies, such as disease phenotypes, sample sizes, significance values, population information and allele descriptions. These fields are grouped into five views relevant to disease phenotypes (Disease View), gene-based molecular data (Gene View), chromosomal and mutation information (CH-SNP-Hap View), Reference View and All View. Table 1 shows a summary of the current contents in the database.

Table 1 Current contents of the GAD

Full size table

Query tools include key-word-search functions that permit field-specific searches, advanced combinatorial queries and pull-down selections of controlled vocabularies (Fig. 1). Batch searches are done against an aggregate table, allowing the user to input a list of genes (300) at once. In this way, batch results from high-throughput assays, such as microarrays, proteomic, cDNA sequencing and SAGE (serial analysis of gene expression), can be rapidly queried in the context of human disease associations.

**Figure 1: A simple search of positive associations for the disease schizophrenia.**

Of particular interest are phenotypic descriptions captured at multiple levels. A top level 'disease class' is assigned, followed by 'disease' from the original paper. If studies recognize clinical subphenotypes, endophenotypes or intermediate phenotypes, this is noted in 'narrow phenotype'. Moreover, certain alleles have defined molecular characteristics and are noted under 'molecular phenotype'. These molecular and pathway variants may have a closer relationship to a polymorphism than to the end-stage complex phenotype, such as altered transcription due to a promoter polymorphism (IL6) or serum levels of ACE. Using this hierarchical phenotypic assignment makes it easier to consider molecular phenotypes in the context of end-stage disease. In some cases, although independent end-stage diseases may not share overt similarities at a clinical level, the genetic factors that contribute to those diseases may be shared at a molecular level^8,9. The development of a hierarchy of phenotypes, from broad to specific, may allow classification of diseases, subphenotypes and molecular parameters of disease and their relationship to complex traits.

GAD is an archive of published genetic association studies that provides a comprehensive, public, web-based repository of molecular, clinical and study parameters for >5,000 human genetic association studies at this time. This approach will allow the systematic analysis of complex common human genetic disease in the context of modern high-throughput assay systems and current annotated molecular nomenclature.

References

Thorisson, G.A. & Stein, L.D. Nucleic Acids Res. 31, 124– 127 (2003).
Article CAS Google Scholar
Sherry, S.T. et al. Nucleic Acids Res. 29, 308– 311 (2001).
Article CAS Google Scholar
Hamosh, A. et al. Nucleic Acids Res. 30, 52– 55 (2002).
Article CAS Google Scholar
Coope, D.N., Nussbaum, R.L. & Krawczak, M. Hum. Genet. 110, 207– 208 (2002).
Article Google Scholar
Anonymous. Nat. Genet. 22, 1– 2 (1999).
Dahlman, I. et al. Nat. Genet. 30, 149– 150 (2002).
Article CAS Google Scholar
Funalot, B., Varenne, O. & Mas, J.L. Nat. Genet. 36, 3 (2004).
Article CAS Google Scholar
Mira, M.T. et al. Nature 427, 636– 640 (2004).
Article CAS Google Scholar
Becker, K.G. Med. Hypotheses 62, 309– 317 (2004)
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Gene Expression and Genomics Unit, 333 Cassell Drive, National Institute on Aging, National Institutes of Health, Baltimore, 21224, Maryland, USA
Kevin G Becker & Tiffani J Bright
Johns Hopkins Asthma and Allergy Center, Johns Hopkins University, 5501 Hopkins Bayview Circle, Baltimore, 21224, Maryland, USA
Kathleen C Barnes
Division of Computational Bioscience, Center for Information Technology, National Institutes of Health, Bethesda, 20892, Maryland, USA
S Alex Wang

Authors

Kevin G Becker
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen C Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Tiffani J Bright
View author publications
You can also search for this author in PubMed Google Scholar
S Alex Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin G Becker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Becker, K., Barnes, K., Bright, T. et al. The Genetic Association Database. Nat Genet 36, 431–432 (2004). https://doi.org/10.1038/ng0504-431

Download citation

Issue Date: 01 May 2004
DOI: https://doi.org/10.1038/ng0504-431

This article is cited by

Exploring gene-patient association to identify personalized cancer driver genes by linear neighborhood propagation
- Yiran Huang
- Fuhao Chen
- Cheng Zhong
BMC Bioinformatics (2024)
A novel microbial and hepatic biotransformation-integrated network pharmacology strategy explores the therapeutic mechanisms of bioactive herbal products in neurological diseases: the effects of Astragaloside IV on intracerebral hemorrhage as an example
- En Hu
- Zhilin Li
- Yang Wang
Chinese Medicine (2023)
BioEGRE: a linguistic topology enhanced method for biomedical relation extraction based on BioELECTRA and graph pointer neural network
- Xiangwen Zheng
- Xuanze Wang
- Dongsheng Zhao
BMC Bioinformatics (2023)
Seasonal effect—an overlooked factor in neuroimaging research
- Rui Zhang
- Ehsan Shokri-Kojori
- Nora D. Volkow
Translational Psychiatry (2023)
TRANSPARENT: a Python tool for designing transcription factor regulatory networks
- Carlo Derelitto
- Daniele Santoni
Soft Computing (2023)

The Genetic Association Database

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

Exploring gene-patient association to identify personalized cancer driver genes by linear neighborhood propagation

A novel microbial and hepatic biotransformation-integrated network pharmacology strategy explores the therapeutic mechanisms of bioactive herbal products in neurological diseases: the effects of Astragaloside IV on intracerebral hemorrhage as an example

BioEGRE: a linguistic topology enhanced method for biomedical relation extraction based on BioELECTRA and graph pointer neural network

Seasonal effect—an overlooked factor in neuroimaging research

TRANSPARENT: a Python tool for designing transcription factor regulatory networks

Search

Quick links

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Exploring gene-patient association to identify personalized cancer driver genes by linear neighborhood propagation

A novel microbial and hepatic biotransformation-integrated network pharmacology strategy explores the therapeutic mechanisms of bioactive herbal products in neurological diseases: the effects of Astragaloside IV on intracerebral hemorrhage as an example

BioEGRE: a linguistic topology enhanced method for biomedical relation extraction based on BioELECTRA and graph pointer neural network

Seasonal effect—an overlooked factor in neuroimaging research

TRANSPARENT: a Python tool for designing transcription factor regulatory networks

Search

Quick links