Abstract
A substantial investment has been made in the generation of large public resources designed to enable the identification of tag SNP sets, but data establishing the adequacy of the sample sizes used are limited. Using large-scale empirical and simulated data sets, we found that the sample sizes used in the HapMap project are sufficient to capture common variation, but that performance declines substantially for variants with minor allele frequencies of <5%.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Johnson, G.C. et al. Nat. Genet. 29, 233–237 (2001).
Ke, X. et al. Hum. Mol. Genet. 13, 2557–2565 (2004).
Ahmadi, K.R. et al. Nat. Genet. 37, 84–89 (2005).
Lin, S., Chakravarti, A. & Cutler, D.J. Nat. Genet. 36, 1181–1188 (2004).
Kamatani, N. et al. Am. J. Hum. Genet. 75, 190–203 (2004).
Halldorsson, B.V., Istrail, S. & De La Vega, F.M. Hum. Hered. 58, 190–202 (2004).
McCarthy, M.I. Curr. Diab. Rep. 3, 159–167 (2003).
Carlson, C.S. et al. Am. J. Hum. Genet. 74, 106–120 (2004).
Hudson, R.R. Bioinformatics 18, 337–338 (2002).
Nordborg, M. in Handbook of Statistical Genetics 179–212 (Wiley, Chichester, UK, 2001).
Mueller, J.C. et al. Am. J. Hum. Genet. 76, 387–398 (2005).
Carlson, C.S., Eberle, M.A., Kruglyak, L. & Nickerson, D.A. Nature 429, 446–452 (2004).
Acknowledgements
This work was funded by the National Institute of Diabetes and Digestive and Kidney Diseases. Collection of the UK case samples was funded by Diabetes UK. This work was carried out on behalf of the International Type 2 Diabetes 1q Consortium.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
MAF distribution of SNPs in the simulated and empirical data (moderate LD and full 13 Mb region). (PDF 8 kb)
Supplementary Fig. 2
Characteristics of the 3 regions of variable LD studied in the empirical data. (PDF 24 kb)
Supplementary Fig. 3
Training set sample size and tSNP selection method on the capture of common variation in the empirical data. (PDF 8 kb)
Supplementary Fig. 4
Training set sample size and tSNP selection method on the capture of unmeasured common variation in the simulated data. (PDF 9 kb)
Supplementary Fig. 5
Training set sample size and tSNP selection method on the capture of rare variation in the empirical data. (PDF 11 kb)
Supplementary Fig. 6
Tagging SNP performance at different r2 thresholds. (PDF 10 kb)
Supplementary Fig. 7
Correlation between training and test set pairwise r2 values. (PDF 11 kb)
Rights and permissions
About this article
Cite this article
Zeggini, E., Rayner, W., Morris, A. et al. An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nat Genet 37, 1320–1322 (2005). https://doi.org/10.1038/ng1670
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1670
This article is cited by
-
Regional heritability mapping method helps explain missing heritability of blood lipid traits in isolated populations
Heredity (2016)
-
Prostate cancer in young men: an important clinical entity
Nature Reviews Urology (2014)
-
The dynorphin/κ-opioid receptor system and its role in psychiatric disorders
Cellular and Molecular Life Sciences (2012)
-
A genetic instrument for Mendelian randomization of fibrinogen
European Journal of Epidemiology (2012)
-
Rare variation at the TNFAIP3 locus and susceptibility to rheumatoid arthritis
Human Genetics (2010)