Abstract
High-density single-nucleotide polymorphism (SNP) arrays have revolutionized the ability of genome-wide association studies to detect genomic regions harboring sequence variants that affect complex traits. Extensive numbers of validated SNPs with known allele frequencies are essential to construct genotyping assays with broad utility. We describe an economical, efficient, single-step method for SNP discovery, validation and characterization that uses deep sequencing of reduced representation libraries (RRLs) from specified target populations. Using nearly 50 million sequences generated on an Illumina Genome Analyzer from DNA of 66 cattle representing three populations, we identified 62,042 putative SNPs and predicted their allele frequencies. Genotype data for these 66 individuals validated 92% of 23,357 selected genome-wide SNPs, with a genotypic and sequence allele frequency correlation of r = 0.67. This approach for simultaneous de novo discovery of high-quality SNPs and population characterization of allele frequencies may be applied to any species with at least a partially sequenced genome.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Genetic architecture and key regulatory genes of fatty acid composition in Gushi chicken breast muscle determined by GWAS and WGCNA
BMC Genomics Open Access 03 August 2023
-
Restriction site-associated DNA sequencing technologies as an alternative to low-density SNP chips for genomic selection: a simulation study in layer chickens
BMC Genomics Open Access 19 May 2023
-
Genome-wide association study of 17 serum biochemical indicators in a chicken F2 resource population
BMC Genomics Open Access 02 March 2023
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout

References
Klein, R.J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).
Libioulle, C. et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 3, e58 (2007).
Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).
Zanke, B.W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat. Genet. 39, 989–994 (2007).
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
The International HapMap Consortium. The international HapMap project. Nature 426, 789–796 (2003).
Nickerson, D.A., Tobe, V.O. & Taylor, S.L. PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25, 2745–2751 (1997).
International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716 (2004).
Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005).
O'Brien, S.J. et al. The promise of comparative genomics in mammals. Science 286, 458–481 (1999).
Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).
Albert, T.J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007).
Barbazuk, W.B., Emrich, S.J., Chen, H.D., Li, L. & Schnable, P.S. SNP discovery via 454 transcriptome sequencing. Plant J. 51, 910–918 (2007).
Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
McKay, S.D. et al. Construction of bovine whole-genome radiation hybrid and linkage maps using high-throughput genotyping. Anim. Genet. 38, 120–125 (2007).
Acknowledgements
J.F.T. and R.D.S. were supported by National Research Initiative grants 2005-35205-15448, 2005-35604-15615, 2006-35205-16701 and 2006-35616-16697 from the US Department of Agriculture Cooperative State Research, Education and Extension Service. C.P.V.T., T.S.S., and L.K.M. were supported by National Research Initiative grant 2006-35205-16888 from the US Department of Agriculture Cooperative State Research, Education, and Extension Service and by Projects 1265-31000-081D and 1265-31000-090-00D from the United States Department of Agriculture Agricultural Research Service. T.P.L.S. was supported by Project 5438-31000-073D from the US Department of Agriculture Agricultural Research Service. L.K.M. was also supported by National Research Initiative grant 2006-35205-17878 from the US Department of Agriculture Cooperative State Research, Education and Extension Service. We gratefully acknowledge the early prepublication access under the Fort Lauderdale conventions to the draft bovine genome sequence provided by the Baylor College of Medicine Human Genome Sequencing Center and the Bovine Genome Sequencing Project Consortium.
Author information
Authors and Affiliations
Contributions
C.P.V.T. and L.K.M. developed and implemented the SNP discovery algorithm; J.F.T. and C.P.V.T. performed SNP discovery modeling; C.P.V.T., T.S.S. and L.K.M. performed in silico genome analysis; W.C.W. suggested the reduced representation strategy; T.P.L.S. constructed the RRLs; J.F.T., R.D.S., T.P.L.S. and T.S.S. identified cows for DNA pools; T.S.S. managed the DNA collection; C.T.L. genotyped the discovery animals and managed the assay synthesis; C.D.H. sequenced the RRLs; L.K.M., T.S.S., R.D.S., S.S.M. and T.P.L.S. conducted pilot validations; and C.P.V.T., J.F.T., T.S.S. and T.P.L.S. coordinated manuscript writing and editing.
Corresponding author
Ethics declarations
Competing interests
C.T.L. and C.D.H. are employees of Illumina, Inc.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3, Supplementary Table 1, Supplementary Methods (PDF 1163 kb)
Rights and permissions
About this article
Cite this article
Van Tassell, C., Smith, T., Matukumalli, L. et al. SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods 5, 247–252 (2008). https://doi.org/10.1038/nmeth.1185
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.1185
This article is cited by
-
Genetic architecture and key regulatory genes of fatty acid composition in Gushi chicken breast muscle determined by GWAS and WGCNA
BMC Genomics (2023)
-
Genome-wide association study of 17 serum biochemical indicators in a chicken F2 resource population
BMC Genomics (2023)
-
Restriction site-associated DNA sequencing technologies as an alternative to low-density SNP chips for genomic selection: a simulation study in layer chickens
BMC Genomics (2023)
-
Genome-wide identification and annotation of SNPs for economically important traits in Frieswal™, newly evolved crossbred cattle of India
3 Biotech (2023)
-
Genome-wide association study identifies SNPs for growth performance and serum indicators in Valgus-varus deformity broilers (Gallus gallus) using ddGBS sequencing
BMC Genomics (2022)