Abstract
Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes1,2,3,4,5,6,7,8,9,10,11. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations—comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing—verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Li, W. & Sadler, L.A. Low nucleotide diversity in man . Genetics 129, 513–523 (1991).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Lai, E., Riley, J., Purvis, I. & Roses, A. A 4-Mb high-density single nucleotide polymorphism-based map around human APOE. Genomics 54, 31–38 ( 1998).
Nickerson, D.A. et al. DNA sequence diversity in a 9.7 kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233– 240 (1998).
Pennisi, E. A closer look at SNPs suggests difficulties. Science 281, 1787–1789 (1998).
Taillon-Miller, P., Gu, Z., Li, Q., Hillier, L. & Kwok, P.Y. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. 8, 748– 754 (1998).
Wang, D.G. et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998).
Brookes, A.J. The essence of SNPs. Gene 234, 177– 186 (1999).
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).
Halushka, M.K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–246 (1999).
Masood, E. Consortium plans free SNP map of human genome. Nature 398, 545–546 (1999).
Schuler, G. Pieces of the puzzle: expressed sequence tags and the catalog of human genes . J. Mol. Med. 75, 694– 698 (1997).
Buetow, K.H., Edmonson, M.N. & Cassidy, A.B. Reliable identification of large numbers of candidate SNPs from public EST data. Nature Genet. 21, 323–325 (1999).
Picoult-Newberg, L. et al. Mining SNPs from EST databases. Genome Res. 9, 167–174 (1999).
Marth, G.T. et al. A general approach to single nucleotide polymorphism discovery . Nature Genet. 23, 452– 456 (1999).
Jackson, A.L. & Loeb, L.A. The mutation rate and cancer. Genetics 148, 1483–1490 (1998).
Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E.J. Gene discovery in dbEST. Science 265, 1993–1994 (1994).
Ingram, V.M. Abnormal human haemoglobin. III. The chemical difference between normal and sickle cell haemoglobins. Biochim. Biophys. Acta 36 , 402–411 (1959).
Baur, E.W. & Motulsky, A.G. Hemoglobin tacoma—a β-chain variant associated with increased hb A2. Humangenetik 1, 621–634 (1965).
Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 ( 1998).
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Maeda, M. et al. A simple and rapid method for HLA-DQA1 genotyping by digestion of PCR-amplified DNA with allele specific restriction endonucleases. Tissue Antigens 34, 290–298 (1989).
Acknowledgements
We thank B. Modrek for assistance in mapping polymorphisms; P. Green for the PHRED, PHRAP and CONSED programs; K. Buetow for the CGAP SNP data; S. McGinnis for information about Unigene and E. Partsch for the sequence of pCMVSPORT. K.I. was supported by USPHS National Research Service Award GM08375. V.K. was supported by USPHS National Research Service Award GM07104. S.N. was supported by the Gwynn Hazen Cherry Memorial Laboratory. W.W. was supported by National Science Foundation grants NSF-DMS-9703918 and NSF-DBI-9904701. C.J.L. was supported by Department of Energy grant DEFG0387ER60615 and a grant from the Searle Scholars Program. Experimental SNP verification costs were supported in part by UC-Biostar grant S97106 to S.N.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Rights and permissions
About this article
Cite this article
Irizarry, K., Kustanovich, V., Li, C. et al. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nat Genet 26, 233–236 (2000). https://doi.org/10.1038/79981
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/79981
This article is cited by
-
A Thermophilic G-Quadruplex DNA/N-methylmesoporphyrin IX Sensor for Accurately Detecting Single Nucleotide Polymorphism
Journal of Analysis and Testing (2022)
-
A simple and ultrasensitive fluorescence assay for single-nucleotide polymorphism
Analytical and Bioanalytical Chemistry (2018)
-
A universal probe design for colorimetric detection of single-nucleotide variation with visible readout and high specificity
Scientific Reports (2016)
-
The Effect of Ancestry and Genetic Variation on Lung Function Predictions: What Is “Normal” Lung Function in Diverse Human Populations?
Current Allergy and Asthma Reports (2015)
-
Mining of gene-based SNPs from publicly available ESTs and their conversion to cost-effective genotyping assay in sorghum [Sorghum bicolor (L.) Moench]
Journal of Crop Science and Biotechnology (2014)