Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences

Irizarry, Kris; Kustanovich, Vlad; Li, Cheng; Brown, Nik; Nelson, Stanley; Wong, Wing; Lee, Christopher J.

doi:10.1038/79981

Letter
Published: October 2000

Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences

Kris Irizarry¹,
Vlad Kustanovich²,
Cheng Li³,
Nik Brown⁵,
Stanley Nelson^2,4,
Wing Wong³ &
…
Christopher J. Lee¹

Nature Genetics volume 26, pages 233–236 (2000)Cite this article

558 Accesses
117 Citations
1 Altmetric
Metrics details

A Correction to this article was published on 01 November 2000

Abstract

Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes^{1,2,3,4,5,6,7,8,9,10,11}. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations—comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing—verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Prediction and subsequent validation of a SNP in megakaryocyte potentiating factor.**

**Figure 2: Comparison of candidate SNPs with catalogued *HLA-A* polymorphisms.**

**Figure 3: A map of a 1.4-Mb region of genomic contig NT_001454 (22q13.1) containing 42 genes, 47 SNPs and 2 microsatellites.**

Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation

Article Open access 27 October 2023

The mutational constraint spectrum quantified from variation in 141,456 humans

Article Open access 27 May 2020

Searching thousands of genomes to classify somatic and novel structural variants using STIX

Article Open access 08 April 2022

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

Li, W. & Sadler, L.A. Low nucleotide diversity in man . Genetics 129, 513–523 (1991).
CAS PubMed PubMed Central Google Scholar
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Article CAS PubMed Google Scholar
Lai, E., Riley, J., Purvis, I. & Roses, A. A 4-Mb high-density single nucleotide polymorphism-based map around human APOE. Genomics 54, 31–38 ( 1998).
Article CAS PubMed Google Scholar
Nickerson, D.A. et al. DNA sequence diversity in a 9.7 kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233– 240 (1998).
Article CAS PubMed Google Scholar
Pennisi, E. A closer look at SNPs suggests difficulties. Science 281, 1787–1789 (1998).
Article CAS PubMed Google Scholar
Taillon-Miller, P., Gu, Z., Li, Q., Hillier, L. & Kwok, P.Y. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. 8, 748– 754 (1998).
Article CAS PubMed PubMed Central Google Scholar
Wang, D.G. et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998).
Article CAS PubMed Google Scholar
Brookes, A.J. The essence of SNPs. Gene 234, 177– 186 (1999).
Article CAS PubMed Google Scholar
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).
Article CAS PubMed Google Scholar
Halushka, M.K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–246 (1999).
Article CAS PubMed Google Scholar
Masood, E. Consortium plans free SNP map of human genome. Nature 398, 545–546 (1999).
Article CAS PubMed Google Scholar
Schuler, G. Pieces of the puzzle: expressed sequence tags and the catalog of human genes . J. Mol. Med. 75, 694– 698 (1997).
Article CAS PubMed Google Scholar
Buetow, K.H., Edmonson, M.N. & Cassidy, A.B. Reliable identification of large numbers of candidate SNPs from public EST data. Nature Genet. 21, 323–325 (1999).
Article CAS PubMed Google Scholar
Picoult-Newberg, L. et al. Mining SNPs from EST databases. Genome Res. 9, 167–174 (1999).
CAS PubMed PubMed Central Google Scholar
Marth, G.T. et al. A general approach to single nucleotide polymorphism discovery . Nature Genet. 23, 452– 456 (1999).
Article CAS PubMed Google Scholar
Jackson, A.L. & Loeb, L.A. The mutation rate and cancer. Genetics 148, 1483–1490 (1998).
CAS PubMed PubMed Central Google Scholar
Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E.J. Gene discovery in dbEST. Science 265, 1993–1994 (1994).
Article CAS PubMed Google Scholar
Ingram, V.M. Abnormal human haemoglobin. III. The chemical difference between normal and sickle cell haemoglobins. Biochim. Biophys. Acta 36 , 402–411 (1959).
Article CAS PubMed Google Scholar
Baur, E.W. & Motulsky, A.G. Hemoglobin tacoma—a β-chain variant associated with increased hb A2. Humangenetik 1, 621–634 (1965).
CAS PubMed Google Scholar
Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 ( 1998).
Article CAS PubMed Google Scholar
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Article CAS PubMed Google Scholar
Maeda, M. et al. A simple and rapid method for HLA-DQA1 genotyping by digestion of PCR-amplified DNA with allele specific restriction endonucleases. Tissue Antigens 34, 290–298 (1989).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank B. Modrek for assistance in mapping polymorphisms; P. Green for the PHRED, PHRAP and CONSED programs; K. Buetow for the CGAP SNP data; S. McGinnis for information about Unigene and E. Partsch for the sequence of pCMVSPORT. K.I. was supported by USPHS National Research Service Award GM08375. V.K. was supported by USPHS National Research Service Award GM07104. S.N. was supported by the Gwynn Hazen Cherry Memorial Laboratory. W.W. was supported by National Science Foundation grants NSF-DMS-9703918 and NSF-DBI-9904701. C.J.L. was supported by Department of Energy grant DEFG0387ER60615 and a grant from the Searle Scholars Program. Experimental SNP verification costs were supported in part by UC-Biostar grant S97106 to S.N.

Author information

Authors and Affiliations

Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California, USA
Kris Irizarry & Christopher J. Lee
Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, USA
Vlad Kustanovich & Stanley Nelson
Department of Statistics, University of California, Los Angeles, Los Angeles, California, USA
Cheng Li & Wing Wong
Department of Pediatrics, University of California, Los Angeles, Los Angeles, California, USA
Stanley Nelson
Graduate Program in Computer Science, University of California, Los Angeles, Los Angeles, California, USA
Nik Brown

Authors

Kris Irizarry
View author publications
You can also search for this author in PubMed Google Scholar
Vlad Kustanovich
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Nik Brown
View author publications
You can also search for this author in PubMed Google Scholar
Stanley Nelson
View author publications
You can also search for this author in PubMed Google Scholar
Wing Wong
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher J. Lee.

Supplementary information

Tables A and B (DOC 40 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Irizarry, K., Kustanovich, V., Li, C. et al. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nat Genet 26, 233–236 (2000). https://doi.org/10.1038/79981

Download citation

Received: 29 November 1999
Accepted: 10 July 2000
Issue Date: October 2000
DOI: https://doi.org/10.1038/79981

This article is cited by

A Thermophilic G-Quadruplex DNA/N-methylmesoporphyrin IX Sensor for Accurately Detecting Single Nucleotide Polymorphism
- Yu Yan
- Dan Zhao
- Meng Liu
Journal of Analysis and Testing (2022)
A simple and ultrasensitive fluorescence assay for single-nucleotide polymorphism
- Qian Ma
- Zhiqiang Gao
Analytical and Bioanalytical Chemistry (2018)
A universal probe design for colorimetric detection of single-nucleotide variation with visible readout and high specificity
- Xueping Chen
- Dandan Zhou
- Guoming Xie
Scientific Reports (2016)
The Effect of Ancestry and Genetic Variation on Lung Function Predictions: What Is “Normal” Lung Function in Diverse Human Populations?
- Victor E. Ortega
- Rajesh Kumar
Current Allergy and Asthma Reports (2015)
Mining of gene-based SNPs from publicly available ESTs and their conversion to cost-effective genotyping assay in sorghum [Sorghum bicolor (L.) Moench]
- Yemane Girma
- Dadakhalandar Doddamani
- Gurusiddesh Hiremath
Journal of Crop Science and Biotechnology (2014)

Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences

Abstract

Access options

Similar content being viewed by others

Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation

The mutational constraint spectrum quantified from variation in 141,456 humans

Searching thousands of genomes to classify somatic and novel structural variants using STIX

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Tables A and B (DOC 40 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

A Thermophilic G-Quadruplex DNA/N-methylmesoporphyrin IX Sensor for Accurately Detecting Single Nucleotide Polymorphism

A simple and ultrasensitive fluorescence assay for single-nucleotide polymorphism

A universal probe design for colorimetric detection of single-nucleotide variation with visible readout and high specificity

The Effect of Ancestry and Genetic Variation on Lung Function Predictions: What Is “Normal” Lung Function in Diverse Human Populations?

Mining of gene-based SNPs from publicly available ESTs and their conversion to cost-effective genotyping assay in sorghum [Sorghum bicolor (L.) Moench]

Search

Quick links

Abstract

Access options

Similar content being viewed by others

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links