Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences

A Correction to this article was published on 01 November 2000

Abstract

Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes1,2,3,4,5,6,7,8,9,10,11. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations—comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing—verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Prediction and subsequent validation of a SNP in megakaryocyte potentiating factor.
Figure 2: Comparison of candidate SNPs with catalogued HLA-A polymorphisms.
Figure 3: A map of a 1.4-Mb region of genomic contig NT_001454 (22q13.1) containing 42 genes, 47 SNPs and 2 microsatellites.

Similar content being viewed by others

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

  1. Li, W. & Sadler, L.A. Low nucleotide diversity in man . Genetics 129, 513–523 (1991).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).

    Article  CAS  PubMed  Google Scholar 

  3. Lai, E., Riley, J., Purvis, I. & Roses, A. A 4-Mb high-density single nucleotide polymorphism-based map around human APOE. Genomics 54, 31–38 ( 1998).

    Article  CAS  PubMed  Google Scholar 

  4. Nickerson, D.A. et al. DNA sequence diversity in a 9.7 kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233– 240 (1998).

    Article  CAS  PubMed  Google Scholar 

  5. Pennisi, E. A closer look at SNPs suggests difficulties. Science 281, 1787–1789 (1998).

    Article  CAS  PubMed  Google Scholar 

  6. Taillon-Miller, P., Gu, Z., Li, Q., Hillier, L. & Kwok, P.Y. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. 8, 748– 754 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Wang, D.G. et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998).

    Article  CAS  PubMed  Google Scholar 

  8. Brookes, A.J. The essence of SNPs. Gene 234, 177– 186 (1999).

    Article  CAS  PubMed  Google Scholar 

  9. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).

    Article  CAS  PubMed  Google Scholar 

  10. Halushka, M.K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–246 (1999).

    Article  CAS  PubMed  Google Scholar 

  11. Masood, E. Consortium plans free SNP map of human genome. Nature 398, 545–546 (1999).

    Article  CAS  PubMed  Google Scholar 

  12. Schuler, G. Pieces of the puzzle: expressed sequence tags and the catalog of human genes . J. Mol. Med. 75, 694– 698 (1997).

    Article  CAS  PubMed  Google Scholar 

  13. Buetow, K.H., Edmonson, M.N. & Cassidy, A.B. Reliable identification of large numbers of candidate SNPs from public EST data. Nature Genet. 21, 323–325 (1999).

    Article  CAS  PubMed  Google Scholar 

  14. Picoult-Newberg, L. et al. Mining SNPs from EST databases. Genome Res. 9, 167–174 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Marth, G.T. et al. A general approach to single nucleotide polymorphism discovery . Nature Genet. 23, 452– 456 (1999).

    Article  CAS  PubMed  Google Scholar 

  16. Jackson, A.L. & Loeb, L.A. The mutation rate and cancer. Genetics 148, 1483–1490 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E.J. Gene discovery in dbEST. Science 265, 1993–1994 (1994).

    Article  CAS  PubMed  Google Scholar 

  18. Ingram, V.M. Abnormal human haemoglobin. III. The chemical difference between normal and sickle cell haemoglobins. Biochim. Biophys. Acta 36 , 402–411 (1959).

    Article  CAS  PubMed  Google Scholar 

  19. Baur, E.W. & Motulsky, A.G. Hemoglobin tacoma—a β-chain variant associated with increased hb A2. Humangenetik 1, 621–634 (1965).

    CAS  PubMed  Google Scholar 

  20. Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 ( 1998).

    Article  CAS  PubMed  Google Scholar 

  21. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

    Article  CAS  PubMed  Google Scholar 

  22. Maeda, M. et al. A simple and rapid method for HLA-DQA1 genotyping by digestion of PCR-amplified DNA with allele specific restriction endonucleases. Tissue Antigens 34, 290–298 (1989).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank B. Modrek for assistance in mapping polymorphisms; P. Green for the PHRED, PHRAP and CONSED programs; K. Buetow for the CGAP SNP data; S. McGinnis for information about Unigene and E. Partsch for the sequence of pCMVSPORT. K.I. was supported by USPHS National Research Service Award GM08375. V.K. was supported by USPHS National Research Service Award GM07104. S.N. was supported by the Gwynn Hazen Cherry Memorial Laboratory. W.W. was supported by National Science Foundation grants NSF-DMS-9703918 and NSF-DBI-9904701. C.J.L. was supported by Department of Energy grant DEFG0387ER60615 and a grant from the Searle Scholars Program. Experimental SNP verification costs were supported in part by UC-Biostar grant S97106 to S.N.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher J. Lee.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Irizarry, K., Kustanovich, V., Li, C. et al. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nat Genet 26, 233–236 (2000). https://doi.org/10.1038/79981

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/79981

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing