Journal home
Advance online publication
Current issue
Archive
Press releases
Free Association (blog)
Supplements
Focuses
Guide to authors
Online submissionOnline submission
For referees
Free online issue
Contact the journal
Subscribe
Advertising
work@npg
Reprints and permissions
About this site
For librarians
 
NPG Resources
Nature
Nature Biotechnology
Nature Cell Biology
Nature Medicine
Nature Methods
Nature Reviews Cancer
Nature Reviews Genetics
Nature Reviews Molecular Cell Biology
news@nature.com
Nature Conferences
RNAi Gateway
NPG Subject areas
Biotechnology
Cancer
Chemistry
Clinical Medicine
Dentistry
Development
Drug Discovery
Earth Sciences
Evolution & Ecology
Genetics
Immunology
Materials Science
Medical Research
Microbiology
Molecular Cell Biology
Neuroscience
Pharmacology
Physics
Browse all publications
Letter
Nature Genetics  23, 452 - 456 (1999)
doi:10.1038/70570

A general approach to single-nucleotide polymorphism discovery

Gabor T. Marth1, Ian Korf1, Mark D. Yandell1, Raymond T. Yeh1, Zhijie Gu2, Hamideh Zakeri2, Nathan O. Stitziel1, LaDeana Hillier1, Pui-Yan Kwok2 & Warren R. Gish1

1  Washington University Department of Genetics and Genome Sequencing Center, St. Louis, Missouri, USA .

2  Washington University Division of Dermatology, St. Louis, Missouri, USA.

Correspondence should be addressed to Gabor T. Marth gmarth@watson.wustl.edu or Pui-Yan Kwok kwok@im.wustl.edu
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits1. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence6, 7 as a template on which to layer often unmapped, fragmentary sequence data8, 9, 10, 11 and to use base quality values12 to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.

 Top
Abstract
Previous | Next
Table of contents
Full textFull text
Download PDFDownload PDF
Send to a friendSend to a friend
Save this linkSave this link

Open Innovation Challenges

naturejobs

Figures & Tables
Export citation
natureproducts

Search buyers guide:

 
ADVERTISEMENT
 
Nature Genetics
ISSN: 1061-4036
EISSN: 1546-1718
Journal home | Advance online publication | Current issue | Archive | Press releases | Supplements | Focuses | For authors | Online submission | Permissions | For referees | Free online issue | About the journal | Contact the journal | Subscribe | Advertising | work@npg | naturereprints | About this site | For librarians
Nature Publishing Group, publisher of Nature, and other science journals and reference works©1999 Nature Publishing Group | Privacy policy