Nature Genetics
26, 233 - 236 (2000)
doi:10.1038/79981
Genome-wide analysis of single-nucleotide polymorphisms in human expressed
sequencesKris Irizarry1, Vlad Kustanovich2, Cheng Li3, Nik Brown5, Stanley Nelson2, 4, Wing Wong3
& Christopher J. Lee11
Department of Chemistry & Biochemistry, University
of California, Los Angeles, Los Angeles, California,
USA. 2
Department of Human Genetics, University of California,
Los Angeles, Los Angeles, California, USA.
3
Department of Statistics, University of California,
Los Angeles, Los Angeles, California, USA.
4
Department of Pediatrics, University of California,
Los Angeles, Los Angeles, California, USA.
5
Graduate Program in Computer Science, University of
California, Los Angeles, Los Angeles, California,
USA.
Correspondence should be addressed to Christopher J. Lee leec@mbi.ucla.eduSingle-nucleotide polymorphisms (SNPs) have been explored as a high-resolution
marker set for accelerating the mapping of disease genes1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11.
Here we report 48,196 candidate SNPs detected by statistical analysis of human
expressed sequence tags (ESTs), associated primarily with coding regions of
genes. We used Bayesian inference to weigh evidence for true polymorphism
versus sequencing error, misalignment or ambiguity, misclustering or chimaeric
EST sequences, assessing data such as raw chromatogram height, sharpness,
overlap and spacing, sequencing error rates, context-sensitivity and cDNA
library origin. Three separate validationscomparison with 54 genes
screened for SNPs independently, verification of HLA-A polymorphisms
and restriction fragment length polymorphism (RFLP) testingverified
70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold
more true HLA-A SNPs than previous analyses of the EST data. We found
SNPs in a large fraction of known disease genes, including some disease-causing
mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis
of human coding region polymorphism provides a public resource for mapping
of disease genes (available at
http://www.bioinformatics.ucla.edu/snp).
|