Nature Genetics
22, 231 - 238 (1999)
doi:10.1038/10290
Characterization of single-nucleotide polymorphisms in coding regions
of human genesMichele Cargill1, 7, David Altshuler1, 2, 7, James Ireland1, Pamela Sklar1, 3, Kristin Ardlie1, Nila Patil5, Charles R. Lane1, Esther P. Lim1, Nilesh Kalyanaraman1, James Nemesh1, Liuda Ziaugra1, Lisa Friedland1, Alex Rolfe1, Janet Warrington5, Robert Lipshutz5, George Q. Daley1, 4
& Eric S. Lander1, 61
Whitehead Institute/MIT Center for Genome Research
, One Kendall Square, Building 300, Cambridge
, Massachusetts 02139, USA. 2
Departments of Endocrinology, Boston,
Massachusetts 02114, USA. 3
Psychiatry Boston, Massachusetts
02114, USA. 4
Hematology, Massachusetts General Hospital,
Boston, Massachusetts 02114, USA. 5
Affymetrix, Inc., Santa Clara,
California 95051, USA. 6
Department of Biology, Massachusetts Institute of Technology
, Cambridge, Massachusetts 02139,
USA. 7
These authors contributed equally to this work.
Correspondence should be addressed to Eric S. Lander lander@genome.wi.mit.eduA major goal in human genetics is to understand the role of common genetic
variants in susceptibility to common diseases. This will require characterizing
the nature of gene variation in human populations, assembling an extensive
catalogue of single-nucleotide polymorphisms (SNPs) in candidate genes and
performing association studies for particular diseases. At present, our knowledge
of human gene variation remains rudimentary. Here we describe a systematic
survey of SNPs in the coding regions of human genes. We identified SNPs in
106 genes relevant to cardiovascular disease, endocrinology and neuropsychiatry
by screening an average of 114 independent alleles using 2 independent screening
methods. To ensure high accuracy, all reported SNPs were confirmed by DNA
sequencing. We identified 560 SNPs, including 392 coding-region SNPs (cSNPs)
divided roughly equally between those causing synonymous and non-synonymous
changes. We observed different rates of polymorphism among classes of sites
within genes (non-coding, degenerate and non-degenerate) as well as between
genes. The cSNPs most likely to influence disease, those that alter the amino
acid sequence of the encoded protein, are found at a lower rate and with lower
allele frequencies than silent substitutions. This likely reflects selection
acting against deleterious alleles during human evolution. The lower allele
frequency of missense cSNPs has implications for the compilation of a comprehensive
catalogue, as well as for the subsequent application to disease association.
|