A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms

Abstract

We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Distribution of SNP coverage across intervals of finished sequence.
Figure 2: Distribution of heterozygosity.

References

  1. 1

    Collins, F. S. Of needles and haystacks: finding human disease genes by positional cloning. Clin. Res. 39, 615–623 (1991).

  2. 2

    Collins, F. S., Guyer, M. S. & Charkravarti, A. Variations on a theme: cataloging human DNA sequence variation. Science 278, 1580–1581 (1997).

  3. 3

    Lander, E. S. The new genomics: global views of biology. Science 274, 536–539 (1996).

  4. 4

    Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).

  5. 5

    Li, W. H. & Sadler, L. A. Low nucleotide diversity in man. Genetics 129, 513–523 (1991).

  6. 6

    Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes [published erratum appears in Nature Genet. 23, 373 (1999)]. Nature Genet. 22, 231–238 (1999).

  7. 7

    Cambien, F. et al. Sequence diversity in 36 candidate genes for cardiovascular disorders. Am. J. Hum. Genet. 65, 183–191 (1999).

  8. 8

    Fullerton, S. M. et al. Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am. J. Hum. Genet. 67, 881–900 (2000).

  9. 9

    Halushka, M. K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–247 (1999).

  10. 10

    Nickerson, D. A. et al. DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233–240 (1998).

  11. 11

    Rieder, M. J., Taylor, S. L., Clark, A. G. & Nickerson, D. A. Sequence variation in the human angiotensin converting enzyme. Nature Genet. 22, 59–62 (1999).

  12. 12

    Templeton, A. R., Weiss, K. M., Nickerson, D. A., Boerwinkle, E. & Sing, C. F. Cladistic structure within the human lipoprotein lipase gene and its implications for phenotypic association studies. Genetics 156, 1259–1275 (2000).

  13. 13

    Eaves, I. A. et al. The genetically isolated populations of Finland and sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes. Nature Genet. 25, 320–323 (2000).

  14. 14

    Taillon-Miller, P. et al. Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nature Genet. 25, 324–328 (2000).

  15. 15

    Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139–144 (1999).

  16. 16

    Collins, A., Lonjou, C. & Morton, N. E. Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl Acad. Sci. USA 96, 15173–15177 (1999).

  17. 17

    Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature (submitted).

  18. 18

    Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).

  19. 19

    Nachman, M. W., Bauer, V. L., Crowell, S. L. & Aquadro, C. F. DNA variability and recombination rates at X-linked loci in humans. Genetics 150, 1133–1141 (1998).

  20. 20

    Wang, D. G. et al. Large-scale identification, mapping, and genotyping of single- nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998).

  21. 21

    Jorde, L. B. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444 (2000).

  22. 22

    Mullikin, J. C. et al. An SNP map of human chromosome 22. Nature 407, 516–520 (2000).

  23. 23

    Collins, F. S., Brooks, L. D. & Chakravarti, A. A DNA polymorphism discovery resource for research on human genetic variation [published erratum appears in Genome Res. 9, 210 (1999)]. Genome Res. 8, 1229–1231 (1998).

  24. 24

    Marth, G. T. et al. A general approach to single-nucleotide polymorphism discovery. Nature Genet. 23, 452–456 (1999).

  25. 25

    Buetow, K. H., Edmonson, M. N. & Cassidy, A. B. Reliable identification of large numbers of candidate SNPs from public EST data. Nature Genet. 21, 323–325 (1999).

  26. 26

    Gu, Z., Hillier, L. & Kwok, P. Y. Single nucleotide polymorphism hunting in cyberspace. Hum. Mutat. 12, 221–225 (1998).

  27. 27

    Irizarry, K. et al. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nature Genet. 26, 233–236 (2000).

  28. 28

    Picoult-Newberg, L. et al. Mining SNPs from EST databases. Genome Res. 9, 167–174 (1999).

  29. 29

    Marth, G. T. et al. Single nucleotide polymorphisms in the public database: how useful are they? Nature Genet. (submitted).

  30. 30

    Yang, Z. et al. Sampling SNPs. Nature Genet. 26, 13–14 (2000).

  31. 31

    Pruitt, K. D., Katz, K. S., Sicotte, H. & Maglott, D. R. Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet. 16, 44–47 (2000).

  32. 32

    International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  33. 33

    Bohossian, H. B., Skaletsky, H. & Page, D. C. Unexpectedly similar rates of nucleotide substitution found in male and female hominids. Nature 406, 622–625 (2000).

  34. 34

    Cooke, H. J., Brown, W. R. & Rappold, G. A. Hypervariable telomeric sequences from the human sex chromosomes are pseudoautosomal. Nature 317, 687–692 (1985).

  35. 35

    Shen, P. et al. Population genetic implications from sequence variation in four Y chromosome genes. Proc. Natl Acad. Sci. USA 97, 7354–7359 (2000).

  36. 36

    Underhill, P. A. et al. Detection of numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography. Genome Res. 7, 996–1005 (1997).

  37. 37

    Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).

  38. 38

    Hudson, R. R. in Oxford Surveys in Evolutionary Biology (eds Futuyma, D. & Antonovics, J.) 1–44 (Oxford Univ. Press, Oxford, 1991).

  39. 39

    Lindblad-Toh, K. et al. Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nature Genet. 24, 381–386 (2000).

  40. 40

    Kimmel, M. et al. Signatures of population expansion in microsatellite repeat data. Genetics 148, 1921–1930 (1998).

  41. 41

    Reich, D. E. & Goldstein, D. B. Genetic evidence for a Paleolithic human population expansion in Africa [published erratum appears in Proc. Natl Acad. Sci. USA 95, 11026 (1998)]. Proc. Natl Acad. Sci. USA 95, 8119–8123 (1998).

  42. 42

    Miller, R. D., Taillon-Miller, P. & Kwok, P. Y. Regions of low single-nucleotide polymorphism (SNP) incidence in human and orangutan Xq: deserts and recent coalescences. Genomics (in the press).

  43. 43

    Horton, R. et al. Large-scale sequence comparisons reveal unusually high levels of variation in the HLA-DQB1 locus in the class II region of the human MHC. J. Mol. Biol. 282, 71–97 (1998).

  44. 44

    Zhang, Z., Schwartz, S., Wagner, L. & Miller, W. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000).

Download references

Acknowledgements

The SNP Consortium, the Wellcome Trust and the National Human Genome Research Institute funded SNP discovery and data management at Cold Spring Harbor Laboratories, The Sanger Centre, Washington University in St. Louis, and the Whitehead/MIT Center for Genome Research. Work in P.Y.K.'s laboratory is supported in part by grants from the SNP Consortium and the National Human Genome Research Institute. P.Y.K. thanks Q. Li, M. Minton, R. Donaldson and S. Duan for technical assistance. D.M.A. was supported during a phase of this work under a Postdoctoral Fellowship for Physicians from the Howard Hughes Medical Institute. For full list of contributors to TSC programme, see http://www.snp.cshl.org.

Author information

Correspondence to David R. Bentley or David Altshuler.

Additional information

(contributing institutions are listed alphabetically).

Rights and permissions

Reprints and Permissions

About this article

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.