Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection

An Addendum to this article was published on 29 March 2011


We report a large-scale analysis of the patterns of genome-wide genetic variation in soybeans. We re-sequenced a total of 17 wild and 14 cultivated soybean genomes to an average of approximately ×5 depth and >90% coverage using the Illumina Genome Analyzer II platform. We compared the patterns of genetic variation between wild and cultivated soybeans and identified higher allelic diversity in wild soybeans. We identified a high level of linkage disequilibrium in the soybean genome, suggesting that marker-assisted breeding of soybean will be less challenging than map-based cloning. We report linkage disequilibrium block location and distribution, and we identified a set of 205,614 tag SNPs that may be useful for QTL mapping and association studies. The data here provide a valuable resource for the analysis of wild soybeans and to facilitate future breeding and quantitative trait analysis.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Analysis of the phylogenetic relationship, population structure and LD decay of wild and cultivated soybeans.
Figure 2: Summary of resequencing data of 17 wild and 14 cultivated soybean accessions.
Figure 3: Patterns of LD blocks in two genomic regions.

Accession codes




  1. 1

    Hymowitz, T. On the domestication of soybean. Econ. Bot. 24, 408–421 (1970).

    Article  Google Scholar 

  2. 2

    Hymowitz, T. & Harlan, J.R. Introduction of soybean to North America by Samuel Bowen in 1765. Econ. Bot. 37, 371–379 (1983).

    Article  Google Scholar 

  3. 3

    Hyten, D.L. et al. Highly variable patterns of linkage disequilibrium in multiple soybean populations. Genetics 175, 1937–1944 (2007).

    CAS  Article  Google Scholar 

  4. 4

    Hyten, D.L. et al. Impacts of genetic bottlenecks on soybean genome diversity. Proc. Natl. Acad. Sci. USA 103, 16666–16671 (2006).

    CAS  Article  Google Scholar 

  5. 5

    Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).

    CAS  Article  Google Scholar 

  6. 6

    Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

    CAS  Article  Google Scholar 

  7. 7

    Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124–1132 (2009).

    CAS  Article  Google Scholar 

  8. 8

    Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    CAS  Article  Google Scholar 

  9. 9

    Xia, Q. et al. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326, 433–436 (2009).

    CAS  Article  Google Scholar 

  10. 10

    Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

    Article  Google Scholar 

  13. 13

    Hernandez, R.D. et al. Demographic histories and patterns of linkage disequilibrium in Chinese and Indian Rhesus Macaques. Science 316, 240–243 (2007).

    CAS  Article  Google Scholar 

  14. 14

    Caicedo, A.L. et al. Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genet. 3, 1745–1756 (2007).

    CAS  Article  Google Scholar 

  15. 15

    Gore, M.A. et al. A first-generation haplotype map of maize. Science 326, 1115–1117 (2009).

    CAS  Article  Google Scholar 

  16. 16

    Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

    CAS  Article  Google Scholar 

  17. 17

    Kim, S. et al. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 39, 1151–1155 (2007).

    CAS  Article  Google Scholar 

  18. 18

    Zhu, Q., Zheng, X., Luo, J., Gaut, B.S. & Ge, S. Multilocus analysis of nucleotide variation of Oryza sativa and its wild relatives: severe bottleneck during domestication of rice. Mol. Biol. Evol. 24, 875–888 (2007).

    CAS  Article  Google Scholar 

  19. 19

    Flint-Garcia, S.A., Thornsberry, J.M. & Buckler, E.S. IV . Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54, 357–374 (2003).

    CAS  Article  Google Scholar 

  20. 20

    Gabriel, S.B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).

    CAS  Article  Google Scholar 

  21. 21

    Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005).

    CAS  Article  Google Scholar 

  22. 22

    The Bovine HapMap Consortium. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324, 528–532 (2009).

  23. 23

    Watterson, G.A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–276 (1975).

    CAS  Article  Google Scholar 

  24. 24

    Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595 (1989).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Liu, B. et al. QTL mapping of domestication-related traits in soybean (Glycine max). Ann. Bot. (Lond.) 100, 1027–1038 (2007).

    CAS  Article  Google Scholar 

  26. 26

    Li, H. et al. Identification of QTL underlying vitamin E contents in soybean seed among multiple environments. Theor. Appl. Genet. 120, 1405–1413 (2010).

    CAS  Article  Google Scholar 

  27. 27

    Huang, Z.-W., Zhao, T.-J., Yu, D.-Y., Chen, S.-Y. & Gai, J.-Y. Correlation and QTL mapping of biomass accumulation, apparent harvest index, and yield in soybean. Acta. Agron. Sin. 34, 944–951 (2008).

    CAS  Article  Google Scholar 

  28. 28

    McNally, K.L. et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl. Acad. Sci. USA 106, 12273–12278 (2009).

    CAS  Article  Google Scholar 

  29. 29

    Clark, R.M. et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 317, 338–342 (2007).

    CAS  Article  Google Scholar 

  30. 30

    Jordan, I.K., Rogozin, I.B., Wolf, Y.I. & Koonin, E.V. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968 (2002).

    CAS  Article  Google Scholar 

  31. 31

    Dangl, J.L. & Jones, J.D.G. Plant pathogens and integrated defence responses to infection. Nature 411, 826–833 (2001).

    CAS  Article  Google Scholar 

  32. 32

    Blanc, G. & Wolfe, K.H. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16, 1679–1691 (2004).

    CAS  Article  Google Scholar 

  33. 33

    Maere, S. et al. Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102, 5454–5459 (2005).

    CAS  Article  Google Scholar 

  34. 34

    Lynch, M. & Conery, J.S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).

    CAS  Article  Google Scholar 

  35. 35

    Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

    CAS  Article  Google Scholar 

  36. 36

    Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

    CAS  Article  Google Scholar 

  37. 37

    Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006).

    Article  Google Scholar 

  38. 38

    Lu, J. et al. The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication. TIG 22, 126–131 (2006).

    CAS  Article  Google Scholar 

  39. 39

    Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

    CAS  Article  Google Scholar 

  40. 40

    Doyle, J.J. & Doyle, J.L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11–15 (1987).

    Google Scholar 

  41. 41

    Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).

    CAS  Article  Google Scholar 

  42. 42

    Tamura, K., Dudley, J., Nei, M. & Kumar, S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599 (2007).

    CAS  Article  Google Scholar 

  43. 43

    Hudson, R.R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).

    CAS  Article  Google Scholar 

  44. 44

    Akey, J.M., Zhang, G., Zhang, K., Jin, L. & Shriver, M.D. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12, 1805–1814 (2002).

    CAS  Article  Google Scholar 

  45. 45

    McDonald, J.H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 (1991).

    CAS  Article  Google Scholar 

  46. 46

    Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    CAS  Article  Google Scholar 

  47. 47

    Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    CAS  Article  Google Scholar 

Download references


T. Han, X. Yan, H. Liao, B. Zhuang and Y.-K. Lau provided valuable advice, information and other aid. This work was partially supported by the Hong Kong RGC General Research Fund 468610 (to H.-M.L.), the Hong Kong UGC AoE Center for Plant and Agricultural Biotechnology Project AoE-B-07/09 and a special fund from the Resource Allocation Committee, The Chinese University of Hong Kong (to H.-M.L. and S.S.-M.S.). We also acknowledge the funding support from the National Natural Science Foundation of China (30725008), the Chinese 973 program (2007CB815703; 2007CB815705), Chinese Ministry of Agriculture (948 program), the Shenzhen Municipal Government of China and grants from Shenzhen Bureau of Science Technology & Information, China (ZYC200903240077A; CXB200903110066A). We thank L. Goodman for assistance in editing the manuscript.

Author information




H.-M.L., G.Z., S.S.-M.S. and Jun Wang managed the project. H.-M.L., X.X., X.L, N.Q. and G.Y. designed the experiments and led the data analysis. W.H., B.W., J.L., W.C., M.J. and Jian Wang contributed to DNA sequencing and bioinformatics. F.-L.W., M.-W.L. and G.S. prepared samples and contributed to data analysis. H.-M.L., X.X. and X.L. wrote the manuscript.

Corresponding authors

Correspondence to Hon-Ming Lam or Jun Wang or Samuel Sai-Ming Sun or Gengyun Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13 and Supplementary Tables 1–7 (PDF 845 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lam, HM., Xu, X., Liu, X. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet 42, 1053–1059 (2010).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing