A reference genome for common bean and genome-wide analysis of dual domestications

Journal name:
Nature Genetics
Year published:
Published online


Common bean (Phaseolus vulgaris L.) is the most important grain legume for human consumption and has a role in sustainable agriculture owing to its ability to fix atmospheric nitrogen. We assembled 473 Mb of the 587-Mb genome and genetically anchored 98% of this sequence in 11 chromosome-scale pseudomolecules. We compared the genome for the common bean against the soybean genome to find changes in soybean resulting from polyploidy. Using resequencing of 60 wild individuals and 100 landraces from the genetically differentiated Mesoamerican and Andean gene pools, we confirmed 2 independent domestications from genetic pools that diverged before human colonization. Less than 10% of the 74 Mb of sequence putatively involved in domestication was shared by the two domestication events. We identified a set of genes linked with increased leaf and seed size and combined these results with quantitative trait locus data from Mesoamerican cultivars. Genes affected by domestication may be useful for genomics-enabled crop improvement.

At a glance


  1. Structure of the P. vulgaris genome and synteny with the G. max genome.
    Figure 1: Structure of the P. vulgaris genome and synteny with the G. max genome.

    (a) Gray lines connect duplicated genes. (b) Chromosome structure with centromeric and pericentromeric regions in black and gray, respectively (scale is in Mb). (c) Gene density in sliding windows of 1 Mb at 200-kb intervals. (d) Repeat density in sliding windows of 1 Mb at 200-kb intervals. (e) Recombination rate based on the genetic and physical mapping of 6,945 SNPs and SSRs. (f,g) First syntenic region (f) and second G. max syntenic region (g) due to a lineage-specific duplication resulting in two chromosome segments for every segment in P. vulgaris.

  2. Geographic distribution of sampled genotypes.
    Figure 2: Geographic distribution of sampled genotypes.
  3. Evolution and domestication of common bean.
    Figure 3: Evolution and domestication of common bean.

    (a) Divergence of the wild Mesoamerican and Andean common bean pools. The wild Andean gene pool diverged from the wild Mesoamerican gene pool ~165,000 years ago, with a small founding population and a strong bottleneck that lasted ~76,000 years. The bottleneck was followed by an exponential growth phase extending to the present day. Asymmetric gene flow between the two pools had a key role in maintaining genetic diversity, especially in the Andean population, with average migration rates M21 = 0.135 (wild Mesoamerican to wild Andean) and M12 = 0.087 (wild Andean to wild Mesoamerican). This scenario conforms to the Mesoamerican origin model of the common bean, with an Andean bottleneck that predated domestication. (nanc, size of ancestral population; tdiv, start of bottleneck; nb, size of bottleneck population; tb, length of bottleneck) (b) Population genomic analysis based on SNP data from the resequencing of DNA pools for common bean. The size of the circle for each pool is proportional to the π value for the pool. For a reference, π = 0.0061 for the wild Mesoamerican (MA) pool. FST statistics, representing the differentiation of any two pools, are noted on the lines (not proportional) connecting pools. Data are average statistics across all 10-kb/2-kb sliding/discarding windows with <50% called bases. Land, landrace; N, north; S, south; C, central. (c) Variation in seed size in common bean. The seeds of wild Mesoamerican and Andean beans (two each) are smaller than the seeds corresponding to the reference genotype (G19833) and the multiple market classes of common beans grown in the United States (navy to light red kidney).

  4. Differentiation and reduction in diversity during the domestication of common bean.
    Figure 4: Differentiation and reduction in diversity during the domestication of common bean.

    (a,b) Genome-wide view in 10-kb/2-kb sliding windows of differentiation (FST) and reduction in diversity (π ratio) statistics associated with domestication within the common bean Mesoamerican (a) and Andean (b) gene pools. Log10 π ratios less than zero are not shown. Lines represent the 90%, 95% and 99% tails for the empirical distribution of each statistic.

  5. Genome-wide association analysis of seed weight.
    Figure 5: Genome-wide association analysis of seed weight.

    (a) A 280-member panel of Mesoamerican cultivars was grown in 4 locations in the United States. Phenotypic data were coupled with 34,799 SNP markers and analyzed using a mixed-model analysis that controlled for population structure and genotype relatedness. (b) A close-up view of the GWAS results for seed weight and linkage disequilibrium (r2) around a 1.23-Mb Mesoamerican sweep window on Pv07. The positions of candidate genes for domestication are noted by asterisks above the GWAS display. The candidates range from Phvul.007G094299 to Phvul.007G.99700 (Supplementary Note).

Accession codes

Primary accessions

NCBI Reference Sequence

Referenced accessions


  1. Anderson, J.W. et al. Hypocholesterolemic effects of oat-bran or bean intake for hypercholesterolemic men. Am. J. Clin. Nutr. 40, 11461155 (1984).
  2. Geil, P. & Anderson, J. Nutrition and health implications of dry beans: a review. J. Am. Coll. Nutr. 13, 549558 (1994).
  3. Cichy, K.A., Caldas, G.V., Snapp, S.S. & Blair, M.W. QTL analysis of seed iron, zinc, and phosphorus levels in an Andean bean population. Crop Sci. 49, 17421750 (2009).
  4. Beebe, S. Common bean breeding in the tropics. Plant Breed. Rev. 36, 357426 (2012).
  5. Mamidi, S. et al. Demographic factors shaped diversity in the two gene pools of wild common bean Phaseolus vulgaris L. Heredity 110, 267276 (2013).
  6. Bitocchi, E. et al. Molecular analysis of the parallel domestication of the common bean (Phaseolus vulgaris) in Mesoamerica and the Andes. New Phytol. 197, 300313 (2013).
  7. Bitocchi, E. et al. Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data. Proc. Natl. Acad. Sci. USA 109, E788E796 (2012).
  8. Gepts, P., Osborn, T., Rashka, K. & Bliss, F. Phaseolin-protein variability in wild forms and landraces of the common bean (Phaseolus vulgaris): evidence for multiple centers of domestication. Econ. Bot. 40, 451468 (1986).
  9. Mamidi, S. et al. Investigation of the domestication of common bean (Phaseolus vulgaris) using multilocus sequence data. Funct. Plant Biol. 38, 953967 (2011).
  10. Zizumbo-Villarreal, D. & Colunga-GarcíaMarín, P. Origin of agriculture and plant domestication in West Mesoamerica. Genet. Resour. Crop Evol. 57, 813825 (2010).
  11. Singh, S.P., Gepts, P. & Debouck, D.G. Races of common bean (Phaseolus vulgaris, Fabaceae). Econ. Bot. 45, 379396 (1991).
  12. McClean, P.E., Lee, R., Otto, C., Gepts, P. & Bassett, M. Molecular and phenotypic mapping of genes controlling seed coat pattern and color in common bean (Phaseolus vulgaris L.). J. Hered. 93, 148152 (2002).
  13. Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551556 (2009).
  14. Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178183 (2010).
  15. Meyers, B.C., Kaushik, S. & Nandety, R.S. Evolving disease resistance genes. Curr. Opin. Plant Biol. 8, 129134 (2005).
  16. Geffroy, V. et al. Molecular analysis of a large subtelomeric nucleotide-binding-site–leucine-rich-repeat family in two representative genotypes of the major gene pools of Phaseolus vulgaris. Genetics 181, 405419 (2009).
  17. Geffroy, V. et al. Identification of an ancestral resistance gene cluster involved in the coevolution process between Phaseolus vulgaris and its fungal pathogen Colletotrichum lindemuthianum. Mol. Plant Microbe Interact. 12, 774784 (1999).
  18. Innes, R.W. et al. Differential accumulation of retroelements and diversification of NB-LRR disease resistance genes in duplicated regions following polyploidy in the ancestor of soybean. Plant Physiol. 148, 17401759 (2008).
  19. Chen, N.W.G. et al. Specific resistances against Pseudomonas syringae effectors AvrB and AvrRpm1 have evolved differently in common bean (Phaseolus vulgaris), soybean (Glycine max), and Arabidopsis thaliana. New Phytol. 187, 941956 (2010).
  20. Geffroy, V. et al. A family of LRR sequences in the vicinity of the Co-2 locus for anthracnose resistance in Phaseolus vulgaris and its potential use in marker-assisted selection. Theor. Appl. Genet. 96, 494502 (1998).
  21. Miklas, P.N., Kelly, J.D., Beebe, S.E. & Blair, M.W. Common bean breeding for resistance against biotic and abiotic stresses: from classical to MAS breeding. Euphytica 147, 105131 (2006).
  22. David, P. et al. A nomadic subtelomeric disease resistance gene cluster in common bean. Plant Physiol. 151, 10481065 (2009).
  23. Lavin, M., Herendeen, P.S. & Wojciechowski, M.F. Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the Tertiary. Syst. Biol. 54, 575594 (2005).
  24. Gill, N. et al. Molecular and chromosomal evidence for allopolyploidy in soybean. Plant Physiol. 151, 11671174 (2009).
  25. McClean, P.E., Mamidi, S., McConnell, M., Chikara, S. & Lee, R. Synteny mapping between common bean and soybean reveals extensive blocks of shared loci. BMC Genomics 11, 184 (2010).
  26. Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
  27. Chacón S, M.I., Pickersgill, B. & Debouck, D.G. Domestication patterns in common bean (Phaseolus vulgaris L.) and the origin of the Mesoamerican and Andean cultivated races. Theor. Appl. Genet. 110, 432444 (2005).
  28. Kwak, M. & Gepts, P. Structure of genetic diversity in the two major gene pools of common bean (Phaseolus vulgaris L., Fabaceae). Theor. Appl. Genet. 118, 979992 (2009).
  29. Rossi, M. et al. Linkage disequilibrium and population structure in wild and domesticated populations of Phaseolus vulgaris L. Evol. Appl. 2, 504522 (2009).
  30. Rubin, C.-J. et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587591 (2010).
  31. Doebley, J.F., Gaut, B.S. & Smith, B.D. The molecular genetics of crop domestication. Cell 127, 13091321 (2006).
  32. Repinski, S.L., Kwak, M. & Gepts, P. The common bean growth habit gene PvTFL1y is a functional homolog of Arabidopsis TFL1. Theor. App. Genet. 124, 15391547 (2012).
  33. Sweeney, M.T. et al. Global dissemination of a single mutation conferring white pericarp in rice. PLoS Genet. 3, e133 (2007).
  34. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497501 (2012).
  35. Fornara, F., de Montaigu, A. & Coupland, G. SnapShot: control of flowering in Arabidopsis thaliana. Cell 141, 550 (2010).
  36. Chen, H. et al. Arabidopsis CULLIN4-damaged DNA binding protein 1 interacts with CONSTITUTIVELY PHOTOMORPHOGENIC1–SUPPRESSOR OF PHYA complexes to regulate photomorphogenesis and flowering time. Plant Cell 22, 108123 (2010).
  37. Gepts, P. Crop domestication as a long-term selection experiment. Plant Breed. Rev. 24, 144 (2004).
  38. Disch, S. et al. The E3 ubiquitin ligase BIG BROTHER controls Arabidopsis organ size in a dosage-dependent manner. Curr. Biol. 16, 272279 (2006).
  39. Breuer, C. et al. BIN4, a novel component of the plant DNA topoisomerase VI complex, is required for endoreduplication in Arabidopsis. Plant Cell 19, 36553668 (2007).
  40. Pérez-Vega, E. et al. Mapping of QTLs for morpho-agronomic and seed quality traits in a RIL population of common bean (Phaseolus vulgaris L.). Theor. Appl. Genet. 120, 13671380 (2010).
  41. Koinange, E.M., Singh, S.P. & Gepts, P. Genetic control of the domestication syndrome in common bean. Crop Sci. 36, 10371045 (1996).
  42. Weeden, N.F. Genetic changes accompanying the domestication of Pisum sativum: is there a common genetic basis to the 'domestication syndrome'for legumes? Ann. Bot. 100, 10171025 (2007).
  43. Van Daele, I. et al. A comparative study of seed yield parameters in Arabidopsis thaliana mutants and transgenics. Plant Biotechnol. J. 10, 488500 (2012).
  44. Hwang, I., Sheen, J. & Muller, B. Cytokinin signaling networks. Annu. Rev. Plant Biol. 63, 353380 (2012).
  45. González, A.M., De la Fuente, M., De Ron, A.M. & Santalla, M. Protein markers and seed size variation in common bean segregating populations. Mol. Breed. 25, 723740 (2010).
  46. Song, Q. et al. Abundance of SSR motifs and development of candidate polymorphic SSR markers (BARCSOYSSR_1. 0) in soybean. Crop Sci. 50, 19501960 (2010).
  47. Van Ooijen, J. JoinMap 4. Software for the Calculation of Genetic Linkage Maps in Experimental Populations (Kyazma, Wageningen, The Netherlands, 2006).
  48. Hyten, D.L. et al. High-throughput SNP discovery and assay development in common bean. BMC Genomics 11, 475 (2010).
  49. Jaffe, D.B. et al. Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13, 9196 (2003).
  50. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656664 (2002).
  51. Schuler, G.D. Sequence mapping by electronic PCR. Genome Res. 7, 541550 (1997).
  52. Haas, B.J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 56545666 (2003).
  53. Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516522 (2000).
  54. Yeh, R.-F., Lim, L.P. & Burge, C.B. Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803816 (2001).
  55. Ma, J. & Bennetzen, J.L. Rapid recent growth and divergence of rice nuclear genomes. Proc. Natl. Acad. Sci. USA 101, 1240412410 (2004).
  56. Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 38, D211D222 (2010).
  57. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 33893402 (1997).
  58. Rutherford, K. et al. Artemis: sequence visualization and annotation. Bioinformatics 16, 944945 (2000).
  59. Lupas, A., Van Dyke, M. & Stock, J. Predicting coiled coils from protein sequences. Science 252, 11621164 (1991).
  60. Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 15671587 (2003).
  61. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945959 (2000).
  62. McClean, P.E. et al. Population structure and genetic differentiation among the USDA common bean (Phaseolus vulgaris L.) core collection. Genet. Resour. Crop Evol. 59, 499515 (2012).
  63. Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14, 26112620 (2005).
  64. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  65. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  66. Koboldt, D.C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568576 (2012).
  67. Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437460 (1983).
  68. Watterson, G.A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256276 (1975).
  69. Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585595 (1989).
  70. Hudson, R.R., Slatkin, M. & Maddison, W. Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583589 (1992).
  71. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 12991320 (2005).
  72. Xia, Q. et al. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326, 433436 (2009).
  73. Kesavan, M., Song, J.T. & Seo, H.S. Seed size: a priority trait in cereal crops. Physiol. Plant. 147, 113120 (2013).
  74. Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629644 (2006).
  75. SAS Institute, Inc. SAS 9.3 Language Reference: Concepts, Second Edition (SAS Institute, Inc., Cary, NC, 2012).
  76. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203208 (2006).
  77. Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 17091723 (2008).
  78. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559575 (2007).
  79. Lipka, A.E. et al. GAPIT: genome association and prediction integrated tool. Bioinformatics 28, 23972399 (2012).

Download references

Author information

  1. These authors contributed equally to this work.

    • Jeremy Schmutz &
    • Phillip E McClean


  1. US Department of Energy Joint Genome Institute, Walnut Creek, California, USA.

    • Jeremy Schmutz,
    • G Albert Wu,
    • Shengqiang Shu,
    • Kerrie Barry,
    • Mansi Chovatia,
    • David M Goodstein,
    • Uffe Hellsten,
    • Mei Wang,
    • Ming Zhang &
    • Daniel S Rokhsar
  2. HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA.

    • Jeremy Schmutz,
    • Jane Grimwood &
    • Jerry Jenkins
  3. Department of Plant Sciences, North Dakota State University, Fargo, North Dakota, USA.

    • Phillip E McClean,
    • Sujan Mamidi,
    • Samira Mafi Moghaddam,
    • Rian Lee &
    • Juan M Osorno
  4. Corn Insects and Crop Genetics Research Unit, US Department of Agriculture–Agricultural Research Service, Ames, Iowa, USA.

    • Steven B Cannon
  5. Soybean Genomics and Improvement Laboratory, US Department of Agriculture–Agricultural Research Service, Beltsville, Maryland, USA.

    • Qijian Song,
    • David L Hyten,
    • Gaofeng Jia,
    • Josiane Rodrigues &
    • Perry B Cregan
  6. Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia, USA.

    • Carolina Chavarro,
    • Mirayda Torres-Torres,
    • Dongying Gao,
    • Brian Abernathy,
    • Michael Gonzales &
    • Scott A Jackson
  7. CNRS, Université Paris–Sud, Institut de Biologie des Plantes, UMR 8618, Saclay Plant Sciences (SPS), Orsay, France.

    • Valerie Geffroy,
    • Manon M S Richard &
    • Vincent Thareau
  8. Institut National de la Recherche Agronomique (INRA), Université Paris–Sud, Unité Mixte de Recherche de Génétique Végétale, Gif-sur-Yvette, France.

    • Valerie Geffroy
  9. Department of Agricultural and Natural Sciences, Tennessee State University, Nashville, Tennessee, USA.

    • Matthew Blair
  10. Department of Soil and Crop Sciences, Colorado State University, Fort Collins, Colorado, USA.

    • Mark A Brick
  11. Department of Plant Sciences, University of California, Davis, Davis, California, USA.

    • Paul Gepts
  12. Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA.

    • James D Kelly
  13. Arizona Genomics Institute, University of Arizona, Tucson, Arizona, USA.

    • Dave Kudrna,
    • Yeisoo Yu &
    • Rod A Wing
  14. Vegetable and Forage Crop Research Unit, US Department of Agriculture–Agricultural Research Service, Prosser, Washington, USA.

    • Phillip N Miklas
  15. Panhandle Research and Extension Center, University of Nebraska, Scottsbluff, Nebraska, USA.

    • Carlos A Urrea
  16. Present addresses: Pioneer Hi-Bred International, Inc., Johnston, Iowa, USA (D.L.H.) and Genética e Melhoramento, Federal University of Viçosa, Viçosa, Brazil (J.R.).

    • David L Hyten &
    • Josiane Rodrigues


J.S., P.E.M., D.S.R. and S.A.J. conceived the study and jointly wrote the manuscript with S.B.C. Genomic clones and DNA were provided by R.A.W., Y.Y., D.K., R.L. and M.B. The following analyses were performed by the indicated authors: repeat annotation, D.G.; identification of resistance genes, V.G., M.M.S.R. and V.T.; genetic mapping, P.B.C., Q.S., J.R., D.L.H. and G.J.; sequencing, assembly and/or annotation, J.G., J.J., S.S., K.B., M.C., D.M.G., U.H., M.W. and M.Z.; comparative, population and/or evolutionary analyses, S.M., G.A.W., S.B.C., C.C., S.M.M., B.A., M.T.-T. and M.G.; and GWAS, S.M.M., M.A.B., P.G., J.D.K., P.N.M., J.M.O. and C.A.U.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (10,280 KB)

    Supplementary Figures 1–25, Supplementary Tables 1–15 and 18–22, and Supplementary Note

Excel files

  1. Supplementary Table 16 (567 KB)

    Mesoamerican domestication candidates.

  2. Supplementary Table 17 (243 KB)

    Andean domestication candidates.

Additional data