Nature 463, 178-183 (14 January 2010) | doi:10.1038/nature08670; Received 19 August 2009; Accepted 12 November 2009

There is a Corrigendum (6 May 2010) associated with this document.

Genome sequence of the palaeopolyploid soybean

Jeremy Schmutz1,2, Steven B. Cannon3, Jessica Schlueter4,5, Jianxin Ma5, Therese Mitros6, William Nelson7, David L. Hyten8, Qijian Song8,9, Jay J. Thelen10, Jianlin Cheng11, Dong Xu11, Uffe Hellsten2, Gregory D. May12, Yeisoo Yu13, Tetsuya Sakurai14, Taishi Umezawa14, Madan K. Bhattacharyya15, Devinder Sandhu16, Babu Valliyodan17, Erika Lindquist2, Myron Peto3, David Grant3, Shengqiang Shu2, David Goodstein2, Kerrie Barry2, Montona Futrell-Griggs5, Brian Abernathy5, Jianchang Du5, Zhixi Tian5, Liucun Zhu5, Navdeep Gill5, Trupti Joshi11, Marc Libault17, Anand Sethuraman1, Xue-Cheng Zhang17, Kazuo Shinozaki14, Henry T. Nguyen17, Rod A. Wing13, Perry Cregan8, James Specht18, Jane Grimwood1,2, Dan Rokhsar2, Gary Stacey10,17, Randy C. Shoemaker3 & Scott A. Jackson5

  1. HudsonAlpha Genome Sequencing Center, 601 Genome Way, Huntsville, Alabama 35806, USA
  2. Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California 94598, USA
  3. USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, Iowa 50011, USA
  4. Department of Bioinformatics and Genomics, 9201 University City Blvd, University of North Carolina at Charlotte, Charlotte, North Carolina 28223, USA
  5. Department of Agronomy, Purdue University, 915 W. State Street, West Lafayette, Indiana 47906, USA
  6. Center for Integrative Genomics, University of California, Berkeley, California 94720, USA
  7. Arizona Genomics Computational Laboratory, BIO5 Institute, 1657 E. Helen Street, The University of Arizona, Tucson, Arizona 85721, USA
  8. USDA, ARS, Soybean Genomics and Improvement Laboratory, B006, BARC-West, Beltsville, Maryland 20705, USA
  9. Department Plant Science and Landscape Architecture, University of Maryland, College Park, Maryland 20742, USA
  10. Division of Biochemistry & Interdisciplinary Plant Group, 109 Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA
  11. Department of Computer Science, University of Missouri, Columbia, Missouri 65211, USA
  12. The National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, New Mexico 87505, USA
  13. Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, Arizona 85721, USA
  14. RIKEN Plant Science Center, Yokohama 230-0045, Japan
  15. Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA
  16. Department of Biology, University of Wisconsin-Stevens Point, Stevens Point, Wisconsin 54481, USA
  17. National Center for Soybean Biotechnology, Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211, USA
  18. Department of Agronomy and Horticulture, University of Nebraska, Lincoln, Nebraska 68583, USA

Correspondence to: Scott A. Jackson5 Correspondence and requests for materials should be addressed to S.A.J. (Email:

This article is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share Alike licence (, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This licence does not permit commercial exploitation, and derivative works must be licensed under the same or similar licence.


Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.


These links to content published by NPG are automatically generated.