Haplotype-resolved sweet potato genome traces back its hexaploidization history

Published online:


Here we present the 15 pseudochromosomes of sweet potato, Ipomoea batatas, the seventh most important crop in the world and the fourth most significant in China. By using a novel haplotyping method based on genome assembly, we have produced a half haplotype-resolved genome from ~296 Gb of paired-end sequence reads amounting to roughly 67-fold coverage. By phylogenetic tree analysis of homologous chromosomes, it was possible to estimate the time of two recent whole-genome duplication events as occurring about 0.8 and 0.5 million years ago. This half haplotype-resolved hexaploid genome represents the first successful attempt to investigate the complexity of chromosome sequence composition directly in a polyploid genome, using sequencing of the polyploid organism itself rather than any of its simplified proxy relatives. Adaptation and application of our approach should provide higher resolution in future genomic structure investigations, especially for similarly complex genomes.

  • Subscribe to Nature Plants for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.


  1. 1.

    Crops (FAO, accessed 1 August 2017);

  2. 2.

    Ozias-Akins, P. & Jarret, R. L. Nuclear DNA content and ploidy levels in the genus Ipomoea. J. Am. Soc. Hortic. Sci. 119, 110–115 (1994).

  3. 3.

    Ukoskit, K. & Thompson, P. G. Autopolyploidy versus allopolyoloidy and low-density randomly amplified polymorphic DNA linkage maps of sweetpotato. J. Am. Soc. Hortic. Sci. 122, 822–828 (1997).

  4. 4.

    Kriegner, A., Cervantes, J. C., Burg, K., Mwanga, R. O. M. & Zhang, D. A genetic linkage map of sweetpotato (Ipomoea batatas (L.) Lam.) based on AFLP markers. Mol. Breeding 11, 169–185 (2003).

  5. 5.

    Hirakawa, H. et al. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Res. 22, 171–179 (2015).

  6. 6.

    The Potato Genome Sequencing Consortium. Genome sequence and analysis of the tuber crop potato. Nature 475, 189–195 (2011).

  7. 7.

    Li, F. et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. 33, 524–530 (2015).

  8. 8.

    Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103 (2012).

  9. 9.

    Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).

  10. 10.

    Ling, H. et al. Draft genome of the wheat A-genome progenitor Triticum urartu. Nature 496, 87–90 (2013).

  11. 11.

    Jia, J. et al. Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91–95 (2013).

  12. 12.

    The International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).

  13. 13.

    Choulet, F. et al. Structural and functional partitioning of bread wheat chromosome 3B. Science 345, 1249721 (2014).

  14. 14.

    Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24, 1384–1395 (2014).

  15. 15.

    Zimin, A. V. et al. The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677 (2013).

  16. 16.

    Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).

  17. 17.

    Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).

  18. 18.

    Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).

  19. 19.

    The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  20. 20.

    Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2002).

  21. 21.

    Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

  22. 22.

    Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at (2012).

  23. 23.

    Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).

  24. 24.

    Hoshino, A., Jayakumar, V., Nitasaka, E., Toyoda, A., Noguchi, H., Itoh, T., Shin-I, T., Minakuchi, Y., Koda, Y. & Nagano, A. J. et al. Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nat. Commun. 7, 13295 (2016).

  25. 25.

    Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

  26. 26.

    Niknafs, Y. S., Pandian, B., Iyer, H. K., Chinnaiyan, A. M. & Iyer, M. K. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat. Methods 14, 68–70 (2017).

  27. 27.

    Smit, A. & Hubley, R. RepeatModeler - 1.0.8 (Institute for Systems Biology, 2015);

  28. 28.

    Kyndt, T. et al. The genome of cultivated sweet potato contains Agrobacterium T-DNAs with expressed genes: an example of a naturally transgenic food crop. Proc. Natl Acad. Sci. USA 112(18), 5844–5849 (2015).

  29. 29.

    Nützmann, H. W. & Osbourn, A. Regulation of metabolic gene clusters in Arabidopsis thaliana. New Phytol. 205, 503–510 (2015).

  30. 30.

    Fernie, A. R. & Tohge, T. Location, location, location – no more! The unravelling of chromatin remodeling regulatory aspects of plant metabolic gene clusters. New Phytol. 205, 458–460 (2015).

  31. 31.

    Boycheva, S., Daviet, L., Wolfender, J. L. & Fitzpatrick, T. B. The rise of operon-like gene clusters in plants. Trends Plant Sci. 19, 447–459 (2014).

  32. 32.

    Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).

  33. 33.

    Kumar, S., Stecher, G., Peterson, D. & Tamura, K. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics 28, 2685–2686 (2012).

  34. 34.

    Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94 (2010).

  35. 35.

    Li, K. T. et al. Increased bioavailable vitamin B6 in field-grown transgenic cassava for dietary sufficiency. Nat. Biotechnol. 33, 1029–1032 (2015).

  36. 36.

    Kim, S. H. & Hamada, T. Rapid and reliable method of extracting DNA and RNA from sweetpotato, Ipomoea batatas (L). Lam. Biotechnol. Lett. 27, 1841–1845 (2005).

  37. 37.

    Firon, N. et al. Transcriptional profiling of sweetpotato (Ipomoea batatas) roots indicates down-regulation of lignin biosynthesis and up-regulation of starch biosynthesis at an early stage of storage root formation. BMC Genomics 14, 460 (2013).

  38. 38.

    Wang, Z. et al. De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweetpotato (Ipomoea batatas). BMC Genomics 11, 726 (2010).

  39. 39.

    Xie, F. et al. De novo sequencing and a comprehensive analysis of purple sweet potato (Ipomoea batatas L.) transcriptome. Planta 236, 101–113 (2012).

  40. 40.

    Tao, X. et al. Digital gene expression analysis based on integrated de novo transcriptome assembly of sweet potato [Ipomoea batatas (L.) Lam.]. PLoS ONE 7, e36234 (2012).

  41. 41.

    Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

Download references


We thank J. Dai and Z. Nikoloski for helpful discussions during the haplotyping. We thank J. Zhu and Y. Jiang from Purdue University, J. Yu from Iowa State University and G. Gheysen from Ghent University for their invaluable comments during proofreading. J. Yang acknowledges support from the Alexander von Humboldt Foundation (Forschungsstipendium für erfahrene Wissenschaftler). M-Hossein Moeinzadeh acknowledges support from IMPRS-CBSC doctoral programme. This project was funded by the International Science & Technology Cooperation Program of China (2015DFG32370), the National Natural Science Foundation of China (31201254, 31361140366, 31501353), the National High Technology Research and Development Program of China (2011AA100607-4, 2012AA101204-3), the Chinese Academy of Sciences (2012KIP518), the China Postdoctoral Science Foundation (2012M520945), the Shanghai Municipal Afforestation & City Appearance and Environmental Sanitation Administration (G102410, F122422, F132427, G142434, G152429) and the Science and Technology Commission of Shanghai Municipality (14DZ2260400, 14ZR1414100).

Author information

Author notes

  1. Jun Yang and M-Hossein Moeinzadeh contributed equally to this work.


  1. Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, 3888 Chenhua Road, Shanghai, 201602, China

    • Jun Yang
    •  & Shanshan Zhao
  2. Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195, Berlin, Germany

    • Jun Yang
    • , M-Hossein Moeinzadeh
    • , Peng Xiao
    • , Stefan Haas
    •  & Martin Vingron
  3. National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200032, China

    • Weijuan Fan
    • , Gaifang Deng
    • , Hongxia Wang
    • , Fenhong Hu
    •  & Peng Zhang
  4. Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany

    • Jun Yang
    •  & Alisdair R. Fernie
  5. Tai’an Academy of Agricultural Sciences, 16 Tailai Road, Tai’an, 271000, Shandong, China

    • Guiling Liu
    • , Jianli Zheng
    •  & Zhe Sun
  6. Max Planck Institute for Molecular Genetics, Sequencing Core Facility, Ihnestraße 63-73, 14195, Berlin, Germany

    • Heiner Kuhl
    • , Stefan Boerno
    •  & Bernd Timmermann
  7. Max Planck Institute for Molecular Genetics, Otto-Warburg-Laboratory: Computational Epigenomics Group, Ihnestraße 63-73, 14195, Berlin, Germany

    • Johannes Helmuth


  1. Search for Jun Yang in:

  2. Search for M-Hossein Moeinzadeh in:

  3. Search for Heiner Kuhl in:

  4. Search for Johannes Helmuth in:

  5. Search for Peng Xiao in:

  6. Search for Stefan Haas in:

  7. Search for Guiling Liu in:

  8. Search for Jianli Zheng in:

  9. Search for Zhe Sun in:

  10. Search for Weijuan Fan in:

  11. Search for Gaifang Deng in:

  12. Search for Hongxia Wang in:

  13. Search for Fenhong Hu in:

  14. Search for Shanshan Zhao in:

  15. Search for Alisdair R. Fernie in:

  16. Search for Stefan Boerno in:

  17. Search for Bernd Timmermann in:

  18. Search for Peng Zhang in:

  19. Search for Martin Vingron in:


J.Y., M-H.M., H.K., A.R.F., B.T., P.Z. and M.V. planned and coordinated the project and wrote the manuscript. G.-L.L., J.-L.Z. and Z.S. supplied the newly bred cultivar, Taizhong6. W.-J. F., G.-F.D. H.-X.W. and S.-S.Z. prepared genomic DNA. H.K. conducted the primary genome assembly and repeat sequence identification. J.Y. and M-H.M. conducted haplotyping and genome evolution analysis. S.B. managed part of sequencing work. J.H., P.X., S.H. and F.-H.H. supported and inspired a part of the analysis.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Peng Zhang or Martin Vingron.

Electronic supplementary material

  1. Supplementary Information

    Supplementary Figures 1–9, Supplementary Note.

  2. Supplementary Table 1

    Statistics of QC-passed reads and mapped sequence data obtained from all libraries.

  3. Supplementary Table 2

    Putative gene clusters list, yellow background indicates eight gene clusters shown in Fig. 4.