Haplotype-resolved sweet potato genome traces back its hexaploidization history

Abstract

Here we present the 15 pseudochromosomes of sweet potato, Ipomoea batatas, the seventh most important crop in the world and the fourth most significant in China. By using a novel haplotyping method based on genome assembly, we have produced a half haplotype-resolved genome from ~296 Gb of paired-end sequence reads amounting to roughly 67-fold coverage. By phylogenetic tree analysis of homologous chromosomes, it was possible to estimate the time of two recent whole-genome duplication events as occurring about 0.8 and 0.5 million years ago. This half haplotype-resolved hexaploid genome represents the first successful attempt to investigate the complexity of chromosome sequence composition directly in a polyploid genome, using sequencing of the polyploid organism itself rather than any of its simplified proxy relatives. Adaptation and application of our approach should provide higher resolution in future genomic structure investigations, especially for similarly complex genomes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Outline of current sweet potato genome assembly.
Fig. 2: Summary of variations.
Fig. 3: Illustration of seed-finding algorithm.
Fig. 4: Identified gene clusters in present I. batatas genome.
Fig. 5: Evolutionary history of cultivated I. batatas revealed by phylogenetic analysis of homologous chromosome regions.

References

  1. 1.

    Crops (FAO, accessed 1 August 2017); http://www.fao.org/faostat/en/#data/QC

  2. 2.

    Ozias-Akins, P. & Jarret, R. L. Nuclear DNA content and ploidy levels in the genus Ipomoea. J. Am. Soc. Hortic. Sci. 119, 110–115 (1994).

  3. 3.

    Ukoskit, K. & Thompson, P. G. Autopolyploidy versus allopolyoloidy and low-density randomly amplified polymorphic DNA linkage maps of sweetpotato. J. Am. Soc. Hortic. Sci. 122, 822–828 (1997).

  4. 4.

    Kriegner, A., Cervantes, J. C., Burg, K., Mwanga, R. O. M. & Zhang, D. A genetic linkage map of sweetpotato (Ipomoea batatas (L.) Lam.) based on AFLP markers. Mol. Breeding 11, 169–185 (2003).

  5. 5.

    Hirakawa, H. et al. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Res. 22, 171–179 (2015).

  6. 6.

    The Potato Genome Sequencing Consortium. Genome sequence and analysis of the tuber crop potato. Nature 475, 189–195 (2011).

  7. 7.

    Li, F. et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. 33, 524–530 (2015).

  8. 8.

    Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103 (2012).

  9. 9.

    Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).

  10. 10.

    Ling, H. et al. Draft genome of the wheat A-genome progenitor Triticum urartu. Nature 496, 87–90 (2013).

  11. 11.

    Jia, J. et al. Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91–95 (2013).

  12. 12.

    The International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).

  13. 13.

    Choulet, F. et al. Structural and functional partitioning of bread wheat chromosome 3B. Science 345, 1249721 (2014).

  14. 14.

    Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24, 1384–1395 (2014).

  15. 15.

    Zimin, A. V. et al. The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677 (2013).

  16. 16.

    Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).

  17. 17.

    Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).

  18. 18.

    Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).

  19. 19.

    The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  20. 20.

    Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2002).

  21. 21.

    Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

  22. 22.

    Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at http://arxiv.org/abs/1207.3907v2 (2012).

  23. 23.

    Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).

  24. 24.

    Hoshino, A., Jayakumar, V., Nitasaka, E., Toyoda, A., Noguchi, H., Itoh, T., Shin-I, T., Minakuchi, Y., Koda, Y. & Nagano, A. J. et al. Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nat. Commun. 7, 13295 (2016).

  25. 25.

    Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

  26. 26.

    Niknafs, Y. S., Pandian, B., Iyer, H. K., Chinnaiyan, A. M. & Iyer, M. K. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat. Methods 14, 68–70 (2017).

  27. 27.

    Smit, A. & Hubley, R. RepeatModeler - 1.0.8 (Institute for Systems Biology, 2015); https://sourceforge.net/u/djinnome/jamg/ci/47152a01077445af52726d76270e60bb360bb2f2/tree/3rd_party/RepeatModeler-open-1.0.8/

  28. 28.

    Kyndt, T. et al. The genome of cultivated sweet potato contains Agrobacterium T-DNAs with expressed genes: an example of a naturally transgenic food crop. Proc. Natl Acad. Sci. USA 112(18), 5844–5849 (2015).

  29. 29.

    Nützmann, H. W. & Osbourn, A. Regulation of metabolic gene clusters in Arabidopsis thaliana. New Phytol. 205, 503–510 (2015).

  30. 30.

    Fernie, A. R. & Tohge, T. Location, location, location – no more! The unravelling of chromatin remodeling regulatory aspects of plant metabolic gene clusters. New Phytol. 205, 458–460 (2015).

  31. 31.

    Boycheva, S., Daviet, L., Wolfender, J. L. & Fitzpatrick, T. B. The rise of operon-like gene clusters in plants. Trends Plant Sci. 19, 447–459 (2014).

  32. 32.

    Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).

  33. 33.

    Kumar, S., Stecher, G., Peterson, D. & Tamura, K. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics 28, 2685–2686 (2012).

  34. 34.

    Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94 (2010).

  35. 35.

    Li, K. T. et al. Increased bioavailable vitamin B6 in field-grown transgenic cassava for dietary sufficiency. Nat. Biotechnol. 33, 1029–1032 (2015).

  36. 36.

    Kim, S. H. & Hamada, T. Rapid and reliable method of extracting DNA and RNA from sweetpotato, Ipomoea batatas (L). Lam. Biotechnol. Lett. 27, 1841–1845 (2005).

  37. 37.

    Firon, N. et al. Transcriptional profiling of sweetpotato (Ipomoea batatas) roots indicates down-regulation of lignin biosynthesis and up-regulation of starch biosynthesis at an early stage of storage root formation. BMC Genomics 14, 460 (2013).

  38. 38.

    Wang, Z. et al. De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweetpotato (Ipomoea batatas). BMC Genomics 11, 726 (2010).

  39. 39.

    Xie, F. et al. De novo sequencing and a comprehensive analysis of purple sweet potato (Ipomoea batatas L.) transcriptome. Planta 236, 101–113 (2012).

  40. 40.

    Tao, X. et al. Digital gene expression analysis based on integrated de novo transcriptome assembly of sweet potato [Ipomoea batatas (L.) Lam.]. PLoS ONE 7, e36234 (2012).

  41. 41.

    Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

Download references

Acknowledgements

We thank J. Dai and Z. Nikoloski for helpful discussions during the haplotyping. We thank J. Zhu and Y. Jiang from Purdue University, J. Yu from Iowa State University and G. Gheysen from Ghent University for their invaluable comments during proofreading. J. Yang acknowledges support from the Alexander von Humboldt Foundation (Forschungsstipendium für erfahrene Wissenschaftler). M-Hossein Moeinzadeh acknowledges support from IMPRS-CBSC doctoral programme. This project was funded by the International Science & Technology Cooperation Program of China (2015DFG32370), the National Natural Science Foundation of China (31201254, 31361140366, 31501353), the National High Technology Research and Development Program of China (2011AA100607-4, 2012AA101204-3), the Chinese Academy of Sciences (2012KIP518), the China Postdoctoral Science Foundation (2012M520945), the Shanghai Municipal Afforestation & City Appearance and Environmental Sanitation Administration (G102410, F122422, F132427, G142434, G152429) and the Science and Technology Commission of Shanghai Municipality (14DZ2260400, 14ZR1414100).

Author information

J.Y., M-H.M., H.K., A.R.F., B.T., P.Z. and M.V. planned and coordinated the project and wrote the manuscript. G.-L.L., J.-L.Z. and Z.S. supplied the newly bred cultivar, Taizhong6. W.-J. F., G.-F.D. H.-X.W. and S.-S.Z. prepared genomic DNA. H.K. conducted the primary genome assembly and repeat sequence identification. J.Y. and M-H.M. conducted haplotyping and genome evolution analysis. S.B. managed part of sequencing work. J.H., P.X., S.H. and F.-H.H. supported and inspired a part of the analysis.

Correspondence to Peng Zhang or Martin Vingron.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Supplementary Figures 1–9, Supplementary Note.

Supplementary Table 1

Statistics of QC-passed reads and mapped sequence data obtained from all libraries.

Supplementary Table 2

Putative gene clusters list, yellow background indicates eight gene clusters shown in Fig. 4.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading