Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data

Abstract

Construction of chromosome-level assembly is a vital step in achieving the goal of a ‘Platinum’ genome, but it remains a major challenge to assemble and anchor sequences to chromosomes in autopolyploid or highly heterozygous genomes. High-throughput chromosome conformation capture (Hi-C) technology serves as a robust tool to dramatically advance chromosome scaffolding; however, existing approaches are mostly designed for diploid genomes and often with the aim of reconstructing a haploid representation, thereby having limited power to reconstruct chromosomes for autopolyploid genomes. We developed a novel algorithm (ALLHiC) that is capable of building allele-aware, chromosomal-scale assembly for autopolyploid genomes using Hi-C paired-end reads with innovative ‘prune’ and ‘optimize’ steps. Application on simulated data showed that ALLHiC can phase allelic contigs and substantially improve ordering and orientation when compared to other mainstream Hi-C assemblers. We applied ALLHiC on an autotetraploid and an autooctoploid sugar-cane genome and successfully constructed the phased chromosomal-level assemblies, revealing allelic variations present in these two genomes. The ALLHiC pipeline enables de novo chromosome-level assembly of autopolyploid genomes, separating each allele. Haplotype chromosome-level assembly of allopolyploid and heterozygous diploid genomes can be achieved using ALLHiC, overcoming obstacles in assembling complex genomes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of major steps in the ALLHiC algorithm.
Fig. 2: Description of Hi-C scaffolding problem in the autopolyploid genome and application of the pruning approach for haplotype phasing.
Fig. 3: Network graph of Hi-C links based on a synthetic genome by combing the genome sequences of rice subspecies O. sativa spp. japonica (green dots) and O. sativa spp. indica (orange dots).
Fig. 4: Validation of partitioning on five simulations derived from the rice Nipponbare genome.
Fig. 5: Comparison of four Hi-C scaffolding algorithms, ALLHiC, LACHESIS, SALSA2 and 3D-DNA.
Fig. 6: Hi-C scaffolding of the autotetraploid sugar cane genome S. spontaneum AP85-441.

Similar content being viewed by others

Data availability

The Hi-C data (O. sativa L. japonica cv. Nipponbare, O. sativa L. indica cv. 93-11 and S. spontaneum L. AP85-441) generated in this study have been deposited in the GSA database (http://gsa.big.ac.cn) under BioProject No. PRJCA001420 and accession No. CRA001597. Other published datasets used for ALLHiC testing are listed in Supplementary Table 1.

References

  1. Ekblom, R. & Wolf, J. B. A field guide to whole-genome sequencing, assembly and annotation. Evolut. Appl. 7, 1026–1042 (2014).

    Article  Google Scholar 

  2. Deschamps, S. et al. A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping. Nat. Commun. 9, 4844 (2018).

  3. Belser, C. et al. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps. Nat. Plants 4, 879–887 (2018).

    Article  CAS  PubMed  Google Scholar 

  4. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).

  6. Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Reyes-Chin-Wo, S. et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, 14953 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Dekker, J. The three ‘C’ s of chromosome conformation capture: controls, controls, controls. Nat. Methods 3, 17–21 (2006).

    Article  CAS  PubMed  Google Scholar 

  9. van Berkum, N. L. et al. Hi-C: a method to study the three-dimensional architecture of genomes. J. Vis. Exp. https://doi.org/10.3791/1869 (2010).

  10. Zhang, L. et al. RNA sequencing provides insights into the evolution of lettuce and the regulation of flavonoid biosynthesis. Nat. Commun. 8, 2264 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Jarvis, D. E. et al. The genome of Chenopodium quinoa. Nature 542, 307–312 (2017).

    Article  CAS  PubMed  Google Scholar 

  12. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Ghurye, J., Pop, M., Koren, S., Bickhart, D. & Chin, C. S. Scaffolding of long read assemblies using long range contact information. BMC Genom. 18, 527 (2017).

    Article  Google Scholar 

  16. Schmitt, A. D., Hu, M. & Ren, B. Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol. 17, 743–755 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wood, T. E. et al. The frequency of polyploid speciation in vascular plants. Proc. Natl Acad. Sci. USA 106, 13875–13879 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ming, R. & Man Wai, C. Assembling allopolyploid genomes: no longer formidable. Genome Biol. 16, 27 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).

    Article  CAS  PubMed  Google Scholar 

  20. Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).

    Article  CAS  PubMed  Google Scholar 

  21. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).

    Article  CAS  PubMed  Google Scholar 

  22. Sierro, N. et al. The tobacco genome sequence and its comparison with those of tomato and potato. Nat. Commun. 5, 3833 (2014).

    Article  CAS  PubMed  Google Scholar 

  23. Sierro, N. et al. Reference genomes and transcriptomes of Nicotiana sylvestris and Nicotiana tomentosiformis. Genome Biol. 14, R60 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Bertioli, D. J. et al. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat. Genet. 48, 438–446 (2016).

    Article  CAS  PubMed  Google Scholar 

  25. Zhuang, W. et al. The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication. Nat. Genet. 51, 865–876 (2019).

  26. Chapman, J. A. et al. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol. 16, 26 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  27. The Potato Genome Sequencing,. et al. Genome sequence and analysis of the tuber crop potato. Nature 475, 189–195 (2011).

  28. Yang, J. et al. Haplotype-resolved sweet potato genome traces back its hexaploidization history. Nat. Plants 3, 696–703 (2017).

  29. Kronenberg, Z. N. et al. FALCON-Phase: integrating PacBio and Hi-C data for phased diploid genomes. Preprint at https://doi.org/10.1101/327064 (2018).

  30. Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zhang, Q. et al. N(6)-Methyladenine DNA methylation in japonica and indica rice genomes and Its association with gene expression, plant development, and stress responses. Mol. Plant 11, 1492–1508 (2018).

    Article  CAS  PubMed  Google Scholar 

  32. Schneeberger, K. et al. Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proc. Natl Acad. Sci. USA 108, 10249–10254 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Xie, T. et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol. Plant 8, 489–492 (2015).

    Article  CAS  PubMed  Google Scholar 

  34. Ferhat et al. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 24, 999–1011 (2014).

    Article  Google Scholar 

  35. Bastian, M., Heymann, S. & Jacomy, M. Gephi: an open source software for exploring and manipulating networks. In International AAAI Conference on Weblogs and Social Media 361–362 (AAAI, 2009).

  36. Tang, H. Disentangling a polyploid genome. Nat. Plants 3, 688–689 (2017).

    Article  PubMed  Google Scholar 

  37. Wang, J. et al. Microcollinearity between autopolyploid sugarcane and diploid sorghum genomes. BMC Genomics 11, 261 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Zhang, J. et al. Recent polyploidization events in three Saccharum founding species. Plant Biotechnol. J. 17, 264–274 (2019).

    Article  CAS  PubMed  Google Scholar 

  39. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Marie-Nelly, H. et al. High-quality genome (re)assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).

  41. Irvine, J. E. Saccharum species as horticultural classes. Theor. Appl. Genet. 98, 186–194 (1999).

    Article  Google Scholar 

  42. Fávero, A. P., Simpson, C. E., Valls, J. F. M. & Vello, N. A. Study of the evolution of cultivated peanut through crossability studies among Arachis ipaensis, A. duranensis, and A. hypogaea. Crop Sci. 46, 1546–1552 (2006).

  43. Bertioli, D. J. et al. The genome sequence of segmental allotetraploid peanut Arachis hypogaea. Nat. Genet. 51, 877–884 (2019).

  44. Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).

  45. Wang, M. et al. Evolutionary dynamics of 3D genome architecture following polyploidization in cotton. Nat. Plants 4, 90–97 (2018).

    Article  CAS  PubMed  Google Scholar 

  46. Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (No. 2016YFD0100305 to H.T.), a National Natural Science Foundation of China grant (No. 31701874 to X.Z.) and the Fuzhou Science and Technology project (No. 2017-N-33 to X.Z). We also thank the Fujian provincial government for a Fujian ‘100 Talent Plan’ award (to H.T.).

Author information

Authors and Affiliations

Authors

Contributions

H.T. and X.Z. designed and implemented the ALLHiC software. X.Z., H.T., S.Z. and Q.Z. tested the software on various datasets. X.Z., H.T. and R.M. wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Haibao Tang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information: Nature Plants thanks Jean Marc Aury, Jay Ghurye and Yves van de Peer for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–5, Supplementary Tables 1–6 and Supplementary Figs. 1–36.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Zhang, S., Zhao, Q. et al. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019). https://doi.org/10.1038/s41477-019-0487-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41477-019-0487-8

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing