Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outbred mapping populations

Zhou, Chenxi; Olukolu, Bode; Gemenet, Dorcus C.; Wu, Shan; Gruneberg, Wolfgang; Cao, Minh Duc; Fei, Zhangjun; Zeng, Zhao-Bang; George, Andrew W.; Khan, Awais; Yencho, G. Craig; Coin, Lachlan J. M.

doi:10.1038/s41588-020-00717-7

Technical Report
Published: 30 October 2020

Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outbred mapping populations

Nature Genetics volume 52, pages 1256–1264 (2020)Cite this article

3983 Accesses
12 Citations
25 Altmetric
Metrics details

Subjects

Abstract

Despite advances in sequencing technologies, assembly of complex plant genomes remains elusive due to polyploidy and high repeat content. Here we report PolyGembler for grouping and ordering contigs into pseudomolecules by genetic linkage analysis. Our approach also provides an accurate method with which to detect and fix assembly errors. Using simulated data, we demonstrate that our approach is of high accuracy and outperforms three existing state-of-the-art genetic mapping tools. Particularly, our approach is more robust to the presence of missing genotype data and genotyping errors. We used our method to construct pseudomolecules for allotetraploid lawn grass utilizing PacBio long reads in combination with restriction site-associated DNA sequencing, and for diploid Ipomoea trifida and autotetraploid potato utilizing contigs assembled from Illumina reads in combination with genotype data generated by single-nucleotide polymorphism arrays and genotyping by sequencing, respectively. We resolved 13 assembly errors for a published I. trifida genome assembly and anchored eight unplaced scaffolds in the published potato genome.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Pseudomolecule construction for simulated datasets.**

**Fig. 3: The ITR_r1.0 scaffold Itr_sc000015 is a misassembly.**

**Fig. 4: Pseudomolecule construction for M9 × M19 and B2721 mapping populations.**

**Fig. 5: Collinear plots between the *Z. japonica* pseudomolecules constructed from PolyGembler and the *O. sativa* chromosomes.**

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Targeted genome-modification tools and their advanced applications in crop breeding

Article 24 April 2024

Genome assembly in the telomere-to-telomere era

Article 22 April 2024

Data availability

Data for the simulation studies, including comparisons with other methods and studies of M9 × M19 I. trifida and the B2721 potato, are available from http://data.genomicsresearch.org/Projects/polyGembler. Data for the 12601ab1 × Stirling potato mapping population were provided by C. Hackett. Data for the Z. japonica mapping population Carrizo × El Toro are available from the NCBI repository under the accession number SRP055007. The whole-genome PacBio sequence data for the Z. japonica cultivar Yaji are available from the NCBI repository under the accession number SRP110561. Data related to the PGSC version 4.03 pseudomolecules are available from http://solanaceae.plantbiology.msu.edu. The I. trifida de novo genome assembly ITR_r1.0 is available from http://sweetpotato-garden.kazusa.or.jp. The I. trifida de novo genome assembly NCNSP0306 is available from http://sweetpotato.plantbiology.msu.edu. Release 7 of the O. sativa reference genome is available from http://phytozome.jgi.doe.gov. The genome assembly of the Z. japonica accession Nagirizaki is available from http://zoysia.kazusa.or.jp. Source data are provided with this paper.

Code availability

The software PolyGembler, presented in this article, and its documentation are publicly available at GitHub (https://github.com/c-zhou/polyGembler).

References

Kyriakidou, M., Tai, H. H., Anglin, N. L., Ellis, D. & Strömvik, M. V. Current strategies of polyploid plant genome sequence assembly. Front. Plant Sci. 9, 1660 (2018).
PubMed PubMed Central Google Scholar
Bancroft, I. et al. Dissecting the genome of the polyploid crop oilseed rape by transcriptome sequencing. Nat. Biotechnol. 29, 762–766 (2011).
CAS PubMed Google Scholar
Wu, S. et al. Genome sequences of two diploid wild relatives of cultivated sweetpotato reveal targets for genetic improvement. Nat. Commun. 9, 4580 (2018).
PubMed PubMed Central Google Scholar
Fierst, J. L. Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools. Front. Genet. 6, 220 (2015).
PubMed PubMed Central Google Scholar
Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).
CAS PubMed Google Scholar
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced rad markers. PLoS ONE 3, e3376 (2008).
PubMed PubMed Central Google Scholar
Elshire, R. J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6, e19379 (2011).
CAS PubMed PubMed Central Google Scholar
Lander, E. S. & Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl Acad. Sci. USA 84, 2363–2367 (1987).
CAS PubMed PubMed Central Google Scholar
Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003).
CAS PubMed Google Scholar
Margarido, G., Souza, A. & Garcia, A. OneMap: software for genetic mapping in outcrossing species. Hereditas 144, 78–79 (2007).
CAS PubMed Google Scholar
Van Ooijen, J. Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genet. Res. 93, 343–349 (2011).
CAS Google Scholar
Rastas, P., Calboli, F. C., Guo, B., Shikano, T. & Merila¨, J. Construction of ultradense linkage maps with Lep-MAP2: stickleback F₂ recombinant crosses as an example. Genome Biol. Evol. 8, 78–93 (2016).
CAS Google Scholar
Hackett, C. & Luo, Z. TetraploidMap: construction of a linkage map in autotetraploid species. J. Hered. 94, 358–359 (2003).
CAS PubMed Google Scholar
Hackett, C. A., Boskamp, B., Vogogias, A., Preedy, K. F. & Milne, I. TetraploidSNPMap: software for linkage analysis and QTL mapping in autotetraploid populations using SNP dosage data. J. Hered. 108, 438–442 (2017).
CAS Google Scholar
Bourke, P. M. et al. polymapR—linkage analysis and genetic map construction from F₁ populations of outcrossing polyploids. Bioinformatics 34, 3496–3502 (2018).
CAS PubMed PubMed Central Google Scholar
Hirakawa, H. et al. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Res. 22, 171–179 (2015).
CAS PubMed PubMed Central Google Scholar
Consortium, P. G. S. et al. Genome sequence and analysis of the tuber crop potato. Nature 475, 189–195 (2011).
Google Scholar
Hoshino, A. et al. Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nat. Commun. 7, 13295 (2016).
CAS PubMed PubMed Central Google Scholar
Wang, F. et al. Sequence-tagged high-density genetic maps of Zoysia japonica provide insights into genome evolution in Chloridoideae. Plant J. 82, 744–757 (2015).
CAS PubMed Google Scholar
Ouyang, S. et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35, D883–D887 (2006).
PubMed PubMed Central Google Scholar
Tanaka, H. et al. Sequencing and comparative analyses of the genomes of zoysiagrasses. DNA Res. 23, 171–180 (2016).
CAS PubMed PubMed Central Google Scholar
Strehl, A. & Ghosh, J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003).
Google Scholar
Wu, Y., Bhat, P. R., Close, T. J. & Lonardi, S. Efficient and accurate construction of genetic linkage maps from the minimum spanning tree of a graph. PLoS Genet. 4, e1000212 (2008).
PubMed PubMed Central Google Scholar
Mascher, M. et al. Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ). Plant J. 76, 718–727 (2013).
CAS PubMed PubMed Central Google Scholar
Hahn, M. W., Zhang, S. V. & Moyle, L. C. Sequencing, assembling, and correcting draft genomes using recombinant populations. G3 (Bethesda) 4, 669–679 (2014).
Google Scholar
Su, S.-Y., White, J., Balding, D. J. & Coin, L. J. Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions. BMC Bioinformatics 9, 513 (2008).
PubMed PubMed Central Google Scholar
Zheng, C. et al. Probabilistic multilocus haplotype reconstruction in outcrossing tetraploids. Genetics 203, 119–131 (2016).
CAS PubMed PubMed Central Google Scholar
Jiao, W.-B. & Schneeberger, K. The impact of third generation genomic technologies on plant genome assembly. Curr. Opin. Plant Biol. 36, 64–70 (2017).
CAS PubMed Google Scholar
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019).
CAS PubMed Google Scholar
Kyriakidou, M., Anglin, N. L., Ellis, D., Tai, H. H. & Strömvik, M. V. Genome assembly of six polyploid potato genomes. Sci. Data 7, 88 (2020).
CAS PubMed PubMed Central Google Scholar
Voorrips, R. E. & Maliepaard, C. A. The simulation of meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics 13, 248 (2012).
PubMed PubMed Central Google Scholar
Huang, W., Li, L., Myers, J. R. & Marth, G. T. Art: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
PubMed Google Scholar
Love, R. R., Weisenfeld, N. I., Jaffe, D. B., Besansky, N. J. & Neafsey, D. E. Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly. BMC Genomics 17, 187 (2016).
PubMed PubMed Central Google Scholar
Li, Y. et al. DeepSimulator: a deep simulator for nanopore sequencing. Bioinformatics 34, 2899–2908 (2018).
CAS PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A.Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
CAS PubMed Google Scholar
Glaubitz, J. C. et al. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE 9, e90346 (2014).
PubMed PubMed Central Google Scholar
Rochette, N. C., Rivera-Colón, A. G. & Catchen, J. M. Stacks 2: analytical methods for paired-end sequencing improve RADseq-based population genomics. Mol. Ecol. 28, 4737–4754 (2019).
CAS PubMed Google Scholar
Gerard, D., Ferrão, L. F. V., Garcia, A. A. F. & Stephens, M. Genotyping polyploids from messy sequencing data. Genetics 210, 789–807 (2018).
PubMed PubMed Central Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
CAS PubMed PubMed Central Google Scholar
Csardi, G. & Nepusz, T. The igraph software package for complex network research. Int. J. Complex Syst. 1695, 1–9 (2006).
Google Scholar
Rosvall, M. & Bergstrom, C.Maps of information flow reveal community structure in complex networks. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).
CAS PubMed PubMed Central Google Scholar
Preedy, K. & Hackett, C. A rapid marker ordering approach for high-density genetic linkage maps in experimental autotetraploid populations using multidimensional scaling. Theor. Appl. Genet. 129, 2117–2132 (2016).
CAS PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
CAS PubMed PubMed Central Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
PubMed PubMed Central Google Scholar
Xie, M., Wu, Q., Wang, J. & Jiang, T. H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids. Bioinformatics 32, 3735–3744 (2016).
CAS PubMed Google Scholar

Download references

Acknowledgements

We thank F. Diaz for developing the M9 × M19 I. trifida mapping population and M. David for extracting and quantifying DNA from the M9 × M19 cross. The 12601ab1 × Stirling Infinium 8303 potato array data were provided by C. A. Hackett. This research was supported by grants from the Bill & Melinda Gates Foundation (OPP1052983) and Australian Research Council (DP170102626 awarded to L.J.M.C.). The work at the International Potato Center (CIP) was carried out as part of the Consultative Group for International Agricultural Research (CGIAR) Research Program on Roots, Tubers and Bananas, which is supported by CGIAR Fund Donors (http://www.cgiar.org/about-us/our-funders/). This research was also supported by use of the NeCTAR Research Cloud, by QCIF and by the University of Queensland’s Research Computing Centre. The NeCTAR Research Cloud is a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy.

Author information

Authors and Affiliations

Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
Chenxi Zhou, Minh Duc Cao & Lachlan J. M. Coin
Department of Clinical Pathology, University of Melbourne, Melbourne, Victoria, Australia
Chenxi Zhou & Lachlan J. M. Coin
Department of Horticultural Science, North Carolina State University, Raleigh, NC, USA
Bode Olukolu & G. Craig Yencho
Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
Bode Olukolu
International Potato Center, Lima, Peru
Dorcus C. Gemenet, Wolfgang Gruneberg & Awais Khan
CGIAR Excellence in Breeding Platform, International Maize and Wheat Improvement Center, Nairobi, Kenya
Dorcus C. Gemenet
Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
Shan Wu & Zhangjun Fei
Department of Statistics, North Carolina State University, Raleigh, NC, USA
Zhao-Bang Zeng
Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
Zhao-Bang Zeng
Data61, Commonwealth Scientific and Industrial Research Organisation, Brisbane, Queensland, Australia
Andrew W. George
Department of Plant Pathology and Plant–Microbe Biology, Cornell University, Geneva, NY, USA
Awais Khan
The Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
Lachlan J. M. Coin

Authors

Chenxi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bode Olukolu
View author publications
You can also search for this author in PubMed Google Scholar
Dorcus C. Gemenet
View author publications
You can also search for this author in PubMed Google Scholar
Shan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Gruneberg
View author publications
You can also search for this author in PubMed Google Scholar
Minh Duc Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhangjun Fei
View author publications
You can also search for this author in PubMed Google Scholar
Zhao-Bang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Andrew W. George
View author publications
You can also search for this author in PubMed Google Scholar
Awais Khan
View author publications
You can also search for this author in PubMed Google Scholar
G. Craig Yencho
View author publications
You can also search for this author in PubMed Google Scholar
Lachlan J. M. Coin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.Z. and L.J.M.C. designed the study and wrote the software. A.K., D.C.G. and W.G. developed and provided the I. trifida mapping population materials. B.O., D.C.G., S.W. and W.G. generated data for the M9 × M19 I. trifida mapping population. C.Z. performed the analysis. C.Z. and L.J.M.C. wrote the manuscript. L.J.M.C., G.C.Y., A.K., M.D.C., A.W.G., Z.-B.Z. and Z.F. supervised the project. All authors contributed to editing the final manuscript.

Corresponding author

Correspondence to Lachlan J. M. Coin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Pseudomolecule construction for 20× tetraploid simulated GBS data.

A total of 42,715 SNPs located on 678 scaffolds were used for linkage analysis. These scaffolds of ~482Mb covered approximately 99.6% of the genome. a, Dot plot for the RF estimations for scaffold pairs mapped to the same reference chromosome. The x- and y-axis represents the physical distances and the estimated RFs, respectively. b, Histogram of the RF estimations for scaffold pairs mapped to different reference chromosomes. c, Collinear plots of pseudomolecules mapped to reference chromosomes. The x- and y-axis represents physical positions (Mb) on the reference chromosomes and pseudomolecules, respectively. Each line represents a collinear block between the reference chromosome and the pseudomolecule. The diagonal line in each plot indicates a high correlation between the reference chromosome and the pseudomolecule constructed from scaffolds.

Extended Data Fig. 2 Collinear plots between the Ipomoea nil reference chromosomes and pseudomolecules constructed from the Ipomoea trifida genotype data.

The x- and y-axis represents the physical positions (Mb) on the reference chromosomes and pseudomolecules, respectively. Each line represents a collinear block between the Ipomoea nil reference chromosome and the pseudomolecules.

Extended Data Fig. 3 Genetic linkage map construction from the Infinium 8303 SNP array data of the Stirling×12601ab1 mapping population.

a, Dot plot for RF estimations between scaffold pairs mapped to the same PGSC v4.03 chromosomes. The x- and y-axis represents the physical distances and the estimated RFs, respectively. b, Histogram of the RF estimations for scaffold pairs mapped to different PGSC v4.03 pseudomolecules. c, Comparison between the genetic linkage map constructed by the proposed method and the PGSC v4.03 pseudomolecules. Twelve genetic linkage groups corresponding to 12 pseudomolecules were constructed. In each plot, the x-axis represents the positions (Mb) on the PGSC v4.03 pseudomolecules, and the y-axis represents the positions (cM) on the genetic linkage map.

Extended Data Fig. 4 Genetic linkage map constructed from the Infinium 8303 SNP array data of the B2721 mapping population with TetraploidSNPMap.

Each dot represents a SNP. The x-axis represents the positions (Mb) on the PGSC v4.03 pseudomolecules, and the y-axis represents the positions (cM) on the genetic linkage map. The genetic linkage map comprises a total of 4,745 SNPs including 56 SNPs located on the unplaced PGSC v4.03 scaffolds (red) and 76 SNPs placed in incorrect PGSC v4.03 pseudomolecules (blue). Since the physical positions of the red and blue dots cannot be determined, they were set to zero in the plots.

Extended Data Fig. 5 Genetic linkage map constructed from the Infinium 8303 SNP array data of the Stirling×12601ab1 mapping population with TetraploidSNPMap.

Each dot represents a SNP. The x-axis represents the positions (Mb) on the PGSC v4.03 pseudomolecules, and the y-axis represents the positions (cM) on the genetic linkage map. The genetic linkage map comprises a total of 3,593 SNPs including 54 SNPs located on the unplaced PGSC v4.03 scaffolds (red) and 35 SNPs placed in incorrect PGSC v4.03 pseudomolecules (blue). Since the physical positions of the red and blue dots cannot be determined, they were set to zero in the plots.

Extended Data Fig. 6 Collinear plots between the pseudomolecules of Zoysia japonica accession Yaji and Nagirizaki.

The x- and y-axis represent the positions (Mb) on the pseudomolecules. Each line represents a collinear block between the pseudomolecules.

Extended Data Fig. 7 Relationship between the number of genetic markers and computational resources required for the haplotype phasing algorithm.

The x- and y-axis (in logarithm scale) represents the number of genetic markers and the consumption of resources, respectively. a, CPU time and b, Memory. Each point in the plot was averaged over 30 independent experiments (Intel® Xeon® Processor E5-2667 v3 CPU, 3.20GHz). The error bar for one standard deviation was included at each point.

Source data

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Figs. 1 and 2 and Tables 1–6.

Reporting Summary

Source data

Source Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, C., Olukolu, B., Gemenet, D.C. et al. Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outbred mapping populations. Nat Genet 52, 1256–1264 (2020). https://doi.org/10.1038/s41588-020-00717-7

Download citation

Received: 10 March 2017
Accepted: 15 September 2020
Published: 30 October 2020
Issue Date: November 2020
DOI: https://doi.org/10.1038/s41588-020-00717-7

This article is cited by

Centromeric repeats in Citrus sinensis provide new insights into centromeric evolution and the distribution of G-quadruplex structures
- Shipeng Song
- Hui Liu
- Chunli Chen
Horticulture Advances (2023)
Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar
- Hequan Sun
- Wen-Biao Jiao
- Korbinian Schneeberger
Nature Genetics (2022)
Breeding and genetics of disease resistance in temperate fruit trees: challenges and new opportunities
- Awais Khan
- Schuyler S. Korban
Theoretical and Applied Genetics (2022)
Using probabilistic genotypes in linkage analysis of polyploids
- Yanlin Liao
- Roeland E. Voorrips
- Chris Maliepaard
Theoretical and Applied Genetics (2021)
Quinoa genome assembly employing genomic variation for guided scaffolding
- Alexandrina Bodrug-Schepers
- Nancy Stralis-Pavese
- Heinz Himmelbauer
Theoretical and Applied Genetics (2021)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links