Abstract

Plant genomes are often characterized by a high level of repetitiveness and polyploid nature. Consequently, creating genome assemblies for plant genomes is challenging. The introduction of short-read technologies 10 years ago substantially increased the number of available plant genomes. Generally, these assemblies are incomplete and fragmented, and only a few are at the chromosome scale. Recently, Pacific Biosciences and Oxford Nanopore sequencing technologies were commercialized that can sequence long DNA fragments (kilobases to megabase) and, using efficient algorithms, provide high-quality assemblies in terms of contiguity and completeness of repetitive regions1,2,3,4. However, even though genome assemblies based on long reads exhibit high contig N50s (>1 Mb), these methods are still insufficient to decipher genome organization at the chromosome level. Here, we describe a strategy based on long reads (MinION or PromethION sequencers) and optical maps (Saphyr system) that can produce chromosome-level assemblies and demonstrate applicability by generating high-quality genome sequences for two new dicotyledon morphotypes, Brassica rapa Z1 (yellow sarson) and Brassica oleracea HDEM (broccoli), and one new monocotyledon, Musa schizocarpa (banana). All three assemblies show contig N50s of >5 Mb and contain scaffolds that represent entire chromosomes or chromosome arms.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The genome assemblies, gene predictions and genome browsers are freely available at http://www.genoscope.cns.fr/plants. The Illumina, MinION and PromethION data, the assemblies and the annotations are available in the European Nucleotide Archive under the following projects: PRJEB26620 (B.rapa), PRJEB26621 (B.oleracea) and PRJEB26661 (M.schizocarpa). Germplasm for these genomes will be made freely and publicly available to the entire community. M.schizocarpa germplasm is available at Bioversity International Transit Center under ITC number ITC0926. B.rapa ssp. trilocularis (genotype Z1) is available at the Plant Genetic Resources of Canada and B.oleracea ssp. italica (genotype HDEM) is available at the Biological Resource Center BrACySol, Rennes, France. All supporting data are included in the Supplementary Information.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).

  2. 2.

    Jiao, W. B. & Schneeberger, K. The impact of third generation genomic technologies on plant genome assembly. Curr. Opin. Plant. Biol. 36, 64–70 (2017).

  3. 3.

    Michael, T. P. et al. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9, 541 (2018).

  4. 4.

    Schmidt, M. H. et al. De novo assembly of a new Solanum pennellii accession using nanopore sequencing. Plant Cell 29, 2336–2348 (2017).

  5. 5.

    Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).

  6. 6.

    International Rice Genome Sequencing Project The map-based sequence of the rice genome. Nature 436, 793–800 (2005).

  7. 7.

    Du, H. et al. Sequencing and de novo assembly of a near complete indica rice genome. Nat. Commun. 8, 15324 (2017).

  8. 8.

    Edger, P. P. et al. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity. Gigascience 7, 1–7 (2018).

  9. 9.

    Dassanayake, M. et al. The genome of the extremophile crucifer Thellungiella parvula. Nat. Genet. 43, 913–918 (2011).

  10. 10.

    International Brachypodium Initiative Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010).

  11. 11.

    Raymond, O. et al. The Rosa genome provides new insights into the domestication of modern roses. Nat. Genet. 50, 772–777 (2018).

  12. 12.

    Cheng, F. et al. Subgenome parallel selection is associated with morphotype diversification and convergent crop domestication in Brassica rapa and Brassica oleracea. Nat. Genet. 48, 1218–1224 (2016).

  13. 13.

    Cai, C. C. et al. Brassica rapa genome 2.0: a reference upgrade through sequence re-assembly and gene re-annotation. Mol. Plant 10, 649–651 (2017).

  14. 14.

    Wang, X. W. et al. The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1035–1039 (2011).

  15. 15.

    Parkin, I. A. et al. Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea. Genome Biol. 15, R77 (2014).

  16. 16.

    D’Hont, A. et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488, 213–217 (2012).

  17. 17.

    Martin, G. et al. Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genomics 17, 243 (2016).

  18. 18.

    Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 30, 771–776 (2012).

  19. 19.

    Sakai, H. et al. The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome. Sci. Rep. 5, 16780 (2015).

  20. 20.

    Wang, X. et al. Genomic analyses of primitive, wild and cultivated citrus provide insights into asexual reproduction. Nat. Genet. 49, 765–772 (2017).

  21. 21.

    Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623–630 (2015).

  22. 22.

    Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

  23. 23.

    Schranz, M. E. et al. Characterization and effects of the replicated flowering time gene FLC in Brassica rapa. Genetics 162, 1457–1468 (2002).

  24. 24.

    Goubet, P. M. et al. Contrasted patterns of molecular evolution in dominant and recessive self-incompatibility haplotypes in Arabidopsis. PLoS Genet. 8, e1002495 (2012).

  25. 25.

    Shiba, H. et al. Genomic organization of the S-locus region of Brassica. Biosci. Biotechnol. Biochem. 67, 622–626 (2003).

  26. 26.

    Bachmann, J. A., Tedder, A., Laenen, B., Steige, K. A. & Slotte, T. Targeted long-read sequencing of a locus under long-term balancing selection in Capsella. G3 (Bethesda) 8, 1327–1333 (2018).

  27. 27.

    Kim, D., Jung, J., Choi, Y. O. & Kim, S. Development of a system for S locus haplotyping based on the polymorphic SLL2 gene tightly linked to the locus determining self-incompatibility in radish (Raphanus sativus L.). Euphytica 209, 525–535 (2016).

  28. 28.

    Yang, J. H. et al. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Nat. Genet. 48, 1225–1232 (2016).

  29. 29.

    Jarvis, D. E. et al. The genome of Chenopodium quinoa. Nature 542, 307–312 (2017).

  30. 30.

    Jiao, W. B. et al. Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res. 27, 778–786 (2017).

  31. 31.

    Reyes-Chin-Wo, S. et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, 14953 (2017).

  32. 32.

    Teh, B. T. et al. The draft genome of tropical fruit durian (Durio zibethinus). Nat. Genet. 49, 1633–1641 (2017).

  33. 33.

    Gawel, N. J. & Jarret, R. L. A modified CTAB DNA extraction procedure for Musa and Ipomoea. Plant Mol. Biol. Rep. 9, 262–266 (1991).

  34. 34.

    Risterucci, A. M. et al. A high-density linkage map of Theobroma cacao L. Theor. Appl. Genet. 101, 948–955 (2000).

  35. 35.

    Engelen, S. & Aury J. M. Fastxtend tool (Genoscope/CEA, 2015); http://www.genoscope.cns.fr/fastxtend/

  36. 36.

    Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).

  37. 37.

    Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).

  38. 38.

    Vaser, R. et al. Ra assembler. v. git commit 65bedfe (Faculty of Electrical Engineering and Computing, University of Zagreb, 2017); https://github.com/rvaser/ra

  39. 39.

    Ruan, J. et al. SMARTdenovo assembler. v. git commit 3d9c22e (Agricultral Genomics Insititute, China, 2015) ; https://github.com/ruanjue/smartdenovo

  40. 40.

    Wick, R. et al. Fitlong tool. v. git commit 8d81024 (University of Melbourne, 2017); https://github.com/rrwick/Filtlong

  41. 41.

    Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

  42. 42.

    Vaser, R., Sovic, I., Nagarajan, N. & Sikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).

  43. 43.

    Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).

  44. 44.

    de Givry, S., Bouchez, M., Chabrier, P., Milan, D. & Schiex, T. CARHTA GENE: multipopulation integrated genetic and radiation hybrid mapping. Bioinformatics 21, 1703–1704 (2005).

  45. 45.

    Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

  46. 46.

    RepeatMasker Open-4. 0 (Institute for Systems Biology, 2013); http://www.repeatmasker.org

  47. 47.

    Chalhoub, B. et al. Plant genetics. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950–953 (2014).

  48. 48.

    Morgulis, A., Gertz, E. M., Schaffer, A. A. & Agarwala, R. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J. Comput. Biol. 13, 1028–1040 (2006).

  49. 49.

    Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

  50. 50.

    Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

  51. 51.

    Dubarry, M. et al. Gmove a tool for eukaryotic gene predictions using various evidences (poster). F1000Res. 5, 681 (2016).

  52. 52.

    Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).

  53. 53.

    Marcais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

  54. 54.

    Nettstad M. Dot (DNA Nexus, 2017); http://github.com/dnanexus/dot

  55. 55.

    Dereeper, A. et al. Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 36, W465–W469 (2008).

Download references

Acknowledgements

This work was supported by the Genoscope, the Commissariat à l’Energie Atomique et aux Energies Alternatives (CEA) and France Génomique (ANR-10-INBS-09-08). We are grateful to ONT for early access to the MinION device through the MinION Access Programme and we thank their staff for technical help. Work by X.V. and M.G. is supported financially by Région Hauts-de-France, the Ministère de l’Enseignement Supérieur et de la Recherche (CPER Climibio) and the European Fund for Regional Economic Development.

Author information

Author notes

  1. These authors contributed equally: Caroline Belser, Benjamin Istace, Erwan Denis, Marion Dubarry.

Affiliations

  1. Genoscope, Institut de biologie François-Jacob, Commissariat à l’Energie Atomique (CEA), Université Paris-Saclay, Evry, France

    • Caroline Belser
    • , Benjamin Istace
    • , Erwan Denis
    • , Marion Dubarry
    • , Wahiba Berrabah
    • , Stefan Engelen
    • , Arnaud Lemainque
    • , Benjamin Noel
    • , Valérie Barbe
    • , Corinne Cruaud
    •  & Jean-Marc Aury
  2. CIRAD, UMR AGAP, Montpellier, France

    • Franc-Christophe Baurens
    • , Guillaume Martin
    •  & Angélique D’Hont
  3. AGAP, Université Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France

    • Franc-Christophe Baurens
    • , Guillaume Martin
    •  & Angélique D’Hont
  4. IGEPP, INRA, Agrocampus Ouest, Université Rennes 1, BP35327, Le Rheu, France

    • Cyril Falentin
    • , Anne-Marie Chèvre
    • , Régine Delourme
    • , Gwenaëlle Deniot
    • , Philippe Duffé
    • , Maria Manzanares-Dauleux
    • , Jérôme Morice
    •  & Mathieu Rousseau-Gueutin
  5. Université Lille, CNRS, UMR 8198—Evo-Eco-Paleo, Lille, France

    • Mathieu Genete
    •  & Xavier Vekemans
  6. Génomique Métabolique, Genoscope, Institut de biologie François Jacob, CEA, CNRS, Université d’Evry, Université Paris-Saclay, Evry, France

    • France Denoeud
    •  & Patrick Wincker

Authors

  1. Search for Caroline Belser in:

  2. Search for Benjamin Istace in:

  3. Search for Erwan Denis in:

  4. Search for Marion Dubarry in:

  5. Search for Franc-Christophe Baurens in:

  6. Search for Cyril Falentin in:

  7. Search for Mathieu Genete in:

  8. Search for Wahiba Berrabah in:

  9. Search for Anne-Marie Chèvre in:

  10. Search for Régine Delourme in:

  11. Search for Gwenaëlle Deniot in:

  12. Search for France Denoeud in:

  13. Search for Philippe Duffé in:

  14. Search for Stefan Engelen in:

  15. Search for Arnaud Lemainque in:

  16. Search for Maria Manzanares-Dauleux in:

  17. Search for Guillaume Martin in:

  18. Search for Jérôme Morice in:

  19. Search for Benjamin Noel in:

  20. Search for Xavier Vekemans in:

  21. Search for Angélique D’Hont in:

  22. Search for Mathieu Rousseau-Gueutin in:

  23. Search for Valérie Barbe in:

  24. Search for Corinne Cruaud in:

  25. Search for Patrick Wincker in:

  26. Search for Jean-Marc Aury in:

Contributions

C.F., G.D., F.-C.B., E.D. and C.C. extracted the DNA. C.C. and A.L. optimized and performed the sequencing. E.D., W.B. and V.B. generated the optical maps. P.D., R.D. and M.M.-D. generated the genetic map for the B.oleracea HDEM accession. B.I., C.B. and J.-M.A. performed the genome assemblies. G.M. performed the anchoring of the M.schizocarpa scaffolds. C.F., J.M. and M.R.-G. performed the anchoring of the B.oleracea scaffolds. M.D. and J.-M.A. performed the anchoring of the B.rapa scaffolds. M.D. and B.N. performed the gene prediction for the genome assemblies. B.I., C.B., M.D., F.D., J.-M.A. and S.E. performed the bioinformatic analyses. X.V. and M.G. performed the S-locus annotation of the two Brassicaceae genomes. B.I., C.B., M.D. and J.-M.A. wrote the article. A.D., A.-M.C., P.W. and J.-M.A. supervised the study.

Competing interests

The authors declare no competing interests. B.I., S.E., C.C., P.W. and J.-M.A. are part of the MinION Access Programme and J.-M.A. received travel and accommodation expenses to speak at ONT conferences.

Corresponding author

Correspondence to Jean-Marc Aury.

Supplementary information

  1. Supplementary Information

    Supplementary Tables 1–21 and Supplementary Figures 1–19.

  2. Reporting Summary

  3. Supplementary File 2

    Detailed information about the 105 plant genome assemblies.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41477-018-0289-4

Further reading