Abstract
Most fresh bananas belong to the Cavendish and Gros Michel subgroups. Here, we report chromosome-scale genome assemblies of Cavendish (1.48 Gb) and Gros Michel (1.33 Gb), defining three subgenomes, Ban, Dh and Ze, with Musa acuminata ssp. banksii, malaccensis and zebrina as their major ancestral contributors, respectively. The insertion of repeat sequences in the Fusarium oxysporum f. sp. cubense (Foc) tropical race 4 RGA2 (resistance gene analog 2) promoter was identified in most diploid and triploid bananas. We found that the receptor-like protein (RLP) locus, including Foc race 1-resistant genes, is absent in the Gros Michel Ze subgenome. We identified two NAP (NAC-like, activated by apetala3/pistillata) transcription factor homologs specifically and highly expressed in fruit that directly bind to the promoters of many fruit ripening genes and may be key regulators of fruit ripening. Our genome data should facilitate the breeding and super-domestication of bananas.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Genome assemblies of Cavendish, Gros Michel and Zebrina v2.0 have been deposited into NCBI under GenBank numbers JAVVNX000000000, JAVVNW000000000 and JAVVNV000000000 and in the National Genomics Data Center BioProject database (https://ngdc.cncb.ac.cn/bioproject/) under the accession number PRJCA019650. Genome assemblies with annotations and results of ChIP–seq and DNase-seq can be accessed at FigShare (https://figshare.com/projects/Origin_and_evolution_of_the_triploid_cultivated_banana_genome/178041). Raw data used for the assemblies, including PacBio, Illumina and Hi-C data, are available through the Sequence Read Archive of the National Centre for Biotechnology Information (NCBI) under the BioProject PRJNA1017453 with SRA accessions from SRR23425440 to SRR23425472 and from SRR23885547 to SRR23885549. Fifty-eight RNA-seq datasets were downloaded from NCBI BioProject accessions PRJNA381300, PRJNA394594 and PRJNA598018. DNA methylation data were downloaded from NCBI BioProject PRJNA381300.
Code availability
Custom code and scripts for mapping the origins of chromosomal segments are available at FigShare (https://doi.org/10.6084/m9.figshare.21229205.v1)70. All public software used in this study is provided in the accompanying Nature Portfolio Reporting Summary.
References
Rouard, M. et al. Three new genome assemblies support a rapid radiation in Musa acuminata (wild banana). Genome Biol. Evol. 10, 3129–3140 (2018).
Langhe, E. D., Vrydaghs, L., Maret, P. D., Perrier, X. & Denham, T. Why bananas matter: an introduction to the history of banana domestication. Ethnobot. Res. Appl. 7, 322–326 (2008).
D'Hont, A. et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488, 213–217 (2012).
Wang, Z. et al. Musa balbisiana genome reveals subgenome evolution and functional divergence. Nat. Plants 5, 810–821 (2019).
Davey, M. W. et al. A draft Musa balbisiana genome sequence for molecular genetics in polyploid, inter- and intra-specific Musa hybrids. BMC Genomics 14, 683 (2013).
de Jesus, O. N. et al. Genetic diversity and population structure of Musa accessions in ex situ conservation. BMC Plant Biol. 13, 41 (2013).
Martin, G. et al. Genome ancestry mosaics reveal multiple and cryptic contributors to cultivated banana. Plant J. 102, 1008–1025 (2020).
Kallow, S. et al. Maximizing genetic representation in seed collections from populations of self and cross-pollinated banana wild relatives. BMC Plant Biol. 21, 415 (2021).
Martin, G. et al. Chromosome reciprocal translocations have accompanied subspecies evolution in bananas. Plant J. 104, 1698–1711 (2020).
Baurens, F. C. et al. Recombination and large structural variations shape interspecific edible bananas genomes. Mol. Biol. Evol. 36, 97–111 (2019).
Belser, C. et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun. Biol. 4, 1047 (2021).
Belser, C. et al. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps. Nat. Plants 4, 879–887 (2018).
Cenci, A. et al. Unravelling the complex story of intergenomic recombination in ABB allotriploid bananas. Ann. Bot. 127, 7–20 (2021).
Martin, G. et al. Interspecific introgression patterns reveal the origins of worldwide cultivated bananas in New Guinea. Plant J. 113, 802–818 (2023).
Lescot, T. Genetic diversity of banana in figures. FruiTrop 189, 58–62 (2008).
Stokstad, E. Banana fungus puts Latin America on alert. Science 365, 207–208 (2019).
Maxmen, A. CRISPR might be the banana’s only hope against a deadly fungus. Nature 574, 15 (2019).
Busche, M. et al. Genome sequencing of Musa acuminata dwarf Cavendish reveals a duplication of a large segment of chromosome 2. G3 10, 37–42 (2020).
Carreel, F. et al. Ascertaining maternal and paternal lineage within Musa by chloroplast and mitochondrial DNA RFLP analyses. Genome 45, 679–692 (2002).
Christelová, P. et al. Molecular and cytological characterization of the global Musa germplasm collection provides insights into the treasure of banana diversity. Biodivers. Conserv. 26, 801–824 (2017).
Wang, X., Yu, R. & Li, J. Using genetic engineering techniques to develop banana cultivars with Fusarium wilt resistance and ideal plant architecture. Front. Plant Sci. 11, 617528 (2020).
Stokstad, E. GM banana shows promise against deadly fungus strain. Science 358, 979 (2017).
Dale, J. et al. Transgenic Cavendish bananas with resistance to Fusarium wilt tropical race 4. Nat. Commun. 8, 1496 (2017).
Tripathi, L., Ntui, V. O. & Tripathi, J. N. CRISPR/Cas9-based genome editing of banana for disease resistance. Curr. Opin. Plant Biol. 56, 118–126 (2020).
Ahmad, F. et al. Genetic mapping of Fusarium wilt resistance in a wild banana Musa acuminata ssp. malaccensis accession. Theor. Appl. Genet. 133, 3409–3418 (2020).
Lü, P. et al. Genome encode analyses reveal the basis of convergent evolution of fleshy fruit ripening. Nat. Plants 4, 784–791 (2018).
Thomas, B. C., Pedersen, B. & Freeling, M. Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 16, 934–946 (2006).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 224 (2019).
Schneeberger, K. et al. Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proc. Natl Acad. Sci. USA 108, 10249–10254 (2011).
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom. Bioinform. 2, lqaa026 (2020).
Kriventseva, E. V. et al. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47, D807–D811 (2019).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 44, D1141–D1147 (2016).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4, 10.1– 10.14 (2009).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. Evol. 20, 238 (2019).
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165 (2011).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556 (1997).
Stubbs, T. M. et al. Multi-tissue DNA methylation age predictor in mouse. Genome Biol. 18, 68 (2017).
Broad Institute. Picard toolkit. GitHub https://broadinstitute.github.io/picard (2019).
Zhang, Y. et al. Model-based analysis of ChIP–Seq (MACS). Genome Biol. 9, R137 (2008).
Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Ramírez-González, R. H. et al. The transcriptional landscape of polyploid wheat. Science 361, eaar6089 (2018).
Li, P. et al. RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics 17, 852 (2016).
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
He, Z. et al. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res. 44, W236–W241 (2016).
Li, X. et al. Custom code and scripts for mapping the origins of chromosomal segments. FigShare https://doi.org/10.6084/m9.figshare.21229205.v1 (2023).
Acknowledgements
We thank G. Riddihough (Life Science Editors) for text editing. X.L. acknowledges funding from the National Natural Science Foundation of China (32370687). P.L. acknowledges funding from the National Natural Science Foundation of China (32372666) and Construction of Plateau Discipline of Fujian Province (102/71201801104). L.Z. acknowledges funding from the National Natural Science Foundation of China (32272750). Y.V.d.P. acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (no. 833522) and from Ghent University (Methusalem funding, BOF.MET.2021.0005.01).
Author information
Authors and Affiliations
Contributions
L.Z. conceived and designed the project. P.L., Z.C., Y.Y., W.Z., S.X., Y.X., J.W. and H.L. collected the samples and extracted DNA and RNA. L.Z., P.L., J.W. and S.Y. coordinated the Illumina and PacBio sequencing. X.Z., M.J. and X. Chang assembled genomes and Hi-C data analyses. X.Z., C.Z. and X. Wang conducted protein-coding gene and repetitive sequence annotations. L.Z. and X.L. performed phylogenetic analyses. X.L., X. Chen and L.Z. performed comparative genomic analysis. X.L., X.Z., Q.W. and X. Wen performed the RNA-seq analysis. P.L. and S.Y. performed ChIP–seq experiments, DNase-seq experiments and bioinformatic analysis of ChIP–seq, DNase-seq and WGBS data. X.L., P.L., S.Y. and X.Z. wrote the manuscript draft. L.Z., P.L., S.Y., X.L., X.Z., Y.V.d.P., Z.L., Z.W., J.H. and J.-M.A. reviewed and revised the manuscript. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Jordi Garcia-Mas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Genome assemblies of Cavendish and Gros Michel.
a, BUSCO completeness assessments of the genome assemblies of Cavendish, Gros Michel, and four diploid wild banana species (Banksii, DH-Pahang, Zebrina, and Calcutta 4). Cavendish* was assembled by Busche et al.18. Zebrina v1.0 was assembled by Rouard et al.1, and Zebrina v2.0 was our assembly based on nanopore long-reads. The abbreviations of banana species refer to Fig. 1a. b, Macrosyntenic comparison of the entire Cavendish, Gros Michel and three diploid wild banana genomes (Banksii, DH-Pahang, and Zerbina), with each chromosome colored according to sub-genomes (Ban in blue, Dh in orange, and Ze in green).
Extended Data Fig. 2 Macrosyntenic comparison of the entire Cavendish and three diploid wild banana genomes: Banksii (a), DH-Pahang (b), and Zebrina (c).
Each chromosome set colored according to sub-genomes (Ban in blue, Dh in orange, and Ze in green). The abbreviations of banana species refer to Fig. 1a.
Extended Data Fig. 3 Macrosyntenic comparison of the entire Gros Michel and three diploid wild banana genomes: Banksii (a), DH-Pahang (b), and Zebrina (c).
Each chromosome set colored according to sub-genomes (Ban in blue, Dh in orange, and Ze in green). The abbreviations of banana species refer to Fig. 1a.
Extended Data Fig. 4 Examples of high-quality Cavendish and Zebrina genome assemblies.
a-d, NBS-LRR cluster, RLK cluster, RLP cluster, and RLP/LRR cluster on Ze03, Ze01, Dh10, and Ze10 of Cavendish, while not assembled in the previously published Cavendish assembly. Cavendish* was assembled by Busche et al.18. e and f, NBS-LRR cluster on chromosome 3 and RLP/LRR cluster on chromosome 10 of our assembled Zebrina v2.0 with length of 280 kb and 370 kb, while being two big gaps in the published Zebrina v1.0 (ref. 1). Each resistance gene was colored on micro-synteny plot (NBS-LRR in blue, RLK in pink, RLP in red, LRR in green, and other gene in yellow). The abbreviations of banana species refer to Fig. 1a.
Extended Data Fig. 5 Phylogenetic tree of banana RLPs involved in Foc race1-associated QTL (named as RLP locus)25.
The purple stars denote RLPs located in the Ze sub-genome, while the two red stars denote RLPs found only in the Ze sub-genome of Cavendish. The abbreviations of banana species refer to Fig. 1a.
Extended Data Fig. 6 A model of MaNAP4/5′ regulation of banana fruit ripening.
In the model, these genes directly regulated by MaNAP4/5 are key genes in the fruit ripening process.
Extended Data Fig. 7 Sub-genome dominance in the triploid banana genome.
a, Statistical comparison of categories of syntenic triad homoeolog expression bias. P-values were determined by one-way ANOVA with Tukey’s HSD test (n = 26 tissues of each category) within the suppression and dominance categories, and P-values less than 0.05 was highlighted in red. For boxplot in this study, the middle line represents the median, the lower and upper edges of the box represent the first and third quartiles, the end of the lower whisker represents the smallest value at most 1.5× inter-quartile range from the lower edge of the box, the end of the upper whisker represents the largest value at most 1.5× inter-quartile range from the upper edge of the box. b and c, Total number (b) and length (c) of DNase-hypersensitive sites (DHSs) detected in mature green and ripe fruits. d-f, Sub-genome distribution of MaNAP4/5 binding motifs (d), sites (e) and genes (f). g, Distribution of NBS-LRR resistance genes in the sub-genomes.
Supplementary information
Supplementary Information
Supplementary Notes 1 and 2 and Figs. 1–12.
Supplementary Tables
Supplementary Tables 1–16.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Yu, S., Cheng, Z. et al. Origin and evolution of the triploid cultivated banana genome. Nat Genet 56, 136–142 (2024). https://doi.org/10.1038/s41588-023-01589-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-023-01589-3