Musa balbisiana genome reveals subgenome evolution and functional divergence

Banana cultivars (Musa ssp.) are diploid, triploid and tetraploid hybrids derived from Musa acuminata and Musa balbisiana. We presented a high-quality draft genome assembly of M. balbisiana with 430 Mb (87%) assembled into 11 chromosomes. We identified that the recent divergence of M. acuminata (A-genome) and M. balbisiana (B-genome) occurred after lineage-specific whole-genome duplication, and that the B-genome may be more sensitive to the fractionation process compared to the A-genome. Homoeologous exchanges occurred frequently between A- and B-subgenomes in allopolyploids. Genomic variation within progenitors resulted in functional divergence of subgenomes. Global homoeologue expression dominance occurred between subgenomes of the allotriploid. Gene families related to ethylene biosynthesis and starch metabolism exhibited significant expansion at the pathway level and wide homoeologue expression dominance in the B-subgenome of the allotriploid. The independent origin of 1-aminocyclopropane-1-carboxylic acid oxidase (ACO) homoeologue gene pairs and tandem duplication-driven expansion of ACO genes in the B-subgenome contributed to rapid and major ethylene production post-harvest in allotriploid banana fruits. The findings of this study provide greater context for understanding fruit biology, and aid the development of tools for breeding optimal banana cultivars.

The B-genome is more sensitive to fractionation than the A-genome. Compared to other monocots, the Musa lineage exhibits a relatively slower evolutionary rate as demonstrated previously 17 (Supplementary Fig. 4). Phylogenetic analyses based on 519 single-copy orthologous genes indicated a very recent divergence time of about 5.4 MYA for the A-and B-genomes, with their common ancestor having diverged from Poaceae ~134 MYA ( Supplementary Fig. 5). This estimate is consistent with a divergence time of 4.6 MYA between the A-and B-genomes, which was estimated based on 17 bacterial artificial chromosome (BAC) clones that contained 23.5 Kb of coding sequence 18 . Our estimate is more recent than the 20.9 MYA (estimated by three genes and one internal transcribed spacer region) or the 27.9 MYA (estimated by 19F genes) 19,20 . However, the increased sampling of informative characters in our genome-wide study for phylogenetic analyses should contribute to a more accurate divergence time estimation.
Three rounds of whole-genome duplication (WGD) events (α-, βand γ-WGD) have occurred in the Musa lineage 1 , which was validated by our four-fold synonymous third-codon transversion (4dTv) analysis ( Supplementary Fig. 6). WGD is frequently, and almost always, followed by diploidization and fractionation, which includes chromosome rearrangement, gene loss and biased retention 21 . After diploidization and divergence, A-and B-genomes start to evolve independently. To investigate the evolutionary differences between the A-and B-genomes after divergence, we performed three types of comparative analysis: (1) assessment of gene family expansion/contraction between A-and B-genomes by comparison to 14 other plant genomes; (2) comparison of structural variation in the A-and B-genomes by synteny analysis; and (3) comparison of synteny between the ancestral syntenic block and the A/B-genomes.
We analysed the gene family clustering and expansion/contraction of banana genomes of M. acuminata (A-genome) and M. balbisiana (B-genome), compared to 14 other plant genomes using OrthoMCL and CAFE 22,23 (Supplementary Table 7 and Supplementary Fig. 7). We found 9,038 gene families that were conserved in M. balbisiana, M. acuminata, Oryza sativa, Brachypodium distachyon and Vitis vinifera. In contrast, 348 and 639 gene families were specific to the A-and B-genome, respectively ( Supplementary Fig. 8). After their divergence, 1,761 gene families were expanded and 203 were contracted in the A-genome, while 392 gene families were expanded and 1,008 contracted in the B-genome (Supplementary Fig. 9 and Supplementary Tables 8 and 9). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the significantly expanded gene families (P < 0.05) in the B-genome suggested that it was enriched in the photosynthesis and biosynthesis of secondary metabolite pathways, including those associated with the metabolism of inositol, starch and sucrose, linoleic acid and arachidonic acid (Supplementary Fig. 10 and Supplementary Table 10). Plants produce a high diversity of secondary metabolites with prominent functions in defence against a variety of herbivores and pathogens, in addition to the mitigation of various types of abiotic stresses 24 . Thus, these observations are consistent with the association of the B-genome with improved vigour and tolerance to both biotic and abiotic stresses 2 . Synteny analysis indicated a high level of genomic colinearity and sequence similarity between the A-and B-genomes ( Supplementary  Fig. 11). We identified 72 large syntenic blocks between the A-and B-genomes, including 15 large blocks each containing over 900 gene pairs. These 72 syntenic blocks comprised 75.02% of A-genome space (containing 23% transposable elements) and 68.01% of B-genome space (containing 22% transposable elements) (Supplementary Table 11). We also identified two large translocations and two inversions between the A-and B-genome after their divergence. One large reciprocal translocation comprises 7.09 Mb on chromosome (chr)1 of the B-genome and 7.03 Mb on chr:3 of the A-genome, and one large inversion of 9.39 Mb on chr 5 of the B-genome and 8. 83 Mb on chr 5 of the A-genome ( Fig. 1 and Supplementary Fig. 11). These translocations and inversions were also supported by the rearrangements based on genetic mapping 25 , and can serve to introduce novel genetic diversity into the A-and B-genomes 26 .
Previously, 12 Musa ancestral blocks were assembled that represented the ancestral genome before α/β-WGD 1,27,28 . We identified 97 syntenic blocks resulting from α/β-WGD events in the B-genome by comparison to the 12 Musa ancestral blocks. These blocks contained 24,639 genes and represented 70.10% of all gene models involved in WGD-driven regions. We also identified 100 syntenic blocks that contained 26,780 genes (75.61%) in the A-genome (Supplementary Table 12). Of these ancestral Musa α/β blocks, 56.89% (15,236) and 60.59% (14,930) were singletons in the Aand B-genome, respectively, suggesting that genome fractionation (gene loss) and diploidization processes had occurred extensively after WGD events in the Musa lineage (Supplementary Table 12).
Taken together, our results indicated that the B-genome exhibited less expansion and more contraction of gene families, less syntenic coverage ratio and a higher singleton ratio in the ancestral blocks compared to the A-genome after divergence. Cycles of WGD followed by diploidization and fractionation have occurred across land plants, and are important in determining chromosome structure and gene content. Consequently, these processes have significantly contributed to the evolutionary success of plants 21   and fractionation processes involve a series of evolutionary events, including repetitive DNA loss, chromosome rearrangements and complex patterns of gene loss 29,30 . The above evidence supports the hypothesis that the B-genome was more sensitive to fractionation than the A-genome after their divergence.
Genetic diversity in polyploid bananas and functional divergence of subgenomes. Polyploid species usually exhibit vigorous growth, including high-quality production and high fitness 31 . Most banana cultivars are polyploid and exhibit various levels of ploidy and genomic background 32 . The genetic classification of some bananas is discordant, as is the case for Pelipita. Previous studies have shown that its karyotype comprises eight A and 25 B chromosomes as opposed to the predicted 11 A and 22 B chromosome distribution 33 . Understanding the genetic diversity and genomic constitution of Musa accessions would inform genomic group classifications, in addition to conservation and breeding strategies. Therefore, we resequenced five triploid bananas and four diploid bananas to investigate their genetic diversity (Supplementary Table 13). Simultaneous alignment of re-sequencing data to the A-and B-subgenomes identified the uniquely mapped reads that were used to analyse coverage depth, variations calling and homoeologous exchanges on each chromosome (Supplementary Table 14). Homoeologous exchanges were characterized by read coverage that showed a chromosomal region with a duplicated copy from the corresponding homoeologous subgenome 34 . These analyses confirmed that genome constitutions for the banana accessions were, in most cases, consistent with previous genome group classifications based on morphological traits. We identified 48 segmental homoeologous exchanges in the accession FenJiao, including nine from the B-to the A-subgenome and 39 in the reverse direction ( Fig. 2a and Supplementary Table 15). We also found four segmental homoeologous exchanges from the B-to the A-subgenome in the accession Kamaramasenge, and replacement of chromosome 10 of the B-subgenome by the A-subgenome. (Fig. 2b and Supplementary  Table 15). For the accession Pelipita, chromosomes 2, 7 and 11 of the A-subgenome were replaced by the B-subgenome and there were 18 segmental homoeologous exchanges on chromosomes 6, 9 and 10 ( Fig. 2c and Supplementary Table 15). This indicates the eight A and 25 B chromosome constitution of Pelipita, consistent with previous genomic in situ hybridization studies 33 . This classification is further supported by phylogenetic analyses based on genotyping data ( Supplementary Fig. 12). A total of 18,475,661 single-nucleotide polymorphisms (SNPs), 1,425,391 small insertions and deletions and 220,452 structural variations were identified in the samples (Supplementary Tables 16-18). Analysis of gene and SNP density on the chromosomes indicated that SNPs were preferentially located on non-gene-rich regions ( Supplementary Fig. 13). There were ~2.5fold SNPs on the A-genome of Pisang_Mas and Pisang_Kra compared to the B-genome of Balbisiana (Supplementary Table 16). The nucleotide diversity (π,0.0059) of A-subgenomes was higher than that of the B-subgenomes (0.0031) in accessions Fenjiao, Pelipita and Kamaramasenge.
Gene family expansion and contraction analysis of M. acuminata and M. balbisiana in comparison to other sequenced genomes indicated that there are 83 gene families significantly expanded in the A-genome (and conversely contracted in the B-genome). These families included plant-pathogen interactions, glycosphingolipid biosynthesis-ganglio series and glycosaminoglycan degradation pathways among others (Supplementary Table 19). Conversely, 33 gene families were significantly expanded in the B-genome (and contracted in the A-genome). These families included those involved in photosynthesis, metabolic pathways and ribosome, among others (Supplementary Table 20). This indicates that the Aand B-genomes may have functionally diverged at the genome level during their respective genome evolution.
To explore the transcription of allopolyploid subgenomes, we assessed the expression of homoeologue genes from the A-and B-subgenomes of the triploid FenJiao. Expression levels were measured within different tissues, at different stages of fruit development and ripening and in banana seedlings after abiotic stress treatments (Supplementary  gene pairs were identified between the A-and B-subgenomes based on the gene alignment method with cumulative identity percentage (CIP) ≥ 60% and cumulative alignment length percentage (CALP) ≥ 60% (refs. 35,36 ). Among the homoeologue gene pairs, 81.83% were further supported by syntenic analysis (Supplementary Table 22). Expression of all homoeologue gene pairs was assessed to determine the distribution of expression fold change of B/A in the triploid 'FJ' . The log 2 (RPKM B/RPKM A) was 1.2/1, where RPKM is reads per kilobase million, which differed from the genomic constitution value of 2/1 ( Supplementary Fig. 14). This result could be explained by dosage compensation, wherein the triploid expression levels are reduced to a diploid state 37,38 . We further characterized 1,075 and 4,032 homoeologue gene pairs that showed expression dominance in the A-and B-subgenome, respectively (Supplementary Table 23). KEGG enrichment analysis indicated that genes with expression dominance in the B-subgenome were associated with 2-oxocarboxylic acid metabolism and the arginine biosynthesis pathway (q-value < 0.05) (Supplementary Table 24), whereas those showing expression dominance in the A-subgenome were not significantly enriched in KEGG pathways. Non-synonymous/synonymous substitution (Ka/Ks) ratios were calculated for all homoeologue gene pairs between the A-and B-subgenomes. The Ka/Ks ratios of genes with expression dominance in the A-subgenome (median, 0.157) were slightly lower than those in the B-subgenome (median, 0.196) and non-dominant genes (median, 0.186) ( Supplementary Fig. 15).
We then constructed a gene co-expression network for those genes with expression dominance using weighted gene co-expression network analysis (WGCNA) 39 . The results indicated that 87 and 295 genes with dominance expression interacted with 4,302 and 4,612 genes in the A-and B-subgenome, respectively. KEGG pathway enrichment analysis suggested that genes in the co-expression network of the A-and B-subgenomes were commonly associated with starch and sucrose metabolism (ko00500) and other metabolic pathways. In particular, ubiquinone and other terpenoid-quinone biosynthesis, photosynthesis-antenna proteins, carotenoid biosynthesis and other glycan degradation pathways were specially enriched in the A-subgenome (q-value < 0.05) (Supplementary Fig. 16 and Supplementary Table 25). In contrast, selenocompound metabolism and cyanoamino acid metabolism pathways were particularly enriched in the B-subgenome (q-value < 0.05) (Supplementary Fig. 17 and Supplementary Table 26). Overall, these results further support the hypothesis of functional divergence between the A-and B-genomes at the transcriptional level.

Expression dominance of homoeologue gene pairs in the ethylene biosynthesis pathway and expansion of the ACO family affect fruit ripening.
Ethylene plays a key role in the regulation of climacteric fruit ripening post-harvest 40 . The core enzymatic steps in the ethylene biosynthesis pathways are well characterized, and include S-adenosyl-l-methionine synthase (SAMS), 1-aminocyclopropane-1-carboxylic acid synthase (ACS) and ACO 41,42 (Fig. 3a). However, the expansion and expression dominance of these homoeologue gene pairs during polyploid fruit ripening remains largely unknown.
We identified 12 SAMS, 11 ACS and 11 ACO genes from the A-genome and 10 SAMS, 11 ACS and 18 ACO genes in the B-genome, which represents a significant expansion compared to the seven other sequenced plant species among the monocots and eudicots 22 (Supplementary Tables 27 and 28). We further characterized 28 homoeologue gene pairs from the A-and B-genomes (Supplementary Table 29). These gene pairs displayed similar expression profiles in the BaXiJiao (Musa AAA group, cv. Cavendish, BX), the A-subgenome of FenJiao (Musa ABB group, cv Pisang Awak, FJ) and the B-subgenome of FJ (Fig. 3b and  Supplementary Tables 30-32). Interestingly, eight gene pairs exhibited homoeologue expression dominance in the B-subgenome and five gene pairs were dominantly expressed in the A-subgenome of FJ (Fig. 3b and Supplementary Tables 33 and 34). The dominant expression of these genes in various tissues may be related to the fundamental role of ethylene biosynthesis.
Both ACS and ACO have previously been demonstrated as limiting enzymes within ethylene biosynthesis 41 . The expression abundance of MA-ACS1 (AB021906) and MA-ACO1 (X91076) is consistent with ethylene production during the post-harvest bananaripening period 43 . Of the ten ACS gene pairs, MaACS7/MbACS7, which is a homologue of MA-ACS1, exhibited high expression levels during fruit ripening and was dominantly expressed in the B-genome (Fig. 3b). MbACS6 and MbACS7 are paralogous in a large syntenic block (the block contains 19 gene pairs), and maintain synteny and close evolutionary relationships to MaACS6 and MaACS7, respectively, suggesting that these genes duplicated from WGD ( Fig. 3c and Supplementary Fig. 18). Of the nine ACO gene pairs, three (MaACO2/MbACO6, MaACO3/MbACO7 and MaACO8/MbACO13) exhibited high expression levels during fruit ripening and were dominantly expressed in the B-subgenome (Fig. 3b). MaACO2/MbACO6 and MaACO3/MbACO7 are in the syntenic block of chr:5 and chr:6 between the A-and B-genome, respectively, and belong to the same phylogenetic clade (Fig. 3d,e and Supplementary Table 35). These results indicated that MaACO2/MbACO6 and MaACO3/MbACO7 originated independently and developed crucial functions during fruit ripening.
We also observed high expression levels (log 2 RPKM > 4 in at least one stage of fruit ripening in the B-genome) of homoeologue gene pairs, probably related to fruit softening (pectin methylesterases, galactosidases, expansions and pectatelyase) 44 , cell wall modification (xyloglucan endotransglucosylase/hydrolases, fasciclin-like arabinogalactan proteins and β-d-xylosidase) 45,46 and aroma production (alcohol dehydrogenases) 40 . These genes are closely involved in fruit ripening and are regulated by ethylene 44 . Almost all of these gene pairs showed expression dominance in the B-subgenome of FJ during fruit ripening, which is the same as the dominance expression of gene pairs related to ethylene biosynthesis (Supplementary Fig. 19 and Supplementary Tables 36 and 37). This co-dominance of homoeologue gene pairs in the B-subgenome further supports the significant contribution of the B-genome to ethylene biosynthesis and fruit ripening.
Gene duplication is a major mechanism that generates new genetic diversity as a basis for evolutionary innovation in eukaryotes 47 . Compared to the 11 ACO genes within the A-genome, the ACO genes in the B-genome expanded significantly to 18 members. The expansion of ACO genes, including MbACO2, -3, -4, -5 in chr 3, MbACO8, -9, -11 in chr 6 and MbACO16, -18 in scaffolds, was driven by tandem duplications in the B-genome (Fig. 3b,d). Of the ACO genes that were expanded in the B-genome, MbACO2 and MbACO3 showed strong expression levels during the fruit-ripening stages with log 2 RPKM > 11 at 6 days post-harvest (DPH) in FJ, which is coincident with the ethylene climacteric period (Fig. 3b,f and Supplementary Table 38). In addition, MbACO8, -9, -11, -16, -18 exhibited high expression levels in roots and fruits at 0 days after flowering (DAF) (Fig. 3b and Supplementary Table 38). These genes belonged to the same cluster and their expression patterns were highly concordant with their duplication (Fig. 3d,e), suggesting that the expansion and evolution of ACO genes in the B-genome contributed to tissue development and fruit ripening.
Ripening of FJ was more rapid than BX during post-harvest ripening. BX required 8 and 14 DPH to reach the more-green-thanyellow and full-yellow stages, respectively, whereas FJ required 3 and 6 DPH to reach these stages, respectively ( Supplementary  Fig. 20). The dominant expression of ethylene biosynthesis and fruit ripening-related gene pairs and the expansion of the ACO family in the B-genome could have contributed to increased ethylene production and faster fruit ripening in FJ compared to BX.           Table 43). In contrast, most of the starch synthesis-related genes that were unique to the A-or B-genome exhibited low expression levels ( Supplementary Fig. 22).
Within the starch degradation pathway, genome annotation indicated that families AMY and BMY had an obvious expansion in the banana A-and B-genomes compared to seven other plant species (Fig. 4b and Supplementary Table 40). We characterized 21 homoeologue gene pairs within this pathway from the A-and B-genomes. Among these gene pairs, 11 had dominant expression and were associated with the B-subgenome during fruit ripening ( Fig. 4c Table 48) . MbAMY-1, -2, -3, -4, -5 and MbAMY-6, -7, -8 showed close proximity and an evolutionary relationship, suggesting that these genes duplicated from a tandem copy (Fig. 4d and Supplementary  Fig. 23). MbBMY-5, -8 and MbBMY-6, -12 are two paralogous pairs in syntenic blocks of the B-genome and showed close evolutionary relationships, suggesting that these genes duplicated from WGD ( Fig. 4e and Supplementary Fig. 24). Taken together, these results suggest that tandem-driven AMY duplication and WGD-driven BMY duplication in the B-genome contribute to starch degradation.
The dominant expression of genes involved in starch biosynthesis in the B-subgenome could have led to increased starch accumulation during fruit development in FJ relative to BX (Fig. 4f,g and  Supplementary Fig. 21). In addition, the dominant expression of genes related to starch degradation in the B-subgenome also probably led to elevated starch degradation in FJ (Fig. 4c,f,g). Therefore, the active starch metabolic pathway in the B-genome leads to marked starch accumulation and degradation during the fruit development and ripening processes, respectively.

Discussion
Most cultivated bananas are triploid, having evolved from two wild diploid species, M. acuminata and M. balbisiana 6 . Our previous M. acuminata genome sequencing efforts provided genomic resources to inform banana breeding 1 . However, there is also an urgent need to develop the genomic resources of M. balbisiana, which is a crucial contributor to cultivated varieties. Improved understanding of the B-genome structure, subgenome evolution, homoeologous exchange, genetic diversity in polyploidy bananas and gene expansion and expression patterns will help in the design and application of breeding strategies for novel banana cultivars with improved traits.
Cycles of WGD followed by diploidization events have occurred across land plants, and have significantly contributed to their evolutionary success 24,25 . Our results indicate that the B-genome may be more sensitive to fractionation than the A-genome after WGD, although the A-and B-subgenomes have diverged very recently. Variation in genomic structure between the A-and B-genomes consists of chromosome rearrangements and gene loss during diploidization, which has resulted in the functional divergence of subgenomes in polyploidy bananas. This divergence is supported by differential enrichment of expanded/contracted gene families between the A-and B-genomes and the expression dominance of homoeologue genes from A-and B-subgenomes in triploids. Although homoeologue expression dominance has been identified in certain polyploid species [57][58][59][60] , the relationship between homoeologue expression dominance and functional divergence (especially in regard to ethylene biosynthesis/starch metabolism) of subgenomes in triploids remains to be elucidated. Thus, these results provide an important basis for the improvement of agriculturally important traits by focusing selection on transcriptionally dominant genes. It is worth noting that homoeologous exchanges may obscure the signal of expression dominance in subgenomes of allopolyploids 61 . The extensive homoeologue exchanges in allopolyploid bananas may remove many progenitor genome conflicts that result in subgenome biases in gene content and expression. Thus, homoeologous exchanges may contribute to the diversity of homoeologue expression dominance and induce a series of rapid genetic and epigenetic modifications for agronomic traits 61 .
Previous studies have suggested an important role of ethylene production in fruit ripening and starch metabolism with regard to fruit quality post-harvest 27,43,55 . However, the genetic mechanisms underlying polyploid fruit ripening are less well known. Here, we analysed biological processes related to these two pathways in triploid bananas. We identified significant genomic expansions and dominant expression of homoeologue genes in the B-genome at the pathway level of gene families and, most notably, in the ACO family, known to play a critical role in ethylene production. Our analysis revealed the origin and evolution of crucial gene families in these pathways, particularly for the independent origin of the MaACO2/MbACO6 and MaACO3/MbACO7 gene pairs and their specific function in fruit ripening. Moreover, we identified that this tandem event has led to expansion of ACO genes in the B-genome and to strong expression of these genes during fruit ripening. Our analysis also indicated a potential link between the dominant expression of homoeologue genes and the expansion of gene families with fruit ripening and starch metabolism. Previous studies have demonstrated that the B-genome is associated with improved vigour and tolerance to both biotic and abiotic stresses. Consequently, the B-genome is a target for banana breeding programmes 2 . Here, we highlight that the B-genome is of great importance in ethylene biosynthesis and post-harvest banana ripening, which will further our understanding of ripening mechanisms in polyploid fruit.
The M. balbisiana genome assembly, along with our previously acquired M. acuminata genome data, may aid in the discovery of cultivar-specific sequences that are related to important cultivarspecific traits, including shelf life, quality and stress tolerance. Thus, these resources can be used to build molecular characterization strategies for various cultivars to assist in rapid quality control and the conservation of biodiversity. The data from this study also pave the way for whole-genome association studies, germplasm improvement and genetic modification of bananas to meet increasing commercial demands. These genomic resources and results also reinforce the use of the banana as an appropriate model to study subgenome evolution and fruit biology in triploid variants. Due to the sterility and seedlessness of banana cultivars, further efforts will be needed to leverage the key gene resources from precisely characterized germplasms to achieve effective breeding schemes.

Sample collection.
A double haploid of the wild diploid genotype PKW; 2n = 2x = 22 was provided by the Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD) for genome sequencing. Fresh, unexpanded leaves were harvested and then frozen immediately with liquid nitrogen to preserve genomic DNA for isolation. High-molecular weight genomic DNA was extracted using a standard cetyltrimethyl ammonium bromide (CTAB) method 62 . DNA integrity was assessed by agarose gel electrophoresis (concentration of agarose gel, 1%; voltage, 150 V; electrophoresis time, 40 min). Finally, DNA was purified from the gel using a QIAquick Gel Extraction kit (QIAGEN).
Library construction and sequencing. One paired-end and eight mate-pair libraries were constructed for short-read sequencing on the Illumina HiSeq 2000 platform, which generated around 86.34 Gb (166× coverage) of high-quality data. For long-read DNA sequencing, 5.79 million SMRT long reads (58.99 Gb data, 113× coverage) were sequenced using the PacBio Sequel system with libraries of 20-kb insert size; sub-reads had a mean length of 10.2 kb and N50 length of 16.6 kb. One Hi-C library was prepared and sequenced on Illumina NovaSeq 6000 to generate 71.96 Gb (138× coverage) of high-quality data 63 (Supplementary Table 1). Additional details are available in the Supplementary note. Genome assembly. De novo assembly of DH-PKW was performed using wtdbg (v.1.2.8; https://github.com/ruanjue/wtdbg) based on PacBio data (only reads longer than 1 kb were used in assembly). The assembled genome was corrected for two rounds using the 'wtdbg-cns' programme in the wtdbg package. We then used the algorithm Arrow (https://github.com/PacificBiosciences/GenomicConsensus), which takes into account all of the underlying data and the raw quality values inherent in SMRT sequencing, to polish the assembly again for the final consensus accuracies. The final consensus contigs were scaffolded using the SSPACE-standard programme 64 (v.3.0) with meta-pair reads from libraries of insert size 2-20 kb. Based on Hi-C data, 430.02-Mb scaffolds were anchored to 11 pseudo-molecules using LACHESIS software 13 (Supplementary Fig. 2). Chromosomes were numbered according to the linkage group nomenclature adopted for M. acuminata. Additional details regarding genome assembly are provided in the Supplementary note.
Evaluation of assembly quality. BUSCO (v.3) was used to assess assembly completeness 11 . We mapped 29,610 M. acuminata-expressed sequence tags (ESTs) to the assembled genome using BLAT 65 (v.35) with default parameters. In total, 93.59% of the ESTs were aligned to the genome with identity >90%. Additionally, BWA 66 v.0.7.12 (aln -l 35) was used to map 59× Illumina reads to the assembly, and 96.11% of the reads were mapped to the assembled genome. Additional details are available in the Supplementary note.

Genome annotation.
Repetitive sequences within the M. balbisiana genome were identified by a combination of homology-based and de novo approaches (Supplementary Table 1). Gene structures were annotated iteratively using three main approaches (ab initio predictions, homologue proteins and transcriptome data) that were combined using MAKER 15 (v.2.31.8) (Supplementary Table 5). Gene functions were annotated according to the best match of the alignments using blastp 67 (E-value < 1 × 10 -5 ) against the Swiss-Prot 68 , TrEMBL 68 Table 6).

Gene family analysis.
A total of 500,142 genes from 16 plant species with available whole-genome sequences were used for gene family clustering analysis. BLAST 67 (v.2.2.26; -p blastp) was used to generate pairwise protein sequence alignments with E-values of < 1 × 10 -5 . Then, OrthoMCL 22 was used to cluster similar genes by setting the main inflation value at 1.5 and using default settings for the other parameters. These analyses resulted in 39,358 gene families comprising 393,700 genes from the 16 species (Supplementary Table 7 and Supplementary Fig. 7).
We identified 519 single-copy gene families shared among the 16 species, and constructed a phylogenetic tree using MrBayes (v.3.1.2) 75 software with the general time-reversible model ( Supplementary Fig. 4). Divergence times for the 16 species were also estimated based on fourfold degenerate sites of all singlecopy orthologous genes using the MCMCTree programme in the PAML package (v.4.4) 76 ( Supplementary Fig. 5).
CAFÉ (v.2.1) 23 was used to analyse the expansion and contraction of gene families. A random birth-and-death model was used to assess gene gain or loss in gene families across the specified phylogenetic tree (Supplementary Fig. 9). Families with P < 0.05 were considered as significant expansion or contraction, and pathway enrichment analysis of these families was conducted using the enrichment pipeline 77 . Additional details are available in the Supplementary note.
Whole-genome alignment analysis. MCSCAN 78 (parameters: -a -e 1e-5 -s 5) was used to detect collinearity within M. acuminata (A-genome) and M. balbisiana (B-genome) and among various species. Syntenic blocks containing at least ten gene pairs were obtained. All of the orthologous and paralogous gene pairs were extracted from the syntenic blocks for calculation of 4dTv 79 distances using the HKY substitution model 80 (Supplementary Fig. 8). Additional details are available in the Supplementary note.
Orthologous gene pair analysis. BLAST 67 (v.2.2.26, -p blastp) was used to align M. acuminata proteins to M. balbisiana proteins for identification of orthologous genes. The value 1 × 10 -5 was used as a cut-off to define the raw orthologues. We then filtered the BLAST results using two parameters (CIP ≥ 60% and CALP ≥ 60% (ref. 35 ). We identified 25,717 orthologous gene pairs (81.83% consistency with syntenic blocks) between M. acuminata and M. balbisiana using these two parameters (Supplementary Table 22 Table 13). BaXiJiao, Gros_Michel and FenJiao were obtained from the Tissue Culture Centre of the Chinese Academy of Tropical Agricultural Sciences (CATAS). Pelipita, Pisang_Mas, Pisang_Kra, M. balbisiana and Kamaramasenge were provided by the Bioversity International Musa Transit Centre. Genomic DNA was extracted from fresh leaves of seedlings at the five-leaf stage using the CTAB method 62 .
Analysis of homoeologous exchanges. Assessment of read coverage depth was used to detect homoeologous exchanges between the A-and B-subgenomes. We inferred these based on cases where reads coverage of a given region on one parental genome was significantly high while the orthologous one was low or had no coverage. High coverage indicates duplication, and low or no coverage indicates loss 34 . The uniquely mapped paired-end reads were used to calculate the coverage depth of each sample on the A-and B-genomes ( Supplementary Figs. 25-27).
According to coverage depth, we detected homoeologous exchanges in the triploids 'FenJiao (ABB)' , 'Pelipita (ABB)' and 'Kamaramasenge (AAB)' (Supplementary  Table 15). Additional details are available in the Supplementary note. Transcriptome analysis. Banana fruits at different stages of development (0, 20 and 80 DAF) and ripening (8 and 14 DPH for BX, 3 and 6 DPH for FJ) were obtained from the banana plantation at the Institute of Tropical Bioscience and Biotechnology (Chengmai, Hainan, 20° N, 110° E). Two-month-old BX and FJ banana seedlings were obtained from the Tissue Culture Centre of CATAS and cultured in Hoagland's solution. Seedlings at the five-leaf stage were treated with 200 mM mannitol for 7 days, 300 mM NaCl for 7 days and low-temperature conditions (4 °C) for 22 h. Fruit, root and leaf samples were frozen in liquid nitrogen and stored at −80 °C until RNA extraction for transcriptome analysis.
Total RNAs were isolated using a plant RNA extraction kit (TIANGEN). Total RNA (3 μg) from each sample was converted to complementary DNA using a RevertAid First-Strand cDNA Synthesis Kit (Fermentas). cDNA libraries were constructed using TruSeq RNA Library Preparation Kit v.2, and were subsequently sequenced on the Illumina HiSeq 2000 platform using the Illumina RNA-Seq protocol. Two biological replicates were used for each sample. A total of 159.14 Gb (Supplementary Table 21) of high-quality clean data were produced. Gene expression levels were calculated as RPKM 86 . Differentially expressed genes were identified by methods previously established with the read count of two replicates for each gene (fold change ≥2; false discovery rate ≤0.001) 87 . Additional details are available in the Supplementary note.
Weighted gene co-expression network analysis. Gene expression patterns for all identified genes were used to construct a co-expression network using WGCNA (v.1.47) 39 . Genes without expression detected in all tissues were removed before analyses. Soft thresholds were set based on the scale-free topology criterion employed in ref. 88 . An adjacency matrix was developed using squared Euclidean distance values, and the topological overlap matrix was calculated for unsigned network detection using the Pearson method. Co-expression coefficients >0.55 between the target genes were then selected. Finally, the network connections were visualized using cytoscape 89 .  Table 51) for homologue-based searches with the criteria similarity >80% and coverage >80%. We then confirmed the presence of the conserved domain within all protein sequences and removed members without a complete domain.

Identification of gene families involved in ethylene biosynthesis
Determination of total starch content. Banana pulp was immersed in 0.5% sodium bisulfite for 10 min to prevent browning, and then dried at 40 °C for 24 h. Pulp was then ground and centrifuged. The residue was suspended in 5 ml of 80% Ca(NO 3 ) 2 in a boiling water bath for 10 min then centrifuged for 4 min at 4,000 r.p.m. The supernatant was transferred to a 20-ml volumetric flask and the residue was extracted twice with 80% Ca(NO 3 ) 2 , which yielded a combined extract volume of 20 ml. All experiments were repeated three times. The total starch content was assessed following methods described by Yang et al. 90 .
Scanning electron microscopy. Samples were fixed in stubs using double-faced tape and coated with a 10-nm platinum layer in a Bal-tec MEDo020 coating system (Kettleshulme). The prepared samples were analysed using an FEI Quanta 600 FEG scanning electron microscope (FEI Co.). Observations were performed in secondary electron mode while operating at 15 kV.
Measurement of ethylene production during fruit post-harvest stage. Ethylene production was measured by enclosing fruit samples in an airtight container for 2 h at 25 °C. After incubation, 1 ml of the headspace gas was withdrawn and injected into a gas chromatograph (GC-2010, Shimadzu) fitted with a flame ionization detector and an activated alumina column. Ethylene production measurements were obtained as recommended by the manufacturer.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Raw sequence reads for gene assembly and gene expression for all samples were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under the BioProject (PRJNA432894). Genome assembly and annotation of DH-PKW was submitted to NCBI (PYDT00000000). A full data availability statement is included in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
One double haploid (DH-PKW) sample from M. balbisiana was used for genome assembly. Nine different genotypes (AAA, ABB, AA, BB, AAB) of banana accessions were used for resequencing. Two cultivated varieties of BaXiJiao and FenJiao were used for transcriptomic analysis and forty samples were collected from different tissues and treatments including development and postharvest ripening process of fruits, and osmotic, salt and low temperature treatments. Two biological replicates were used for each sample.

Replication
For gene expression profiling of BaXijiao and FenJiao, we produced RNA-seq data of fruits, roots, and leaves with two biological replicates.