The draft genome of tropical fruit durian (Durio zibethinus)

Durian (Durio zibethinus) is a Southeast Asian tropical plant known for its hefty, spine-covered fruit and sulfury and onion-like odor. Here we present a draft genome assembly of D. zibethinus, representing the third plant genus in the Malvales order and first in the Helicteroideae subfamily to be sequenced. Single-molecule sequencing and chromosome contact maps enabled assembly of the highly heterozygous durian genome at chromosome-scale resolution. Transcriptomic analysis showed upregulation of sulfur-, ethylene-, and lipid-related pathways in durian fruits. We observed paleopolyploidization events shared by durian and cotton and durian-specific gene expansions in MGL (methionine γ-lyase), associated with production of volatile sulfur compounds (VSCs). MGL and the ethylene-related gene ACS (aminocyclopropane-1-carboxylic acid synthase) were upregulated in fruits concomitantly with their downstream metabolites (VSCs and ethylene), suggesting a potential association between ethylene biosynthesis and methionine regeneration via the Yang cycle. The durian genome provides a resource for tropical fruit biology and agronomy.

gene prediction (Fig. 1a). Using the Maker pipeline 13 , we incorporated 151,593 protein sequences from four plant species and 43,129 transcripts assembled from D. zibethinus Musang King RNA-seq data. 45,335 gene models were identified, with an average coding-sequence length of 1.7 kb and an average of 5.8 exons per gene. The vast majority of gene predictions (42,747) were supported by homology to known proteins, existence of known functional domains, or the presence of expressed transcripts (Supplementary Fig. 2). Using Blast2GO 14 35  30  25  20  15  10  5  0   40  35  30  25  20  15  10  5  0  30  25  20  15  10  5  0  30  25  20  15  annotated 35,975 predicted gene models with Gene Ontology (GO) terms. The annotated gene models were involved in processes such as defense response, fruit development, and carbohydrate and lipid metabolism (Fig. 1c), which may be of interest in the study of genetic variation between domesticated cultivars.
To verify the sensitivity of our gene predictions and the completeness and proper haplotig merging of our assembly, we checked core gene statistics using BUSCO 15 . Our gene predictions recovered 1,300 of the 1,440 (90.3%) highly conserved core proteins in the Embryophyta lineage, of which 68.1% were single-copy genes and 22.2% were duplicated. We checked whether this high duplication rate (22.2%) indicated unmerged haplotigs in our assembly, using coverage statistics from the Illumina short reads. Among these duplicated genes, 93.1% had mean read coverage within 1 s.d. of the mean read coverage for single-copy core genes, showing that these duplicated genes likely exist as independent and distinct copies in the genome (Supplementary Fig. 3). The high number of independent duplicate genes is suggestive of a recent whole-genome duplication (WGD) event in the durian lineage, similar to the scenarios in G. raimondii (11.5% duplicate genes) and G. arboreum (12.2%), both of which had recent WGDs, and distinct from the scenario in T. cacao (1.2%), which has not undergone a recent WGD. Visualization of syntenic blocks between durian and cacao confirmed that chromosome-level blocks in cacao are duplicated across multiple durian chromosomes (Fig. 1a).

Comparative phylogenomics shows durian paleopolyploidy
To investigate the evolution of distinct durian traits, we compared the durian genome to ten other sequenced plant species. These included three plants in the same Malvales order (G. raimondii, G. arboreum, and T. cacao), six plants in the same Eudicots clade (Arabidopsis thaliana, Populus trichocarpa (poplar), Glycine max (soybean), Vitis vinifera (grape), Coffea canephora (coffee), and Carica papaya (papaya)), and Oryza sativa japonica (rice) as an outgroup.
Gene family clustering with OrthoMCL 16 identified 32,159 gene families consisting of 327,136 genes ( Fig. 2a; for clarity, only durian, T. cacao, G. arboreum, G. raimondii, and A. thaliana gene families are shown). 607 gene families, consisting of 1,764 genes, were unique to durian (Supplementary Table 4). Durian shared the most gene families with the other Malvales plants (cotton and cacao), consistent with the placement of these three species in the same taxonomic order. Durian, cotton, and cacao also had similar numbers of gene families as Arabidopsis (within 99% of each other), further validating the accuracy and completeness of our gene predictions at the gene family level.
We derived 47 single-copy orthologous genes among the 11 plant species for phylogenetic analysis, taking the intersection of 344 singlecopy gene families found by OrthoMCL and 395 single-copy gene families present across diverse plant species 17 (Supplementary Table 5).
We used BEAST2 (ref. 18) to generate and date phylogenetic trees on the basis of Bayesian analysis. Consistent with the phylogenetic ordering of durian, cotton, and cacao first proposed by Alverson et al. 19 , our results suggest that cacao first diverged from the shared duriancotton lineage 62-85 million years ago (95% highest posterior density (HPD) interval), followed by the divergence of durian and cotton 60-77 million years ago (Fig. 2b). For the other plant species, phylogenetic placement and estimated speciation dates were mostly in broad consensus with other studies 20,21 .
Ancient WGDs (also known as paleopolyploidization events) are widespread in plant lineages and represent a powerful evolutionary force for the development of novel gene functions and the emergence of new species 22 . All core eudicots share an ancient WGD termed the γ event 23 , and, among the 11 plant species analyzed, at least eight additional ancient WGDs have previously been identified 24 . To investigate WGDs in the durian lineage, we identified syntenic regions across the durian, cotton, and cacao genomes, with each region consisting of at least five collinear homologous genes (Supplementary Table 6). We found 60% of the cacao genome represented by syntenic regions in the durian genome. Of these syntenic regions in durian, 25% were present in one copy, 37% in two copies, 36% in three copies, and the remainder (2%) in four or more copies (Supplementary Fig. 4). The observation that 75% of the genomic regions syntenic between durian and cacao are present in multiple copies in durian strongly suggests that the durian lineage underwent a WGD after speciation from cacao. The K s (synonymous substitution rate) distribution between syntenic durian genes also exhibited a peak characteristic of a recent WGD (K s = 0.24; Fig. 2c).
A WGD has also previously been shown for the cotton lineage, after its speciation with cacao 25,26 . We investigated whether the WGDs associated with cotton and durian reflect the same event or represent two distinct evolutionary events, by resolving the ordering and dates of the WGDs alongside the durian-cotton divergence event. We extracted paralogous pairs of durian and cotton genes arising from their respective WGDs, established orthologous relationships between them, and estimated their phylogenetic relationships and divergence times (Supplementary Table 7). Our analysis showed that paralogs within cotton and durian (originating from the WGD) each predated their orthologs (originating from the durian-cotton divergence; posterior probability for the phylogenetic tree branches ≥0.9981; Fig. 2d). This provides evidence that durian and cotton share a WGD that likely occurred before their lineages diverged and places the known cotton-specific WGD also within the durian lineage.

Upregulated gene expression in sulfur and ripening
To investigate biological processes associated with durian fruit traits, we sequenced expressed RNAs (RNA-seq) from different plant organs of the Musang King cultivar, including ripe fruit arils and stem, leaf, and roots. In addition, we sequenced RNA from ripe fruit arils of two other durian cultivars (Monthong and Puang Manee). We performed three independent comparative transcriptomic analyses:  A r t i c l e s Gene set enrichment analysis (GSEA) 27 showed that genes upregulated in durian fruits relative to non-fruit organs were associated with distinct cellular pathways, such as sulfur-, ripening-, and flavorrelated processes ( Fig. 3a and Supplementary Table 8). Enrichment in gene sets related to sulfur comprised five clusters, of which acidthiol ligase enzymes (containing a number of key enzymes in carbon-sulfur reactions and flavonoid production) and methionine metabolic pathways (of which MGL is a key member) 28 contained the most significantly upregulated leading-edge genes. Ripening-related gene sets included genes regulated by the MADS-BOX transcription factor family 29,30 , SEPALLATA transcription factor family, and ethylene-related genes such as ACS (aminocyclopropane-1-carboxylic acid synthase), a key ethylene-production enzyme involved in ripening 31 . Other upregulated processes related to taste and odor included triterpenoid metabolism, which has been associated with the intensely bitter taste of Ganoderma lucidum (Lingzhi) 32 , and lipid-related genes involved in the production of C 6 volatile compounds (hexanal and hexanol), which have been associated with the green odor profiles of apples, tomatoes, and bananas, as well as undesirable rancid odors when present at high levels 33,34 . Conversely, processes enriched in non-fruit organs included photosynthetic pathways and the regulation of nitrogen metabolism, likely required by roots and leaves to manufacture amino acids.
To determine whether the fruit-enriched processes are specific to durian, we next compared the transcriptomes of durian fruits to those from the fruits of five other species. GSEA showed that sulfurrelated pathways, lipid-oxidation pathways, and ripening pathways related to ethylene as well as protein degradation were upregulated in durian as compared to other fruits, whereas some flavor-related pathways were downregulated (Fig. 3b). These results suggest that both sulfur-related pathways and specific ripening processes related to ethylene could be highly regulated in durian fruits, even in comparison to other climacteric fruits. A prominent role for these pathways in durian is supported in the literature, as VSCs have been identified as important components in durian odor 5,6,35 . Ethylene production and cellular respiration are also increased in durian fruits during the climacteric stage of ripening and are associated with important physiological changes, including odor production, pulp softening, sugar release, and water loss followed by dehiscence 1,36 .
We also investigated whether differences in sulfur-, ripening-, and flavor-related pathways might be associated with the distinct and multifarious odor descriptors ascribed to different durian cultivars. GSEA results showed that all three pathways were significantly upregulated (sulfur metabolic process, P = 0.0019 by Kolmogorov-Smirnov test; response to ethylene, P = 0.0009; flavonoid metabolic process, P = 0.0009) in Musang King as compared to Monthong and Puang Manee  Peak boundaries corresponding to recent WGDs (green shading) were derived from Gaussian mixture modeling. (d) Inferred phylogenetic tree of paralogous durian and G. raimondii genes originating from their respective WGDs. Numbers represent branching posterior probabilities. The tree topology shows that the divergence of durian and cotton paralogs (derived from a WGD) predated the divergence of durian-cotton orthologs (derived from speciation) with high branching posterior probability, suggesting a shared WGD occurring before durian-cotton speciation (60-65 million years ago; green circle).
( Fig. 3c). These findings correlate well with the stronger perceived taste and smell of the Musang King cultivar.

Genomic expansions of volatile sulfur compound genes
Durian aroma comprises at least two major groups of volatile compounds: (i) sulfur-containing volatiles, such as thiols, disulfides, and trisulfides, which may contribute to the distinct roasty, onion-like odor of durian, and (ii) esters, which may contribute to a sweeter, fruity odor 5,6,37 . The former group (VSCs) is of particular interest, as its odor descriptors most closely reflect the distinctive nature of the durian aroma. Interestingly, analysis of the durian genome identified evolutionary gene family expansions of the MGL gene (four copies in durian, three copies in G. arboreum, two copies in G. raimondii, and one copy in T. cacao, A. thaliana, and O. sativa; Fig. 4a). Studies in microbes and plants have shown that MGL is a major functional contributor to VSC biosynthesis, degrading the sulfurcontaining amino acids cysteine and methionine into methanethiol, which is further broken down into di-and trisulfides 1,38,39 (Fig. 4b).
Phylogenetic analysis of MGL proteins from durian and other species identified two evolutionary MGL subgroups (posterior probability > 0.9999; Fig. 4c). The first group, referred to as MGLa, included singular MGL genes from T. cacao and both cotton species, as well as two durian MGL genes, MGLa1 (Duzib1.0C006G000732) and MGLa2 (Duzib1.0C011G000528). This group likely represents the ancestral, conserved MGL gene family. The second MGL group included only the two remaining duplicated copies of MGL from durian, suggesting that it comprises MGL genes that have diverged in sequence after genome duplication. We denoted the MGL genes in this second group MGLb1 (Duzib1.0C010G000484) and MGLb2 (Duzib1.0C010G000488).
To investigate the functions of the MGLa and MGLb genes in durian, we examined differences in their expression between durian fruit arils and non-fruit organs (stem, root, and leaf). MGLb1 and MGLb2 were consistently upregulated in fruits, with MGLb2 in particular showing dramatically higher expression (DESeq2; 12.73-and 2,241-fold increase relative to non-fruit organs, P = 2.07 × 10 −4 and 9.26 × 10 −100 for MGLb1 and MGLb2, respectively; Fig. 4d and Supplementary Table 9). In contrast, among the MGLa genes, MGLa1 exhibited decreased expression in fruits (7.14-fold decrease, P = 4.44 × 10 −3 ). These results suggest a novel function for durian MGLb in durian fruits. A r t i c l e s To investigate potential mechanisms by which MGL might have expanded from a single copy in the durian-cacao ancestor to four copies in durian, we compared the MGL-containing genomic regions in cacao and durian. The two durian MGLa genes were present on two separate scaffolds, both of which showed regional synteny to the cacao MGL region. In contrast, the two durian MGLb genes were present in close proximity on the same scaffold, which also displayed synteny with the cacao MGL region (Fig. 4e). This suggests that MGL initially expanded in the durian lineage via a large-scale genomic duplication event (such as a WGD), followed by a subsequent tandem duplication event that doubled the durian MGLb genes.
Besides MGL, we also observed significant genomic expansions at the pathway level in gene families related to sulfur metabolism (false discovery rate (FDR) = 0.032), fatty acid metabolism (FDR = 0.018), and ethylene processes (FDR = 1.76 × 10 −7 ) ( Fig. 4a  and Supplementary Table 10). These included the APS (ATP sulfurylase) family, involved in hydrogen sulfide biosynthesis; OASA1 (cysteine synthase), involved in maintaining sulfur levels; the THIK (3-ketoacyl-COA thiolase) family, involved in the production of lipid volatiles; XBAT32 (an E3 ubiquitin-protein ligase), a regulator of ethylene biosynthesis; and ETR2 (ethylene receptor) and AHK5 (histidine kinase), involved in ethylene signaling. Interestingly, some genes in these pathways were expanded in both durian and cotton, supporting findings of the importance of ethylene processing in cotton 21 . Taken together, these results suggest that a WGD event in durian's lineage led to the expansion and diversification of pathways related to durian  volatiles, such as those involved in sulfur processing (including MGL), lipid volatiles, and ethylene. Upregulation of these genes in durian may be linked to increased VSC production, thereby contributing to durian odor.

Potential association between VSC production and ripening
In climacteric plant species, ethylene sensing has been shown to act as a major trigger of fruit ripening. Ethylene production is a three-step reaction, beginning with transfer of the adenosyl group in ATP to L-methionine by SAM synthase, followed by conversion of l-methionine to 1-aminocyclopropane-1-carboxylic acid and ethylene by the enzymes ACS and ACO, respectively 40 . Our genomic and transcriptomic findings suggest a potential association between durian odor and fruit ripening, based on the central role of methionine in both processes. In this model (Fig. 5a), methionine acts as a precursor amino acid for both VSCs and ethylene via MGL and ACS activity, respectively, and both MGL and ACS are upregulated in durian fruits (Figs. 4d and 5b). We also observed upregulation of other genes plausibly related to durian ripening, including PME40, which encodes a pectinesterase that has an important role in fruit softening in strawberries 41 ; C71BV, which encodes a cytochrome P450 upregulated in banana fruits 42 ; and BXL1, which encodes a β-d-xylosidase involved in cell wall modification of ripening tomatoes 43 . To study associations between MGL, ACS, VSCs, and ethylene, we profiled six durian fruits at different ripening stages (six arils for each stage): 1 month post-anthesis, 2 months post-anthesis, 2.5 months postanthesis (ripe), and over 3 d post-abscission (fully ripe). qRT-PCR analysis of MGLa and MGLb in durian arils over the post-anthesis period demonstrated a coordinated spike in expression over the ripening period, in concert with increasing VSC levels as detected by profiling with headspace gas chromatography coupled with mass spectrometry (GC-MS) and increasing ethylene levels ( Fig. 5c and Supplementary  Tables 11-13). The availability of global mass spectrometry profiles also allowed us to identify different types of VSCs such as sulfide complexes and disulfide analogs (details of different VSCs are provided in Supplementary Table 11 A r t i c l e s and ethylene remained high for post-abscission fruits maintained over 3 d at room temperature (Fig. 5d).
The usage of methionine as a precursor for both odor (VSCs) and ripening (ethylene) in durian suggests the presence of a rapidly sulfur-depleted environment in durian fruits, as (i) sulfur is a scarce nutrient, (ii) VSC production via MGL is a sulfur sink, and (iii) 5-methylthioadenosine (MTA), the byproduct of ACS, contains a reduced sulfur group. Consistent with a low-sulfur state, we observed upregulation in fruits of several genes related to sulfur sensing and scavenging, including SDI1, a sulfur-sensing gene, and upregulation of BGL, which encodes a β-glucosidase that cleaves sulfur from glucosinolate 44 . An additional pathway for salvaging sulfur from MTA is the Yang cycle, which allows sulfur moieties from MTA to be recycled back to methionine 45,46 . In summary, our results suggest that the complex aroma of durian is possibly linked to durian fruit ripening, as both ethylene and VSC production are connected to the same regulatory processes and metabolites. However, we emphasize that, in the current study, the potential associations between MGL expression, VSC generation, and ethylene production are correlative and that further functional experiments will be required to establish a direct causal link between these molecular entities.

DISCUSSION
Although durian is well known as a delicacy in tropical food, research focused on it has been hampered by an absence of genetic resources. The D. zibethinus Musang King draft genome thus provides a useful resource for durian agronomy and is, to our knowledge, the first and most complete genome assembly of durian. The D. zibethinus assembly also represents the first from the Helicteroideae subfamily and is thus also valuable for evolutionary phylogenomic studies, similar to cotton and cacao, whose genomes are available and for which genetic research is actively pursued.
Ancient polyploidization events are important evolutionary drivers for the emergence of new phenotypes and new species 22 . Once duplication occurs, novel or specialized functions can arise in redundant alleles, and new regulatory mechanisms can develop through genomic rearrangements. Our observation that durian likely shares the previously described cotton WGD has two implications. First, other subfamilies within Malvaceae, after divergence from the Byttnerioideae subfamily (including cacao), are also likely to have the cotton-specific WGD. Second, this WGD event, which has been proposed to have driven the evolution of unique Gossypium traits 26 , may also have been involved in driving the evolution of unique durian traits, as well as the radiation of different Malvaceae subfamilies around the same time 47,48 .
Our current study focused on analyzing biological processes related to durian odor. VSCs have been identified as major contributors to durian smell, and we observed upregulation of sulfur-related pathways in durian fruit arils in comparison to other durian plant organs and fruits from other species. At the genomic level, we also identified expansions in gene families in sulfur-related pathways, most notably in the MGL gene, whose product has been shown to catalyze the breakdown of methionine into VSCs in microbes and plants 38,39,49 . More specifically, our analysis suggests a distinct role for MGLb genes in the durian fruit, where these genes were found to be significantly upregulated. MGL functions as a homotetramer in which each subunit requires a pyridoxal 5′-phosphate (PLP) cofactor, with the active sites located in the vicinity of PLP and the substrate-binding site 39,49 . While these active sites are conserved in both durian MGLa and MGLb protein sequences as compared to other plant and bacterial MGL sequences, regions proximal to the active sites, which are conserved in other plant species, exhibited more mutations in MGLb than in MGLa (four versus two, respectively; Supplementary Fig. 5). The impact of these mutations on the function of durian MGL gene products requires further investigation.
Our analysis also suggests a potential association between MGL and ACS in the durian fruit, whose coordination may involve the Yang cycle. Previous durian studies have also noted correlations between VSC production and increased ethylene production 36 ; however, a genetic link behind these observations has not been demonstrated. It is possible that linking odor and ripening may provide an evolutionary advantage for durian in facilitating fruit dispersal. Certain plants whose primary dispersal vectors are primates with more advanced olfactory systems show a shift in odor at ripening [50][51][52] . Similarly, durian-by emanating an extremely pungent odor at ripening-appears to have the characteristic of a plant whose main dispersal vectors are odorenticed primates rather than visually enticed animals.
The durian genome assembly creates a large scope for further studies. As an example, rapid commercialization of durian has led to the proliferation of cultivars with a wide discrepancy in prices and little way to verify the authenticity of the fruit products at scale. A high-quality genome assembly may aid in identifying cultivar-specific sequences, including SNPs related to important cultivar-specific traits (such as taste, texture, and odor), and allow molecular barcoding of different durian cultivars for rapid quality control. At a more basic level, such genetic information is vital to better understanding of durian biodiversity. The Durio genus comprises more than 30 known species, some of which do not produce edible fruit (for example, Durio singaporensis) and others that are outcompeted in nature and face extinction. Further studies will help to elucidate the ecological roles of these important and fascinating tropical plants. DNA extraction and library preparation. Genomic DNA was extracted from a fruit stalk using Plant DNAzol reagent (Life Technologies) following the manufacturer's recommendations.
SMRTbell DNA library preparation and sequencing with P6-C4 chemistry were performed in accordance with the manufacturer's protocols (Pacific Biosciences). 50 µg of high-quality genomic DNA was used to generate a 20-kb SMRTbell library using a lower end of 10 kb in size selection on the BluePippin (Sage Science). Initial titration runs were performed to optimize SMRT cell loading and yield. The genome was sequenced employing 52 SMRT cells on the PacBio RSII platform (Pacific Biosciences) by sequencing provider Icahn Mount Sinai School of Medicine.
A short-read genomic library was prepared using the TruSeq Nano DNA Library Preparation kit (Illumina) in accordance with the manufacturer's recommendations. Briefly, 1 µg of extracted intact genomic DNA was sheared (Covaris), with an insert size of 200-300 bp, and ligated to adaptors. DNA fragments were subjected to PCR, and the DNA library was used in paired-end sequencing (2 × 100 bp, ~750 million paired-end reads, ~202× coverage) on the Illumina HiSeq 2500 sequencer.
RNA extraction and library preparation. RNA-seq experiments were conducted for Musang King fruit arils (three arils, 51.6 million paired-end reads each on average) and a combined non-fruit group comprising stem (55.6 million reads), leaf (56.9 million reads), and root (53.2 million reads) profiles. We also sequenced fruit arils of Monthong and Puang Manee cultivars (three arils each, 90 million reads each on average). Our sample sizes and read depths enabled significance testing of differentially expressed genes and gene sets 53 (by DESeq2 and GSEA).
RNA was extracted using the PureLink RNA mini kit (Life Technologies) following the manufacturer's recommended protocol.
For the construction of RNA-seq libraries, 2 µg of total RNA was processed using the TruSeq RNA Sample Preparation kit (Illumina) followed by sequencing on the Illumina HiSeq 2500 platform.
Genome assembly. The main assembly was performed on full PacBio long reads using two diploid assembly approaches. CANU 54 was run with both default and high-sensitivity settings, and FALCON was run with length_cutoff = 5000 for initial mapping of seed reads for the error-correction phase. The CANU assembly had a N50 value indicating a less contiguous assembly, and the assembly did not substantially improve even after applying Redundans, a tool that detects redundant heterozygous contigs and attempts to reduce assembly fragmentation. We subsequently used FALCON because it was better able to phase the diploid genome as well as indicating better contiguity from N50 evaluations. Errors in the PacBio reads were corrected within the FALCON pipeline. Screening for contamination was handled with PacBio's whitelisting pipeline. We subsequently compared a number of length_cutoff values in FALCON for the mapping of seed reads for preassembly ranging from 2,000 to 11,000. On the basis of the contig N50 results, we chose length_cutoff = 9,500 for the preassembly step. We additionally configured DBsplit (-x500, --s400), daligner (-B128, -l4800, --s100, -k18, -h480), and overlap_filtering (--max_diff 100, --max_cov 100) with parameters optimized for our plant assembly. The results from this step were used to perform phased diploid genome assembly using FALCON-Unzip (default parameters). Subsequently, the draft assembly was polished using Arrow (see URLs) over three iterations and finally corrected using Illumina short reads with Pilon.
For additional details about genome assembly validation, see the Supplementary Note. Estimation of genome size and ploidy. We estimated genome size by flow cytometry 55 and obtained an estimate of 800 Mb (Supplementary Table 2). We additionally estimated genome size by k-mer distribution analysis with the program Jellyfish 56 (k = 31), using Illumina short reads, and obtained an estimate of 738 Mb. For additional details about ploidy estimation, see the Supplementary Note.
CHiCAGO and Hi-C library preparation and sequencing. ~500 ng of highmolecular-weight genomic DNA (mean fragment length = 150 kb) was reconstituted into chromatin in vitro for the ChiCAGO library. Chromatin was fixed with formaldehyde for the CHiCAGO library. For the Hi-C library, chromatin was fixed in place with formaldehyde in the nucleus. Fixed chromatin was digested with DpnII, 5′ overhangs filled in with biotinylated nucleotides, and free blunt ends were ligated. After ligation, cross-links were reversed and the DNA was purified from protein. Purified DNA was treated to remove biotin that was not internal to the ligated fragments. The DNA was then sheared to a mean fragment size of ~350 bp, and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adaptors. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 233 million (CHiCAGO) and 280 million (Hi-C) 2 × 150-bp paired-end reads, which provided 379.52× (1-to 100-kb pairs) and 4,370.55× (100-to 1,000-kb pairs) physical coverage of the genome, respectively.
Scaffolding with HiRise. The input de novo assembly, shotgun reads, CHiCAGO library reads, and Dovetail Hi-C library reads were used as input data for HiRise, a software pipeline designed specifically to use proximityligation data to scaffold genome assemblies. An iterative analysis was conducted. First, shotgun and CHiCAGO library sequences were aligned to the draft input assembly using a modified SNAP read mapper 57 . The separation between CHiCAGO read pairs mapping within draft scaffolds was analyzed by HiRise to produce a likelihood model for the genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and to make joins above a selected threshold. After aligning and scaffolding CHiCAGO data, Dovetail Hi-C library sequences were aligned and scaffolded following the same method. After scaffolding, shotgun sequences were used to close gaps between contigs. Scaffolding with CHiCAGO libraries made 167 breaks and 2,359 joins to the contig assembly. We additionally used Hi-C libraries to detect 7 breaks and 706 joins.
Transcriptome assembly. RNA-seq reads were preprocessed by trim_galore to remove contaminating sequences from adaptors as well as sequences with low base quality. Reads were then aligned to the reference genome using HiSat 58 . Reference-guided assembly was performed on the resulting alignment to produce a plant-organ-specific set of transcripts using StringTie 59 . A nonredundant set of transcripts observed in all samples assembled was merged using the merge functionality in StringTie.
Genome annotation. We generated a de novo repeat library from our assembly using RepeatModeler. We removed de novo repeats that matched known Arabidopsis genes (BLASTN E < 10 −5 ). We annotated our de novo repeats using the RepeatClassifier module of RepeatModeler. We combined our de novo library with the RepBase plant repeat database 60 and ran RepeatMasker on the assembly (default parameters).
We used the Maker genome annotation pipeline for gene prediction. Gene evidence was provided in the form of 151,594 protein sequences from four plant species (Arabidopsis, grape, rice, and soybean) and 178,840 transcripts assembled from our RNA-seq data. For details on how Maker was run, see the Supplementary Note.
We transferred GO annotations to our predicted genes using Blast2GO, incorporating sequence homology with Arabidopsis genes using BLASTP (best hit with E < 10 −5 ) and functional domain search with InterProScan 61 .
Short-read alignment and SNP calling. Previously preprocessed Illumina short reads were used for paired-end remapping to the reference assembly using BWA-MEM 62 . The resulting alignments were sorted using SAMtools 63 and PCR, and optical duplicates were marked and removed using Picardtools (see URLs). The processed alignments were then used to infer SNPs with FreeBayes 64 using parameters -C 5, -m 20 --F 0.01. Resulting SNP calls were further filtered to remove those with quality less than 20.

Molecular phylogenetic analysis.
We performed gene family clustering for 11 plant species using OrthoMCL (default parameters) on protein sequences. Molecular phylogenetic analysis was performed using a strict set of 47 singlecopy orthologous genes (Supplementary Table 5 Whole-genome duplication analysis. We used MCScan 67 and the CoGe Comparative Genomics Platform 68 to detect syntenic blocks (regions with at least five collinear genes) and calculate K s rates for syntenic genes. To analyze the durian and G. raimondii WGDs, we first identified paralogous durian and G. raimondii gene pairs originating from their respective WGDs. In durian and G. raimondii separately, we modeled the distribution of K s rates as a mixture model and identified syntenic gene pairs falling within the first peak ± 1 s.d. as paralogs likely derived from the most recent WGD event. Next, we identified OrthoMCL gene families consisting of one such paralogous durian pair, one such paralogous G. raimondii pair, and one gene each from cacao and Arabidopsis. In each of these 62 families, we matched each durian paralog with its cotton ortholog (among the two cotton paralogs), by setting the durian-cotton pair with the lowest synonymous distance as orthologs (Supplementary Table 7). We then used BEAST2 to estimate the phylogenetic tree and divergence times of the paralogous and orthologous genes in durian and G. raimondii, along with cacao and Arabidopsis. BEAST2 was run as described above.
Transcriptome analysis. Previously aligned RNA-seq data were counted against the predicted gene models using HTSeq-count 69 . Raw counts were fitted on a negative binomial distribution, and differential expression was tested for using DESeq2. Hidden variables were removed using the sva package and added to the study design before running DESeq2. Genes were sorted according to their log 2 -transformed fold-change values after shrinkage in DESeq2 and used for gene set analysis with fgsea's implementation of GSEA. Significant gene sets were used to perform leading-edge analysis. We detected clusters of pathways that shared many leading-edge genes using a community detection algorithm and manually curated these clusters to elucidate the important phenotype-associated pathway groups visualized on the bubble plots. Leading edges that also had a fold change of 2 were used to generate bubble plots. DESeq normalized gene expression was used in all bubble plots and heat maps. For additional details about transcriptomic comparisons against other plant species, see the Supplementary Note. Gene family expansion analysis. For gene family expansion analysis, we considered only six plants from the OrthoMCL gene family results: durian, two cotton species (G. raimondii and G. arboretum), cacao, Arabidopsis, and rice. We considered a gene family expanded in durian if (i) the number of durian gene members was greater than or equal to the number of gene members for every other plant considered, (ii) the number of durian gene members was greater than the number of gene members for cacao, Arabidopsis, and rice, and (iii) the number of gene members for all plants considered was greater than 0. These criteria were chosen to detect gene family expansions in durian, as well as expansions in both durian and the closely related cotton species. At the pathway level, we considered whether each pathway was expanded in durian by checking if its durian gene members were enriched in belonging to expanded gene families (Fisher's exact test with Benjamini-Hochberg multiple-hypothesis correction).
Quantitative real-time PCR. cDNA synthesis was performed on 200 ng of total RNA with the SensiFAST cDNA Synthesis kit (Bioline) followed by preamplification with SsoAdvanced PreAmp Supermix (Bio-Rad) according to the manufacturer's recommendations. The cDNA was diluted tenfold, and a 1-µl aliquot of cDNA was used for each qRT-PCR reaction using SsoAdvanced SYBR Green Supermix (Bio-Rad). All samples were run in triplicate for each primer combination. Thermal cycling was performed on a CFX C1000 System (Bio-Rad) in 96-well plates with an initiation step at 95 °C for 30 s, 40 cycles of denaturation at 95 °C for 5 s and extension at 60 °C for 15 s, and a final melting curve analysis on each reaction to confirm the specificity of amplification. Relative quantification of gene expression was performed using three durian housekeeping genes, CAC, DNAJ, and APT, as reference. Ct values were determined using CFXTM manager software (Bio-Rad) and exported into MS Excel (Microsoft) for statistical analysis. Real-time efficiencies (E) were calculated from the slopes of standard curves for each gene (E = 10(1/slope)).

Mass spectrometry.
Headspace solid-phase microextraction (HS-SPME) was used to identify sulfur-containing volatiles in durian fruits. Harvested fruits were kept in a room maintained at 25 °C with a relative humidity of 80-90%. 500 mg of each sample aril was ground under liquid nitrogen and transferred into a 20-ml brown glass headspace vial (Restek). 50 nmol of the internal standard, thiophene (Sigma-Aldrich), was added to each sample vial, and vials were immediately locked with air-tight headspace screw caps. An 85-µm carboxen on polymethylsiloxane (CAR/PDMS) (Supelco, Bellefonte) was used as the SPME. The SPME fiber was conditioned according to the manufacturer's instructions. Online headspace sampling, extraction, and GC-MS analysis were performed on the Agilent 7200 GC-QTOF mass spectrometer equipped with a PAL CTC Headspace autosampler. Headspace extraction and desorption were carried out accordingly. Each sample was incubated for 30 min at 40 °C, followed by extraction of the volatiles for 30 min at 40 °C and 250 r.p.m. Conditioning was performed for 5 min at 250 °C followed by injection into the GC-MS instrument for 1 min at 230 °C. The injector was operated in a 'split' mode at a ratio of 50:1, and the flow rate was maintained at 1 ml/min. An Agilent 19091S-433HP-5MS 5% Phenyl Methyl Silox (30 m × 0.25 mm × 0.25 µm) column was used to separate the volatiles. The oven temperature was increased from 40 °C to 200 °C at a rate of 5 °C/min followed by further increase to 280 °C at a rate of 40 °C/min for 5 min. The total run time was 39 min. Other parameters are as follows: ion source, EI; source temperature, 230 °C; electron energy, 70 eV; acquisition rate, 2 spectra/s; acquisition time, 500 ms/spectrum; mass range, 40-250 a.m.u. Mass spectra were analyzed using Agilent MassHunter Qualitative Analysis software version B.05.00. Identification of metabolites was performed using the NIST version 11 library based on spectral matching. Quantifications of sulfur-containing metabolites are expressed as follows: peak area of metabolite/peak area of internal standard/0.5 g = abundance/gram.

Ethylene gas measurement.
Harvested fruits were all kept in a room maintained at 25 °C with a relative humidity of 80-90%. All daily samplings for the post-abscission fruits were conducted at the same starting time to ensure consistency. Each durian fruit was placed in an airtight 8-L container fitted with gas-sampling ports connected to the ethylene gas analyzer (model F-900, based on electrochemical sensors; Felix Instruments). After 60 min, the headspace air was sampled and measured under polarcept settings following the manufacturer's recommendations. Ethylene measurements were obtained as a mean of three readings sampled at 10-min intervals.

Statistics.
To determine differentially expressed genes, we used DESeq2 to model raw gene counts for gene expression on a negative binomial model. P values were obtained from a two-tailed test with Benjamini-Hochberg multiple-hypothesis correction. To determine disregulated pathways based on gene expression, we used GSEA, which applies a Kolmogorov-Smirnov-like statistic. P values were derived from a permutation test to generate empirical null distributions. To determine significant pathways with gene family expansion, we used a two-tailed Fisher's exact test with Benjamini-Hochberg

Data exclusions
Describe any data exclusions.
As per a Reviewer's request, cultivars that did not have at least 3 biological replicates were removed from this study. This did not change the conclusions of our study.

Replication
Describe whether the experimental findings were reliably reproduced. All replications were successful.

Randomization
Describe how samples/organisms/participants were allocated into experimental groups.

Blinding
Describe whether the investigators were blinded to group allocation during data collection and/or analysis.
No blinding was used, as no randomized controlled trials were conducted, and measurements were not prone to observer bias.
Note: all studies involving animals and/or human research participants must disclose whether blinding and randomization were used.

Statistical parameters
For all figures and tables that use statistical methods, confirm that the following items are present in relevant figure legends (or the Methods section if additional space is needed).

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement (animals, litters, cultures, etc.) A description of how samples were collected, noting whether measurements were taken from distinct samples or whether the same sample was measured repeatedly.
A statement indicating how many times each experiment was replicated The statistical test(s) used and whether they are one-or two-sided (note: only common tests should be described solely by name; more complex techniques should be described in the Methods section) A description of any assumptions or corrections, such as an adjustment for multiple comparisons The test results (e.g. p values) given as exact values whenever possible and with confidence intervals noted A summary of the descriptive statistics, including central tendency (e.g. median, mean) and variation (e.g. standard deviation, interquartile range)

Clearly defined error bars
See the web collection on statistics for biologists for further resources and guidance.