The genome sequence of star fruit (Averrhoa carambola)

Wu, Shasha; Sun, Wei; Xu, Zhichao; Zhai, Junwen; Li, Xiaoping; Li, Chengru; Zhang, Diyang; Wu, Xiaoqian; Shen, Liming; Chen, Junhao; Ren, Hui; Dai, Xiaoyu; Dai, Zhongwu; Zhao, Yamei; Chen, Lei; Cao, Mengxia; Xie, Xinyu; Liu, Xuedie; Peng, Donghui; Dong, Jianwen; Hsiao, Yu-Yun; Chen, Shi-lin; Tsai, Wen-Chieh; Lan, Siren; Liu, Zhong-Jian

doi:10.1038/s41438-020-0307-3

Download PDF

Article
Open access
Published: 01 June 2020

Horticulture Research volume 7, Article number: 95 (2020) Cite this article

5154 Accesses
17 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Oxalidaceae is one of the most important plant families in horticulture, and its key commercially relevant genus, Averrhoa, has diverse growth habits and fruit types. Here, we describe the assembly of a high-quality chromosome-scale genome sequence for Averrhoa carambola (star fruit). Ks distribution analysis showed that A. carambola underwent a whole-genome triplication event, i.e., the gamma event shared by most eudicots. Comparisons between A. carambola and other angiosperms also permitted the generation of Oxalidaceae gene annotations. We identified unique gene families and analyzed gene family expansion and contraction. This analysis revealed significant changes in MADS-box gene family content, which might be related to the cauliflory of A. carambola. In addition, we identified and analyzed a total of 204 nucleotide-binding site, leucine-rich repeat receptor (NLR) genes and 58 WRKY genes in the genome, which may be related to the defense response. Our results provide insights into the origin, evolution and diversification of star fruit.

The chromosome-level genome of dragon fruit reveals whole-genome duplication and chromosomal co-localization of betacyanin biosynthetic genes

Article Open access 10 March 2021

The Physalis floridana genome provides insights into the biochemical and morphological evolution of Physalis fruits

Article Open access 18 November 2021

The Gillenia trifoliata genome reveals dynamics correlated with growth and reproduction in Rosaceae

Article Open access 01 November 2021

Introduction

Wood sorrel (Oxalidaceae family) includes approximately 780 species and is distributed in both tropical and temperate areas. It contains species with various forms, including herbs, shrubs, and trees¹. Wood sorrels are important economic crops and are utilized for both ornamental decoration and medicinal applications^2,3. Based on morphological and molecular data, Oxalidaceae belongs to Oxalidales and is sister to the Connaraceae family. It can be divided into two main subfamilies: Oxalidoideae and Averrhooideae⁴. Averrhooideae differs from Oxalidoideae, an herbaceous subfamily, by having woody plants. It has four genera and is classified into two tribes: Biophyteae (Biophytum) and Averrhoeae (Dapania, Averrhoa and Sarcotheca). It is mostly distributed in tropical and subtropical regions⁵. Although these genera share synapomorphies with other wood sorrels, such as floral morphology, Averrhoa possesses several unique traits, including imparipinnate leaves, an herbaceous to papyraceous form, lateral petiolules that do not leave a stalk on the rachis after dropping, and the presence of 3–7 ovules per locule¹. Therefore, Averrhoa is a key taxon in the evolutionary assessment of wood sorrel structure, and analysis of its genome should reveal new insights into the key adaptations that contribute to the diversification within Oxalidaceae^6,7.

Averrhoa carambola, known as star fruit, originated in Asia and has been cultivated in Southeast Asia and Malaysia for many centuries^8,9,10. Because the flesh is juicy and rich in vitamin C, star fruit is a commonly consumed tropical fruit¹¹. In China, the total consumption of star fruit is approximately 2.6 million tons per year, while the annual production of star fruit in China is approximately two million tons¹¹. In addition, star fruit is widely cultivated as a street tree in southern Chinese cities due to its dense foliage and star-like fruit¹². Furthermore, the characteristics of bearing flowers and fruits on the main trunk and flowering year-round when temperatures exceed 27 °C in tropical regions¹³ make it an ideal species with which to explore economic and interesting traits (cauliflory, defined as flowering from the lower branch and trunk areas of woody plants, and high yield) at the whole-genome scale.

Cephalotus follicularis is a carnivorous plant native to southwest Australia that belongs to the monospecific family Cephalotaceae and is the only species of the order Oxalidales with a sequenced and annotated genome¹⁴. The relationship between star fruit and Cephalotus follicularis is not currently known and could be better understood through a comparison of their sequenced genomes.

Here, we present a complete genome sequence for A. carambola. Comparisons of genomic data with those from other flowering plants provide fundamental insights into the origin, evolution, adaptation, and diversification of star fruit.

Results and discussion

Genome sequencing and characterization

A. carambola has a karyotype of 2N = 2X = 22 chromosomes¹⁵. To sequence its genome, we utilized Illumina HiSeq short reads. We obtained a total of 131 Gb of raw reads with short inserts after library construction and sequencing (Supplementary Table 1). The A. carambola genome was estimated to be 357.79 Mb in size with a heterozygosity of 1.15% based on 17-K-mer analysis (Supplementary Fig. 1). Next, we utilized Oxford Nanopore Technology (ONT) and obtained a total of 52.33 Gb of long reads (Supplementary Table 1). The ONT reads were corrected and assembled to produce a 335.49 Mb genome with a contig N50 size of 4.22 Mb (Supplementary Table 2). Then, the draft assembly was polished using short reads, and BUSCO (Benchmarking Universal Single-Copy Orthologs, v3.1.0) assessment indicated that the completeness of the genome was 96.30%, suggesting that the A. carambola genome is nearly complete and of high quality (Supplementary Tables 2, 3).

We additionally used 42.76 Gb of Hi-C clean data to reconstruct physical maps by reordering and clustering the assembled scaffolds. We anchored 90.88% of the assembly (305.13 Mb) onto 11 pseudochromosomes using a hierarchical clustering strategy (Supplementary Table 4). Chromatin interaction data were used to assess the quality of the Hi-C assembly, which indicated that the assembly was of high quality (Supplementary Fig. 2). The length of the pseudochromosomes ranged from 17.89 Mb to 33.98 Mb with an N50 value of 31.25 Mb (Supplementary Tables 4, 5).

Approximately 61.3% of the A. carambola genome was found to be composed of repetitive elements (transposon elements, TEs) (Supplementary Figs. 3, 4 and Supplementary Table 6). We confidently annotated 25,419 protein-coding genes (Supplementary Table 7 and Supplementary Fig. 5), of which 21,316 (83.86%) were supported by transcriptome data (Supplementary Table 8). A total of 94.8% of annotated BUSCO gene models were identified, suggesting the near completeness of gene prediction (Supplementary Table 8). In addition, we identified 86 microRNAs, 581 transfer RNAs, 71 ribosomal RNAs and 212 small nuclear RNAs in the A. carambola genome (Supplementary Table 9).

Evolution of gene families

We then constructed a high-confidence phylogenetic tree and estimated the divergence times of 24 different plant species based on genes extracted from a total of 93 single-copy families (see Methods and Supplementary Table 10). The estimated divergence time of this set of Oxalidaceae species was 102 Mya (Supplementary Fig. 6). Next, we determined the expansion and contraction of orthologous gene families using CAFÉ 4.2 (Supplementary Fig. 7). Thirty-three gene families were expanded in the lineage leading to Oxalidales, whereas 904 families were contracted (Fig. 1). Four hundred ninety gene families were expanded in A. carambola, while 1021 gene families were contracted (Fig. 1).

**Fig. 1: The expansion and contraction of gene families.**

By comparing 24 different plant species, 504 gene families, including 8153 A. carambola genes, appeared to be unique to carambola (see Methods, Supplementary Figs. 7, 8 and Supplementary Table 10). We performed GO and KEGG enrichment analysis (Supplementary Table 11) and found that these gene families were enriched in several categories (Supplementary Tables 12–17).

Collinearity analysis

Genes are typically conserved both in function and order inside collinear fragments among closely related species. We utilized MCScanX¹⁶ to assess the collinearity among species related to Averrhoa carambola and found that all the predicted genes except TE-related genes were highly conserved in both function and order (Fig. 2 and Supplementary Figs. 9, 10).

Whole-genome duplication

Whole-genome duplication (WGD) is a process of genome doubling that dramatically increases genome complexity. One of the striking features of plant genomes is that WGD has occurred many times^17,18. WGD is a particularly important feature of angiosperm genomes. To determine whether the Averrhoa genome had undergone WGD during evolution, we used Ks (synonymous substitutions per site) distribution analysis. The paralogous gene pairs were extracted from the OrthoMCL results, and the Ks values were calculated using CodeML in the PAML package^16,19.

There was a peak between Ks values of 1.6–1.8, indicating that the A. carambola genome had undergone a WGD event. Further analysis of A. carambola and C. follicularis revealed that the common ancestor prior to the differentiation of Averrhoa and C. follicularis experienced a WGD event. This event was likely the γ event shared by most core eudicots (Fig. 3). Comparative genomic analysis between the A. carambola and C. follicularis genomes revealed a one-to-one syntenic relationship, suggesting that no WGD events occurred after speciation (Supplementary Fig. 11).

**Fig. 3: Ks distribution of *Averrhoa carambola*.**

MADS-box genes of star fruit

MADS-box genes are known to be involved in many important processes during plant development and are especially known for their roles in flowering and flower development²⁰. Because Averrhoa is well known for its cauliflory, defined as flowering from the lower branch and trunk areas of woody plants, we focused on identifying and characterizing the MADS-box genes in the Averrhoa genome in more detail.

In total, 74 putative functional MADS-box genes and two pseudogenes were identified in A. carambola (Table 1). This number is less than the number of MADS-box genes found in Arabidopsis thaliana²¹ and Theobroma cacao^22,23 but greater than the number found in C. follicularis (Table 1). A. carambola has 48 type-II MADS-box genes, which is comparable to the number in A. thaliana (45) but less than the number found in T. cacao (67) (Table 1). Phylogenetic analysis (Supplementary Fig. 12) showed that many of the type-II MADS-box genes were duplicated, except those in the B-class, PI, FLC, AGL12, and Bs clades. The duplicated type-II clades included A-class (three members), B-class AP3 (two members), C/D-class (three members), E-class (four members), AGL6 (two members), SOC1 (three members), AGL15 (two members), ANR1 (five members), SVP (15 members), and MICK* (five members) clades (Table 1). Notably, we found that the SVP (15 members) clade in A. carambola contained more genes than that in A. thaliana, C. follicularis and cocoa tree (Table 1). The large number of SVP clade members might be due to tandem duplication. In Arabidopsis, the two SVP paralogs SVP and AGL24 are involved in floral transition and development. SVP suppresses flowering by acting with FLC to negatively regulate SOC1 and FT²⁴. In contrast, AGL24 acts as a flowering activator to activate SOC1 expression during inflorescence development²⁵. We detected transcripts of all SVP-like genes except Yangtao2024516 in vegetative leaves and shoots (Fig. 4). Interestingly, expression of an SVP-like gene (Yangtao2024516) was detected in the inflorescence and flower buds (Fig. 4). These results suggested that Yangtao2024516 functions in flowering activation, and the other 14 SVP-like genes might also be related to flowering suppression. The expanded SVP clade, including members with differential expression patterns in A. carambola, might contribute to flowering regulatory networks and relate to A. carambola cauliflory.

Table 1 MADS-box genes in Averrhoa carambola, Arabidopsis thaliana, Cephalotus follicularis and Theobroma cacao.

Full size table

**Fig. 4: Expression patterns of genes involved in the flowering.**

Cauliflorous flowering occurs due to the presence of adventitious buds that remain dormant and take years to develop after their formation, when the trunk develops and thickens due to secondary growth²⁶. It has been reported that SVP2 functions in the suppression of meristem activity to prevent precocious budbreak in the perennial species kiwifruit²⁷. The flowering repressor SVP is also controlled by the autonomous pathway and directly represses SOC1 transcription in Arabidopsis²⁸. Interestingly, we found that genes involved in the autonomous flowering time pathway might repress the expression of SVP-like genes in reproductive tissues to promote flowering through activation of the floral integrator genes FLOWERING LOCUS T and SOC1 (Figs. 4, 5). Further study on the A. carambola flowering time pathways will improve our knowledge about causal genes, with applications in commercial variety selection.

**Fig. 5: Putative autonomous flowering.**

Compared with type-II MADS-box genes, only 26 putative functional type-I genes and two pseudogenes were characterized (Table 1), which may have been due to the low duplication rate and high loss rate of type-I MADS-box genes. Tandem duplications seem to have contributed to more type-I genes, especially those in the Mα group. This suggests that type-I MADS-box genes have mainly been duplicated during smaller-scale and more-recent duplications²¹. Functional studies of type-I MADS-box genes in A. carambola will advance our understanding of their role in Oxalidaceae.

Analysis of NLR genes and the WRKY gene family

Resistance (R) genes produce R proteins to provide plant disease resistance against pathogens, most of which are nucleotide-binding site leucine-rich repeat (NLR) genes. NLRs contain three classes based on their N-terminal domain structures, namely, the toll/interleukin-1 receptor (TIR) NLR (TNL), coiled-coil (CC) NLR (CNL), and resistance to powdery mildew8 (RPW8) NLR (RLN)²⁹. Here, a genome-wide identification of NLR genes in star fruit was carried out, resulting in the identification of 204 NLR genes, including homologues of known resistance genes: RNL (with 13 members), TNL (with 41 members) and CNL (with 150 members) (Table 2). There seem to be more NLR genes in A. carambola than in A. thaliana (174 genes)³⁰ but fewer than in the basal angiosperm Nymphaea colorata (342 genes)³¹. Phylogenetic analysis of the NLR genes indicated that tandem duplication contributed to the increasing number of CNL genes. Remarkably, the NLR genes in the C. follicularis genome¹⁴ showed sharp contraction after the speciation of A. carambola and C. follicularis, with only one RNL and one TNL identified (Supplementary Fig. 13).

Table 2 NLR genes in Averrhoa carambola, Arabidopsis thaliana, Cephalotus follicularis and Nymphaea colorata.

Full size table

Plant WRKY proteins are transcription factors (TFs) involved in regulating plant growth and development, as well as responses to many biotic and abiotic stresses³². An investigation of the star fruit genome revealed 58 WRKY genes. There appear to be fewer WRKY genes in star fruit than in the model plant A. thaliana (72 genes)³³ and the basal angiosperm N. colorata (69 genes)³¹. The closely related species C. follicularis contains 42 WRKY genes¹⁴ (Table 3). Phylogenetic analysis showed that WRKY genes, especially type-IIc and type-I genes, underwent expansion in A. carambola, with the exception of those in the type-IIa clade. Among the 58 WRKY genes, type I (with 10 members), type II (33 members) and type III (eight members) contained more members than in C. follicularis (six members in type I, 29 members in type II and seven members in type III) (Supplementary Fig. 14). Interestingly, there was an unknown clade present in A. thaliana and C. follicularis, which contained three members in the star fruit genome and two members in the N. colorata genome.

Table 3 WRKY genes in Averrhoa carambola, Arabidopsis thaliana, Cephalotus follicularis and Nymphaea colorata.

Full size table

Conclusion

Although star fruit is well known as a delicacy in the tropics, research on it has been hampered by the absence of genetic resources. We identified new gene families and analyzed gene family expansion and contraction by comparisons with related species. Significant changes were found in the MADS-box gene family, which might be related to the cauliflory of A. carambola. Evolutionary analysis revealed one WGD event, which was the γ event experienced by most eudicots. The A. carambola assembly represents the first from the wood sorrel family and thus provides a valuable resource for evolutionary phylogenomic studies.

Material and methods

Plant sample preparation and sequencing

Diploid A. carambola was cultivated at the South China Botanical Garden, Guangzhou, Guangdong Province, China. Fresh young leaves were collected to extract genomic DNA for 400 bp paired-end library construction and Illumina sequencing. Genome size and heterozygosity were calculated using KmerFreq and GCE based on a 17-K-mer distribution. The high-molecular-weight DNA of fresh young leaves was extracted and randomly fragmented to construct a 20 kb library for Oxford Nanopore sequencing.

Genome assembly and chromosome anchoring

The raw fastq data from ONT sequencing were transformed and filtered using MinKNOW software. Canu (v1.7) was used to correct and trim the raw ONT reads with the default parameters. The corrected and trimmed ONT reads were assembled by SMARTdenovo using 21-mers³⁴. The assembled contigs were then polished three times with Pilon (v1.22) using Illumina short reads. The quality of the genome assembly was estimated by searching for BUSCO v3.1.0 using the Embryophyta ODB 10 database.

Fresh leaves of A. carambola were used to construct a Hi-C sequencing library, including chromatin crosslinking, chromatin digestion with HindIII, biotin labeling and end repair, DNA purification and streptavidin pull-down of labeled Hi-C ligation products. The library was then sequenced using the Illumina platform, and the clean sequences were mapped to the draft genome, with valid Hi-C reads employed to correct the draft assembly. Finally, the draft genome of A. carambola was assembled into chromosomes using Lachesis (9).

Identification of repetitive sequences

Tandem repeats across the genome were predicted using Tandem Repeats Finder (v4.07b, http://tandem.bu.edu/trf/trf.html). Transposable elements (TEs) were first identified using RepeatMasker (http://www.repeatmasker.org, v3.3.0) and RepeatProteinMask based on the Repbase TE library³⁵. Next, two de novo predication software programs, RepeatModeler (http://repeatmasker.org/RepeatModeler.html) and LTR_FINDER³⁶, were used to identify TEs in the star fruit genome. Finally, repeat sequences with identities ≥ 50% were grouped into the same class.

Gene prediction and annotation

Three independent methods were used for gene prediction. Homologous sequence searching was performed by comparing protein sequences of five sequenced species against the star fruit genome using the TBLASTN algorithm with a cut-off E-value of ≤ 1e⁻⁵. Then, the corresponding homologous genome sequences were aligned against BLAST hits using GeneWise v2.4.1 to extract accurate exon–intron information³⁷. Three ab initio prediction software programs, Augustus v3.0.2³⁸, FGENESH³⁹ and GlimmerHMM⁴⁰, were employed for de novo gene prediction. Then, the homology-based and ab initio gene structures were merged into a nonredundant gene model using GLEAN. The completeness of gene prediction was evaluated using BUSCO v3.1.0. Finally, the RNA sequencing (RNA-Seq) reads were mapped to the assembly using TopHat v2.0.11⁴¹, and Cufflinks v2.2.1⁴² was applied to combine mapping results for transcript structural predictions.

The protein sequences of the consensus gene set were aligned to various protein databases, including GO (The Gene Ontology Consortium), KEGG⁴³, InterPro⁴⁴, Swiss-Prot and TrEMBL, for the annotation of predicted genes. The rRNAs were identified by aligning the rRNA template sequences from the Rfam⁴⁵ database against the genome using the BLASTN algorithm at an E-value cut-off of 1e⁻⁵. The tRNAs were predicted using tRNAscan-SE⁴⁶, and other ncRNAs were predicted by running Infernal 0.81 software against the Rfam database.

Genome evolution analysis

Gene families present in the 24 genomes were identified using OrthoMCL⁴⁷. Peptide sequences from 93 single-copy gene families were used to construct phylogenetic relationships and estimate divergence times. Alignments from MUSCLE were then converted to coding sequences. Fourfold degenerate sites were concatenated and used to estimate the neutral substitution rate per year and the divergence time. PhyML⁴⁸ was used to construct a phylogenetic tree. The Bayesian relaxed molecular clock approach was used to estimate species divergence times using the program MCMCTREE v4.0, which is part of the PAML package⁴⁹. The ‘correlated molecular clock’ and ‘JC69’ models were used. Published Arabidopsis–Papaya (54–90 Mya), P. trichocarpa–Arabidopsis (100–129 Mya), monocot–dicot (140 Mya) and angiosperm (<200 Mya) divergence times were used for calibration⁵⁰.

Transcriptome sequencing and expression analysis

The inflorescence, floral bud, leaf and shoot were collected independently from A. carambola. Total RNA was extracted according to the manufacturer’s protocol. Illumina RNA-Seq libraries were prepared and sequenced on a HiSeq 2500 system following the manufacturer’s instructions (Illumina, USA). Two biological replicates were analyzed for each sample. To estimate gene expression levels, clean reads of each sample were mapped onto the assembled genome to obtain read counts for each gene using HTSeq-count and normalized to FPKM counts⁴².

Phylogenetic reconstruction of the MADS-box gene family

For the identification of MADS-box family members in A. carambola, BLASTp was applied using known plant MADS-box proteins as reference sequences^21,51, with an E-value cut-off of ≤ 1e−5. For phylogenetic analyses, a total of 333 protein sequences, including 76 A. carambola AcMADS-box, 106 A. thaliana AtMADS-box²¹, 53 C. follicularis CfMADS-box¹⁴ and 98 T. cacao TcMADS-box^22,23, were aligned using MUSCLE⁵² (v3.8.31; http://www.drive5.com/muscle) with default parameters. A phylogenetic tree was drawn through RAxML⁵³ with the GTRGAMMA substitution model and 1000 bootstraps on the CIPRES website (https://www.phylo.org/portal2/home.action)⁵⁴. The phylogenetic tree was visualized using FigTree software (http://tree.bio.ed.ac.uk/software/figtree) (Supplementary Fig. 12).

NLR genes and WRKY TF identification

All the annotated protein sequences of Arabidopsis, C. follicularis, and N. colorata were downloaded. The NLR gene (PF00931) and WRKY TF (PF03106) models were obtained from the Pfam database (http://pfam.xfam.org/). The NLR genes and WRKY TFs were then identified using HMMER v3.2.1. The identified sequences were confirmed and filtered using the UniProt database (https://www.uniprot.org/) and the NCBI database (https://www.ncbi.nlm.nih.gov/). Then, MAFFT v7.407 was used to align multiple sequences of candidate proteins with default parameters. A maximum likelihood (ML) phylogenetic tree was constructed for the protein sequences of NLR genes and WRKY TFs from Arabidopsis^30,33, C. follicularis¹⁴, N. colorata³¹ and star fruit by FastTree with default parameters.

Data availability

The whole-genome sequence data reported in this paper have been deposited in the Genome Warehouse of the National Genomics Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession number GWHABKE00000000 and are publicly accessible at https://bigd.big.ac.cn/gwh.

References

Veldkamp, J. F. in Flora Malesiana (ed. van Steenis, C. G. C. J.) 155–178 (Noordhoff, Leyden, 1971).
Khare, C. P. Indian Medicinal Plants 74–75 (Springer, New York, 2007).
Gunawardena, D. C., Jayasinghe, L. & Fujimoto, Y. Phytotoxic constituents of the fruits of Averrhoa carambola. Chem. Nat. Compd. 51, 532–533 (2015).
Article CAS Google Scholar
The Angiosperm Phylogeny Group. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 181, 1–20 (2016).
Article Google Scholar
Stevens, P. F. Angiosperm Phylogeny Website. Version 14, July 2017 [and more or less continuously updated since] (2001 onwards).
Heibl, C. & Renner, S. S. Distribution models and a dated phylogeny for Chilean Oxalis species reveal occupation of new habitats by different lineages, not rapid adaptive radiation. Syst. Biol. 61, 823–834 (2012).
Article PubMed Google Scholar
Sun, M., Naeem, R., Su, J. X., Cao, Z. Y. & Chen, Z. D. Phylogeny of the Rosidae: a dense taxon sampling analysis. J. Syst. Evol. 54, 363–391 (2016).
Article Google Scholar
Fieischmann, P., Watanabe, N. & Winterhalter, P. Enzymatic carotenoid cleavage in star fruit (Averrhoa carambola). Phytochemistry 63, 131–137 (2003).
Article CAS Google Scholar
Neto, M. M. et al. Intoxication by star fruit (Averrhoa carambola) in 32 uraemic patients: treatment and outcome. Nephrol. Dial. Transplant. 18, 120–125 (2003).
Article PubMed Google Scholar
Carolino, R. O. G. et al. Convulsant activity and neurochemical alterations induced by a fraction obtained from fruit Averrhoa carambola (Oxalidaceae: Geraniales). Neurochem. Int. 46, 523–531 (2005).
Article CAS PubMed Google Scholar
Shui, G. G. & Leong, L. P. Residue from star fruit as valuable source for functional food ingredients and antioxidant nutraceuticals. Food Chem. 97, 277–284 (2005).
Article CAS Google Scholar
Bin-Ngah, A.W., Ahmad, I. & Hassan, A. Carambola Production, Processing and Marketing in Malaysia (Inter-American Society for Tropical Horticulture (IASTH), 1992).
Khoo, H. et al. A review on underutilized tropical fruits in Malaysia. Guangxi Agr. Sci. 41, 698–702 (2010).
Google Scholar
Fukushima, K. et al. Genome of the pitcher plant Cephalotus reveals genetic changes associated with carnivory. Nat. Ecol. Evol. 1, 0059 (2017).
Article Google Scholar
Wu, Q. Study on Technology of Breeding polyploid of Averrhoa Carambola by Biotechnique. Master dissertation. Southwest Agricultural University, Chongqing (2002).
Wang, C. et al. Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res. 25, 246 (2015).
Article PubMed PubMed Central CAS Google Scholar
Cui, L. et al. Widespread genome duplications throughout the history of flowering plants. Genome Res. 16, 738–749 (2006).
Article CAS PubMed PubMed Central Google Scholar
Van de Peer, Y., Maere, S. & Meyer, A. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10, 725–732 (2009).
Article PubMed CAS Google Scholar
Blanc, G. & Wolfe, K. H. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16, 1667–1678 (2004).
Article CAS PubMed PubMed Central Google Scholar
Cai, J. et al. The genome sequence of the orcid Phalaenopsis equestris. Nat. Genet. 47, 65–72 (2015).
Article CAS PubMed Google Scholar
Parenicova, L. et al. Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell 15, 1538–1551 (2003).
Article CAS PubMed PubMed Central Google Scholar
Argout, X. et al. The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies. BMC Genomics 18, 730 (2017).
Article CAS PubMed PubMed Central Google Scholar
Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101–108 (2011).
Article CAS PubMed Google Scholar
Willmann, M. R. & Poethig, R. S. The effect of the floral repressor FLC on the timing and progression of vegetative phase change in Arabidopsis. Development 138, 677–685 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gregis, V., Sessa, A., Dorca-Fornell, C. & Kater, M. M. The Arabidopsis floral meristem identity genes AP1, AGL24 and SVP directly repress class B and C floral homeotic genes. Plant J. 60, 626–637 (2009).
Article CAS PubMed Google Scholar
Oliveira, G. P. et al. Origin and development of reproductive buds in jabuticaba cv. Sabará (Plinia jaboticaba Vell). Sci. Horticulturae 249, 432–438 (2019).
Article Google Scholar
Wu, R. Kiwifruit SVP2, a dormancy regulator. Eur. J. Hort. Sci. 83, 231–235 (2018).
Article Google Scholar
Li, D. et al. A repressor complex governs the integration of flowering signals in Arabidopsis. Dev. Cell. 15, 110–120 (2008).
Article CAS PubMed Google Scholar
Shao, Z. et al. Large-scale analyses of angiosperm nucleotide-binding site-leucine-rich repeat genes reveal three anciently diverged classes with distinct evolutionary patterns. Plant Physiol. 170, 2095–2109 (2016).
Article CAS PubMed PubMed Central Google Scholar
Maleck, K. et al. The transcriptome of Arabidopsis thaliana during systemic acquired resistance. Nat. Genet. 26, 403–410 (2000).
Article CAS PubMed Google Scholar
Zhang, L. S. et al. The water lily genome and the early evolution of flowering plants. Nature 577, 79–84 (2020).
Article CAS PubMed Google Scholar
Phukan, U. et al. WRKY transcription factors: molecular regulation and stress responses in plants. Front Plant Sci. 7, 760 (2016).
Article PubMed PubMed Central Google Scholar
Wang, Q., Wang, M., Zhang, X., Hao, B., Kaushik, S. K. & Pan, Y. WRKY gene family evolution in Arabidopsis thaliana. Genetica 139, 973 (2011).
Article CAS PubMed Google Scholar
Schmidt, M. H. W. et al. De novo assembly of a new Solanum pennellii accession using nanopore sequencing. Plant Cell 29, 2336–2348 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–268 (2007).
Article PubMed PubMed Central Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M., Schoffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform. 7, 62 (2006).
Article CAS Google Scholar
Salamov, A. A. & Solovyev, V. V. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522 (2000).
Article CAS PubMed PubMed Central Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-Seq. experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ogata, H. et al. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27, 29–34 (1999).
Article CAS PubMed PubMed Central Google Scholar
Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–215 (2009).
Article CAS PubMed Google Scholar
Griffiths, J. S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–124 (2005).
Article Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556 (1997).
CAS PubMed Google Scholar
Zhang, G. Q. et al. The Apostasia genome and the evolution of orchid. Nature 549, 379–383 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zdobnov, E. M. & Apweiler, R. InterProScan-an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001).
Article CAS PubMed Google Scholar
Letunic, L., Doerks, T. & Bork, P. SMART: recent updates, new develpoments and status in 2015. Nucleic Acids Res. 43, D257–D260 (2014).
Article PubMed PubMed Central CAS Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Miller, M. A., Pfeiffer, W. & Schwartz, T. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. 2010 Gateway Computing Environments Workshop (GCE), (ed. Institute of Electrical and Electronic Engineers) 1–8 (New Orleans, LA, 2010).

Download references

Acknowledgements

This work was supported by The National Key Research and Development Program of China (ref. 2019YFC1711103), the Fujian Agriculture and Forestry University Science and Technology Innovation Special Fund Project (ref. KFA17331A), the Natural Science Foundation of Fujian (ref. 2019J01410), and the Fujian Agriculture and Forestry University 2015 Outstanding Youth Fund Project (ref. xjq201620). The authors thank Junjie Wei and Mingtao Jiang for collecting needed samples.

Author information

These authors contributed equally: Shasha Wu, Wei Sun, Zhichao Xu, Junwen Zhai

Authors and Affiliations

Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
Shasha Wu, Junwen Zhai, Xiaoping Li, Chengru Li, Diyang Zhang, Xiaoqian Wu, Liming Shen, Xiaoyu Dai, Zhongwu Dai, Yamei Zhao, Lei Chen, Mengxia Cao, Xinyu Xie, Xuedie Liu, Donghui Peng, Jianwen Dong, Siren Lan & Zhong-Jian Liu
Institute of Chinese Materia Medica, Chinese Academy of China Medical Sciences, Beijing, 100700, China
Wei Sun & Shi-lin Chen
Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100193, China
Zhichao Xu
State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin’an, Hangzhou, 311300, China
Junhao Chen
Horticulture Research Institute, Guangxi Academy of Agricultural Sciences, Nanning, 530007, China
Hui Ren
Orchid Research and Development Center, National Cheng Kung University, Tainan, 701, China
Yu-Yun Hsiao & Wen-Chieh Tsai
Institute of Tropical Plant Sciences and Microbiology, National Cheng Kung University, Tainan City, 701, China
Yu-Yun Hsiao & Wen-Chieh Tsai

Authors

Shasha Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Junwen Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Li
View author publications
You can also search for this author in PubMed Google Scholar
Chengru Li
View author publications
You can also search for this author in PubMed Google Scholar
Diyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Liming Shen
View author publications
You can also search for this author in PubMed Google Scholar
Junhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hui Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Dai
View author publications
You can also search for this author in PubMed Google Scholar
Zhongwu Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yamei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mengxia Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xuedie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Donghui Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jianwen Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Yun Hsiao
View author publications
You can also search for this author in PubMed Google Scholar
Shi-lin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Chieh Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Siren Lan
View author publications
You can also search for this author in PubMed Google Scholar
Zhong-Jian Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shi-lin Chen, Wen-Chieh Tsai, Siren Lan or Zhong-Jian Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, S., Sun, W., Xu, Z. et al. The genome sequence of star fruit (Averrhoa carambola). Hortic Res 7, 95 (2020). https://doi.org/10.1038/s41438-020-0307-3

Download citation

Received: 21 November 2019
Revised: 21 February 2020
Accepted: 02 March 2020
Published: 01 June 2020
DOI: https://doi.org/10.1038/s41438-020-0307-3

This article is cited by

Haplotype-resolved genome assembly of Coriaria nepalensis a non-legume nitrogen-fixing shrub
- Shi-Wei Zhao
- Jing-Fang Guo
- Jian-Feng Mao
Scientific Data (2023)
Genome-wide Transcriptome Analysis Reveals the Gene Regulatory Network in Star Fruit Flower Blooming
- Si Qin
- Xiao-Ping Li
- Sha-Sha Wu
Tropical Plant Biology (2023)
Chromosome-scale assembly of the Dendrobium chrysotoxum genome enhances the understanding of orchid evolution
- Yongxia Zhang
- Guo-Qiang Zhang
- Zhong-Jian Liu
Horticulture Research (2021)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results and discussion

Genome sequencing and characterization

Evolution of gene families

Collinearity analysis

Whole-genome duplication

MADS-box genes of star fruit

Analysis of NLR genes and the WRKY gene family

Conclusion

Material and methods

Plant sample preparation and sequencing

Genome assembly and chromosome anchoring

Identification of repetitive sequences

Gene prediction and annotation

Genome evolution analysis

Transcriptome sequencing and expression analysis

Phylogenetic reconstruction of the MADS-box gene family

NLR genes and WRKY TF identification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links