Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa

Qu, Minghao; Fan, Xiangrong; Hao, Chenlu; Zheng, Yi; Guo, Sumin; Wang, Sen; Li, Wei; Xu, Yanqin; Gao, Lei; Chen, Yuanyuan

doi:10.1038/s41597-023-02270-4

Download PDF

Data Descriptor
Open access
Published: 24 June 2023

Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa

Minghao Qu ORCID: orcid.org/0009-0001-1113-6427^1,2^na1,
Xiangrong Fan^3,4,5^na1,
Chenlu Hao^1,2,
Yi Zheng⁶,
Sumin Guo¹,
Sen Wang⁶,
Wei Li ORCID: orcid.org/0000-0003-4310-2544^3,4,5,
Yanqin Xu⁷^na2,
Lei Gao ORCID: orcid.org/0000-0002-2435-3180^1,8^na2 &
…
Yuanyuan Chen ORCID: orcid.org/0000-0002-7142-0788^3,4^na2

Scientific Data volume 10, Article number: 407 (2023) Cite this article

1332 Accesses
1 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Water chestnut (Trapa L.) is a floating-leaved aquatic plant with high edible and medicinal value. In this study, we presented chromosome-level genome assemblies of cultivated large-seed species Trapa bicornis and its wild small-seed relative Trapa incisa by using PacBio HiFi long reads and Hi-C technology. The T. bicornis and T. incisa assemblies consisted of 479.90 Mb and 463.97 Mb contigs with N50 values of 13.52 Mb and 13.77 Mb, respectively, and repeat contents of 62.88% and 62.49%, respectively. A total of 33,306 and 33,315 protein-coding genes were predicted in T. bicornis and T. incisa assemblies, respectively. There were 159,232 structural variants affecting more than 11 thousand genes detected between the two genomes. The phylogenetic analysis indicated that the lineage leading to Trapa was diverged from the lineage to Sonneratia approximately 23 million years ago. These two assemblies provide valuable resources for future evolutionary and functional genomic research and molecular breeding of water chestnut.

A chromosome-level reference genome of the wax gourd (Benincasa hispida)

Article Open access 07 February 2023

Improved chromosome-level genome assembly of Indian sandalwood (Santalum album)

Article Open access 21 December 2023

Haplotype-resolved chromosomal-level assembly of wasabi (Eutrema japonicum) genome

Article Open access 11 July 2023

Background & Summary

Trapa L., known as water chestnut or water caltrop, is the only genus of Trapaceae. Although the Angiosperm Phylogeny Group (APG) IV treated Trapaceae belonging to Lythraceae, the term “Trapaceae” is still used by some scholars today due to a handful of morphological differences between the two families¹. Trapa plants are annual floating-leaved herbs naturally growing in temperate, subtropical and tropical regions of the Old World, and invasive in Australia and North America². They reproduce sexually and/or asexually and have a high degree of autogamy^3,4. The genus has two diversity centers, i.e. the Yangtze River Basin (central China) and the Amur River- Tumen River Basin (the border between China and Russia)⁵. Trapa plants have high edible value because of their large starchy seeds, which has a long history of consumption. In China, archaeological studies found that water chestnut was widely eaten during the Neolithic Age (7000-2000 BC) with 21 unearthed sites in the basins of the Yellow River and Yangtze River⁶. In ancient Europe, inhabitants also gathered water chestnut seeds as part of their diet between 4000 and 1000 BC⁷. The cultivation of water chestnut can be traced back to the Tang (618–907 AD) and Song (916–1279 AD) dynasties⁸ in the middle and lower reaches of the Yangtze River. At present, it is an important aquatic crop widely grown in China and India⁹. Additionally, the tender Trapa seeds, stems and leaves are used as vegetables because of the fresh and sweet taste, whereas their seed pericarps are traditional Chinese medicine because of their bioactive components in the treatment of cancer, inflammation and atherosclerosis^10,11,12. Furthermore, Trapa has significant ecological value in improving water quality due to its strong absorption capacity for heavy metals and pollutants¹³.

A better understanding of species identification, evolutionary relationships and genetic information will greatly facilitate the effective management and sustainable utilization of wild plant resources. However, the classification of Trapa species is still open to debate because of their similar morphology of vegetative organs and the highly variable seeds. Some scholars argued that the genus contained more than 20, 30 or 70 species, while others merged them into one or two polymorphic species¹⁴. The quantitative taxonomic studies based on morphological variations showed that Trapa species with similar seed sizes were closely related, and all species were divided into two branches, the large- and small-seed clusters¹⁵. This was well supported by the molecular studies based on chloroplast (cp) sequences^14,16. The cp genome analysis also showed that both the geographical origin and tubercle morphology of seeds were of great significance for deducing relationship within Trapa¹⁴. Cytological studies showed two different chromomeric numbers in Trapa (2n = 2x = 48 and 2n = 4x = 96) and suggested that the tetraploid might be a hybrid of diploids¹⁷, which was supported by molecular analyses based on allozymes as well as nuclear and chloroplast DNA sequeences^18,19. The existence of the two distinct subgenomes was directly confirmed by the recently published chromosome-level assembly of a tetraploid Trapa natans (AABB) genome⁸. Furthermore, the resequencing data exhibited that large-seed species contained both diploids (2n = 2x = 48, AA) and tetraploids (2n = 4x = 96, AABB), and the small-seed ones only contained diploids (2n = 2x = 48, BB)⁸. It is a pity that the genome sequences of representatives of the ‘AA’ and ‘BB’ genomes are not available, though such species are very common in the Trapa genus.

Here, we sequenced the genomes of the typical cultivated species Trapa bicornis Osbeck (AA) and a small-seed species Trapa incisa Sieb. et Zucc. (BB), which would greatly deepen the understanding of Trapa diversity and the origin of tetraploid Trapa. De novo assembly using PacBio high-fidelity (HiFi) long reads generated 479.90 and 463.97 Mb contigs for T. bicornis and T. incisa with N50 values of 13.51 and 13.77 Mb, respectively. After scaffolding by Hi-C reads, 98.0% and 98.1% of the contigs could be successfully anchored into 24 pseudo-chromosomes for each genome, respectively. We predicted 33,306 and 33,315 protein-coding genes in T. bicornis and T. incisa genomes, respectively. Despite good collinearity, there were 159,232 structural variations (SVs) identified between the genomes of T. bicornis and T. incisa, overlapping with more than 11 thousand genes. Divergence time estimation indicated that T. bicornis and T. incisa diverged around 1.51 million years ago. The generation of the two genomes provides baseline information of the diversity of Trapa species, which will eventually facilitate functional genomic analysis and molecular breeding of water chestnut.

Methods

Sample collection and sequencing

Seeds of T. bicornis and T. incisa were collected from Honghu (29.39°N/113.07°E), Hubei province, China (Fig. 1). Plants were cultured outdoors from March to July in water tanks in Wuhan Botanical Garden, Chinese Academy of Science, Hubei province, China. The 90-day-old individuals for each species were used for the DNA/RNA extractions.

Genomic DNA was isolated from fresh young leaves using Cetyltrimethylammonium bromide (CTAB) method²⁰. A total amount of 1.5 µg DNA per sample was used as input material for the Illumina paired-end library construction. Each library with an average insert size of 350 bp was generated using Truseq Nano DNA HT Sample preparation Kit (Illumina USA) following manufacturer’s instructions. These libraries were sequenced by Illumina HiSeq X Ten system. A total of 125.97 Gb and 53.14 Gb paired-end reads (PE150) covering roughly 183.38 × and 112.42 × of genomes were generated for T. bicornis and T. incisa, respectively (Table 1).

Table 1 Sequencing data of T. bicornis and T. incisa genome.

Full size table

For PacBio long-read sequencing, about 10 µg genomic DNA were sheared into fragments of 10-20 kb in length by g-TUBE (Covaris USA). The fragmented DNA was purified by AMPure PB magnetic beads. The High-fidelity (HiFi) libraries were generated using SMRTbell Express Template Prep Kit 2.0 and sequenced on PacBio Sequel IIe platform (Pacific Biosciences, Menlo Park, USA). A total of 24.11 Gb and 20.42 Gb HiFi reads with N50 sizes of 17,588 bp and 13,963 bp were obtained using the CCS (Circular Consensus Sequencing) software with default parameters (https://ccs.how/), which covered 49.23 × and 43.20 × of T. bicornis and T. incisa genomes, respectively (Table 1).

The high-throughput chromosome conformation capture (Hi-C) libraries were constructed using 5 µg DNA. The DNA crosslinking was performed by 4% formaldehyde. The linked DNA was digested with DpnII restriction endonuclease, labelled with biotin-14-DCTP and then ligated by T4 DNA Ligase. The ligated DNA was sheared into 200-600 bp fragments and sequenced on Illumina HiSeq X Ten system with the paired-end module. About 111.79 Gb and 103.65 Gb of raw data were obtained for T. bicornis and T. incisa, respectively (Table 1).

RNA was extracted from roots, petioles, leaves, flowers and fruits, respectively, using Tiangen RNAprep pure plant kit (Tiangen Biotech, China). Libraries were constructed using NEBNext UltraTM RNA Library Prep Kit (NEB, USA) according to the manufacturer’s instructions, and sequenced on Illumina Novaseq. 6000 platform. RNA-seq datasets from different tissues of the same species were combined as evidence for genome annotation. A total of 34.05 Gb and 36.68 Gb RNA-seq reads were obtained for T. bicornis and T. incisa, respectively (Table 1).

Genome assembly

The PacBio HiFi reads of each genome were de novo assembled by using hifiasm v0.16.1²¹ with default parameters. The assemblies had a total size of 489.65 Mb and 472.74 Mb, containing 325 and 262 contigs with N50 sizes of 13.52 Mb and 13.77 Mb for T. bicornis and T. incisa, respectively (Table 2). The cleaned Hi-C reads were mapped to the corresponding contigs using Juicer v1.9.9²². The unique mapped reads were taken as input for 3D-DNA pipeline v180114²³ with parameters “-r 0” and then sorted and corrected manually by using JuicerBox v1.11.08²⁴. Finally, a total of 24 pseudo-chromosomes was obtained, which contained 98.01% and 98.14% of the assembled contigs for T. bicornis and T. incisa, respectively (Fig. 2).

Table 2 Assessment of T. bicornis and T. incisa assemblies.

Full size table

We assessed the integrity of the genomes using the BUSCO v5.0 (Benchmarking Universal Single-Copy Orthologs)²⁵ with the ‘embryophyta_odb10’ database. The T. bicornis and T. incisa assemblies contained 97.70% [S:85.10%, D:12.60%, F:0.90%, M:1.40%, n:1614] and 97.80% [S:84.70%, D:13.10%, F:0.80%, M:1.40%, n:1614] of the 1,614 conserved genes, respectively, which are similar to the corresponding values of the diploid T. natans (C: 96.41% [S: 84.76%, D: 11.65%, F: 0.43%, M: 3.16%, n: 1614])²⁶. Based on the Illumina PE150 reads, we assessed the consensus quality values (QV) of the two assemblies using Merqury v2020-01-29²⁷ with “k-mer = 20”. For T. bicornis and T. incisa assemblies, the mapping rate of the reads were 99.88% and 99.61%, respectively, and the QV values were 49.70 and 43.91, respectively (Table 2). These evaluations indicated that the two genome assemblies were of considerable completeness, contiguity and accuracy.

Genome annotation

Custom repeat libraries for each genome were constructed by screening the genome using LTR_finder²⁸, ltrharvest²⁹ and RepeatModeler-2.0.2a³⁰. Then, the non-redundant repeats from Repbase³¹ and Dfam³² databases were extracted and added to the custom libraries. RepeatMasker v 4.1.2-p1 (http://www.repeatmasker.org) was used to identify repeat sequences based on the custom libraries. A total of 307.95 Mb (62.88%) and 295.42 Mb (62.49%) repetitive sequences were annotated in the T. bicornis and T. incisa genomes, respectively (Table 3).

Table 3 Genome annotation of repetitive sequences and protein-coding genes.

Full size table

For protein-coding gene annotation, we employed RNA-seq-based, ab initio and homologue-based predictions to identify gene models. The clean RNA-seq reads were aligned to the assemblies using HISAT2 v2.2.1³³, and then the alignment was converted to gtf format by StringTie2 v2.1.6³⁴. Furthermore, TransDecoder v5.5.0³⁵ was used to identify the open reading frame (ORF) and modify the boundaries of exons. The ab initio gene predictions were generated by three de novo predicting programs, including Augustus-3.3.3³⁶, SNAP v2006-07-28³⁷ and GlimmerHMM 3.0.4^38,39. Proteins from Punica granatum⁴⁰, Arabidopsis thaliana TAIR10⁴¹, Eucalyptus grandis⁴², Melaleuca alternifolia⁴³ and tetraploid Trapa natans⁸ were aligned to the genomes using TBLASTN⁴⁴. The homologous genes were identified using Exonerate v2.2.0⁴⁵. The RNA-seq evidences, ab initio predictions and homolog evidences were fed to MAKER v3.01⁴⁶ to generate the final gene set. A total of 33,306 and 33,315 protein-coding genes were predicted in the T. bicornis and T. incisa genomes, respectively.

Functional annotation of protein-coding genes were evaluated based on five public databases, including GO (http://geneontology.org/), KEGG (https://www.kegg.jp/), GenBank nr (https://www.ncbi.nlm.nih.gov/), Uniprot (https://www.uniprot.org/) and Interpro (http://www.ebi.ac.uk/interpro/), using DIAMOND v2.0.13.151⁴⁷. A total of 31,360 (94.14%) and 31,406 (94.27%) genes were successfully annotated in at least one database for T. bicornis and T. incisa, respectively (Table 3). The BUSCO completeness values were 97.70% and 98.10% of the predicted proteins of T. bicornis and T. incisa, respectively (Table 3).

Variations between the T. bicornis and T. incisa genomes

Single nucleotide polymorphisms (SNPs) between the genomes of T. bicornis and T. incisa were detected by alignment of the two assemblies using NUCmer from MUMMER4⁴⁸. We set the minimum alignment length to 100 bp and retained the uniquely matching fragments. A total of 9,449,234 SNPs were identified by show-snps tool from MUMMER4⁴⁸ (Fig. 3).

To identify SVs, T. incisa genome was mapped to T. bicornis genome by using Minimap2⁴⁹ with the parameter “-ax asm5”. Assemblytics was adopted to extract unique alignments and identify SVs based on them⁵⁰. Protein-coding genes overlapping with SV regions were retrieved by BEDTools v2.29.1⁵¹. The final SVs were classified into seven categories: deletion, insertion, repeat contraction, repeat expansion, tandem contraction, tandem expansion and substitution. A total of 159,232 SVs were identified between T. bicornis and T. incisa genomes, which accounted for 110.49 Mb and 140.13 Mb sequences of the two genomes, respectively (Table 4). These SVs overlapped with 11,265 and 11,621 genes of the two Trapa genomes, respectively.

Table 4 The structure variations detected between the T. bicornis and T. incisa genomes.

Full size table

The synteny between the published tetraploid T. natans genome and the present two diploid Trapa genomes

Our new assemblies provided great resource for investigating the origin of the Trapa tetraploid and the genomic changes post-polyploidization. The genomes of T. bicornis and T. incisa and the two subgenomes of the published tetraploid genome were pairwise aligned with each other by using MUMMER4⁴⁸ (Fig. 4). The syntenic regions were extracted from the alignments with the software syri-1.4⁵². Clearly, the T. bicornis and T. incisa genomes possessed the highest percentage of syntenic regions with the A and B subgenomes of T. natans, respectively, suggesting that the formers represented the ancestry genomes of the latter two, separately. The percentage of syntenic regions between the A and B subgenomes (69.01%) was higher than that between the T. bicornis and T. incisa genomes (59.81%), evidencing homoeologous recombination events after tetraploidization⁵³.

Comparative genomics and divergence time estimation

Using OrthoFinder v2.5.2⁵⁴, orthologous groups were constructed for 11 species, including Arabidopsis thaliana⁴¹, Brassica oleracea⁵⁵, Citrus sinensis⁵⁶, Corymbia citriodora²⁶, Eucalyptus grandis⁴², Melaleuca alternifolia⁴³, Punica granatum⁴⁰, Sonneratia alba⁵⁷, Trapa bicornis, Trapa incisa and tetraploid Trapa natans⁸ (AABB), which was divided into two subgenomes. A total of 1,105 single copy orthologues were obtained, and they were aligned using MUSCLE v3.8.31⁵⁸. The alignments of protein sequence were converted into nucleotide sequences. The final alignments of orthologous groups were concatenated to build a maximum likelihood phylogenetic tree using RAxML-8.2.12⁵⁹ with “GTRGAMMA” model. The figure of phylogenetic tree was visualized by iTOLv6⁶⁰. Divergence times among the species were estimated using the MCMC tree program implemented in PAML v4.9i⁶¹. The reference divergence time was obtained from http://timetree.org/. The three species (Citrus sinensis, Arabidopsis thaliana and Brassica oleracea) were constrained as root in the time-calibrated phylogeny. Due to the lack of strong morphological evidence, the relationship between Trapa and Lythraceae has been unclear historically⁶². Here, our phylogenetic tree (Fig. 5) showed that Trapa was sister to the genus Sonneratia (Lythraceae s.l.), which was also supported by previous studies based on chloroplast and nuclear sequences^14,63,64. According to the time-calibrated phylogeny, the Trapa-Sonneratia clade diverged from Punica (Lythraceae) at ca 35.24 million years ago. Then, the two genera (Trapa and Sonneratia) diverged ca 23 Mya ago, and the two Trapa species with distinct genomes (T. bicornis: AA; T. incisa: BB) diverged ca 1.5 Mya.

Data Records

The raw data of Illumina PE150 reads, PacBio HiFi long reads and Hi-C reads from T. bicornis were submitted to the National Center for Biotechnology Information (NCBI) SRA (Sequence Read Archive) database with accession number SRR22185068⁶⁵, SRR22185067⁶⁶, SRR22185066⁶⁷ under BioProject accession number PRJNA893431⁶⁸. The RNA-seq data for the five tissues are also under the PRJNA893431⁶⁸. For T. incisa, the raw data of Illumina, PacBio and Hi-C sequencing had been deposited in SRA database as SRR22094614⁶⁹, SRR22094613⁷⁰ and SRR22094612⁷¹ under PRJNA894094⁷². And the RNA-seq data are also under the same BioProject accession. The assembly genome files were stored in GenBank database under the accession GCA_030064425.1⁷³ and GCA_030064435.1⁷⁴, respectively. The genomes and annotation files and raw sequencing data have also been uploaded in National Genomics Data Center (NGDC) under PRJCA012133⁷⁵ and PRJCA012134⁷⁶.

Technical Validation

The quality scores across all bases and GC content of the Illumina raw sequencing data were inspected by FastQC v0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Contig level and chromosome level of the assemblies were assessed in four ways: N50 for continuity, QV for accuracy, BUSCO for completeness and paired-end reads mapping rate for consistency with raw data. The protein-coding genes were verified by values of BUSCO and functional databases annotation. For construction of phylogenetic tree, each branch received 100% bootstrap values.

Code availability

The scripts and command lines were uploaded on the github (https://github.com/fcbayern31/A-pipeline-for-common-genomic-analysis.git). All softwares, which are in the public domain, were used in accordance with the official instructions. Anything not specified in the method is executed with default parameters.

References

The Angiosperm Phylogeny Group. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 181, 1–20 (2016).
Article Google Scholar
Chen, J., Ding, B. Y. & Funston, M. Trapaceae. In Flora of China 13, 290–291 (2007).
CAS Google Scholar
Arima, S., Daigoho, M. & Hoque, M. A. Flower development and anthesis behavior in the water chestnut (Trapa sp.). Bull. Fac. Agric. 84, 83–92 (1999).
Google Scholar
Li, X., Fan, X., Chu, H., Li, W. & Chen, Y. Genetic delimitation and population structure of three Trapa taxa from the Yangtze River, China. Aquat. Bot. 136, 61–70 (2017).
Article Google Scholar
Xue, Z., Xue, J., Victorovna, K. & Ma, K. The complete chloroplast DNA sequence of Trapa maximowiczii Korsh. (Trapaceae), and comparative analysis with other Myrtales species. Aquat. Bot. 143, 54–62 (2017).
Article CAS Google Scholar
Guo, Y., Wu, R., Sun, G., Zheng, Y. & Fuller, B. T. Neolithic cultivation of water chestnuts (Trapa L.) at Tianluoshan (7000-6300 cal BP), Zhejiang Province, China. Sci. Rep. 7, 16206 (2017).
Article ADS PubMed PubMed Central Google Scholar
Karg, S. The water chestnut (Trapa natans L.) as a food resource during the 4th to 1st millennia BC at Lake Federsee, Bad Buchau (southern Germany). Environ. Archaeol. 11, 125–130 (2006).
Article Google Scholar
Lu, R. et al. Genome sequencing and transcriptome analyses provide insights into the origin and domestication of water caltrop (Trapa spp., Lythraceae). Plant Biotechnol. J. 20, 761–776 (2022).
Article CAS PubMed Google Scholar
Hummel, M. & Kiviat, E. Review of world literature on water chestnut with implications for management in North America. J. Aquat. Plant Manage. 42, 17–28 (2004).
Google Scholar
Ciou, J., Wang, C., Chen, J. & Chiang, P. Total phenolics content and antioxidant activity of extracts from dried water caltrop (Trapa taiwanensis nakai) hulls. J. Food Drug Anal. 16, 41–47 (2008).
CAS Google Scholar
Yu, H. & Shen, S. Phenolic composition, antioxidant, antimicrobial and antiproliferative activities of water caltrop pericarps extract. Lwt-Food Sci. Technol. 61, 238–243 (2015).
Article CAS Google Scholar
Kauser, A. et al. In vitro antioxidant and cytotoxic potential of methanolic extracts of selected indigenous medicinal plants. Prog. Nutr. 20, 706–712 (2018).
CAS Google Scholar
Xu, L. et al. Assessment of the nutrient removal potential of floating native and exotic aquatic macrophytes cultured in swine manure wastewater. Int. J. Environ. Res. Public Health 17, 1103 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fan, X. et al. Fifteen complete chloroplast genomes of Trapa species (Trapaceae): insight into genome structure, comparative analysis and phylogenetic relationships. BMC Plant Biol. 22, 230 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fan, X. et al. Analysis of morphological plasticity of Trapa L. from China and their taxonomic significance. Plant Sci. J. 34, 340–351 (2016).
Google Scholar
Wang, W., Fan, X., Li, X. & Chen, Y. The complete chloroplast genome sequence of Trapa incisa Sieb. & Zucc. (Lythraceae). Mitochondrial DNA B Resour. 6, 1732–1733 (2021).
Article PubMed PubMed Central Google Scholar
Oginuma, K., Takano, A. & Kadono, Y. Karyomorphology of some Trapaceae in Japan. Acta Phytotax. Geobot. 47, 47–52 (1996).
Google Scholar
Kim, C., Ryun, N. H. & Choi, H. Molecular genotyping of Trapa bispinosa and T. japonica (Trapaceae) based on nuclear AP2 and chloroplast DNA trnL-F region. Am. J. Bot. 97, e149–152 (2010).
Article CAS PubMed Google Scholar
Takano, A. & Kadono, Y. Allozyme variations and classification of Trapa (Trapaceae) in Japan. Aquat. Bot. 83, 108–118 (2005).
Article CAS Google Scholar
Doyle, J. & Doyle, J. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19, 11–15 (1987).
Google Scholar
Cheng, H., Concepcion, G., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258 e251 (2018).
Article CAS PubMed PubMed Central Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Article CAS PubMed PubMed Central Google Scholar
He, Z. et al. Evolution of coastal forests based on a full set of mangrove genomes. Nat. Ecol. Evol 6, 728–749 (2022).
Article Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article CAS PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–268 (2007).
Article PubMed PubMed Central Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
Article PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–82 (2013).
Article CAS PubMed Google Scholar
Kim, D., Paggi, J., Park, C., Bennett, C. & Salzberg, S. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Article CAS PubMed Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Article CAS PubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Article PubMed PubMed Central Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Majoros, W. H. & Salzberg, S. L. An empirical analysis of training protocols for probabilistic gene finders. BMC Bioinformatics 5, 206 (2004).
Article PubMed PubMed Central Google Scholar
Luo, X. et al. The pomegranate (Punica granatum L.) draft genome dissects genetic divergence between soft- and hard-seeded cultivars. Plant Biotechnol. J. 18, 955–968 (2020).
Article CAS PubMed Google Scholar
Berardini, T. Z. et al. The Arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome. Genesis 53, 474–485 (2015).
Article CAS PubMed PubMed Central Google Scholar
Myburg, A. A. et al. The genome of Eucalyptus grandis. Nature 510, 356–362 (2014).
Article ADS CAS PubMed Google Scholar
Voelker, J., Shepherd, M. & Mauleon, R. A high-quality draft genome for Melaleuca alternifolia (tea tree): a new platform for evolutionary genomics of myrtaceous terpene-rich species. Gigabyte 2021, 1–15 (2021).
Article Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).
Article PubMed PubMed Central Google Scholar
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Article CAS PubMed PubMed Central Google Scholar
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Article CAS PubMed PubMed Central Google Scholar
Marcais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Article PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Goel, M. et al. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome biol. 20, 1–13 (2019).
Article Google Scholar
Gaeta, R. T. & Chris, P. J. Homoeologous recombination in allopolyploids: the polyploid ratchet. New Phytologist 186, 18–28 (2010).
Article CAS PubMed Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Article PubMed PubMed Central Google Scholar
Parkin, I. A. et al. Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea. Genome Biol. 15, R77 (2014).
Article PubMed PubMed Central Google Scholar
Wang, L. et al. Somatic variations led to the selection of acidic and acidless orange cultivars. Nat. Plants 7, 954–965 (2021).
Article CAS PubMed Google Scholar
Healey, A. L. et al. Pests, diseases, and aridity have shaped the genome of Corymbia citriodora. Commun. Biol. 4, 537 (2021).
Article CAS PubMed PubMed Central Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49, W293–W296 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Graham, S. A., Crisci, J. V. & Hoch, P. C. Cladistic analysis of the Lythraceae sensu lato based on morphological characters. Bot. J. Linn. Soc. 113, 1–33 (1993).
Article Google Scholar
Graham, S. A., Hall, J., Sytsma, K. & Shi, S. H. Phylogenetic analysis of the Lythraceae based on four gene regions and morphology. Int. J. Plant Sci. 166, 995–1017 (2005).
Article CAS Google Scholar
Huang, Y. L. & Shi, S. H. Phylogenetics of Lythraceae sensu lato: a preliminary analysis based on chloroplast rbcL gene, psaA-ycf3 spacer, and nuclear rDNA internal transcribed spacer (ITS) sequences. Int. J. Plant Sci. 163, 215–225 (2002).
Article CAS Google Scholar
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/SRR22185068 (2022).
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/SRR22185067 (2022).
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/SRR22185066 (2022).
NCBI BioProject https://www.ncbi.nlm.nih.gov/bioproject/PRJNA893431 (2022).
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/SRR22094614 (2022).
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/SRR22094613 (2022).
NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/SRR22094612 (2022).
NCBI BioProject https://www.ncbi.nlm.nih.gov/bioproject/PRJNA894094 (2022).
NCBI GenBank https://www.ncbi.nlm.nih.gov/assembly/GCA_030064425.1 (2022).
NCBI GenBank https://www.ncbi.nlm.nih.gov/assembly/GCA_030064435.1 (2022).
NGDC BioProject https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA012133 (2022).
NGDC BioProject https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA012134 (2022).

Download references

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (32170395 and 82060684), the Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Talent Program of Wuhan Botanical Garden, Chinese Academy of Sciences (Y855291), Young and Middle-Aged Talents Training Program of Traditional Chinese Medicine of Jiangxi Province (2020-2) and Jiangxi University of Chinese Medicine Science and Technology Innovation Team Development Program (CXTD22002).

Author information

These authors contributed equally: Minghao Qu, Xiangrong Fan.
These authors jointly supervised this work: Yanqin Xu, Lei Gao, Yuanyuan Chen.

Authors and Affiliations

Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Plant Germplasm Research Center, Wuhan Botanical Garden, Innovative Academy of Seed Design, Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
Minghao Qu, Chenlu Hao, Sumin Guo & Lei Gao
University of Chinese Academy of Sciences, Beijing, 100049, China
Minghao Qu & Chenlu Hao
Aquatic Plant Research Center, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
Xiangrong Fan, Wei Li & Yuanyuan Chen
Hubei Key laboratory of Wetland evolution & ecological restoration, Wuhan Botanical Garden, Chinese academy of sciences, Wuhan, Hubei, 430074, China
Xiangrong Fan, Wei Li & Yuanyuan Chen
Research Center for Ecology, College of Science, Tibet University, Lhasa, Tibet, 850000, China
Xiangrong Fan & Wei Li
Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Bioinformatics Center, Beijing University of Agriculture, Beijing, 102206, China
Yi Zheng & Sen Wang
College of Pharmacy, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, 330004, China
Yanqin Xu
Hubei Hongshan Laboratory, Wuhan, Hubei, 430070, China
Lei Gao

Authors

Minghao Qu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangrong Fan
View author publications
You can also search for this author in PubMed Google Scholar
Chenlu Hao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Sumin Guo
View author publications
You can also search for this author in PubMed Google Scholar
Sen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanqin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.G. and Y.C. conceived this project; X.F. and Y.C. collected the samples; M.Q. and C.H. performed the data analyses; M.Q. and Y.C. wrote the manuscript; L.G., Y.C., Y.X., Y.Z., S.W., W.L. and S.G. revised the manuscript. All authors have read and approved the final manuscript for publication.

Corresponding authors

Correspondence to Yanqin Xu, Lei Gao or Yuanyuan Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Qu, M., Fan, X., Hao, C. et al. Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa. Sci Data 10, 407 (2023). https://doi.org/10.1038/s41597-023-02270-4

Download citation

Received: 07 November 2022
Accepted: 26 May 2023
Published: 24 June 2023
DOI: https://doi.org/10.1038/s41597-023-02270-4