Chromosome-scale genomes of commercial timber trees (Ochroma pyramidale, Mesua ferrea, and Tectona grandis)

Sahu, Sunil Kumar; Liu, Min; Chen, Yewen; Gui, Jinshan; Fang, Dongming; Chen, Xiaoli; Yang, Ting; He, Chengzhong; Cheng, Le; Yang, Jinlong; Sahu, Durgesh Nandini; Li, Linzhou; Wang, Hongli; Mu, Weixue; Wei, Jinpu; Liu, Jie; Zhao, Yuxian; Zhang, Shouzhou; Lisby, Michael; Liu, Xin; Xu, Xun; Li, Laigeng; Wang, Sibo; Liu, Huan

doi:10.1038/s41597-023-02420-8

Download PDF

Data Descriptor
Open access
Published: 03 August 2023

Chromosome-scale genomes of commercial timber trees (Ochroma pyramidale, Mesua ferrea, and Tectona grandis)

Scientific Data volume 10, Article number: 512 (2023) Cite this article

2161 Accesses
6 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Wood is the most important natural and endlessly renewable source of energy. Despite the ecological and economic importance of wood, many aspects of its formation have not yet been investigated. We performed chromosome-scale genome assemblies of three timber trees (Ochroma pyramidale, Mesua ferrea, and Tectona grandis) which exhibit different wood properties such as wood density, hardness, growth rate, and fiber cell wall thickness. The combination of 10X, stLFR, Hi-Fi sequencing and HiC data led us to assemble high-quality genomes evident by scaffold N50 length of 55.97 Mb (O. pyramidale), 22.37 Mb (M. ferrea) and 14.55 Mb (T. grandis) with >97% BUSCO completeness of the assemblies. A total of 35774, 24027, and 44813 protein-coding genes were identified in M. ferrea, T. grandis and O. pyramidale, respectively. The data generated in this study is anticipated to serve as a valuable genetic resource and will promote comparative genomic analyses, and it is of practical importance in gaining a further understanding of the wood properties in non-model woody species.

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Phylogenomics and the rise of the angiosperms

Article Open access 24 April 2024

Genetic gains underpinning a little-known strawberry Green Revolution

Article Open access 19 March 2024

Background & Summary

Wood being the most important natural and permanently sustainable energy source, plays an important role as an eco-efficient alternative to fossil fuels^1,2. Wood, also known as secondary xylem, is the major structure that gives stability to woody plants and supplies all other plant tissues with water from the roots. Over the last decades, our knowledge of cellular wood formation (xylogenesis) has increased significantly^2,3,4,5. Wood is formed due to the action of the vascular cambium, which is composed of meristematic initials that generate phloem or xylem precursor cells. Hormonal signals predominate in the proliferation of the cells of the vascular cambium, and the plant hormone auxin plays a critical role in wood formation^6,7,8. Although many genes encoding the components (cellulose, xylan, glucomannan, and lignin) of wood biosynthesis have been identified and functionally characterized in poplar and Arabidopsis^8,9,10,11, how and which genes particularly affect the wood properties are largely unknown. To tailor wood for our use, it is critical to dissect the molecular and biochemical mechanisms controlling wood formation¹². The knowledge gained from such studies can be applied to genetically modify the wood quantity and quality¹³.

With the sequencing of the genomes of increasing numbers of tree species (excluding model plants)^{14,15,16,17,18,19,20}, it is now possible to uncover the molecular mechanisms controlling the formation of wood. So far, the genome sequences of several important tree species have been released like Populus trichocarpa²¹, Eucalyptus grandis²², Picea abies (Norway spruce)²³, Broussonetia papyrifera (Paper Mulberry)²⁴, Morus notabilis (Mulberry tree)²⁵, Tectona grandis (Teak tree)¹⁷, Dalbergia odorifera²⁶, Dipterocarpus turbinatus and Hopea hainanensis²⁷. Despite the ecological and economic importance of wood, not all aspects of its formation have been unveiled. Therefore, for the present study, we selected three non-model tree species which exhibit different wood properties such as wood density, hardness, growth rate, and fiber cell wall thickness (Fig. 1a, Table S1).

Ochroma pyramidale or balsa is a very fast-growing evergreen tree of Malvaceae family that can reach 25 m in 5 years, and prefer to be grown in a warm and humid environment. It is grown for the production of balsa wood, a very soft and light wood, with a coarse, open grain, and it is one of the lightest wood in trade (USDA, 2019) (Table S1). The density of dry balsa wood ranges from 40–340 kg/m³, with a typical density of about 160 kg/m³. This spongy texture of the wood is due to its large cells that are filled with water (Fig. 1a, Fig. S1).

Mesua ferrea, the ironwood, Indian rose chestnut, or cobra’s saffron, is an evergreen species of Calophyllaceae (Table S1). This slow-growing tree is named after the heaviness (high wood density) and hardness of its timber (Fig. 1a), and it typically grows in tropical and sub-tropical regions. The density is 940 to 1,195 kg/m³ (59 to 75 lb/ft3) at 15% moisture content. Since it is difficult to saw, it is mostly used for railroad ties and heavy structural timber (FAO, 2016) (Fig. S2).

Teak (Tectona grandis L.f.;2n = 2x = 36) is a tropical and deciduous hardwood tree species belonging to Lamiaceae family (Table S1). It can survive and grow in a variety of climatic and edaphic conditions, but it thrives in a warm, moist, tropical climate with a wide difference in dry and wet seasons. Teak is a highly prestigious wood because of its look, strength (intermediate wood density, Fig. 1a) and resistance to decay and is commonly used in the construction of ships, boats, furniture and aesthetic needs²⁸, including several medicinal properties²⁹ (Fig. S3).

Considering the lacunae of key genomic resources with respect to wood formation, fast growth, and genetic architecture of woody plants, we present high-quality chromosome-scale genomes of three economically important timber trees which display varying growth and wood traits. A total of 35774, 24027, and 44813 protein-coding genes were identified in M. ferrea, T. grandis and O. pyramidale, respectively. Based on the K-mer analysis, the genome sizes were estimated to be 1,884 Mb, 534 Mb and 296 Mb for O. pyramidale, M. ferrea and T. grandis, respectively (Table S2). The final assembled genomes showed 552.7 Mb and 306.08 Mb with scaffold N50 values of 0.76 Mb and 0.34 Mb for M. ferrea and T. grandis, respectively (Figs. 1a, 2, Figs. S4, S5, Table 1, Table S3). While for O. pyramidale, the preliminary genome assembly based on the sequencing reads generated by Hi-Fi technology reached 1.84 Gb with a contig N50 of 42.7 Mb (Table S3). We discovered the existence of whole genome duplications (WGDs) in all three timber trees. Noticeably, O. pyramidale retained huge numbers of WGD genes compared to others, and the post-WGD retained duplicated gene pairs likely triggered the huge expansion and expression of several wood-formation related TFs (NAC and MYBs), auxin hormone signaling, nutrition and energy supply, and CAzyme-related genes in the fast-growing O. pyramidale compared to other tree species. The comprehensive data of these forest trees will serve as a valuable genetic resource, and thus it is of practical importance in gaining further understanding of the complex process of wood biosynthesis, and the basis of the physical and chemical properties of wood.

Table 1 The assembly and annotation statistics.

Full size table

Methods

Sample collection, DNA/RNA extraction, library construction and sequencing

The fresh young leaves of M. ferrea (HCNGB_00001601) and T. grandis (HCNGB_00001711) were collected from Ruili Botanical Garden, Yunnan, China, while samples of O. pyramidale (HCNGB_ 00009295) were procured from Xishuangbanna Tropical Botanical Garden, Yunnan. The DNA was extracted using CTAB (cetyltrimethylammonium bromide) method³⁰. The amount of DNA extracted from each sample was determined using a Qubit 4.0 fluorometer (Invitrogen, USA). The quality of the extracted DNA was assessed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA). The DNA was considered pure if the OD260/280 ratio was between 1.8 and 2.0, and the OD260/230 ratio was between 2.0 and 2.2. The molecular weight of the DNA was evaluated using pulse-field gel electrophoresis (PFGE), and the DNA above 40 kilobases (Kb) was considered for library construction. The GEM and barcode sequences were generated based on the standard protocol (Chromium Genome Chip Kit v1, 10X Genomics, Pleasanton, USA) for M. ferrea and O. pyramidale (Fig. S4). The barcode libraries were sequenced on a BGISEQ-500 platform to generate 150-bp read pairs³¹. For Tectona grandis the library construction and sequencing were performed using the Single Tube Long Fragment Read (stLFR) technology³², a method that allows data from long DNA molecules to be sequenced using low-cost second-generation sequencing technology. The library was sequenced on a BGISEQ-500 platform to generate 100-bp and 140-bp reads for read1 and read2, respectively (Table S4). We constructed a SMRTbell target size library using 15 µg high molecular weight (HMW) genomic DNA and used the standard methodology for PacBio-HiFi sequencing. We used Sequencing Primer V2 and Sequel II Binding Kit 2.0 in Grandomics to shear genomic DNA to expected size of fragments for sequencing on a PacBio Sequel II instrument (Table S4).

We also collected branch samples from each tree species to collect xylem and phloem tissues³³. The fresh young leaves, phloem and xylem were used to extract the RNA by using the PureLink RNA Mini Kit (Thermo Fisher Scientific, Carlsbad, CA, USA). The samples with RNA integrity number (RIN) value above seven were considered for further sequencing. RNA libraries were constructed by using the TruSeq RNA Sample Preparation Kit (Illumina, San Diego, CA, USA), and were then sequenced on the BGISEQ-500 platform (paired-end, 100-bp reads or 150-bp reads) (Table S5). The RNA reads were filtered by the Trimmomatic³⁴ with the parameters: ILLUMINACLIP:adapter.fa:2:30:20:8:true HEADCROP:5 LEADING:3 TRAILING:3 SLIDINGWINDOW:5:8 MINLEN:50.

Hi-C library construction and sequencing

The Hi-C libraries were constructed by utilizing the MboI restriction enzyme and following the in situ ligation protocols³⁵. The chromatin digested with MboI was labeled at the ends with biotin-14-dATP (Thermo Fisher Scientific, Waltham, MA, USA) and employed for in situ DNA ligation. Subsequently, the DNA was extracted, purified, and sheared using Covaris S2 (Covaris, Woburn, MA, USA). Following A-tailing, pull-down, and adapter ligation, the DNA libraries were subjected to sequencing on a BGISEQ-500 to generate 100-bp read pairs (Table S6). Hi-C data enabled the identification of 42, 20 and 18 chromosomes for O. pyramidale, M. ferrea and T. grandis, respectively, which was consistent with their reported chromosome numbers (Fig. 1a)¹⁷.

Evaluation of genome size

The obtained DNA sequencing reads from the 10X and stLFR libraries were filtered using SOAPnuke³⁶ with the parameters (-l 10 -q 0.1 -n 0.01 -Q 2 -d --misMatch 1 --matchRatio 0.4). Clean reads from paired-end libraries were used to estimate genome sizes (Table S2, Table 1). To conduct the k-mer frequency distribution analysis, the following formula was employed:

$$Gen=Num* \left(Len-17+1\right)/K\_Dep$$

Where, Num = read number of reads used, Len = the read length, K = k-mer length, and K_Dep = main peak’s location.

Genome assembly

We used Supernova (version 2.1.1)³⁷, a de novo assembly program designed to assemble diploid genomes using Linked-Reads (10X, stLFR libraries sequences) using the default parameters, and exported into fasta format using the ‘pseudohap2’ style. The GapCloser³⁸ was used to fill the gap using the parameters “-l 150” for each species except Tectona grandis. For Hi-fi reads, first, we use CCS (version 6.0.0) with the parameter –min-passes 3, then samtools (version 1.11)³⁹ to convert the bam file to the fastq file. The fastq was then used as the input file for hifiasm software⁴⁰ with the default parameters.

The Hi-C reads were quality controlled and aligned to each species’ genome assembly using Juicer with default settings⁴¹. Next by using the 3D-DNA pipeline (with default parameters)⁴², an initial assembly at the superscaffold-level was automatically generated. This corrected mis-joins, arranged the scaffolds in the proper order and orientation, and organized them from the initial draft assembly. Manual inspections and refinements of the draft assembly were performed using Juicebox Assembly Tool⁴³ to ensure accuracy. From the Hi-C interaction map in Fig. 2, it appears that different chromosomes interact repeatedly (Fig. S5). This phenomenon is quite common, and has been observed in other organisms as well, such as the cotton genome⁴⁴, Kiwifruit⁴⁵, and Phoebe tree⁴⁶. There are a number of possible explanations for this phenomenon. One possibility is that it is due to the presence of repetitive DNA sequences in the genome. These sequences are often found in tandem repeats, which means that they are repeated multiple times in close proximity to each other. This can lead to increased levels of interaction between different chromosomes, as the repetitive sequences can bind to each other. Another possibility is that the repeated interaction is due to the presence of functional elements that are shared between different chromosomes. For example, there are a number of genes that are involved in DNA repair and replication, and these genes are often found on multiple chromosomes. This means that the chromosomes that contain these genes may interact with each other more frequently, as they are likely to be involved in the same biological processes.

Repeat annotation

To characterize the repeat elements based on homology, the alignment of the genome assembly was performed using Repbase v.21.01⁴⁷ and RepeatMasker v.4.0.6⁴⁸. For the de novo approach, RepeatModeler v.1.0.8⁴⁹ and LTR Finder v.1.0.6⁵⁰ were employed to construct a de novo repeat library using the genome assembly. Subsequently, RepeatMasker v.4.0.6⁴⁸ was used to identify and annotate repeat elements in the genome. Tandem repeats in the genome were annotated using TRF v. 4.07⁵¹. Finally, the repetitive regions of the genome were masked prior to gene prediction.

Both M. ferrea and T. grandis displayed comparatively low repetitive elements (REs), which accounted for 33.93% (188.11 Mb) and 31.3% (95.80 Mb) of assemblies, respectively (Fig. 2d, Table S7). While O. pyramidale contains 1269.55 Mb sequences representing 68.96% of the assembled genome, particularly long terminal repeat (LTR) Gypsy and Copia types of retrotransposons which accounted for ~64% of total genome size.

Gene annotation analysis

The gene structures were predicted using the MAKER-P pipeline (version 2.31)⁵², which relied on RNA, homologous, and de novo prediction evidence (Table 1). To obtain the RNA evidence, the clean transcriptome reads were assembled into inchworms using Trinity (version 2.0.6)⁵³, and these sequences were subsequently supplied to MAKER-P as expressed sequence tags.

To perform homologous comparisons, we obtained protein sequences from either the model plant or closely related species for each of the species being analyzed. For de novo prediction, we created several training sets to optimize various ab initio gene predictors. Initially, we employed a genome-guided approach using Trinity⁵³ to generate a set of transcripts. These transcripts were then mapped back to the genome using PASA (version 2.0.2)⁵⁴, resulting in the generation of gene models that possessed realistic gene characteristics such as size, number of exons/introns per gene, and splicing site features. The complete gene models selected in Augustus⁵⁵ were used for training. Genemark-ES (version 4.21)⁵⁶ was self-trained using default parameters.

The initial phase of MAKER-P utilized the aforementioned evidence, employing default parameters except for “est2genome” and “protein2genome,” which were set to “1.” This resulted in the generation of gene models supported solely by RNA and protein data. Subsequently, SNAP⁵⁷ was trained using these gene models. The second and final rounds of MAKER-P were executed with default parameters, culminating in the production of the final gene models.

Functional annotation

Protein-coding genes were functionally annotated by aligning the predicted amino acid sequences with public databases, using sequence similarity and domain conservation as criteria (Table 1). To identify the best matches, the annotation process involved searching the protein-coding genes against protein sequence databases, including the Kyoto Encyclopedia of Genes and Genomes (KEGG)⁵⁸, the National Center for Biotechnology Information (NCBI) non-redundant (NR) and KOG databases⁵⁹, SwissProt⁶⁰ and TrEMBL using BLASTP with an E-value cut-off of 1e-5. Then, InterProScan 55.0 (InterProScan)⁶¹ was used to identify domains and motifs based on Pfam⁶², SMART⁶³, PANTHER⁶⁴, PRINTS⁶⁵ and ProDom⁶⁶ (Table S8).

Annotation of non-coding RNAs

Ribosomal RNA (rRNA) genes were searched against the A. thaliana rRNA database using BLAST. MicroRNAs (miRNA) and small nuclear RNA (snRNA) were searched against the Rfam database⁶². tRNAscan-SE was also used to scan for tRNAs⁶⁷ (Table S9).

Identification of gene families and phylogenetic analysis

The protein-coding genes of 22 representative species were selected for gene families analysis using OrthoFinder (version 2.3.14)⁶⁸ with default parameters (Fig. 1b, Table S10). However, we only obtained 5 single-copy orthogroups, which was not enough to construct a phylogenetic tree. Therefore, we added some low-copy orthogroups (which contain more than one gene per species) to improve the robustness of our tree. Totally, 364 low-copy orthogroups were used for phylogenetic tree construction. The protein sequences from the 364 low-copy orthogroups were extracted and aligned by using MAFFT (version 7.310)⁶⁹. We used a Perl script to trim the gaps of the aligned sequences, and then concatenated them into a supergene sequence. The phylogenetic tree was subsequently constructed by IQ-TREE (version 1.6.1)⁷⁰ with “-bb 2000 -alrt 1000”. Tectona grandis genomes were compared with the previously published genome using nucmer (version 4.0.0rc1) (Fig. S6). The results showed that stLFR was able to generate high-quality sequence data that was comparable to PacBio sequencing (Table S11). By offering a more cost-effective and computationally efficient alternative to PacBio sequencing, stLFR could make genomic data more accessible to a wider range of researchers.

Next, we identified that most gene families of O. pyramidale, M. ferrea and T. grandis were commonly shared including other representative trees, but 2846, 2546 and 1246 gene families appeared to be unique in genomes of O. pyramidale, M. ferrea and T. grandis, respectively (Fig. 1c). The homolog matrix of orthogroups was analyzed to infer ancestral and lineage-specific gene family dynamics along the phylogenetic tree, resulting in the expansion of 3921, 2737 and 715 gene families, while 1217, 1434 and 2202 were contracted in genomes of O. pyramidale, M. ferrea and T. grandis, respectively. The number of expanded gene families in O. pyramidale and M. ferrea was remarkably higher than T. grandis (Fig. 1b).

Whole-genome duplication (WGD) and single-gene duplication

The DupGen_finder (version 5.16.3)⁷¹ was used to identify different modes of duplicated gene pairs, such as WGD, tandem duplication, proximal duplication, transposed duplication, dispersed duplication, and the number of gene pairs are shown in Table S12. The coding sequences were used for estimating a synonymous substitution rate (Ks) using the wgd software (version 1.0)⁷². Then, we constructed the Ks distribution plot using the Ks values of the WGD gene pairs.

Whole genome duplication (WGD) is considered to be the main factor driving genome evolution and expansion^73,74,75. The distribution of Ks for the orthologs of O. pyramidale and Durio zibethinu showed peak at ~0.19 (Fig. 3a). While their paralog peaks at 0.15 and 0.17 for O. pyramidale and D. zibethinu suggested that they experienced one lineage-specific WGD after the divergence. T. grandis and Fraxinus pennsylvanica⁷⁶ displayed individual WGD events after their divergence (Fig. 3a).

Notably, we observed a large number of paralogous genes retained in O. pyramidale genome after its specific WGD event compared to other selected species (Fig. 3b,c). KEGG enrichment analysis indicated the WGD duplicated gene pairs in O. pyramidale retained several functions such as plant hormone signal transduction, MAPK signaling pathway, and circadian rhythm (Fig. 3c, Table S13). Next, we investigated the WGD duplication-induced expansion in gene families by combining and overlapping WGD duplicated gene pairs and expanded gene families (EGFs). We identified a total of 42,855, 32,804 and 208,22 duplicated genes from O. pyramidale, M. ferrea and T. grandis genomes, respectively, which were classified into five categories, that is, the WGD duplicates, tandem duplicates (TD), transposed duplicates (TRD), proximal duplicates (PD) and dispersed duplicates (DD) (Table S14). Interestingly, WGD events contributed to the highest proportion of expansion of the gene families compared with other duplication types in O. pyramidale (89.74%) (Fig. 3b). However, O. pyramidale contains abundant TEs, a relatively low proportion of expanded gene families caused by TRD compared to M. ferrea. For M. ferrea, a higher proportion of expanded gene families was induced by TRD and TD but not WGD (Fig. 3a).

KEGG enrichment of expanded gene families induced by WGD indicated that genes related to Plant hormone signal transduction, Photosynthesis, MAPK signaling pathway, Nitrogen metabolism, alpha−Linolenic acid metabolism, Biosynthesis of unsaturated fatty acids, Fatty acid metabolism etc., might be mainly caused by WGD events in O. pyramidale (Tables S13, S4, Fig. 3d).

Identification of transcription factors (TFs), copy number normalization and gene expression analyses

The protein sequences were downloaded from the Plant Transcription Factor Database (PlantTFDB)⁷⁷, and aligned by MAFFT (version 7.310)⁶⁹, later used for HMMs (Hidden Markov Models) construction for each type of TFs by using Hmmer (version3.1b2)⁷⁸.

The protein sequences of the representative species were searched against the HMMs of different type TFs by HMMsearch (version 3.1b2)⁷⁸, and filtered with an e-value cutoff of 1e-5. Subsequently, the protein sequences of ARF, HD-Zip, WOX, MYB, and NAC gene families of each species were aligned by MAFFT⁶⁹, and the gene family trees were constructed by IQ-TREE (version 1.6.1)⁷⁰ with the parameters “-bb 1000”. Instead of directly comparing gene copy numbers among selected species, we normalized the data based on the genome size, and then performed the comparison, avoiding any sort of data bias⁷⁹.

Plant rapid growth needs extensive remodeling and modifications of cell wall to allow the cell wall to be flexible^80,81,82; we thus investigated the enzymes of carbohydrate metabolism, including the glycoside hydrolases (GH) and glycosyltransferases (GT)⁸⁰. We selected the protein sequences which are involved in the synthesis and regulation of the cell wall components, auxins, ethylene and salicylic acid in Arabidopsis as query, and blast with the protein sequences of representative species with an e-value cutoff of 1e-5, and then checked the gene function by SwissProt annotation. The two Malvaceae genomes contained 658/664 GH and 748/818 GT genes in O. pyramidale and B. ceiba, respectively, which is remarkably higher than other selected species (Fig. 4a, Table S15). Next, we investigated GTs related to the cell wall assembly. The copy number of the genes involved in cellulose synthesis of the primary cell wall in O. pyramidale was higher than M. ferrea. Also, the copy numbers of genes involved in hemicellulose synthesis of cell wall in O. pyramidale were higher than M. ferrea and other tree species (Fig. 4b, Table S16). Additionally, the copy number of GAUTs (galacturonosyltransferases) were more than two-fold higher than other selected species. GAUTs are mainly involved in pectin and/or xylan biosynthesis in cell walls, thereby affecting the overall growth of the tree and wood plasticity⁸³. Furthermore, our phylogenetic analysis using the IQ tree grouped GAUTs into eleven sub-groups, and GAUT11/8/15 and 5,6 particularly showed higher expansion in O. pyramidale compared to other species (Fig. 4c, Table S17).

Lignin is one of the main components of trees or plants, constituting around 30% of the dry mass of wood, and lignin gives trees their rigidity^12,84,85. To explore the likely reason behind the heaviness (high wood density) of M. ferrea, we compared the genes involved in the lignin biosynthesis in O. pyramidale and other tree species. We found Caffeic acid O-methyltransferases (COMT) and Cinnamoyl CoA Reductase (CCR) exhibited extensive expansion in M. ferrea (Fig. 4d, Table S18).

Data Records

The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive⁸⁶ in the National Genomics Data Center⁸⁷, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA011294)⁸⁸ that are publicly accessible. The assembled contigs or scaffolds genomes are submitted to the Genome Warehouse under the accession number GWHDOCN00000000⁸⁹, GWHDOCP00000000⁹⁰ and GWHDOCQ00000000⁹¹ of O. pyramidale, M. ferrea and T. grandis. The assembled pseudo-chromosome genomes were submitted to the Genome Warehouse under the accession number GWHDOCO00000000⁹², GWHDOCR00000000⁹³ and GWHDOCS00000000⁹⁴ of O. pyramidale, M. ferrea and T. grandis. The data sets generated and analyzed during the current study are also available in the CNGB Nucleotide Sequence Archive (CNSA) under accession numbers CNP0001860 and CNP0001861.

The annotation files are available in Figshare⁹⁵. All other data generated or analyzed during this study are included in this article and its supplementary information files. WGS data for Tectona grandis⁹⁶, and Mesua ferrea⁹⁷ from PRJNA438407 were obtained from the respective accession numbers for the genome size estimation only.

Technical Validation

Completeness assessment was performed using BUSCO (Bench-marking Universal Single-Copy Orthologs) version 3.0.1⁹⁸ with Embryophyta odb9 database. From the 1,375 core Embryophyta genes, 1,338 (97.3%), 1,329 (96.6%), 1,340 (97.5%), 1,323 (96.2%), 1,340 (97.5%), 1,335 (97.0%) and 1,335 (97.1%) were identified in the M. ferrea, D. sissoo, K. senegalensis, S. macrophylla, T. grandis and O. pyramidale, respectively (Supplemental Tables 19, 20). To further evaluate the completeness of the assembled genome, we performed short reads mapping using clean raw data. In total, 95.23%, 98.54%, 99.04%, 97.68%, 97.43%, and 98.05% reads were mapped to the genomes, among which 83.56%, 87.63%, 94.39%, 82.72%, 90.55%, and 90.61% of them were properly paired to M. ferrea, D. sissoo, K. senegalensis, S. macrophylla, T. grandis and O. pyramidale, respectively. The transcriptome sequences were assembled by using Bridger tool⁹⁹, then mapping to the scaffold assembly was performed by using the BLAT software¹⁰⁰ (Table S5). The BUSCO analysis was again performed after the Hi-C assembly which gave similar results as that of 10X and stLFR genome assemblies (Supplemental Tables 21, 22).

Code availability

The codes and pipelines used in data processing were all executed according to the manual and protocols of the corresponding bioinformatics software. The specific versions of software have been described in Methods. However, the following perl script was used to trim the gaps of the aligned sequences.

#! usr/bin/perl -w

use strict;

sub usage{

print STDERR «USAGE;

usage: $0 <phy.file> <cut off> USAGE

USAGE

exit;

}

my $file = shift;

open IN,”$file” or die $!;

open OUT,” >./$file.trim.phy” or die $!;

my ($num_species,$len);

my (@name,@seq,@gap_ratio,@match_ratio);

my %new_seq;

my @u_site;

my $i = 0;

my $resultfile = “./$file.trim.phy”;

while (<IN>){

chomp;

if (/^\s+(\d+)\s+(\d+)/){

$num_species = $1;

$len = $2;

next;

}

my @temp = split;

$name[$i] = $temp[0];

@{$seq[$i]} = (split //,$temp[1]);

$i++;

}

print “$num_species\n”;

close IN;

for (my $j = 0;$j <$len;$j++){

my ($gap,$match) = (0,0);

for (my $i = 0;$i < $num_species;$i++){

if ($seq[$i][$j] eq “-“){

$gap++;

}else {

$match++;

}

$gap_ratio[$j] = $gap/$num_species*100;

$match_ratio[$j] = $match/$num_species*100;

}

for (my $i = 0;$i < $num_species;$i++){

for (my $j = 0;$j < $len;$j++){

if ($gap_ratio[$j] > = $ARGV[0]){

next;

}else {

$new_seq{$i}. = $seq[$i][$j];

}

print OUT “$name[$i]\t$new_seq{$i}\n”;

}

my $len_temp = length($new_seq{0});

print “$len_temp\n”;

‘sed -i ‘1i $num_species $len_temp’ $resultfile‘;

References

Abreu, I. N. et al. A metabolite roadmap of the wood-forming tissue in Populus tremula. New Phytol 228, 1559–1572 (2020).
CAS PubMed Google Scholar
Rodriguez‐Zaccaro, F. D. & Groover, A. Wood and water: How trees modify wood development to cope with drought. Plants, People, Planet 1, 346–355 (2019).
Google Scholar
Camargo, E. L. O., Ployet, R., Cassan-Wang, H., Mounet, F. & Grima-Pettenati, J. in Molecular Physiology and Biotechnology of Trees Advances in Botanical Research 201–233 (2019).
Tuskan, G. A. et al. Hardwood Tree Genomics: Unlocking Woody Plant Biology. Frontiers in Plant Science 9 (2018).
Wang, J. P. et al. Improving wood properties for wood utilization through multi-omics integration in lignin biosynthesis. Nat Commun 9, 1579 (2018).
ADS PubMed PubMed Central Google Scholar
Johnsson, C. et al. The plant hormone auxin directs timing of xylem development by inhibition of secondary cell wall deposition through repression of secondary wall NAC‐domain transcription factors. Physiologia plantarum 165, 673–689 (2019).
CAS PubMed Google Scholar
Tarelkina, T. V. et al. Expression Analysis of Key Auxin Biosynthesis, Transport, and Metabolism Genes of Betula pendula with Special Emphasis on Figured Wood Formation in Karelian Birch. Plants 9, 1406 (2020).
CAS PubMed PubMed Central Google Scholar
Zheng, S. et al. Two MADS-box genes regulate vascular cambium activity and secondary growth via modulating auxin homeostasis in Populus. Plant Communications, 4 (2020).
Ye, Z.-H. & Zhong, R. Molecular control of wood formation in trees. Journal of Experimental Botany 66, 4119–4131 (2015).
CAS PubMed Google Scholar
Zinkgraf, M. et al. Evolutionary network genomics of wood formation in a phylogenetic survey of angiosperm forest trees. New Phytologist 228, 1811–1823 (2020).
CAS PubMed Google Scholar
Cao, P. B. et al. Wood Architecture and Composition Are Deeply Remodeled in Frost Sensitive Eucalyptus Overexpressing CBF/DREB1 Transcription Factors. Int J Mol Sci 21 (2020).
Chanoca, A., de Vries, L. & Boerjan, W. Lignin Engineering in Forest Trees. Front Plant Sci 10, 912 (2019).
PubMed PubMed Central Google Scholar
Neale, D. B. & Kremer, A. Forest tree genomics: growing resources and applications. Nat Rev Genet 12, 111–122 (2011).
CAS PubMed Google Scholar
Fan, Y. et al. Dissecting the genome of star fruit (Averrhoa carambola L.). Horticulture research 7, 1–10 (2020).
Google Scholar
Sahu, S. K. et al. Draft Genomes of two Artocarpus plants, Jackfruit (A. heterophyllus) and Breadfruit (A. altilis). Genes 11, 27 (2020).
CAS Google Scholar
Liu, H. et al. Molecular digitization of a botanical garden: high-depth whole-genome sequencing of 689 vascular plant species from the Ruili Botanical Garden. Gigascience 8, 1–9 (2019).
ADS Google Scholar
Zhao, D. et al. A chromosomal-scale genome assembly of Tectona grandis reveals the importance of tandem gene duplication and enables discovery of genes in natural product biosynthetic pathways. Gigascience 8 (2019).
Zhao, H. et al. Chromosome-level reference genome and alternative splicing atlas of moso bamboo (Phyllostachys edulis). Gigascience 7 (2018).
Fan, Y. et al. The Clausena lansium (Wampee) genome reveal new insights into the carbazole alkaloids biosynthesis pathway. Genomics 113, 3696–3704 (2021).
CAS PubMed Google Scholar
Sahu, S. K. & Liu, H. Long-read sequencing (method of the year 2022): the way forward for plant omics research. Molecular Plant 16, 791–793 (2023).
CAS PubMed Google Scholar
Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006).
ADS CAS PubMed Google Scholar
Myburg, A. A. et al. The genome of Eucalyptus grandis. Nature 510, 356–362 (2014).
ADS CAS PubMed Google Scholar
Nystedt, B. et al. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584 (2013).
ADS CAS PubMed Google Scholar
Peng, X. et al. A Chromosome-Scale Genome Assembly of Paper Mulberry (Broussonetia papyrifera) Provides New Insights into Its Forage and Papermaking Usage. Mol Plant 12, 661–677 (2019).
CAS PubMed Google Scholar
He, N. et al. Draft genome sequence of the mulberry tree Morus notabilis. Nature communications 4, 1–9 (2013).
Google Scholar
Hong, Z. et al. The chromosome-level draft genome of Dalbergia odorifera. Gigascience 9 (2020).
Wang, S. et al. The chromosome‐scale genomes of Dipterocarpus turbinatus and Hopea hainanensis (Dipterocarpaceae) provide insights into fragrant oleoresin biosynthesis and hard wood formation. Plant biotechnology journal (2021).
Palanisamy, K., Hegde, M. & Yi, J.-S. Teak (Tectona grandis Linn. f.): A Renowned Commercial Timber Species. Journal of Forest and Environmental Science 25 (2009).
Vyas, P., Yadav, D. K. & Khandelwal, P. Tectona grandis (teak) – A review on its phytochemical and therapeutic potential. Natural Product Research 33, 2338–2354 (2019).
CAS PubMed Google Scholar
Sahu, S. K., Thangaraj, M. & Kathiresan, K. DNA Extraction Protocol for Plants with High Levels of Secondary Metabolites and Polysaccharides without Using Liquid Nitrogen and Phenol. ISRN molecular biology 2012, 205049 (2012).
PubMed PubMed Central Google Scholar
Huang, J. et al. BGISEQ-500 WGS library construction. protocols. io, 1–10 (2018).
Wang, O. et al. Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome research 29, 798–808 (2019).
CAS PubMed PubMed Central Google Scholar
Song, D., Shen, J. & Li, L. Characterization of cellulose synthase complexes in Populus xylem differentiation. 187, 777–790 (2010).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
CAS PubMed PubMed Central Google Scholar
Belaghzal, H., Dekker, J. & Gibcus, J. H. Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods 123, 56–65 (2017).
CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, gix120 (2018).
Marks, P. et al. Resolving the full spectrum of human genome variation using Linked-Reads. Genome research 29, 635–645 (2019).
CAS PubMed PubMed Central Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 2047-2217X–2041-2018 (2012).
Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell systems 3, 95–98 (2016).
CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
ADS CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. Biorxiv, 254797 (2018).
Peng, R. et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proceedings of the National Academy of Sciences 119, e2208496119 (2022).
CAS Google Scholar
Xia, H. et al. Chromosome-scale genome assembly of a natural diploid kiwifruit (Actinidia chinensis var. deliciosa). Scientific Data 10, 92 (2023).
CAS PubMed PubMed Central Google Scholar
Han, X. et al. The chromosome-scale genome of Phoebe bournei reveals contrasting fates of terpene synthase (TPS)-a and TPS-b subfamilies. Plant Communications 3, 100410 (2022).
CAS PubMed PubMed Central Google Scholar
Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends in genetics 16, 418–420 (2000).
CAS PubMed Google Scholar
Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics 25, 4.10. 11–14.10. 14 (2009).
Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457 (2020).
ADS CAS Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–W268 (2007).
PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research 27, 573–580 (1999).
CAS PubMed PubMed Central Google Scholar
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER‐P. Current protocols in bioinformatics 48, 4.11. 11–14.11. 39 (2014).
Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols 8, 1494–1512 (2013).
CAS PubMed Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9, R7 (2008).
PubMed PubMed Central Google Scholar
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC bioinformatics 7, 62 (2006).
PubMed PubMed Central Google Scholar
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic acids research 33, 6494–6506 (2005).
CAS PubMed PubMed Central Google Scholar
Korf, I. Gene finding in novel genomes. BMC bioinformatics 5, 59 (2004).
PubMed PubMed Central Google Scholar
Aoki, K. F. & Kanehisa, M. Using the KEGG database resource. Current protocols in bioinformatics 11, 1.12.11–11.12.54 (2005).
Google Scholar
Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997).
ADS CAS PubMed Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research 31, 365–370 (2003).
CAS PubMed PubMed Central Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
CAS PubMed PubMed Central Google Scholar
Bateman, A. et al. The Pfam protein families database. Nucleic acids research 32, D138–D141 (2004).
CAS PubMed PubMed Central Google Scholar
Letunic, I., Doerks, T. & Bork, P. SMART 6: recent updates and new developments. Nucleic acids research 37, D229–D232 (2009).
CAS PubMed Google Scholar
Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nature protocols 8, 1551–1566 (2013).
PubMed PubMed Central Google Scholar
Attwood, T. K. et al. PRINTS and its automatic supplement, prePRINTS. Nucleic acids research 31, 400–402 (2003).
CAS PubMed PubMed Central Google Scholar
Corpet, F., Servant, F., Gouzy, J. & Kahn, D. ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic acids research 28, 267–269 (2000).
CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic acids research 44, W54–W57 (2016).
CAS PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome biology 16, 157 (2015).
PubMed PubMed Central Google Scholar
Katoh, K., Kuma, K.-i, Toh, H. & Miyata, T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic acids research 33, 511–518 (2005).
CAS PubMed PubMed Central Google Scholar
Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution 37, 1530–1534 (2020).
CAS PubMed PubMed Central Google Scholar
Qiao, X. et al. Gene duplication and evolution in recurring polyploidization–diploidization cycles in plants. Genome biology 20, 1–23 (2019).
Google Scholar
Zwaenepoel, A. & de Peer, V. Y. wgd—simple command line tools for the analysis of ancient whole-genome duplications. Bioinformatics 35, 2153–2155 (2019).
CAS PubMed Google Scholar
Soltis, P. S. & Soltis, D. E. Plant genomes: Markers of evolutionary history and drivers of evolutionary change. Plants, People, Planet (2020).
Liu, Y. et al. The Cycas genome and the early evolution of seed plants. Nat Plants 8, 389–401 (2022).
CAS PubMed PubMed Central Google Scholar
Liu, P.-L. et al. The Tetracentron genome provides insight into the early evolution of eudicots and the formation of vessel elements. Genome Biology 21 (2020).
Huff, M. et al. A high‐quality reference genome for Fraxinus pennsylvanica for ash species restoration and research. Molecular ecology resources 22, 1284–1302 (2022).
CAS PubMed Google Scholar
Jin, J. et al. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic acids research, gkw982 (2016).
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic acids research 39, W29–W37 (2011).
CAS PubMed PubMed Central Google Scholar
Lin, S. et al. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis. Science 350, 691–694 (2015).
ADS CAS PubMed Google Scholar
Reiter, W.-D. Biosynthesis and properties of the plant cell wall. Current opinion in plant biology 5, 536–542 (2002).
CAS PubMed Google Scholar
Xi, W., Song, D., Sun, J., Shen, J. & Li, L. Formation of wood secondary cell wall may involve two type cellulose synthase complexes in Populus. Plant Mol Biol 93, 419–429 (2017).
CAS PubMed Google Scholar
Zhong, R., Lee, C., Haghighat, M. & Ye, Z.-H. Xylem vessel-specific SND5 and its homologs regulate secondary wall biosynthesis through activating secondary wall NAC binding elements. New Phytologist 231, 1496–1509 (2021).
CAS PubMed Google Scholar
Biswal, A. K. et al. Downregulation of GAUT12 in Populus deltoides by RNA silencing results in reduced recalcitrance, increased growth and reduced xylan and pectin in a woody biofuel feedstock. Biotechnology for biofuels 8, 1–26 (2015).
CAS Google Scholar
Ohtani, M. & Demura, T. The quest for transcriptional hubs of lignin biosynthesis: Beyond the NAC-MYB-gene regulatory network model. Current opinion in biotechnology 56, 82–87 (2019).
CAS PubMed Google Scholar
Tobimatsu, Y. & Schuetz, M. Lignin polymerization: how do plants manage the chemistry so well? Current Opinion in Biotechnology 56, 75–81 (2019).
CAS PubMed Google Scholar
Chen, T. et al. The genome sequence archive family: toward explosive data growth and diverse data types. Genomics, Proteomics & Bioinformatics 19, 578–583 (2021).
Google Scholar
Database resources of the national genomics data center, china national center for bioinformation in 2022. Nucleic Acids Research 50, D27-D38 (2022).
NGDC Genome Sequence Archive, https://bigd.big.ac.cn/gsa/browse/CRA011294 (2023).
NGDC Genome Warehouse, https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHDOCN00000000 (2023).
NGDC Genome WareHouse, https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHDOCP00000000 (2023).
NGDC Genome WareHouse, https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHDOCQ00000000 (2023).
NGDC Genome WareHouse, https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHDOCO00000000 (2023).
NGDC Genome WareHouse, https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHDOCR00000000 (2023).
NGDC Genome WareHouse, https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=GWHDOCS00000000 (2023).
Liu, M. Genome annotation files of O. pyramidale, M. ferrea and T. grandis., Figshare, https://doi.org/10.6084/m9.figshare.22344934.v1 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR7121481 (2018).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR7121482 (2018).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
PubMed Google Scholar
Chang, Z. et al. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome biology 16, 30 (2015).
PubMed PubMed Central Google Scholar
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome research 12, 656–664 (2002).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by Major Science and Technology Projects of Yunnan Province (Digitalization, development and application of biotic resource, 202002AA100007), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB27020104). This work is part of the 10KP project (https://db.cngb.org/10kp/)⁹⁶ and is also supported by China National GeneBank (CNGB; https://www.cngb.org/).

Author information

These authors contributed equally: Sunil Kumar Sahu, Min Liu, Yewen Chen.

Authors and Affiliations

State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
Sunil Kumar Sahu, Min Liu, Yewen Chen, Dongming Fang, Xiaoli Chen, Ting Yang, Durgesh Nandini Sahu, Linzhou Li, Hongli Wang, Weixue Mu, Jinpu Wei, Xin Liu, Xun Xu, Sibo Wang & Huan Liu
BGI Life Science Joint Research Center, Northeast Forestry University, Harbin, 150400, China
Min Liu & Huan Liu
State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, 311300, Hangzhou, China
Jinshan Gui
Southwest Forestry University, Kunming, Yunnan, 650224, China
Chengzhong He
BGI Research, Kunming, Yunnan, 650106, China
Le Cheng & Jinlong Yang
College of Forensic Science, Xi’an Jiaotong University, Xi’an, China
Jinlong Yang
Forestry Bureau of Ruili, Yunnan Dehong, Ruili, 678600, China
Jie Liu
Chinese Academy of Forestry, Beijing, China
Yuxian Zhao
Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen, Chinese Academy of Sciences, Shenzhen, 518004, China
Shouzhou Zhang
Department of Biology, University of Copenhagen, Copenhagen, Denmark
Michael Lisby
Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen, 518083, China
Xun Xu
National Key Laboratory of Plant Molecular Genetics and CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, 200032, China
Laigeng Li

Authors

Sunil Kumar Sahu
View author publications
You can also search for this author in PubMed Google Scholar
Min Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yewen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jinshan Gui
View author publications
You can also search for this author in PubMed Google Scholar
Dongming Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ting Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chengzhong He
View author publications
You can also search for this author in PubMed Google Scholar
Le Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Durgesh Nandini Sahu
View author publications
You can also search for this author in PubMed Google Scholar
Linzhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weixue Mu
View author publications
You can also search for this author in PubMed Google Scholar
Jinpu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuxian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shouzhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Michael Lisby
View author publications
You can also search for this author in PubMed Google Scholar
Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Laigeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Sibo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.L., S.K.S. and L.L. led and designed this project. H.L., S.K.S. and M.L. conceived the study. S.K.S., J.G., W.M., J.W., S.Z. and J.L. collected the leaf and tissue samples. S.K.S., M.L. and Y.C. contributed to the sample preparation and performed the genome and chromosome-scale assembly. S.K.S., M.L., S.W., Y.C., D.F., X.C., T.Y., D.N.S., W.H., L.L. and S.W. performed annotation and comparative genomic analyses. S.K.S. and M.L. wrote the original draft manuscript. S.W., M.L., S.Z., X.X., S.W., J.G., C.H., D.N.S., L.C., J.Y., Y.Z., X.L., L.L., and H.L., revised and edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Laigeng Li, Sibo Wang or Huan Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table and Figure captions

Supplementary Figures

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sahu, S.K., Liu, M., Chen, Y. et al. Chromosome-scale genomes of commercial timber trees (Ochroma pyramidale, Mesua ferrea, and Tectona grandis). Sci Data 10, 512 (2023). https://doi.org/10.1038/s41597-023-02420-8

Download citation

Received: 29 March 2023
Accepted: 26 July 2023
Published: 03 August 2023
DOI: https://doi.org/10.1038/s41597-023-02420-8

This article is cited by

Chromosome-scale genomes of commercially important mahoganies, Swietenia macrophylla and Khaya senegalensis
- Sunil Kumar Sahu
- Min Liu
- Chengzhong He
Scientific Data (2023)