Abstract
Reaumuria soongarica is a xerophytic shrub belonging to the Tamaricaceae family. The species is widely distributed in the deserts of Central Asia and is characterized by its remarkable adaptability to saline and barren desert environments. Using PacBio long-read sequencing and Hi-C technologies, we assembled a chromosome-level genome of R. soongarica. The genome assembly has a size of 1.28 Gb with a scaffold N50 of 116.15 Mb, and approximately 1.25 Gb sequences were anchored in 11 pseudo-chromosomes. A completeness assessment of the assembled genome revealed a BUSCO score of 97.5% and an LTR Assembly Index of 12.37. R. soongarica genome had approximately 60.07% repeat sequences. In total, 21,791 protein-coding genes were predicted, of which 95.64% were functionally annotated. This high-quality genome will serve as a foundation for studying the genomic evolution and adaptive mechanisms to arid-saline environments in R. soongarica, facilitating the exploration and utilization of its unique genetic resources.
Similar content being viewed by others
Background & Summary
The drought induced by global warming is intensifying, significantly impacting plant survival and reproduction. This occurrence has led to a cascade of ecological and productivity challenges1,2. In arid regions, the coexistence of soil salinity exacerbates the predicament, further complicating agricultural and livelihood practices reliant on plant-based production2. In response to stressors like drought and salt, plants have developed sophisticated adaptive strategies at molecular, physiological, and morphological levels. This involves the coordinated interaction among the genome, transcriptome, proteome, and metabolome3. Exploring the molecular mechanisms underlying these adaptive responses to adverse conditions has been a longstanding focus of academic inquiry.
So far, significant progress has been achieved in the study of plant stress adaptation4. Under these investigations, numerous stress response factors have been identified, and a relatively comprehensive theoretical framework has been integrated. For instance, in drought adaptation, the most representative discoveries include abscisic acid (ABA)-dependent and ABA-independent pathways, DREB2A, and ubiquitination-related mechanisms5. In the ABA-dependent pathway, key regulatory genes such as NFYA5, OCP3, PLDa1, SAL1, and MYB96 have been identified for their crucial roles in stomatal regulation, osmotic substance modulation, and lateral root growth under drought stress. Research on salty environment adaptation has highlighted pathways such as the salt overly sensitive (SOS) pathway involved in reconstructing ion homeostasis, reactive oxygen species (ROS) scavenging pathways, and physiologically drought-responsive pathways caused by ion and osmotic substance imbalances6,7. Studies on ion transport and homeostasis regulation have revealed that plants primarily utilize Na+/H+ antiport proteins on the plasma membrane and vacuolar membrane (SOS1, NHX1) to extrude or compartmentalize the influx of Na+ into cells, achieve substantial K+ uptake through K+ channels (e.g., AKT), and reconstruct proton motive force via plasma membrane and vacuolar membrane H+-ATPase8.
Reports indicate that species thriving in relatively harsh environments have developed unique adaptive strategies through prolonged natural selection and inherent adaptation9. It has been proposed that research directions for plant stress adaptation in the “omics” era. On one hand, through multi-omics integrated analysis, further post-genomic research on model plants is being conducted to comprehensively unveil the molecular mechanisms of their stress responses. On the other hand, research is expanding to encompass widely occurring non-model species in nature to obtain additional information on plant stress adaptation, supplementing and refining existing theories10,11.
Reaumuria soongarica (Fig. 1 a, b, and c) is a shrub belonging to the family Tamaricaceae12. This plant exhibits strong resistance to drought, cold, salinity-alkalinity, and barrenness, making it a crucial component of desert ecosystems. The R. soongarica community represents the most widespread and extensive zonal community type in arid regions such as Central Asian dunes and deserts13. It plays a vital role as an ecological barrier in sustaining and restoring fragile desert ecosystems in Northwest China14. R. soongarica also serves as a significant forage shrub in desert areas, providing the main source of food for camels throughout the year and for sheep during the winter and spring seasons. Due to its high salt content, livestock can obtain sufficient salt intake from it, stimulating their appetite and promoting weight gain. Additionally, the tender branches and leaves of R. soongarica can be used for treating eczema and dermatitis, exhibiting febrifuge and diaphoretic effects.
Here, we assembled a high-quality chromosome-level genome of R. soongarica using PacBio HiFi and Hi-C data. The genome has a length of 1.28 Gb, a contig N50 of 116.15 Mb, and a complete BUSCO score of 97.5%. A total of 1.25 Gb of sequences were anchored onto 11 pseudo-chromosomes. Of the genome, 21,791 protein-coding genes, and 60.07% (769.66 Mb) repetitive sequences were identified. This high-quality genome will facilitate the study of adaptive evolution mechanisms in R. soongarica, laying the foundation for exploring its unique stress-resistant genetic resources and related molecular mechanisms.
Methods
Sample collection
In August 2022, sample collection was conducted in Yihewusu Town (E107°26′02′, N40°11′02′), Ordos City, Inner Mongolia, China. Approximately 20 g of tender leaves were collected from a single plant for genomic sequencing. Following sample collection, the samples were rapidly frozen in liquid nitrogen and transported back to the laboratory for storage in a −80 °C freezer. In the same year, during October, seeds were collected from the same plant in the same sampling site.
Karyotype analysis
R. soongarica seeds were germinated at room temperature. When the roots reached a length of 1.5–2 cm, root tips were treated with nitrous oxide for 2.5 hours. Subsequently, they were immersed in acetic acid for 5 minutes and stored in 75% ethanol. During chromosome preparation, ethanol was removed by rinsing with deionized water. The root apical meristem tissues were dispersed using a mixture of cellulase and pectinase (2:1 ratio). After a 45-minute incubation at 37 °C, the mixture was washed away with deionized water. Once the meristematic tissues were completely air-dried, a 20 µL acetic acid suspension solution was added. After drying the slides, they were examined using an Olympus CX23 microscope (Olympus Corporation, Tokyo, Japan).
For karyotype analysis, well-dispersed intermediate chromosomes were selected. The terminal 21-bp repeat sequence (AG3T3)3 5′-AGGGTTTAGGGTTTAGGGTTT-3′ was used as a probe15. This oligo-probe, synthesized by Sangon Biotech Co., Ltd. (Shanghai, China), was simultaneously tested in a single round of Fluorescence In Situ Hybridization (FISH). The hybridization solution, totaling 10 µL, consisted of 1.5 µL of each probe, 8.5 µL of a mixture of 2 × SSC, and 1 × TE, which was dropped onto the chromosomes on a cover glass (24 cm × 50 cm). The slides were then incubated at 37 °C for 2 hours. Using an Olympus BX63 fluorescence microscope combined with a Photometric SenSys Olympus DP70 CCD camera (Olympus Corporation, Tokyo, Japan), the slides were recorded and analyzed. The results revealed a total of 22 detected chromosomes, indicating a diploid plant with a karyotype of 2n = 2x = 22 (Fig. 1d).
Flow cytometry-based genome size estimation
A quantity of 0.5 g of tender leaves from R. soongarica was placed in a culture dish, chopped, and disrupted. Subsequently, 1600 µL of PI solution (staining buffer + PI + RNase storage solution) was added. The mixture was incubated in the dark for 45 minutes and then analyzed using the Sysmex CyFlow® Cube6. The reference species used was Setaria viridis, with a genome size of 0.51 Gb16. The results estimated the genome size of R. soongarica to be approximately 1.26 Gb (Fig. 1e).
Nucleic acid extraction and quality assessment
For the determination of R. soongarica genome size through k-mer analysis and genomic sequencing, genomic DNA was extracted from the leaves using a modified CTAB method17. For transcriptome sequencing, R. soongarica seeds were sterilized with a 10% sodium hypochlorite solution, rinsed several times with sterile water, and then sown in seedling trays filled with sterile nutrient soil. The cultivation conditions were maintained at 26 °C during the day and 16 °C at night, under a 16-hour light/8-hour dark cycle. The seedlings were watered with 1/2 Hoagland nutrient solution every 3 days. After 3 weeks of cultivation, a healthy plant with intact leaves, stems, and roots was harvested for full-length transcriptome sequencing. Subsequently, seedlings with similar growth were chosen for Na2SO4 treatment, with three concentration gradients: 0 mmol/L (CK), 200 mmol/L (S200), and 400 mmol/L (S400). After the treatment, tender leaves were collected. Each treatment had three biological replicates, for paired-end transcriptome sequencing. The total RNA of the above-collected samples was extracted using TRIzol® reagent18 (Invitrogen, Carlsbad, CA, USA). The isolated DNA and RNA samples underwent quality assessment using NanoDrop-2000 (Thermo Fisher Scientific, Wilmington, DE, USA) and Qubit v3.0 fluorometer (Life Technologies).
K-mer based genome size assessment
The genome of R. soongarica was sequenced using the DNBseq platform. The samples were randomly fragmented using the Covaris ultrasonic high-performance sample processing system, resulting in fragments of approximately 350 bp. Subsequently, DNA fragment end repair was performed, and after passing quality control, the samples were sequenced using high-throughput sequencing. For each qualified library, the raw image data obtained from sequencing were converted into raw sequence data (raw reads) in FASTQ19 file format. The SOAPnuke v2.1.020 software was employed for data filtering with the following parameters: (-n 0.02 -l 20 -q 0.4 -i -G 2–polyX 50 -Q 2–seqType 0). A total of 135.09 Gb of clean reads were obtained. The frequency of 21-bp K-mers was calculated using Jellyfish v2.2.621 with default parameters. GenomeScope v2.022 was then utilized to estimate the genome size, heterozygosity, repeat content, and sequencing depth. The results indicated that the estimated genome size of R. soongarica is 1.26 Gb, with a heterozygosity rate of 2.24%, and a repeat rate of 50.68% (Fig. 1f).
HiFi library construction and sequencing
A 20 Kb PacBio library was constructed using the SMRTbell Template Prep Kit-SP v3, following the manufacturer’s instructions (Pacific Biosciences, Menlo Park, CA, USA). Subsequently, Circular Consensus Sequencing (CCS) mode on the PacBio Sequel II platform was employed for sequencing. The raw sequencing data were filtered using smrtlink software v11.0, resulting in a total of 74.62 Gb of clean HiFi data. The filtered CCS data were then converted to a fasta file using samtools v1.1823. According to survey-based estimations, the sequencing depth was approximately 58.7×.
Hi-C library construction, sequencing, and quality assessment
For DNA samples passing quality checks, a sequential process was carried out, including polyformaldehyde cross-linking, MboI enzyme digestion, end repair, biotin labeling, DNA purification, and capture treatment. The Hi-C library was constructed, and paired-end sequencing was performed on the DNBSEQ platform24.
The sequencing generated 128.23 Gb of raw data. After filtering out data with adapters and lower quality using Soapnuke v2.1.0, 127.19 Gb of clean data was obtained. The sequencing depth was estimated to be approximately 101×.
Genome assembly
HiFi reads were assembled into contigs using Hifiasm v0.16.125. Then, purge_dups was performed to remove redundant and erroneous assemblies obtained from the HiFi reads. The contig-level genome was indexed using WBA26, and Hi-C data were aligned and merged with the contig-based genome. The generate_site_positions.py program in Juicer v1.527 was used to obtain potential enzyme cut sites in the genome, extracting Hi-C data that uniquely mapped to the genome (Hi-C Contacts) and performing clustering and redundancy removal. Next, 3D-DNA v190716 and Juicer v1.5 were used to scaffold Hi-C reads, constructing a chromosome-level genome.
The total length of the contig-level genome was 1.28 Gb, approximately matching the K-mer estimate (Table 1). The N50 was 116.15 Mb, and the complete BUSCO score was 97.5%. After Hi-C scaffolding, a total of 1.25 Gb of sequences were anchored onto 11 pseudo-chromosomes, with a scaffolding rate of 97.96% (Fig. 2a and b). The N50 of the chromosome-level genome was 116.15 Mb, and the complete BUSCOs reached 97.8%, indicating a high genome completeness. As shown in Fig. 2a, the Hi-C data signal was strongest along the diagonal, demonstrating effective genome assembly.
Transcriptome sequencing
For assisting genome annotation, both full-length transcriptome sequencing and short-read transcriptome sequencing were performed. Using PacBio’s Single-Molecule Real-Time (SMRT) technology, full-length cDNA was synthesized, PCR amplified, and libraries were constructed. The sequencing was conducted using PacBio’s SMRT technology, resulting in 63.53 Gb of raw data from a R. soongarica seedling. A total of 339,291 non-redundant full-length transcripts were obtained, with an N50 of 1,396 base pairs. For short-read RNA-seq, libraries were constructed using a paired-end model and sequenced on the DNBSEQ platform. SOAPnuke v2.1.0 was used for filtering, and HISAT v2.2.128 was employed to align the filtered clean reads to the reference genome sequence. In total, 74.31 Gb of clean data was obtained, with an average output of 8.32 Gb per sample, and the reference genome alignment rate ranged from 94.12% to 95.92%.
Genome annotation
In this study, a homology-based annotation approach was employed for R. soongarica genome annotation. RNA-seq data and full-length transcriptome data were aligned to the genome using HISAT v2.2.1 and Gmap software, respectively. GeMoMa v1.929 was then used to extract intron information from all samples for subsequent auxiliary annotation. Transdecoder was utilized for ORF (Open Reading Frame) prediction on the transcriptome data, resulting in gene annotation based on the transcriptome. Gene structure prediction and information integration were performed using GeMoMa v1.9 for five reference species (Arabidopsis thaliana, Eutrema salsugineum, Populus trichocarpa, Vitis vinifera, Zea mays). Transcriptome-specific gene information was incorporated, yielding the final gene set. In total, 21,791 genes were predicted in the R. soongarica genome (Table 2).
Predicted genes were functionally annotated by comparing them to known protein databases using Diamond v0.8.3630. The protein databases included SwissProt31 (http://www.uniprot.org/), TrEMBL32 (http://www.uniprot.org/), KEGG33 (http://www.genome.jp/kegg/), InterPro34 (https://www.ebi.ac.uk/interpro/), NR35, KOG36, and GO37. The results revealed that 95.64% of genes in the R. soongarica genome obtained functional annotation information. Among them, the majority of genes were annotated in the Nr database (20,772, 95.32%), followed by TrEMBL (20,768, 95.31%), and InterPro (18,130, 83.20%) (Table 3).
Repetitive elements annotation
Two methods were employed for the annotation of transposable elements (TEs), namely the homologous alignment method and de novo prediction38. The former one was based on the RepBase v21.1239 database. It utilizes RepeatMasker v1.33240 and RepeatProteinMask v4.0.741 to identify sequences similar to known repeat sequences. De novo prediction was based on the assembled R. soongarica genome. A de novo TE library was constructed using RepeatModeler242 and LTRharvest software43. Subsequently, Repeatmasker software was employed for prediction. The results from both methods were then redundantly processed to obtain the annotation of repeat sequences. Tandem repeats were identified by Tandem Repeats Finder v4.09.
In the genome of R. soongarica, a total of 769.66 Mb of repeat sequences were identified, constituting 60.07% of the genome. The most abundant repeat elements were the long terminal repeat (LTR) sequences, accounting for 47.47%, followed by DNA transposons at 4.76% and tandem repeats at 3.98% (Table 4). LTR_retriever44 was used to identify LTR retrotransposons (LTR-RTs) in the R. soongarica genome, and the insertion time of LTR-RTs was estimated. The mutation rate for this estimation was set as 1.52 × 10−8, which is twice the mutation rate of Tamaricaceae plants45,46,47. TEsorter v1.4.648 was used for the subfamily classification of LTR-RTs, with Ty1/copia and Ty3/gypsy being the two largest subfamilies in LTRs. Subsequently, iTOL v649 was used to visualize the subfamilies separately in evolutionary trees. The results indicated that LTR-RTs in the R. soongarica genome recently burst approximately 0.3 million years ago (Mya) (Fig. 3a). Among them, the most abundant members in the Ty1/copia and Ty3/gypsy families were Tork and Tekay, respectively (Fig. 3b).
Synteny gene identification
Synteny genes were identified within and between the genomes of R. soongarica and other species (V. vinifera, Z. mays, and T. chinensis). Initially, Python scripts from the Whole-Genome Duplication Integrated analysis (WGDI, v 0.6.2) software50 were utilized to generate a modified GFF file for the genome, with the exclusion of alternatively spliced transcripts. Subsequently, Diamond v2.1.6-1 was employed to execute protein-protein BLAST (E-value ≤ 1e−5), and the results were formatted in fmt6.blast. Following this, the commands -d, -icl, -ks, -bi, -bk, -c, and -kp of WGDI were successively executed with the default parameters. As a result, 559 synteny blocks were identified in the R. soongarica genome, with the gene number ranged from 5 to 103, and the majority of these gene blocks exhibit a 1: 1 relationship in the R. soongarica genome (Fig. 4a). The synonymous substitution rate (Ks) distribution of the syntenic blocks showed a peak at 0.56 (Fig. 4b).
Identification of gene families
The genomes of R. soongarica and other 9 species (A. thaliana, E. salsugineum, P. trichocarpa, V. vinifera, Z. mays, H. ammodendron, P. euphratica, T. chinensis, Ammopiptanthus mongolicus) were used to cluster paralogous and orthologous groups using Orthofinder v2.5.551. Among 313,500 analyzed genes, 287,806 were successfully assigned to 27,313 (91.8%) orthogroups. For R. soongarica, 20,418 genes were identified in 19,517 orthogroups, and 508 were species-specific genes in 160 orthogroups (Fig. 5a). Across the ten genomes, 793 single-copy orthologous were identified and aligned using MAFFT v7.50552, trimmed using trimAl v1.4.rev1553. The optimal model was used with IQ-TREE v1.6.1154 to analyze the tree and infer the divergence dates. Trees of the 10 species were constructed with the ML model using RAxML with 1000 bootstrap replicates v8.2.1255, and the best-obtained model was GTR + F + R4. The divergence time was calculated using the MCMCtree program in PAML v4.9 h56. The divergence time between Z. mays vs V. vinifera (142.1–163.5 Mya) and E. salsugineum vs A. thaliana (19.7–34.2 Mya) acquired from TimeTree (http://www.timetree.org/) was used as the calibration points. The phylogenetic analyses verified that R. soongarica was a sister to T. chinensis, with strong bootstrap support (>50%), and the two species diverged nearly 34.17 Mya (95% HPD = 30.04, 113.1) (Fig. 5b).
CAFE v5.057 was performed to identify expansions and contractions of gene families in R. soongarica genome. 218 and 196 gene families were found to have significantly expanded and contracted, comprising 1,303 and 207 genes, respectively.
Identification of salt-stress response gene
Differential expression genes (DEGs) were identified using DESeq258, with the criteria of |Log2FC| > 1 and false discovery rate (FDR) < 0.05 for filtering. A total of 1,449 DEGs were identified through pairwise comparisons (Fig. 6a). Cluster analysis demonstrated distinct expression patterns of these DEGs under different salt concentrations. Functional enrichment analysis showed that these DEGs were enriched in various GO terms and KEGG pathways that may relate to salt-stress response in R. soongarica (Fig. 6b).
Data Records
All sequencing data described in the study have been deposited in the NCBI database. PacBio HiFi long reads, Hi-C reads, and full-length transcriptome Iso-Seq reads were deposited in the Sequence Read Archive (SRA) under accessions SRR2754088559, SRR2754088660, and SRR2754088261, respectively. The RNA-seq data of salt-stressed samples were deposited in the SRA under accessions SRR27540875-SRR2754088162,63,64,65,66,67,68, SRR2754088369, and SRR2754088470, and the genome survey data with SRA accession number SRR2849591771. The genome assembly and annotation of R. soongarica has been deposited on the Figshare platform72, and GenBank with accession number JBEBFM000000000 (2024)73, respectively.
Technical Validation
For the genome assembly, we assessed the quality using BUSCO v5.2.2, embryophyta_odb10). The assembly achieved completeness of 97.5% at the contig level and 97.8% at the chromosome level, indicating a highly complete genome (Table 5). For the full-length transcriptome sequencing data, we employed the single-copy ortholog database BUSCO to assess the quality of the assembled transcripts (Fig. 7a). Additionally, 97.96% of the Hi-C data were successfully anchored to the 11 pseudo-chromosomes, confirming the accuracy of the chromosome assembly (Fig. 2a). The organization of interaction contacts within and around the chromosome region was observed through the Hi-C heatmap, further supporting the quality of the chromosome assembly. We also calculated the LTR Assembly Index (LAI) using LTR_retriever v2.9.0, obtaining a value of 12.37, indicative of a genome of reference quality (Fig. 7b, Table 1).
Code availability
All data processing commands and pipelines were carried out in accordance with the instructions and guidelines provided by the relevant bioinformatic sofware. There were no custom scripts or code utilized in this study.
References
Dandotiya, B., Sharma, H. K. Impacts of Climate Change on Agriculture and Aquaculture. (2021).
Hanin, M., Ebel, C., Ngom, M., Laplaze, L., Masmoudi, K. New Insights on Plant Salt Tolerance Mechanisms and Their Potential Use for Breeding. Front Plant Sci. 7 (2016).
Ahuja, I., de Vos, R. C. H., Bones, A. M. & Hall, R. D. Plant molecular stress responses face climate change. Trends Plant Sci. 15, 664–674 (2010).
He, M., He, C.-Q., Ding, N.-Z. Abiotic Stresses: General defenses of land plants and chances for engineering multistress tolerance. Front Plant Sci. 9 (2018).
Golldack, D., Li, C., Mohan, H., Probst, N. Tolerance to drought and salt stress in plants: unraveling the signaling networks. Front Plant Sci. 5 (2014).
Deinlein, U. et al. Plant salt-tolerance mechanisms. Trends Plant Sci. 19, 371–379 (2014).
Flowers, T. J. & Colmer, T. D. Plant salt tolerance: adaptations in halophytes. Ann Bot-London. 115, 327–331 (2015).
Keisham, M., Mukherjee, S., Bhatla, S. Mechanisms of sodium transport in plants—progresses and challenges. Int J Mo Sci. 19 (2018).
Ekblom, R. & Galindo, J. Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity. 107, 1–15 (2010).
Savolainen, O., Lascoux, M. & Merilä, J. Ecological genomics of local adaptation. Nat Rev Genetic. 14, 807–820 (2013).
Stapley, J. et al. Adaptation genomics: the next generation. Trends Ecol Evol. 25, 705–712 (2010).
Zhang, Y. et al. Water use strategies of dominant species (Caragana korshinskii and Reaumuria soongorica) in natural shrubs based on stable Isotopes in the Loess Hill, China. Water. 12 (2020).
Li, L. H., Chen, J. Q., Han, X. G., Zhang, W. H., Shao, C. L. Grassland Ecosystems of China: A Synthesis and Resume. (2020).
Li, E., Huang, Y., Chen, H. & Zhang, J. Floristic diversity analysis of the Ordos Plateau, a biodiversity hotspot in arid and semi-arid areas of China. Folia Geobot. 53, 405–416 (2019).
Luo, X. M. et al. Karyotype analysis of Piptanthus concolor based on FISH with a oligonucleotides for rDNA 5S. Sci Hortic-Amsterdam. 226, 361–365 (2017).
Mamidi, S. et al. A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci. Nat Biotechnol. 38, 1203–1210 (2020).
Li, J. L., Wang, S., Yu, J., Wang, L. & Zhou, S. L. A modified CTAB protocol for plant DNA extraction. Chinese Bulletin of Botany. 48, 72–78 (2013).
Rio, D. C., Ares, M., Jr., Hannon, G. J., Nilsen, T. W. Purification of RNA using TRIzol (TRI reagent). Cold Spring Harb Protoc. (2010).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience. 7, 1–6 (2018).
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770 (2011).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. J. Bioinformatics. 33, 2202–2204 (2017).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 18, 170–175 (2021).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25, 1754–1760 (2009).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods. 18, 366–368 (2021).
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Junker, V. et al. The role SWISS-PROT and TrEMBL play in the genome research environment. J. J Biotechnol. 78, 221–234 (2000).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. J. Nucleic Acids Res. 28, 27–30 (2000).
Zdobnov, E. M. & Apweiler, R. InterProScan–an integration platform for the signature-recognition methods in InterPro. J. Bioinformatics. 17, 847–848 (2001).
Koonin, E. V. et al. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. J. Genome Biol. 5, 1–28 (2004).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucl Acids Res. 44, D733–D745 (2016).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. J. Nat Genet. 25, 25–29 (2000).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics. 21(Suppl 1), i351–358 (2005).
Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 6, 11 (2015).
Tarailo‐Graovac, M., Chen, N. Using repeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 25 (2009).
Shapiro, J. Mobile Genetic Elements. Elsevier. (2012).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS. 17, 9451–9457 (2020).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 9, 18 (2008).
Ou, S. & Kwok, K. C. Ferulic acid: pharmaceutical functions, preparation and applications in foods. J Sci Food Agr. 84, 1261–1269 (2004).
Cossu, R. M., Buti, M., Giordani, T., Natali, L. & Cavallini, A. A computational study of the dynamics of LTR retrotransposons in the Populus trichocarpa genome. Tree Genet Genomes. 8, 61–75 (2012).
Ma, J. X. & Bennetzen, J. L. Rapid recent growth and divergence of rice nuclear genomes. PNAS. 101, 12404–12410 (2004).
Liu, J. N. et al. Genomic analyses provide insights into the evolution and salinity adaptation of halophyte Tamarix chinensis. GigaScience. 12 (2023).
Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic Res. 9 (2022).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an. online tool for phylogenetic tree display and annotation. Bioinformatics. 23, 127–128 (2007).
Sun, P. et al. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 15, 1841–1851 (2022).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9, 286–298 (2008).
Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972–1973 (2009).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30, 1312–1313 (2014).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 22, 1269–1271 (2006).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540885 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540886 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540882 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540875 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540876 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540877 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540878 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540879 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540880 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540881 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540883 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27540884 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28495917 (2024).
Dang, Z. H. et al. Chromosome-level genome assembly and annotation of Reaumuria soongarica. figshare, https://doi.org/10.6084/m9.figshare.c.7021974.v1 (2024).
Dang, Z. H. et al. Reaumuria songarica ZD-2024a, whole genome shotgun sequencing project. NCBI GenBank https://identifiers.org/ncbi/insdc:JBEBFM000000000 (2024).
Acknowledgements
This work was supported by grants from the Natural Science Foundation of Inner Mongolia of China (2024JQ11), the National Natural Science Foundation of China (31860078 and 32160088), the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region of China (NJYT22093). Thanks to Professor Liqing Zhao and Associate Professor Zhiyong Li from the School of Ecology and Environment at Inner Mongolia University for providing the photographs of R. soongarica.
Author information
Authors and Affiliations
Contributions
Z.D. conceived and designed the study. Z.D., Y.T., M.S., W.G., Y.M., T.H., Y.L. and Y.Z. prepared the materials. Z.D., M.S., W.G. and Y.T. analyzed the data and wrote the manuscript. Z.D., M.S. and Y.T. edited and improved the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Song, M., Gong, W., Tian, Y. et al. Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica. Sci Data 11, 812 (2024). https://doi.org/10.1038/s41597-024-03644-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03644-y