A high-quality chromosome-level genome assembly of Ficus hirta

Ficus species (Moraceae) play pivotal roles in tropical and subtropical ecosystems. Thriving across diverse habitats, from rainforests to deserts, they harbor a multitude of mutualistic and antagonistic interactions with insects, nematodes, and pathogens. Despite their ecological significance, knowledge about the genomic background of Ficus remains limited. In this study, we report a chromosome-level reference genome of F. hirta, with a total size of 297.27 Mb, containing 28,625 protein-coding genes and 44.67% repeat sequences. These findings illuminate the genetic basis of Ficus responses to environmental challenges, offering valuable genomic resources for understanding genome size, adaptive evolution, and co-evolution with natural enemies and mutualists within the genus.


Background & Summary
Ficus is a highly species rich genus of mainly pantropical woody plants with a diverse range of growth forms.Fig trees occupy a broad range of habitats 1,2 and are among the most ecologically important plant groups in tropical forests 3,4 .The genus is characterized by its enclosed inflorescences (figs, also called syconia) that vary in size and location, but have remained unchanged in fundamental structure since the genus first appeared around 45 mya [5][6][7] .The evolutionary history of the genus has therefore combined extensive radiation and ecological diversification with a reproductive conservatism that is linked to their unique interaction with the trees' only pollinators (fig wasps, Hymenoptera Agaonidae).Perhaps the most significant innovation involving fig anatomy has involved the modification of breeding systems, with some Ficus species monoecious, others gynodioecious (but functionally dioecious), that involves associated changes in floral anatomy 8 .Ficus belongs to the Eudicot family Moraceae, placed by recent phylogenies within the 'urticalean' clade of Rosales.Dioecy is believed to be the ancestral state within Moraceae as a whole 5 but the ancestral breeding system in Ficus remains uncertain 8 .Most Ficus species are diploid with 2n = 26, irrespective of their phylogenetic relations within the genus 9 , but tetraploid species are known from Africa 10 .The significance of hybridization in Ficus diversification has been debated, but Gardner et al. have shown that while introgression has taken place, it has not had a major impact on evolution in the genus 7 .
In addition to pollinating fig wasps, Ficus also has symbiotic non-pollinating fig wasps, beetles, flies, moths, nematodes and pathogens that are likely to have a negative impact on the host.More than 300 leaf-chewing and more than 400 sap-sucking insect species were recorded from just 15 Ficus species from Papua New Guinea [11][12][13][14] .Ficus species possess diversified direct defense strategies, including physical structures and differing chemical defenses 15,16 .They are known to contain hundreds of different secondary metabolites 17,18 , but we know little of the underlying genetics.
Here, we assembled a high-quality chromosome-level genome of F. hirta using a combination of PacBio HiFi sequencing and Hi-C techniques and compared this with previously published genomes of four congeners.The assembled F. hirta genome had a combined length of 297.27 Mb, featuring a contig N50 of 19.71 Mb and achieving a complete BUSCO score of 98.50%.A substantial 282.12 Mb (94.90%) of the sequences were successfully anchored to the 13 pseudochromosomes.The genome annotation predicted 28,625 protein-coding genes.This high-quality F. hirta genome provides novel genomic resources for future researchers on genome and adaptive evolution within fig trees, as well as Ficus-natural enemy and mutualist co-evolution.

Methods
Sample collection and sequencing.F. hirta material came from a natural population growing in the South China Botanical Garden (23.18°N, 113.36°E),Guangzhou, China.Fresh young leaves of F. hirta were collected for genome sequencing.Organs (leaves, stems, inflorescences and roots) were collected from three individual trees to provide biological replicates of the F. hirta sampled for its transcriptome.All samples were immediately flash-frozen using liquid nitrogen and stored at −80 °C for subsequent nucleic acid extraction.Highquality genomic DNA was isolated from young leaves of F. hirta using the CTAB method 19 .The genomic DNA was then fragmented into random fragments, and short-read libraries of F. hirta were constructed according to Illumina's standard protocol, and paired-end reads (150 bp) were sequenced on an Illumina NovaSeq platform.Additionally, a 15 kb HiFi library was constructed following the protocol for the PacBio Sequel2 platform, and circular consensus sequencing (CCS) was performed.A Hi-C library 20 was also sequenced on an Illumina NovaSeq platform with paired-end reads of 150 bp.Total RNA was extracted using CTAB and RNA-seq libraries were constructed and sequenced on an Illumina NovaSeq platform with a read length of 150 bp on both sides.All Illumina sequencing data were filtered to obtain clean data using the fastp v0.23.1 software 21 for subsequent analysis.All analyses were performed on a laboratory server with 60 TB storage and 100 threads, operating on Linux.

Genome assembly.
Before assembly, we first estimated the genome size and heterozygosity of F. hirta by calculating the 17-mer frequency distribution using Jellyfish v2.3.0 and GenomeScope v2.0 software 22,23 .Next, Pacbio HiFi reads were assembled into contigs using hifiasm v0.15.4 with the default parameters 24 .To obtain clean Hi-C data, we used HiC-Pro v3.1.0to filter the raw Hi-C data 25 .After that, the clean Hi-C data were aligned to the final assembled contigs by the juicer pipeline v1.6 to obtain the interaction matrix 26 .The contigs were then ordered and anchored using 3D de novo assembly (3D-DNA) v180419 27 .Finally, the Hi-C contact maps of the final assembly result were reviewed manually with Juicebox v1.11.08 26 .

Genome annotation.
For repeated elements identification and masking, we used homology-based and de novo approaches to identification.Briefly, a de novo repeat library was constructed using RepeatModeler v2.0.2 31 .Then the obtained library was combined with the Repbase database v21.12 32 to identify repetitive sequences in the F. hirta genome using RepeatMasker v4.1.2 33.For noncoding RNA prediction, the tRNA genes were predicted using tRNAscan-SE v2.0.6 34 .Others, including miRNA, rRNA and snRNA genes, were detected by comparison with the Rfam database using CMsearch v1.1.3with the default parameters 35,36 .Protein-coding gene annotation was conducted using homology-based, transcriptome-based, and ab initio prediction methods.First, we used homologies from 11 different species (Table S3) as protein-based evidence for predicting gene sets using GeneWise v2.4.1 37 .Transcriptome data, including leaf, stem, inflorescence, and root RNA-seq reads were mapped using HISAT2 v2.1.0 38.Ab initio prediction using packages AUGUSTUS v3.4.0 39 , trained by the transcriptome data.To generate a comprehensive protein-coding gene set, we used the GETA pipeline (https://github.com/chenlianfu/geta) to integrate annotations from all homology-based, transcriptome-based, and ab initio predictions.To functionally annotate the predicted gene models, we searched several different databases, including the NCBI nr 40 , Swiss-Port 41 , KOG 42 , eggNOG 43 , Pfam 44 , GO 45 , and KEGG 46 .

Data records
The National Genomics Data Center (NGDC) database BioProject accession number for the sequence reported in this paper is PRJCA019243.The raw sequencing data for HiFi, Hi-C, and RNA-seq were submitted to NGDC GSA with accession numbers CRR857341-CRR857356 47 .The chromosomal-level genome assembly file was deposited in the NCBI GenBank with accession number GCA_038430175.1 48 .Moreover, the gene structure annotation, gene function annotation and TE annatition files have been deposited at the Figshare 49 database.

technical Validation
To assess genome assembly quality, the Illumina genomic and RNA-seq reads were mapped to the genome using BWA v0.7.17 50 and HISAT2 v2.1.0 38, respectively.To evaluate the completeness and accuracy of the genome, we used the LTR assembly index (LAI) 51 and BUSCO v4.1.2 52evaluation with the embryophyta_ odb10 database to examine.Finally, the mapping rates of Illumina and HiFi reads to the genome were 98.52% and 99.13%, respectively (Table S7).The LAI had a score of 19.98 (Table 1), which is similar to the scores for Oryza sativa and Arabidopsis thaliana 51 .Benchmarking Universal Single-Copy Orthologs (BUSCO) analyses showed the assembled genome contained 1,590 (98.50% of 1,614) complete sets of the core orthologous genes in the Embryophyta_odb10 database, which is higher than that of the seven previously reported Ficus genomes (89.7%-96.4%)(Table S5).All these values suggest a high quality of F. hirta genome sequence.

Table 1 .
Statistics for published Ficus genomes.

Table 2 .
Statistics of repeat sequences in Ficus hirta genome.