Chromosome level genome assembly of colored calla lily (Zantedeschia elliottiana)

The colored calla lily is an ornamental floral plant native to southern Africa, belonging to the Zantedeschia genus of the Araceae family. We generated a high-quality chromosome-level genome of the colored calla lily, with a size of 1,154 Mb and a contig N50 of 42 Mb. We anchored 98.5% of the contigs (1,137 Mb) into 16 pseudo-chromosomes, and identified 60.18% of the sequences (694 Mb) as repetitive sequences. Functional annotations were assigned to 95.1% of the predicted protein-coding genes (36,165). Additionally, we annotated 469 miRNAs, 1,652 tRNAs, 10,033 rRNAs, and 1,677 snRNAs. Furthermore, Gypsy-type LTR retrotransposons insertions in the genome are the primary factor causing significant genome size variation in Araceae species. This high-quality genome assembly provides valuable resources for understanding genome size differences within the Araceae family and advancing genomic research on colored calla lily.


Background & Summary
Zantedeschia spp, commonly known as calla lily, is a perennial herbaceous flowering plant belonging to genus Zantedeschia of the family Araceae.It is typically found in swamps and hills regions of South Africa 1,2 .Through its unique spathes and decorative foliage, calla lily has become popular tubers flowering plants worldwide.It is usually divided into two groups: white calla lily and colored calla lily 3 .Colored calla lily is a significant economic horticultural crop that have been among the top cut flower and tuber exports in New Zealand for the past three decades, while also contributing substantially to the horticultural export revenues of the Netherlands and the United States.Furthermore, the tubers of colored calla lilies have medicinal value and are effective in treating certain gastrointestinal and trauma-related illnesses.
Through k-mer and flow cytometry analysis, the genome size of Zantedeschia elliottiana cv.'Jingcai Yangguang' was ~1.2 Gb, with a genome heterozygosity of 1.9% and a repeat sequence proportion of 67.84% (Figs. 1, 2).The de-novo assembly of the genome used 84.30XIllumina paired-end short reads (100.31Gb), 36.92XHiFi reads (43.93 Gb) and 141.45X Hi-C reads (168.18Gb).We first assembled the genome by HiFi reads and generated a 1,154 Mb contig sequence with 42 Mb contig N50 size (Table 1).Using Hi-C reads, 98.50% of the contigs were anchored into 16 pseudo-chromosomes (Fig. 3, Table 1).The transposable elements content of the total genome in the final annotation is 60.18%, of which LTR retroelement accounted for the largest proportion (51.54%).On the contrary, the proportion of DNA transposons was only 3.73% (Table 2).A total of 36,165 protein-coding genes were predicted, of which 95.1% could be functionally annotated through the InterPro 4 , Pfam 5 , Swiss-Prot 6 , NCBI Non-redundant protein (NR) 7 and Kyoto Encyclopedia of Genes and Genomes (KEGG) 8 databases (Table 3).In addition, 10,033 rRNA, 1,677 snRNA, 469 miRNA and 1,652 tRNA in Zantedeschia elliottiana cv.'Jingcai Yangguang' genome were obtained by non-coding RNA annotation (Table 4).Using BUSCO evaluation, 98% of the core genes can be identified, including 95.7% of complete single-copy genes and 2.3% of duplicated genes (Table 1).93.83~95.23% of RNA-seq reads from eight Zantedeschia elliottiana cv.'Jingcai Yangguang' tissues (tuber, leaf, pistil, root, spathe, stamen, stem and style) could be mapped to the genome.99.02% of Illumina reads and 98.42% of HiFi reads were correctly mapped to the genome.The LTR Assembly Index (LAI) of the genome was 18.43, which directly proved that the genome has high continuity (Table 1).LTR insertion time analysis showed that Araceae plants had different LTR bursts during genome evolution, and different types of LTR have different burst states.For Copia-type LTR retrotransposons, Pistia stratiotes and Zantedeschia elliottiana cv.'Jingcai Yangguang' had the same insertion time.Interestingly, Amorphophallus konjac and Colocasia esculenta experienced two outbreaks of Copia and Gypsy.The time interval between the two outbreaks of Colocasia esculenta were obvious, while Amorphophallus konjac were close.Analysis also showed that Gypsy of Pistiastratiotes had recently experienced an outbreak (Fig. 4a).As a branch of Araceae family, Lemnaceae plantshave a smaller genome size and number of genes than True-Araceae plants.However, the genome size of True-Araceae plants is not related to the number of genes.Correlation analysis further explained the high correlation between genome size and transposable elements.Gypsy-type LTR retrotransposons had the highest correlation with genome size (Fig. 4b).
Here, a high-quality chromosome-level assembly of Zantedeschia elliottiana cv.'Jingcai Yangguang' was assembled, revealing the fundamental cause of genome size variation in the Araceae family.

Methods
Sample collection and sequencing.'Jingcai Yangguang' is a variant of Zantedeschia elliottiana cv.'Black Magic' with a chromosome number of 2n = 2x = 32.It was initially cultivated in 2015 by Di Zhou, a former associate researcher in our team.Its young leaves were collected for genome sequencing, and the sequencing material was sourced from the same plant to ensure accuracy of the sequencing.Eight tissues (tuber, leaf, pistil, root, spathe, stamen, stem and style) were sampled for transcriptome sequencing, and the sequencing results were used for gene structure annotation.
The FastPure Plant DNA Isolation Mini Kit (Vazyme, CHN) was employed for DNA extraction from leaf tissue.In liquid nitrogen, fresh leaves were pulverized into a fine powder, and genomic DNA was isolated according to the manufacturer's guidelines.NanoDrop 2000 (Thermo Scientific, USA) and gel electrophoresis were utilized to evaluate the concentration and purity of the isolated DNA.
The high-quality DNA was used to construct a genomic library, and the library construction and sequencing work were completed at Novogene Co., Ltd. in Beijing.The library is then size-selected using BluePippin (Sage Science, USA) to obtain fragments of the desired size range, which is typically ~15 kb for HiFi sequencing.The purified and size-selected library is then sequenced on the PacBio Sequel II system (Pacifc Biosciences, USA).For Illumina sequencing, a short-read sequencing library was constructed with an insert size of ~250 bp and sequenced on an Illumina NovaSeq.6,000 platform (Illumina, USA).The Hi-C library was constructed using the same leaf sample as previously described.Briefly, nuclear DNA was fixed with formaldehyde and digested with the restriction enzyme DpnII (NEB, UK).Biotinylated nucleotides were added to the termini  of the fragmented DNA, followed by enrichment and size selection to obtain fragments approximately 500 bp.
The library was sequenced on the Illumina NovaSeq.6,000 platform (Illumina, USA).The RNAprep Pure Plant Kit (TIANGEN, CHN) was used to extract RNA from 8 different tissues (tuber, leaf, pistil, root, spathe, stamen, stem and style).The tissue samples were ground with liquid nitrogen and lysis buffer was added to extract RNA.The RNA was isolated according to the manufacturer's guidelines.RNA-seq libraries were generated and sequenced on an NovaSeq.6,000 platform (Illumina, USA).

technical Validation
Firstly, the Hi-C heatmap exhibits the accuracy of genome assembly, with relatively independent Hi-C signals observed between the 16 pseudo-chromosomes (Fig. 2a).Moreover, we aligned RNA and DNA reads to the final determined genome to assess the accuracy of genome assembly.For the alignment of DNA reads, Illumina reads were aligned using BWA (v0.7.17) 54 with default parameters, while HiFi reads were aligned using min-imap2 (v2.24-r1122) 55 with default parameters.The mapping rate for Illumina reads was 99.02%, while the mapping rate for HiFi reads was 98.42%.For the alignment of RNA reads, transcriptomic data from different tissues were individually mapped to the final determined genome using HISAT2 (v2.2.1) 56 with default parameters.The mapping rates for the respective tissue-specific transcriptomic data ranged from 93.83% to 95.23%.Furthermore, we evaluated the completeness of the genome using BUSCO (v5.4.5, embryophyta_odb10) 13 , and LAI (LTR_retriever, v2.9.0) 14 (Table 1).Overall, these assessments individually confirmed the accuracy and completeness of the genome assembly.

Fig. 1
Fig. 1 Genome size estimation of Zantedeschia elliottiana cv.'Jingcai Yangguang' by flow cytometry.Tomato and maize were used as internal references to genome size estimation.

Table 1 .
Summary of the Z. elliottiana genome.

Table 3 .
Statistics of gene functional annotation.