Complete chloroplast genome of Lilium ledebourii (Baker) Boiss and its comparative analysis: lights into selective pressure and adaptive evolution

Lilium ledebourii (Baker) Boiss is a rare species, which exhibits valuable traits. However, before its genetic diversity and evolutionary were uncovered, its wild resources were jeopardized. Moreover, some ambiguities in phylogenetic relationships of this genus remain unresolved. Therefore, obtaining the whole chloroplast sequences of L. ledebourii and its comparative analysis along with other Lilium species is crucial and pivotal to understanding the evolution of this genus as well as the genetic populations. A multi-scale genome-level analysis, especially selection pressure, was conducted. Detailed third‑generation sequencing and analysis revealed a whole chloroplast genome of 151,884 bp, with an ordinary quadripartite and protected structure comprising 37.0% GC. Overall, 113 different genes were recognized in the chloroplast genome, consisting of 30 distinct tRNA genes, four distinct ribosomal RNAs genes, and 79 unique protein-encoding genes. Here, 3234 SSRs and 2053 complex repeats were identified, and a comprehensive analysis was performed for IR expansion and contraction, and codon usage bias. Moreover, genome-wide sliding window analysis revealed the variability of rpl32-trnL-ccsA, petD-rpoA, ycf1, psbI-trnS-trnG, rps15-ycf1, trnR, trnT-trnL, and trnP-psaJ-rpl33 were higher among the 48 Lilium cp genomes, displaying higher variability of nucleotide in SC regions. Following 1128 pairwise comparisons, ndhB, psbJ, psbZ, and ycf2 exhibit zero synonymous substitution, revealing divergence or genetic restriction. Furthermore, out of 78 protein-coding genes, we found that accD and rpl36 under positive selection: however, at the entire-chloroplast protein scale, the Lilium species have gone through a purifying selection. Also, a new phylogenetic tree for Lilium was rebuilt, and we believe that the Lilium classification is clearer than before. The genetic resources provided here will aid future studies in species identification, population genetics, and Lilium conservation.

www.nature.com/scientificreports/ chloroplast genomes of Lilium species demonstrated slightly visible junction variation in the IRa/LSC and IRb/ SSC boundaries, despite the gene number and gene content being conserved (Fig. 2). In all Lilium cp genomes, IR regions were found to have nearly the same size (26,382 bp to 26,990 bp). L. fargesii had the largest IR region expansion, which ended at the rps19 gene. In all Lilium cp genomes, the ycf1 gene was placed in the IR/SSC boundary regions, leading to an incomplete gene duplication within IRs, and the IR/LSC junction site was placed in the rps19 gene. In most Lilium cp genomes, the rps19 gene was observed in the IRb region around 137 bp to 142 bp far from the JLB boundary, except for L. distichum (6 bp); L. gongshanense and L. nepalense (22); L. henricii, L. meleagrinum, and L. rosthornii (23 bp); and L. amoenum and L. primulinum (31 bp): it was found significantly lower than the other species. The ndhF gene in most cp genomes (39/48) 45 bp, and L. washingtonianum 45 bp, the ndhF gene was discovered somewhat extended in the IRb (Fig. 2).
In the Lilium species, the IR region was more conserved than LSC and SSC regions. Synteny results revealed that the Lilium species have a high degree of sequence identity and collinearity at the cp genome-wide scale, especially in the IR region (Fig. S1). We also conducted genome-wide analysis via sliding window assessment to detect hotspot regions in the Lilium cp genomes. Nucleotide diversity (pi) was averaged at 0.00504, ranging from 0 to 0.01913. The variability of rpl32-trnL-ccsA, petD-rpoA, ycf1, psbI-trnS-trnG, rps15-ycf1, trnR, trnT-trnL, and trnP-psaJ-rpl33 were higher among the 48 Lilium cp genomes. The divergence was more prominent in the SC regions than in the IRs regions, which displayed a higher nucleotide variability compared to IR regions (Fig. 3).  (AAA TTC /AAT TTG ). Overall, varied SSR motifs were found in Lilium cp genomes at different frequencies. This research distinguished the presence and SSR types of Lilium species, which might be fruitful for molecular marker investigations and population genetics of Lilium, especially for L. ledebourii (Table S1, Fig. S2). Complicated repeat sequences play a role in the recombination and variation of chloroplast genomes 45 . The SCh chloroplast genome contains 32 complex repeats, including five tandem, ten dispersed, and 17 palindromic repeats (Table 2). These repeats were at least 24 bp in length, with the longest being 53 bp. Furthermore, it was discovered that the final quantity of complex repeats in the SCh genome was around 25% lower than the average number of repeats in Lilium genomes, with a decrease of 18%, 32%, and 22% for tandem, dispersed, and palindromic repeats, respectively (Fig. 4D). In total, 29-55 long repeat sequences were discovered in each Lilium cp genome, including 9-23 dispersed repeats, 14-30 palindromic repeats, and 3-10 tandem repeats (Fig. 4D). With the exception of L. longiflorum, which contained a repetition of 162 bp, repeat sizes varied from 30 to 87 bp in dispersed, 30 to 61 bp in palindromic, and from 15 to 85 bp in tandem. In other words, out of the 48 species, palindromic repeat was the most common type, and the total number of repeats ranged from 29 to 55, with 60.86% of these repeats being between 30 and 40 in length (Fig. 4E).
Codon usage bias analysis. Due to the widespread occurrence of synonymous codon bias in organisms, recognizing codon preference might play a significant role in the evolution by clarifying the selection pressure and improving translation efficiency by utilizing major codons 46,47 . Totally, 21,989 codons were detected in the SCh protein-coding genes. A-and U-ending are seen to be more prevalent than G and C-ending ones. Among SCh amino acids, the highest and lowest frequencies were related to leucine (Leu = 2268) and cysteine (Cys = 255), respectively. In SCh, 30 codons showed more bias (RSCU > 1), and 31 codons displayed bias: RSCU < 1. In addition, there was no bias (RSCU = 1) in the frequency of start codons AUG (methionine), UGG (tryptophan), and AUA (isoleucine) ( Table 3). Table 1. Gene content and functional classification of L. ledebourii chloroplast genome. *Gene with one intron; **gene with two introns; (×2) duplicated gene; a trans-spliced gene; ψ: pseudogene.   (Table S2, Fig. 5). Among the 20 amino acids, the lowest and highest RSCU values were recorded for Tyr-UAC, encoding the tyrosine, and Leu-CUU, encoding the leucine, respectively. Codon usage in the Lilium cp genomes was biased towards A-and U-ended codons, according to RSCU values (RSCU > 1). In addition, the pattern of codon usage bias in the Lilium species was investigated. Figure S3 shows the values of the Codon adaptation index (CAI), Codon bias index (CBI), Frequency of optimal codons (FOP), Effective number of codons (NC), and GC3s for 48 Lilium chloroplast genomes. We observed that five parameters associated with codon usage bias are very similar across Lilium species (Fig. S3).
Also, another analysis to compare Lilium species based on "concatenate all protein-coding genes" was per-  Phylogenetic analysis. Both analyses, which used the complete chloroplast genome (CCGs) and protein coding genes (CDSs), divided 47 Lilium species into two main groups. Although the members of both main groups were the same in both topologies, there were slight differences between them (Figs. 7, S4). Because of the high synteny among Lilium cp genomes, this study concentrated on phylogenetic analysis employing whole cp genome sequences to inspect relationships across the 47 Lilium species. Maximum likelihood was employed with two species serving as outgroups. According to the topology, the majority of nodes were highly supported. In all, 34 of the 45 nodes acquired a maximally supported (value ≥ 99%) value bootstrap. According to the CCG topology, the 47 Lilium species were divided into two main groups consisting of 11 clades (Fig. 7).

Discussion
The conserved characteristics of gene content and organization, as well as the GC content of the SCh cp genome, were found to be similar to the variability in other species 17,18 . Furthermore, trans-splicing was observed in the rps12 gene, which is also seen in other species 48 . The Lilium cp genomes length differed between 151,655 bp in L. bakerianum and 153,235 bp in L. fargesii. It was suggested that one of the primary causes for the change in the cp genomes size is that the IR region shrinks, expands, or losing 49 . We surveyed the expansion and contraction variability in IR/SC junction regions. The boundary of LSC/ IRb is stable, while slightly visible junction variation can be seen in the IRa/LSC and IRb/SSC boundaries. The occurrence of contraction and expansion of IR regions during evolution is a relatively common happening, which has been employed as evolutionary loci for phylogenetic studies 50,51 . Expansion of IR/LSC junctions to rps19 has been observed in other Liliaceae species such as Amana, and this event appears to be a Liliaceae ancestral symplesiomorphy 52 . The contraction/expansion of IR regions in Liliaceae has resulted in the formation of ycf1 and rps19 at the boundaries across SC and IR regions, with varying lengths, as demonstrated in Fritillaria 53 and Cardiocrinum 54 . Given the unison of our findings with those about other plants [52][53][54] , expansion and contraction of IR regions may be a significant mechanism for different lengths of 48 Lilium cp genomes.
Analysis of nucleotide diversity and cp repeats can be used to recognize molecular markers, rebuild evolutionary connections, and delve into population genetics 55,56 . In this study, a total of 3234 microsatellites were discovered in 48 cp genomes of Lilium. A/T repeats were the most common type of SSR found. The abundance of this SSR type is consistent with the majority of other cp genomes explored thus far 57 . Additionally, complex repeats in the 48 Lilium were found, which could be substantial genomic reconfiguration hotspots 58,59 . The number and size of tandem, dispersed, and palindromic repeats were nearly identical in the cp genomes of relevant species such as Fritillaria 60 . Especially, the incidence of large repeats in the chloroplast, such as the 162 bp tandem repeat of L. longiflorum from our results, has probably been linked to an unstable genomic structure because of improper rearrangements 61 . www.nature.com/scientificreports/ Here, a higher divergence of the SC regions and a lower divergence of the IR were discovered, suggesting that the IR is more conservative than in other regions, with the same characteristics of most angiosperms 62 . This occurrence is caused by copy correction of IRs and the removal of harmful mutations via gene conversion 63 . rpl32-trnL-ccsA 64 , trnP-psaJ-rpl33, petD-rpoA, ycf1 18 , psbI-trnS-trnG 65 , and rps15-ycf1, trnT-trnL 66 have previously been identified as high variability regions in various species. The phylogeny of genus Lilium will be clarified with the help of these regions, which are expected to be very helpful in the future.
The rate of Ka (non-synonymous) to Ks (synonymous) nucleotide substitution is commonly employed as a powerful tool for the clarification of the evolution of protein coding genes and also species adaptive developments 67,68 . The Ka/Ks ratio determines the gene divergence grade and whether selection pressure is positive (Ka/Ks > 1), purifying (Ka/Ks < 1, particularly if it is less than 0.5), or neutral (Ka/Ks = 1) 67 .
In the present study, the Ka/Ks > 1 was recorded for accD, rpl16, and rpl36, implying that these genes could be important in the adaptive evolution 51 . Positive selection of accD was signed by the important role of the gene in stress tolerance and resistance, insect predation, and pathogens 69 . Positive selection in accD has been observed in Ipomoea 41 and Stauntonia 43 . Other positively selected genes were rpl16 and rpl36, which are responsible for encoding the ribosomal protein, which has been evidenced to be necessary for the development of chloroplast ribosomes in plants 70 . Previous studies have reported positive selection for rpl16 in Lonicera 71 and rpl36 in Aquilegia 42 .
However, we discovered that some genes were positively selected in at least one pairwise comparison, suggesting these genes were potentially subject to positive pressure for selection among Lilium species. ycf1 possesses 248 positive selective pairwise comparisons, followed by matK (144), ccsA (52), rbcL (37), ndhI (31), clpP (29), atpF (20), rpoC2 (20), cemA (16), ndhD (14), ndhF (10), petB (10), ndhA (9), ndhG (9), petD (9), ycf4 (8), rpoA (7), ndhJ (4), rpl33 (3), rps14 (3), ndhH (2), petG (2), rps4 (2), ndhC (1), rpoB (1), and rps2 (1). Of these, MatK has previously been found under positive selection in over 30 different taxonomic groups 72 . NADH-dehydrogenase gens group (ndh) were fundamental in the use of light energy and the electron transfer chain to produce ATP, significant components for photosynthesis 73,74 . As a consequence, these genes, as important components involved www.nature.com/scientificreports/ in plant growth, may have evolved as a result of more common substitutions in order to adapt to various environmental conditions 43 . We discovered positive selection on the ycf1 gene in 248 pairwise comparisons. The ycf1 is huge open reading frame, which encodes protein products for many amino acids in higher plant. Moreover, the necessity of the ycf1 gene for cell survival has been proven by knockout studies 75 . clpP, which encodes the (ATPdependent) clp protease, is thought to play a role in chloroplast protein transformation and may be necessary for shoot development in the presence of the degradation of clpP-mediated protein 76,77 . Another positively selected gene in our study is the rubisco large-chain gene (rbcL). In many higher plants, rbcL positive selection has been made 78 . atpF is involved in the encoding of the H + -ATP subunits, which is necessary for some photosynthetic processes 79 . Positive selection has been noticed to evolve rpo genes, which encode proteins role-playing in the modification of transcription and post-transcriptional modification 80 . Due to the cooperation of cemA with nuclear genes 81 , cemA might evolve relatively quickly in species 82 . Substitution of amino-acid, indel presence and prematurity in stop codon could lead to a positive selection of ccsA 83 . Selected positive genes may have had crucial roles in the adaptation of Lilium to different environments. Gao et al. 84 documented the adaptation of chloroplast genes to various ecological environments of solar preferences. Moreover, more undiscovered selective compulsions may be involved in the increasing of the Ka/Ks ratio, leading to species divergence 85 . The Ka/Ks ratios in the majority of genes shared by Lilium chloroplast genomes and among pairwise comparisons of species employing all protein-coding genes were less than 1, proposing purifying selection. Similar findings were detailed for Gentiana species 86 . This lower rate of Ka/Ks can be the result of the fact that most of the species are probably to undergo disadvantageous nonsynonymous substitutions and purification selections, and the selective restriction on nonsynonymous substitutions is stronger than synonymous substitutions 87,88 . In short, positive selection in some genes likely enriches the Lilium variety and adaptability. www.nature.com/scientificreports/ At the whole cp protein scale, Lilium species were subjected to a purifying selection. Sunlight UV radiation damages and rearranges DNA 89,90 , and higher temperatures speed up metabolism 91 , all contributing to an increase the mutation rates. Consequently, purifying selection, as one of the most common types of natural selection, constantly helps in the elimination of disadvantageous mutations in populations. Purifying selection would thus be an evolutionary outcome of the preservation of Lilium species adaptive habits.
The richer the taxon sampling, the more accurate it is to comment on the Comber's classification. The present study clustered 47 species of Lilium. To date, some of them have not been evaluated at the whole cp genome level. Lilium species were distributed into 11 clades divided into two main groups. The position of the species in the present topology is consistent with the classification by Kim et al. 12 .
Our samples did not support the monophyly of Comber's sections 11 . According to our results, L. martagon was placed farther away from the Martagon species and was closer to L. leichtlinii, L. bulbiferum, and L. davidii, which agrees with Gong et al. 16 based on nrITS. We observed L. brownii of Comber's Archelirion far from the www.nature.com/scientificreports/ other two species of Archelirion) L. japonicum and L. speciosum( and close to the two species of Leucolirion (L. formosanum and L. longiflorum). Similarly, Li et al. 9 classified L. brownii alongside L. formosanum and L. longiflorum in a genus-level study of the Liliaceae family's evolution. As our results warn, and with the help of ITS-dataset classification 92,93 , now with the approval of cp genome-based classification, L. brownii can be moved from the Archelirion to the Leucolirion. Based on the morphology, L. duchartrei, L. lankongense, L. nanum, and L. rosthornii belong to the Sinomartagon section 11 . However, based on our cp genome-scale topology, these species are further away from the Sinomartagons and are closer to the Lophophorum species. Studies have shown that L. nanum and Lophophorum have similar karyotypes 94 . Additionally, according to the ITS regions, Du et al. 92 reported that L. duchartrei and L. lankongense are in the same clade and are closer to the Lophophorum. We accommodated L. xanthellum on Clade X, away from Leucolirion. According to ultrametric chronograms, L. xanthellum is closer to L. lophophorum and L. matangense 95 . Totally, the composition of clade X in our phylogeny (three species of Lophophorum including L. fargesii, L. lophophorum, and L. matangense, along with L. nanum of Sinomartagon and L. xanthellum of Leucolirion) is in agreement with Du et al. 92 based on ITS regions.
Although, the monophyletic of the three Martagon species (L. hansonii, L. tsingtauense, and L. distichum) were rejected 12,92 prior to our study, so far, the position of L. distichum has not been clear enough due to restriction of the sampling of Lophophorum and Nomocharis. Based on our topology tree and with the help of a richer sampling, L. distichum is further away from its companions in previous studies. Our phylogenetic tree shows L. distichum close to two species of Archelirion (L. japonicum and L. speciosum). Moreover, in terms of flower morphology, Comber's Martagon members have differences from each other. The flowers of L. distichum are outfacing, whereas those of L. tsingtauense and L. martagon are upright and nodding, respectively 19 .
As shown in the results (Fig. 7), L. henricii and L. souliei, two species of Comber's Sinomartagon, are placed next to Nomocharis-Lilium (L. gongshanense, L. meleagrinum, and L. pardanthinum). Gao et al. 96 based on biogeographic results, showed that L. henricii is associated with Nomocharis species. What is more, Gao et al. 97 by examining 38 Lilium species and 7 Nomocharis species using the ITS dataset, showed L. souliei inside the Nomocharis clade.
Based on our CCG topology, L. amoenum of Lophophorum was sister to L. bakerianum, a member of clade V. Zhou et al. 98 , based on fluorescence in situ hybridization, showed the signal pattern of 35S rDNA in L. amoenum was the same as in L. bakerianum. Moreover, these researchers, by mapping the chromosome pattern for 35S rDNA based on ITS data, showed that these two species are monophyletic.
The phylogenetic position of L. ledebourii was very ambiguous due to the scarcity of molecular information. To date, two studies have attempted this. Kim and Kim 8 involved L. ledebourii (with 15 other species) in building the phylogenetic tree. However, this research does not provide a clear picture of the position of this species, due to sampling restriction and the use of only four chloroplast genes (rbcL, matK, ndhF, and atpB), which according to the genome-scale results, are not among the most divergent hotspots. Ghanbari et al. 99 , based on the ITS marker, have examined the position of this species and shown that L. ledebourii (Damash sample) is far from the L. candidum.
Following the resolution of L. ledebourii, as one of the study's objectives, and according to Kim et al. 12 , who reported that L. candidum position remained uncertain, interestingly here, according to cp genome-scale comparisons, L. ledebourii and L. candidum were monophyletic. L. ledebourii is a rare species that has only been seen in Iran and Azerbaijan 20 . Furthermore, L. candidum is thought to have originated in Persia and Syria 100 . In total, our findings indicate that a whole cp genome phylogenomic comparison would resolve much controversy and pave the way for Lilium phylogeny, especially for L. ledebourii.

Conclusions
The whole chloroplast genome of L. ledebourii is reported for the first time. The current study using whole chloroplast genomes of Lilium revealed structural characteristics, sequence diversity, and enhanced links between species. Meanwhile, certain variation hotspots identified as high variability regions could function as particular DNA barcodes. We provide a comprehensive analysis of selection pressure, and in the whole cp protein scale, Lilium species were subjected to a purifying selection. This study covered the restriction of sampling of Lophophorum and Nomocharis as much as possible. For the first time, L. ledebourii participated in the classification of genus Lilium, and its position was determined. The position of some species, e.g., L. distichum, became clearer than before. It is suggested that L. brownii can migrate from the Archelirion to the Leucolirion. We believe that the Lilium species have been classified with more excellent resolution than in earlier studies, which will be helpful in the understanding of the evolution of Lilium species. The genetic resources provided here will aid future studies in species identification, population genetics, and Lilium conservation.

Materials and methods
Sample collection and DNA extraction. Fresh leaves of L. ledebourii were sampled from Damash village and frozen in liquid nitrogen. The leaf samples were gathered in compliance with national and international legislation and guidelines. It was certified by the herbarium of the Faculty of Agricultural Science and Engineering, University of Tehran, and a validated voucher specimen was deposited at the Department of Horticultural Science, with voucher specimen number 6594. The total genomic DNA was isolated from leaves utilizing a DNeasy plant DNA extraction kit (Qiagen, USA) and the manufacturer's guidelines. DNA integrity was evaluated applying 1% agarose gel, and DNA quantification was assessed employing a NanoDrop spectrophotometer. Extracted DNA was stored at − 80 °C. To extract potential chloroplast sequences, the PacBio data were mapped to the reference L. hansonii cp genome (KM103364) data using BLASR 101 . Error correction was performed on SMRT reads by sprai pipeline 102 . The corrected reads were assembled employing Perl-based pipeline 103 . Furthermore, overlapping ends were checked by "check_circularity.pl" script.
Chloroplast genome annotation. GeSeq 104 , with NCBI RefSeq for Lilium as the reference dataset, was utilized for the annotation of protein-coding, ribosome RNA (rRNA), and transfer RNA (tRNA) genes. Using BLAST against cp genes, sequence coordinates of all annotated genes were checked and manually edited. The tRNAscan-SE version 2.0 was used to double-check the tRNA genes 105 . A circular physical map of the chloroplast genome was illustrated using OrganellarGenomeDRAW (OGDRAW) toolkit 106 and Chloe (https:// github. com/ ian-small/ chloe).
Chloroplast genome comparison. In order to discover the Lilium divergence regions, the distance among adjoining genes and junctions of L. ledebourii SSC, LSC, and IRs regions, were compared to the other Lily cp genomes species. For each particularized codon, the ratio of usage frequency was obtained as the Relative Synonymous Codon Usage (RSCU) value using DAMBE V6 for 48 Lilium species 110 . We mapped the results of codon preference via the R program. To evaluate the degree of codons bias, the CodonW V1.4.4 (http:// codonw. sourc eforge. net) calculated the values of the codon bias index (CBI), the codon adaptation index (CAI), the frequency of optimal codons (Fop), the effective number of codons (NC), and the GC content of synonymous third codon positions (GC3s) 111 . To discover mutation hotspot sites, the nucleotide diversity in the Lilium chloroplast genomes was quantified employing sliding window analysis via DnaSP 6 software 112 . The window length and step size were fixed at 600 bp and 200 bp, respectively.
Selection pressure on Lilium cp genomes. We extracted the CDS sequences of the protein-coding genes from all 48 species, and flushed out those with lacking data in at least one species, which resulted in 78 CDS matrices. MAFFT v7 113 was used to generate CDS alignments. DnaSP 6 112 was used to compute the rates of nonsynonymous (Ka) and synonymous (Ks) nucleotide substitution. The selective pressure was measured using the Ka/Ks ratio, with Ka/Ks < 1, Ka/Ks = 1, and Ka/Ks > 1, indicating purifying, neutral, and positive selection, respectively 67 .
Phylogenetic analysis. The MAFFT v7 program was employed to align the complete cp genome sequence of 47 Lilium species 113 . Two complete sequences of Fritillaria hupehensis and Fritillaria cirrhosa were applied as outgroups. Utilizing the effective nucleotide substitution model (GTR + G), the maximum likelihood was carried out with RAxML v8.2.11 114 with 1000 bootstrap repetitions to construct the phylogenetic tree. Finally, the iTOL tool was employed for visualizing the coming about phylogenetic tree 115 . Furthermore, phylogenetic analyses were also carried out for the single-copy protein coding genes (CDSs) shared by all 47 species.