Complete chloroplast genome of Lens lamottei reveals intraspecies variation among with Lens culinaris

Kurt, Selda; Kaymaz, Yasin; Ateş, Duygu; Tanyolaç, Muhammed Bahattin

doi:10.1038/s41598-023-41287-y

Download PDF

Article
Open access
Published: 11 September 2023

Complete chloroplast genome of Lens lamottei reveals intraspecies variation among with Lens culinaris

Selda Kurt¹,
Yasin Kaymaz¹,
Duygu Ateş¹ &
…
Muhammed Bahattin Tanyolaç¹

Scientific Reports volume 13, Article number: 14959 (2023) Cite this article

603 Accesses
Metrics details

Subjects

Abstract

Lens lamottei is a member of the Fabaceae family and the second gene pool of the genus Lens. The environmental factors that drove the divergence among wild and cultivated species have been studied extensively. Recent research has focused on genomic signatures associated with various phenotypes with the acceleration of next-generation techniques in molecular profiling. Therefore, in this study, we provide the complete sequence of the chloroplast genome sequence in the wild Lens species L. lamottei with a deep coverage of 713 × next-generation sequencing (NGS) data for the first time. Compared to the cultivated species, Lens culinaris, we identified synonymous, and nonsynonymous changes in the protein-coding regions of the genes ndhB, ndhF, ndhH, petA, rpoA, rpoC2, rps3, and ycf2 in L. lamottei. Phylogenetic analysis of chloroplast genomes of various plants under Leguminosae revealed that L. lamottei and L. culinaris are closest to one another than to other species. The complete chloroplast genome of L. lamottei also allowed us to reanalyze previously published transcriptomic data, which showed high levels of gene expression for ATP-synthase, rubisco, and photosystem genes. Overall, this study provides a deeper insight into the diversity of Lens species and the agricultural importance of these plants through their chloroplast genomes.

Complete chloroplast genome sequence of Lens ervoides and comparison to Lens culinaris

Article Open access 05 September 2022

Characterization of the chloroplast genome of Gleditsia species and comparative analysis

Article Open access 21 February 2024

Chloroplast genomes in Populus (Salicaceae): comparisons from an intensively sampled genus reveal dynamic patterns of evolution

Article Open access 04 May 2021

Introduction

Lentils are nutritionally important protein-rich crops that provide essential and nonessential amino acids¹, carbohydrates, lipids, dietary fiber, minerals, and water-soluble vitamins^2,3. Lentils are self-pollinating legumes belonging to Fabaceae (Leguminosae), the third-largest plant family⁴. This family includes plants with economic value⁵. The genus Lens consists of seven species belonging to four gene pools. Lens culinaris is the cultivated form, while Lens lamottei, Lens orientalis, Lens tomentosus, Lens ervoides, Lens odemensis, and Lens nigricans are wild species⁶. Their natural habitats are distributed in the Mediterranean region and Central Asia⁷.

Lens lamottei belongs to the second gene pool of Lens and is a member of the Papillionodeae subfamily³. Czefronova and colleagues were the first to describe L. lamottei, which originated from Spain, Italy, and Turkey⁸. L. lamottei has a small seed, at approximately 1.92 mm, with a round shape⁹. Its stipules are horizontal, and the leaves are less dentate⁹. L. lamottei has one of the highest average harvest indices, the highest number of seeds per plant, the highest seed yield, and the highest biomass¹⁰. It is highly resistant to various biotic stressors, such as Stremphylium blight, Fusarium wilt, and Bruchid weevils. It ranks second among wild species in terms of resistance to anthracnose and Ascochyta blight¹¹. It also has enhanced tolerance to drought, salinity, and alkalinity⁹.

The chloroplast (cp) is a crucial organelle that plays a major role in photosynthesis, starch production, and metabolism of amino acids, fatty acids, pigments, vitamins, sulfur, and nitrogen^12,13. It harnesses energy that can be supplied to the plant cell¹⁴. Cp-based studies are vital in plant biotechnology, especially in plant breeding studies¹⁵. Cp genomes can be used to construct phylogenetic trees to elucidate distinctive plant features, such as maternal inheritance, high copy numbers, and self-replication ability^16,17. It provides valuable information about the structural variations and exposes the molecular divergence of related species¹⁸.

The cp has an independent genome that is around 120–170 kb in size and consists of 120–130 genes mainly involved in photosynthesis¹⁹. The recently published cp genome of L. ervoides, one of the close relatives of L. lamottei, is also approximately 120 kb²⁰. Cp genome size is more conversed across species as opposed to the mitochondrial genome²¹. It is stable and highly conserved, with the mutation rate of the cp genome lower than the nuclear genome^13,22. The cp genome has a quadripartite structure with a small single copy region (SSC), a large single copy region (LSC), and two inverted repeat regions (IRs)²². Species carrying a single copy IR region in their cp genome structure are classified under the Inverted Repeat Lacking Clade (IRLC)²³, which includes Cicer arietinum, Pisum sativum, Vicia faba, L. culinaris, and L. ervoides^20,24. The IRLC is a useful model system for examining how the evolution of the plastome is impacted by the loss of a single IR region²⁵.

Divergence patterns, substitution rates, and phylogenetic relations across land plants depend heavily on the comparative analyses of their genomic sequences^26,27. Expansion or contraction of simple sequence repeats (SSRs) can be potential DNA markers for identification of species¹⁸. Genomic data is also utilized in studying gene expression and regulation. Recently, comparisons of transcriptomic profiles have been reported to determine differential gene expression²⁸. Likewise, RNA editing sites and codon usage bias have been studied to identify the role of cp genes and these gene’s regulatory mechanisms^29,30.

The lack of a cp genome sequence for L. lamottei limits its phylogenetic examination across related species and its potential to associate with wild-type features that can be agriculturally beneficial. In the current study, we report the cp sequence of L. lamottei and compare it with L. culinaris.

Results

Genomic features and organization of L. lamottei cp DNA

The BGISEQ-500 platform was used to sequence the cp genome of L. lamottei. The de novo cp genome of L. lamottei was assembled using the GetOrganelle pipeline via two different assembly paths. To examine genome rearrangement, the Mauve aligner was used to align L. culinaris and L. lamottei’s cp genome. The alignment calculated the locally collinear blocks (LCB), which represent highly similar, or conserved regions. When these assemblies were compared, certain regions of assembly path 1 were detected to be inverted without any missing parts in reference to L. culinaris (Fig. 1a). On the other hand, assembly path 2 was performed by carrying the swapped blocks on the same strand. After manually reorienting the major collinear blocks for both of these assembly options, we found that the L. lamottei genome has the same structures and gene content as the cp genome of L. culinaris (Fig. 1b).

As a result of our de novo genome construction analysis, the cp genome of L. lamottei was assembled with a size of 122,855 bp and includes the LSC, SSC, and one IR region. Therefore, L. lamottei can be classified under the IRLC. Cross-species plastome comparison was done to identify genic regions, including protein-coding, tRNA, and rRNA loci. Figure 2 shows the structural organization of both Lens cp genomes with the same orientation of the genic regions. The L. lamottei and L. culinaris cp genomes were similar in genic content and structural organization. However, due to ambiguities in the annotation approaches, a few genes detected in L. lamottei were not found in L. culinaris. These genes are pafII(ycf4), rps18, and rpl22. Another difference is that the pafI gene was annotated as a protein-coding gene in L. lamottei but was defined as a pseudogene in the annotation of L. culinaris. This is because the annotation of the L. culinaris genome was performed using a different tool (DOGMA). When the cp genome of L. culinaris was reannotated using the GeSeq tool, the two cp genomes contained the same set of genes. L. lamottei and L. culinaris have 108 and 105 genes, respectively, that make up the cp genome, including 77, and 73 protein-coding genes, respectively. They have the same number of tRNA (27) and rRNA (4) genes (Supplementary Table S1). These genes have various functions in the cp, including photosynthesis, transcription, and translation (Supplementary Table S2).

Introns play a crucial role in regulating gene expression. Overall, there are 17 intron-containing genes. Among these, six are tRNA genes (trnK-UAA, trnF-AAA, trnL-UAA, trnI-GAU, trnE-UUC, and trnA-UGC), and ten are protein-coding genes (rpoC1, atpF, clpP1, petB, petD, rpl16, rpl2, ndhB, rps12, and ndhA). PafI (ycf3) is the only gene containing two introns. The longest intron (2423 bp) was found in the trnK-UUU gene, which also contained the matK gene. The smallest intron (39 bp) was found in the trnA-UGC gene.

Comparative genome analysis

To study the degree of sequence divergence between the cp genomes of L. lamottei and L. culinaris mVISTA was used to visualize the overall sequence identities of the two cp genomes (Fig. 3). Comparative analyses showed a high sequence similarity with no discernible differences between the wild and cultivated species. We observed similar gene orders and organizations between the two species and highly conserved protein-coding genes with limited substitutions.

The hotspot regions in the cp genomes of L. lamottei and L. culinaris were determined by comparing the nucleotide variation in the coding and noncoding regions. Compared to protein-coding regions, noncoding regions showed higher levels of divergence. Genomic hotspots were detected. High nucleotide variation was found in the intergenic region between the trnQ-accD genes with a dissimilarity of more than 1.0%. We identified a genomic hotspot in an intergenic region between 50,800 and 52,000 bp (Fig. 4).

Simple sequence repeats analysis

Repetitive region were summarized for both lens species in Fig. 5. SSRs, commonly referred to as microsatellite repeats, are shorter tandem repeats ranging from 1 to 6 bp in length and are found throughout the cp genome³¹. A total of 67 and 68 SSRs, respectively, were found in L. lamottei and L. culinaris with MISA (MIcroSAtellite Identification Tool). Of these, there were 49 and 47 mononucleotides, 6 dinucleotides in both, 2 and 4 trinucleotides, 6 and 5 tetranucleotides, and 4, and 5 pentanucleotides, respectively (Fig. 5a). One hexanucleotide repeat motif was detected in L. culinaris but not in L. lamottei. The most common type of SSR was the A/T mononucleotide repeat motif (Fig. 5b). There were differences between the two species in terms of trinucleotide, pentanucleotide, and hexanucleotide motifs. ATC/ATG, AAAAT/ATTTT, and AGATAT/ATATCT 2,1,1 repeat motifs were detected in L. culinaris, respectively. However, these motifs were not found in L. lamottei (Fig. 5b). Repetitions can be classified under four types: forward, reverse, palindromic, and complementary. Using REPuter, we found 50 forward repeats in both L. lamottei and L. culinaris cp genomes. There were 15 and 16 palindromic repeats detected in L. lamottei and L. culinaris, respectively (Fig. 5c). One reverse repeat was found in L. lamottei but none in L. culinaris. No complementary repeats were found in either species (Fig. 5c). In addition, 31 tandem repeats were found in L. lamottei and 36 in L. culinaris.

Using L. culinaris (NC_027152) as a reference sequence, synonymous and non synonymous variations were compared and evaluated to further investigate the divergence of nucleotides. Variations in protein-coding and non-coding regions in the cp genome were examined. Synonymous and nonsynonymous nucleotide changes in protein-coding regions were detected in the L. lamottei genome (Table 1). Out of 73 protein-coding genes, at least one variation was detected in 18 protein-coding genes. Of these, 8 protein-coding genes (ndhB, ndhF, ndhH, petA, rpoA, rpoC2, rps3, and ycf2) were found to have nonsynonymous changes, i.e., variation resulting in an amino acid replacement. In particular, more variations (L429F, M1290R, H1324D, F1436L, and I1751M) were identified in the ycf2 gene region compared to other genes. In addition, the only synonymous changes occurred in the other 10 genes in the cp genome. In addition, 32 noncoding regions were found (Supplementary Table S3). These genes consist of rRNA and tRNA genes. Only two single nucleotide variations were detected in the rrn23 gene.

Table 1 Synonymous and nonsynonymous variations in protein-coding genes of L. lamottei cp genome.

Full size table

RNA editing is a post-transcriptional modification that results in base conversion. The PREP-Cp (Predictive RNA Editor for plants) program was used to identify putative RNA editing sites in L. lamottei. We found 41 predicted RNA editing sites in 17 genes (Supplementary Table S4). The most frequent editing site predicted the change of serine to leucine (%29.3). From nucleotide base C to T, there were all replacement sites. The ndhB (7) gene has the highest number of editing sites. At the second codon position, the editing event was most prevalent. In addition, RNA editing sites of L. culinaris which were cultivated species of L. lamottei were also predicted (Supplementary Table S5), with 41 RNA editing sites in 17 genes and the ndhB (7) gene with the highest abundance of sites. The most frequent editing site in L. culinaris was also the change of serine to leucine (29.3%).

Codon preference and expression levels of protein-coding genes

Synonymous codons are multiple codons that encode one amino acid, and synonymous codons typically exhibit varied usage preferences as a result of selection pressure during plant evolution. All 20 amino acid and stop codons in the protein-coding genes of the L. lamottei’s cp genome were evaluated for codon content and relative synonymous codon usage (RSCU) values (Supplementary Fig. S1; Supplementary Table S6). A total of 19,509 codons was detected in the cp genome of L. lamottei. Looking at the distribution of these codons in amino acids, leucine was the most abundant, while cysteine was the least abundant. Among those amino acids prevalent ATT, which encodes isoleucine (Ile), was the most abundant codon in L. lamottei, with 910 TGC, which encodes cysteine (Cys), was the least frequently used codon with 48. Most amino acids show codon biases, while methionine (ATG) and tryptophan (TGG) were expressed by only one codon and have no codon bias. TTA (leucine) had the highest RSCU value, whereas CTC (leucine) had the lowest.

Prediction of gene expression based on codon usage requires defining a set of reference genes, which includes all genes encoding ribosomal proteins. Higher MELP (MILC-based Expression Level Predictor) values indicate higher gene expression levels and codon usage bias. The log₁₀(FPKM) (Fragments per Kilobase Million) value is expected to increase with the MELP value. Genes with MELP values higher than 1.0 (Fig. 6a) were psbN, psbI, psbE, psbK, psaI, and ndhC, which play roles in photosynthesis, and accD.

Using RNA-seq data from L. lamottei, we analyzed the expression of the 76 genes that code for proteins in the cp³². Reads were mapped to the L. lamottei cp genome we assembled in this study. Taking gene lengths into account, the numbers of reads corresponding to coding genes were computed and normalized. Gene expression values in FPKM were calculated for 76 protein-coding genes (Supplementary Table S7). We found that genes encoding ATP synthase showed the highest expression among all genes (Fig. 6b). This was followed by rubisco and photosystem genes.

Phylogenetic analysis

In this study, 14 species were included in the phylogenetic analysis to determine the relationships of L. lamottei with the other members of the Papilionoideae subfamily (Fig. 7). Arabidopsis thaliana was selected as an outgroup. Five species were chosen from Papilionoideae, which belongs to IRLC (L. culinaris, L. ervoides, C. ariethinum, Cicer echinospermum, Cicer bijigum), and two species from Papilionoideae (Cajanus cajan, Glycine max), two species from the Detarionoideae subfamily (Intsia bijuga, Crudia hamsiana), and two species from Cercidoideae subfamily (Cercis glabra, Tylosema esculentum). Phylogenetic analysis of the 13 cp genomes was performed to provide a better resolution with 1000 bootstraps. In the phylogenetic tree, L. lamottei formed a sister clade with L. culinaris, indicating that these two species are monophyletic and closely related.

Discussion

Lentils are richer in protein, carbohydrate, and dietary fiber content compared to other legumes. Therefore, they are consumed at higher rates worldwide due to their importance for human health^1,2,3. The cp is a highly conserved organelle, both structurally and genetically, and is involved in many biological functions, particularly in photosynthesis^16,17. Therefore, the cp genomes of various have been widely sequenced with the development of technologies^22,33. In comparison to the nuclear genome, the cp genome is inherited maternally and can conveniently be used for phylogenetic investigations due to its short size and conserved structure³⁴. We presented the first assembly and annotation of the cp genome of L. lamottei and compared it to the cultivated lentil species L. culinaris. By examining the genome rearrangements, we found highly similar regions between L. lamottei and L. culinaris. Two different haplotypes in L. lamottei have emerged. Two structural haplotypes of cp genomes that differ in the direction of single-copy regions have been identified in previously published studies³⁵. Flip-flop recombination is a plausible theory to explain the existence of structural heteroplasmy and has been reported previously^35,36. Gene loss, changes in the intergenic region, and expansion, or contraction of the IR region are all variables that affect genome size³⁷. The cp genome of L. lamottei (122.8 kb) of the Fabaceae family has a similar size to that of the close wild relative L. ervoides²⁰, the cultivated form L. culinaris, and other Fabaceae species such as O. arctobia³⁸. The configuration of cp genomes in the species of this family is similar, with sizes ranging from 107 to 218 kb³⁹.

The chloroplast genome of L. lamottei has an LSC, SSC region and a single copy IR region. In other plant cp genomes, a pair of IRs are separated by one LSC and one SSC²². Some species of the Fabaceae family, including L. culinaris contain a single copy of the IR region and belong to the IRLC^26,38. IRs stabilize the plastome structure and affect its size⁴⁰. However, expansion or contraction, gene loss, and genome rearrangement occurred in the IR regions of some species⁴¹. Therefore, the cp genomes of L. lamottei and L. culinaris are highly conserved and similar in genome organization.

The cp gene content was also highly conserved between L. lamottei and L. culinaris, except for some genes. Some genes annotated in L. lamottei could not be identified in the annotation of L. culinaris. This could be due to the limitations of DOGMA, the annotation tool used for the reference genome L. culinaris. When the cp genome was reannotated using the GeSeq tool, the genes on the cp genome matched with each other. The L. lamottei cp genome contains 107 genes, but some genes such as rps16 and infA were not identified. Many IRLC species plastomes contain regions with notable changes and rearrangements, such as the loss of introns from the rps12 and clpP genes and the lack of the rps16 gene⁴². The absence of these genes has been reported in other species^20,23. Introns serve a critical function in gene expression regulation⁴³. L. lamottei contains 17 genes that have an intron. The absence of one of the clpP1 introns in L. lamottei is consistent with the loss of the clpP1 intron in the genomes of highly conserved papilionoid IRLC members^20,44.

The sliding window and mVISTA analysis demonstrated high sequence similarity between L. lamottei and L. culinaris, implying a highly conserved evolutionary model. The intergenic region between the trnQ and accD genes has several nucleotide variations in the cp genome of L. lamottei. Furthermore, the highly divergent areas were mostly found in the noncoding rather than coding regions, as previously shown in the Fabaceae family^20,45. These hypervariable intergenic regions can serve as candidate regions for creating genetic markers⁴⁶. These hotspots as markers can be used in the discrimination of species in the Lens genus for phylogenetic and identification studies.

Repetitive regions in the genome are essential for genome rearrangements⁴⁷. SSRs can also serve as useful molecular markers for studying genetic diversity and evolutionary relationships in L. lamottei and similar species. Nucleotide and tandem repeats in cp genomes are associated with gene duplication, rearrangement, and expansion and can be particularly helpful markers for classifying populations or species^46,48. SSRs are another type of molecular marker that consists of 1–6 nucleotide repeat units and are commonly employed in population genetics³¹. For both species, mononucleotide SSR is the most prominent motif. Although the numbers of repeat motifs are similar, they differ from each other in some motifs. While L. culinaris has one hexanucleotide repeat motif, L. lamottei has none. Of these repeat motifs, mononucleotide repeats are the most common SSR motif discovered in the cp genome⁴⁹. Among these mononucleotides, the most common SSR motif found in the L. lamottei cp genome is the A/T motif. The common consensus is that plastid SSRs are mostly composed of A/T repeats, with G/C repeats considered as rare³⁷. Similar results were found in the wild chickpea species belonging to Fabaceae family and in the L. ervoides cp genome^20,50,51. SSRs have been identified in various organelle genomes and have great application potential for studies on the molecular evolution of plant population genetics and crop breeding⁵².

Understanding the mechanisms underlying molecular evolution critically depends on the estimation of synonymous and nonsynonymous nucleotide mutations⁵³. Eight protein-coding genes (ndhB, ndhF, ndhH, petA, rpoA, rpoC2, rps3, and ycf2) in the cp genome of L. lamottei had nonsynonymous changes. Nonsynonymous mutations are those in which base mutation results in an alteration of the amino acid in the encoded protein²¹. Ycf1 and matK genes have been determined as molecular markers due to the detection of nonsynonymous substitutions in these genes in other studies^53,54. However, in this study, while no substitution occurred in matK and ycf1 genes, several mutations were detected for the ycf2 gene (in the position of L429F, M1290R, H1324D, F1436L, I1751M). Numerous nonsynonymous changes were detected in the ycf2 gene for L. ervoides²⁰, a close relative of L. lamottei. Among noncoding genes, only two single nucleotide substitutions were found in the rrn23 gene. Therefore, markers that can be developed using the ycf2 and rrn23 genes may shed light on future studies.

RNA editing is a fundamental post-transcriptional process that commonly occurs in the cp protein-coding regions to restore the evolutionarily conserved amino acid sequence^24,55. The ndhB gene, which had seven RNA editing sites, contained the highest number of editing sites. The NDH protein complex serves an important function in photosynthesis. RNA editing may enhance photosynthetic efficiency and exhibit positive selection during evolution⁴¹. In 29.3% of the editing sites, serine aminoacids were converted to leucine. Many RNA editing sites can convert the encoded amino acid from polar to nonpolar in both species. In previous studies, unlike hydrophilic mutations, which can affect the secondary structure and function of proteins and improve their genetic information, hydrophobic mutations produce a more stable structure in proteins^24,55,66. Therefore, the presence of RNA editing sites in the cp genome may elucidate evolutionary mechanisms.

Codon usage significantly influences the evolution of the cp genome and can also be used to determine gene functions, gene expression, and levels of mRNA and proteins^21,56. The L. lamottei cp genome has 19,509 codons, with cysteine having the lowest codon usage and leucine having the highest. In Coleanthus subtilis, which also belongs to the legume family, leucine amino acid has the highest codon usage, while cysteine amino acid has the lowest codon usage⁵⁶. In L. lamottei, one codon each encodes the amino acids methionine and tryptophan. On the other hand, the remaining amino acids are encoded by two to six codons. Similar codon usage results are observed in wild lentil and chickpea species^20,50,51. Codon usage could be helpful for phylogenetic association research as codon use can influence the manner of gene mutation³¹.

For cp growth and photosynthesis, appropriate expression of cp genes is essential⁵⁷. Therefore, the expression levels of cp genes in L. lamottei were investigated in this study. MELP values of protein-coding genes were obtained using ribosomal genes as reference. In particular, the MELP values of genes involved in photosynthesis were high. In addition, FPKM values were calculated using RNA-seq data to examine the gene expression of cp genes. We found high gene expression levels, especially for the ATP-synthase, rubisco, and photosynthesis-related genes were classified according to their biological functions. Plants synthesize ATP during electron and proton transport. These processes include photosystem I, photosystem II, cytochrome b6/f complex, and ATP synthase⁵⁸. Also, the rbcL gene is another gene that is related to the photosystem⁵⁹. Therefore, the expression levels of these genes are expected to be higher. We found that the expression levels of these genes were not positively correlated with the MELP value, which could be due to the highly increased expression of ATP synthase.

To assess the phylogenetic relationships of L. lamottei, the complete cp genomes of 14 plant species were used to construct a phylogenetic tree using the Jukes-Cantor model. We used A. thaliana as an outgroup to confirm the phylogeny study. For the phylogenetic analysis, the representative species were selected from three main subfamilies of the Fabaceae family: the Papilionoideae, Detarionoideae, and Cercidoideae subfamilies.In this study, results show that L. lamottei was closely related to L. culinaris and L. ervoides. In a previously published study, L. ervoides provided the same branching as L. culinaris²⁰. Thus, these three Lens species are expected to be closer to each other than to other species. Therefore, it should present close branching with L. lamottei, and cicer species belonging to the IRLC family^24,50,51. Therefore, as a result of the phylogenetic tree, these two species are expected to be closer to each other than the other species. C. cajan and G. max are members of the Papilionoideae family^12,24. The other species are included in the Detarionoideae and Cercidoideae subfamilies^60,61,62,63. Therefore, these species are expected to be more phylogenetically distant from L. culinaris in Fig. 7. The strong bootstrap and posterior probability scores in this study suggested that the whole cp genomes could be beneficial in determining the phylogenetic locations and relations of the Papilionoideae tribe for L. lamottei.

Methods

All the methods were carried out in accordance with relevant local/national/international guidelines and regulations.

Organelle isolation and whole chloroplast genome sequencing of L. lamottei

In collaboration with Akdeniz University (Antalya), L. lamottei seeds were obtained and genotyped for use in this study. It was registered by the Department of Field Crops herbarium at Akdeniz University, and identified by Prof. Dr. Cengiz Toker. It was deposited with the voucher ID of L. lamottei-01. Rows of seeds were planted in February 2020 and collected from Antalya (Turkey)(36.89722° N, 30.71414° E). Plants were grown at an average temperature of 12 °C. Plants were harvested when their leaves reached the full green stage. For 72 h, leave samples were stored at 4 °C to prevent starch accumulation. 20 g of fresh leaves were harvested and extracted from the cp genome according to the cp DNA isolation method published by Shi et al.⁶⁴. DNA isolation purity was assessed using NanoDrop (Thermo Scientific) and agarose gel electrophoresis.

The cp genomic DNA of the plant was sequenced using the DNA nanoball technique at Beijing Genome Institute (BGI, Hong Kong, China). Briefly, the Agencourt AMPure XP-Medium kit was used. DNA fragments in the 200–400 bp range were chosen after random cpDNA fragmentation and purified the PCR products. With the help of a splint oligosequence, heat-denatured DNAs were converted into single-stranded circular DNA (ssCir DNA) for library preparation. These libraries were sequenced on the BGISEQ-500 platform using whole-genome sequencing. The ssCir DNA molecules were transformed into DNA nanoball (DNB). These DNBs were loaded into nanoarrays with a certain pattern using DNA nanochip technology. Paired-end 100-bp reads were obtained using combinatorial Probe-Anchor Synthesis (cPAS)⁶⁵.

The sequencing qualities were assessed using FastQC⁶⁶ utility. The de novo genome assembly tool GetOrganelle⁶⁷, which was specifically designed for plastid genomes, was used. This pipeline uses the assembly algorithm Spades⁶⁸ to construct contigs and scaffolds from raw sequencing reads. During the assembly process, the algorithm was fed with the L. culinaris (NC_027152.1) cp genome as a reference seed and the database of known plastid genomes ('embplant pt'). The constructed scaffolds were visually inspected using Bandage⁶⁹ software to consider possible structural rearrangements. To inspect the assembly quality and read support, raw sequencing reads were aligned back to scaffolds using bowtie2⁷⁰ short read alignment software.

Gene annotations

The cp genome of L. culinaris, which was retrieved from NCBI with accession NC_027152, was used as a reference. MAUVE alignment was performed to analyze gene homology between the two species (L. lamottei and L. culinaris). Gene order, rearrangements, and structural changes were also identified⁷¹. The GeSeq⁷² annotation tool was used to determine the locations of protein-coding, tRNA, and rRNA genes. The annotation of tRNA genes was confirmed using tRNAscan-SE v.2.0.7 and ARAGORN v1.2.38⁷². To visualize the cp genome sequence map of L. lamottei, Organelle Genomes DRAW (OGDRAW)⁷³ was used. The complete cp genome of L. lamottei was deposited in the European Nucleotide Archive (ENA) databases with the accession number ERS7635406.