The complete mitochondrial genome of Leucoptera coffeella (Lepidoptera: Lyonetiidae) and phylogenetic relationships within the Yponomeutoidea superfamily

The coffee leaf miner (Leucoptera coffeella) is one of the major pests of coffee crops in the neotropical regions, and causes major economic losses. Few molecular data are available to identify this pest and advances in the knowledge of the genome of L. coffeella will contribute to improving pest identification and also clarify taxonomy of this microlepidoptera. L. coffeella DNA was extracted and sequenced using PacBio HiFi technology. Here we report the complete L. coffeella circular mitochondrial genome (16,407 bp) assembled using Aladin software. We found a total of 37 genes, including 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), 2 ribosomal RNA genes (rRNAs) and an A + T rich-region and a D-loop. The L. coffeella mitochondrial gene organization is highly conserved with similarities to lepidopteran mitochondrial gene rearrangements (trnM-trnI-trnQ). We concatenated the 13 PCG to construct a phylogenetic tree and inferred the relationship between L. coffeella and other lepidopteran species. L. coffeella is found in the Lyonetiidae clade together with L. malifoliella and Lyonetia clerkella, both leaf miners. Interestingly, this clade is assigned in the Yponomeutoidea superfamily together with Gracillariidae, and both superfamilies displayed species with leaf-mining feeding habits.

Leucoptera coffeella (Guèrin-Meneville & Perrotet 1842) (Lepidoptera: Lyonetiidae) is a monophagous pest in coffee crops of Neotropical America where it causes important economic losses 1 .In Brazil, the world's largest coffee producer, the negative impact corresponds to more than 50% of the production costs.However, in cases of severe infestations, the damage can compromise up to 70% of the costs 2,3 .
The coffee leaf miner is a microlepidoptera that consumes the palisade parenchyma during the larval stages.The mines reduce the photosynthetic area and induce premature leaf senescence, leading to leaf abscission and, consequently, decreasing the coffee grain yields 4,5 .Despite of the extensive occurrence of this pest and its agronomic importance in coffee growing areas, only a few DNA markers are currently available to monitor the presence of L. coffeella in coffee plantations and to characterize its phylogeographic and phylogenetic origins 6 .
Mitochondrial genomes (mitogenomes) are extensively used in differentiation studies to infer phylogenetic relationships 7 and to develop species-specific molecular markers.However, data from families with large body size species are more abundant than small species, and species-rich tropical ecosystems are usually poorly

Mitochondrial genome organization and base composition
We assembled the complete mitochondrial genome of L. coffeella which consisted in a 16,407 bp circular DNA.Our genome contains 13 PCGs, two rRNA, 22 tRNA genes, and an A + T-rich region.Four of the 13 PCGs (NADH1, NADH4L, NADH4, and NADH5), 8 tRNAs (trnY, trnC, trnQ, trnV, trnL1, trnP, trnH and trnF) and the two rRNAs (rrnS and rrnL) are encoded by the minority-strand while the remaining 23 genes are encoded by the majority strand (Fig. 1 and Table 1).This strand specific genes organization of L. coffeella mitogenome is highly conserved with either the evolutionary closed L. malifoliella or the more distant Bombycoidea insects 10,22 .
The length of L. coffeella mitogenome (16,407 bp) is larger than L. malifoliella (15,646 bp) and in the range of lepidoptera mitogenomes chosen in our study (15,027-17,050 bp) (Table 2).The nucleotide composition of L. coffeella is as presented in Table 3 with 41.4%, 40.5%, 7.5% and 10.6% for A, T, G and C, respectively, and is A+Trich (81.9%) as often described for lepidopteran mitogenomes.The higher bias is in the control region of L. coffeella mitogenome (95.6% of A + T), then 84.7% in ribosomal RNAs, 80.9% in transfer RNAs and 79.2% in PCGs.
The lepidopterans have the second most biased nucleotide composition of the insect orders after Hymenoptera 9 .The AT skew is not significant with 0.011 and the GC skew is moderate with − 0.17 which indicates bias towards the use of As and Cs.www.nature.com/scientificreports/motif was found at the same position in another Yponomeutidae mitogenome 23 but also in Danaidae 24,25 and Coleoptera genomes 25,26 .It is also found in our mitogenome at the beginning of NADH4 and NADH4L genes as well as in the coding regions of 4 other genes (rrnL, rrnS, NADH2 and trnN).
Transfer RNA genes (tRNA) and Ribosomal RNA genes L. coffeella mitogenome contains 22 tRNAs, one for each of the 20 amino acids with an additional isotype for each of the two sixfold degenerate amino acids Leucine and Serine.Seven of these tRNAs (trnF, trnH, trnP, trnV, trnQ, trnC, trnY) are coded by the minority strand (Table 1) while the other fifteen tRNA genes are encoded by the majority strand.The total length of the 22 tRNA is 1467 bp with lengths ranging from 61 to 71 bp and their A-T content is 80.9%.The AT skew is slightly positive, 0.016 and the CG skew is positive, 0.131 (Table 3).
The clover-leaf structure is found for all the 22 tRNAs from L. coffeella with the exception of trnS1 and trnS2 (Fig. 3).The trnS1 is missing the dihydrouridine (DHU) arm replaced by an unstable loop while the trnS2 has an additional loop in the anticodon-stem and loop.These two tRNAs are found on the minority strand.Lack of the DHU stem-loop in trnS1 is nearly ubiquitous in insect mitochondrial genomes 9 .
The 16S RNA (rrnL) is located between tnrL1 and trnV and its length is 1345 bp whereas the 12S RNA (rrnS), located between trnV and the control region, is 761 bp.The A + T content of the two rRNAs genes is 84.7% (Table 3).

The A + T-rich region (control region)
The mitochondrial genome of L. coffeella contains a 1363 bp A + T-rich region or control region located between rrnS and trnM genes (Table 1).It is one of the largest A + T-rich region found in the Lepidoptera order, and with the highest A + T content, 95.6% (Table 3).This control region includes initiation sites for transcription and replication.
The structure of the A + T-rich region of L. coffeella is composed of five tandem repeats elements and a motif containing the origin of replication ' ATAGT' (Fig. 4).We found five repeats composed of a 57 bp (in blue) and a 159 bp (in yellow) sequences, and four (TA)n microsatellite regions (in red) (Fig. 4).Five Poly(T)7 and  www.nature.com/scientificreports/Poly(A)5 were also found but these repetitions are shorter compare to other lepidopteran's control regions 22,23 .
The A + T-rich region of L. coffeella mitochondrial genome presents differences compared to other Lyonetiidae, such as L. malifoliella mitogenome which has a shorter control region of 733 bp and the ATAGA motif (Fig. S1) 10 .We did not find the poly-T stretch downstream of the rrnS gene that is widely conserved in lepidopteran mitogenome.L. coffeella control region is also missing the poly-A stretch immediately upstream of the trnM gene, a feature commonly observed in other lepidopteran mitogenomes including L. malifoliella.Both L. malifoliella and L. coffeella have a stem-loop structure in the control region (Fig. S2).Such a feature seems to be intrinsic to the control region of Leucoptera species 10 .The presence of a stable stem-loop structure in the A + T-rich region of Leucoptera appears to be as important as the presence of a stretch poly-T microsatellite in other insects, which, unless associated with a recognition of the light stretch origin of replication, has not yet been fully explored 27 .
The lack of molecular data for other Leucoptera species limits this interpretation and reinforces the need to expand the sampling sizes and deepen our understanding of the replication and transcription origin of the mt genome of Leucoptera species.

Gene rearrangements
We compare L. coffeella gene order to insect and Lepidoptera ancestral gene rearrangements 9 in order to identify possible reorganization such as duplication, deletion, or inversion-translocation.We analyzed gene rearrangements using qMGR program 28 .
L. coffeella has exactly the same genes order than the two other Lyonetiidae mitogenomes available (Fig. 5).It is also identical to the gene order model proposed for ancestral Lepidoptera mitogenome which exhibits the trnM, trnI, trnQ (MIQ) common rearrangement with trnM on the minority strand.MIQ is found in most ditrysians between the A + T rich region and NADH2 9 .L. coffeella also shares the A-R-N-S1-E-F gene rearrangement of insect ancestor between NADH3 and NADH5.L. coffeella mitogenome gene arrangement is identical to the ancestral Lepidoptera mitogenome organization proposed by Cameron 9 and Moreno-Carmona et al. 29 .
All Yponomeutoidea's families form a monophyletic clade and are found in a clade with the Gracillariidae from Gracillarioidea.The Gracillarioidea, with Gracillariidae, Psychidae and Tineidae, forms a polyphyletic group as it was previously described in a phylogenetic study based on 794 lepidopteran mitogenomes 30 .Within Yponomeutoidea, Praydidae and Attevidae form a sister group (PrAt) with a high BS of 96.Plutellidae and Scythropiidae from a sister group (PlSc) with a BS of 80.We obtained better BS values for these four relationships compared to another phylogenetical study recently published 31 .The alignment of our sequences after concatenation using mafft might explain the improvement of nodal values (se Fig. S3).However, we did not improve the nodal support between PrAt and PlSc (54) and in this case more data are needed.Gracillariidae and Yponomeutoidea are present in the same clade with bootstrap value of 88.Leaf-mining feeding behaviors was characterized in Gracillariidae as the most phylogenetically conserved trait 32 .
A most recent study including 130 gracillariid species linked mining as an ancestral larval behavior of Gracilllariidae that has evolved several times 33 .All or part of the larval period as a leaf-miner might confer ecological advantage such as protection from natural enemies (predators, parasitoids, pathogens), from variation in their environment (UV radiation, hygrothermy) 34 .Another hypothesis is that leaf miners fed selectively on the most nutritious layers of foliage tissue avoiding plant defenses.L. coffeella is found in the same clade as L. malifoliella and L. clerkella, two other leaf miners of the Lyonetiidae family with bootstrap of 100 and 97 for nucleotides and amino acids phylogenies, respectively.The Lyonetiidae family includes leaf miners considered as agronomic pests, however, insufficient molecular data is limiting the phylogenetic inferences about this family.
Only the three Lyonetiidae species included in our phylogeny have been sequenced.We only observed the presence of leaf miners which diverged in their host-plant preferences.L. coffeella feeds exclusively on coffee plants 35 .
L. malifoliella is polyphagous 36 and L. clerkella feeds on Malus sp. and Prunus sp.species 37 .In Yponomeutoidea, the large proportion of representatives are oligophagous 38 .Further molecular studies of lepidopteran leaf miners are needed to better understand how this feeding habit innovation occurs in Lyonetiidae and Gracillariidae.The host plant range in these two families should also be further study to confirm the preference for shrubs and trees 34,38,39 .

Conclusions
Mitochondrial genomes sequences are increasingly used as informative molecular markers for systematics, phylogenetics, population genetics and evolutionary studies because of its conserved gene content, its small size, its fast rate of evolution, its minimal or absent sequence recombination, its maternal inheritance and its abundant markers types.Here we report the first complete mitochondrial genome of L. coffeella.It consists of a circular double stranded DNA of 16,407 bp containing the conserved trnM-trnI-trnQ gene rearrangement found in Lepidoptera ancestors 9 .
We found 22 tRNAs showing conserved clover leaf structure, except for the trnS1 and trnS2 coding for serine tRNAs.We also observed a codon usage bias, with high variability detected in the third position of codons.Regarding the most closely related mitogenome, the main difference between L. coffeella and L. malifoliella is in the control region size, with 1363 bp and 733bp, respectively.
Our phylogenetic study based on Maximum-likelihood estimation confirms the presence of L. coffeella and L. malifoliella in the Lyonetiidae clade and in the Yponomeutoidea superfamily with insects from Plutellidae and Praydidae.Our phylogeny points out that the leaf-mining habit was acquired several times through the evolution of Lepidoptera, as we can for instance found leaf-miners in both the Lyonetiidae family as well as in Gracillariidae clades.The acquisition of this innovation was followed by host-plant specialization with L. coffeella on coffee tree, L. clerkella on Malus and Prunus, and L. malifoliella being polyphagous.
Acquisition of more genomic data in this part of the tree is needed to confirm this hypothesis.We have acquired molecular data that can now be used to learn more about the history of L. coffeella introduction and its invasive route in the Neotropical Americas.How this insect has adapted to the conditions of coffee crops in these regions might help the development of Integrated Pest Management programs and the use of for instance parasitoids.

Ethics statement
Our study did not involve any endangered or protected species.No specific permits were required for the insect or plant specimen collection in this study.The collection and use of plant and insect materials in the study comply with relevant institutional, national, and international guidelines and legislation.

Genome assembly and annotation
We used two mitochondrial genes fragments from L. coffeella available in GenBank, COI (MF987402) and Cytb (MF987470) and the orthologous genes from L. malifoliella, COI (GU929715) and Cytb (NC_018547) to search the mitochondrial sequences present in the genome assembly (PRJNA832598).From the blastn search, we retrieved one contig containing two matches for each gene query containing COI and Cytb.To assemble the whole mitochondrial genome of L. coffeella, we used the raw reads used for the genome assembly and submitted the raw reads to Aladin v.3.0 software (https:// github.com/ GDKO/ aladin) using the L. malifoliella mitochondrial genome as seed 10 .

Bioinformatics analysis
The nucleotide base composition was determined using 'wordcount' program of the EMBOSS toolkit v. 6.6.0.0 42 , and the AT/CG skewness was calculated using the formula AT skew = 43 .The tRNA genes, their secondary structures, the gene overlapping, and intergenic spacers were predicted using MITOS2.
The tandem repeats in the control region were located using MEME suite (https:// meme-suite.org/ meme/) 44 .MEME version 5.5.3 was used to search for repeat motifs between 6 to 300 nt within the control region of L. coffeella or L. malifoliella.The comparison of L. coffeella mitochondrial genome gene order with the Lepidoptera ancestor's rearrangement (trnI-trnQ-trnM) 9 was performed using the program qMGR program 28 .The Relative Synonymous Codon Usage (RSCU) of PCGs was determined using MEGA11 45 .The representation of the mitogenome circular map was done with the web tool OGDRAW v.1.3.1 46 by MPI-MP CHLOROBOX.

Phylogenetic analysis
We aligned 18 mitochondrial Lepidopteran genomes of Yponomeutoidea, Tineoidea, Gracillarioidea; Gelechioidea, Geometroidea and Ephydroidea species (Table 2) to reconstruct the phylogenetic trees, with Drosophila melanogaster as the outgroup.The nucleotide sequences of the selected species were downloaded from NCBI database (https:// www.ncbi.nlm.nih.gov, June 2023).We emphasized in taxon sampling some Lepidopteran's leaf miner species with assembled and annotated complete mitochondrial genomes publicly available in GenBank.
Figure S3 is the bioinformatic pipeline of the Leucoptera coffeella mitochondrial genome obtention.The amino-acid and nucleotide alignments used for the phylogeny of Fig. 6 can be found at https:// doi.org/ 10. 57745/ NA3OZ2.

Figure 1 .
Figure 1.The circular mitochondrial genome of Leucoptera coffeella.The J-strand (+) is visualized on the outer circle, and the N-strand (−) on the inner circle.

Figure 4 . 5 .
Figure 4. Control region of Leucoptera coffeella mitochondrial genome.In red: 15 nt repeat, in blue: 57 nt repeat, in yellow: 159 nt repeat, in black: origin of replication.Poly-T and poly-A stretches are in bold and underlined.rrnS and trnM are the genes surrounding the control region.

Table 2 .
The mitochondrial genomes of Lepidoptera selected to reconstruct the phylogenetic trees.

Table 3 .
Base composition of genes and control region of the mitochondrial genome of Leucoptera coffeella.

Table 4 .
Frequency and RSCU values of relative synonymous codon usage in the 13 protein-coding genes of Leucoptera coffeella mitochondrial genome.A total of 3724 codons were analyzed, excluding the initiation and termination codons.