Structural and phylogenetic implications of the complete mitochondrial genome of Ledra auditura

We sequenced and annotated the first complete mitochondrial genome (mitogenome) of Ledra auditura (Hemiptera: Cicadellidae: Ledrinae) and reconstructed phylogenetic relationships among 47 species (including 2 outgroup species) on the basis of 3 datasets using maximum likelihood (ML) and Bayesian inference (BI) analyses. The complete L. auditura mitogenome (length, 16,094 bp) comprises 37 genes [13 protein-coding genes (PCGs), 22 tRNAs, and 2 rRNAs], 1 control region, and 2 long non-coding regions. The first long non-coding region (length, 211 bp) is located between tRNA-I and tRNA-Q and the second region (length, 994 bp) between tRNA-S2 and ND1. All PCGs show ATN (Met/Ile) as their start codon and TAR as their stop codon. Except tRNA-S1 (AGN), which lacks the dihydrouridine arm, all tRNAs can fold into the typical cloverleaf secondary structure. The complete L. auditura mitogenome shows a base composition bias of 76.3% A + T (A = 29.9%, T = 46.4%, G = 13.3%, and C = 10.5%), negative AT skew of −0.22, and positive GC skew of 0.12. In ML and BI analyses, L. auditura was clustered with Evacanthus heimianus (Hemiptera: Cicadellidae: Evacanthinae) with strong branch support.

PCGs and codon usage. The total length of the 13 PCGs is 11,064 bp, and these encode 3,688 amino acids, accounting for 68.7% of the complete L. auditura mitogenome. All PCGs are initially encoded by ATN (Met/Ile). The start codon of 4 genes (ND2, COX2, COX3, and ND6) is ATT, that of 4 other genes (ATP8, ND3, ND4, and ND1) is ATA, that of 1 gene (ND5) is ATC, and that of the remaining 4 genes (COX1, ATP6, ND4L, and Cytb) is ATG. The stop codon of 9 PCGs is the typical TAA and that of 4 (COX1, ATP6, ND3, and Cytb) PCGs is TAG ( Table 3).
The base composition of the 13 PCGs is 74.6% A + T (A = 29.3%, T = 45.3%, G = 13.5%, and C = 11.8%), with a negative AT skew (−0.21) and weakly positive GC skew (0.07). The relative synonymous codon usage and codon usage of the 13 PCGs of the L. auditura mitogenome are presented in Fig. 2 (except the stop codons TAA and TAG). Within each codon, the third codon position terminating with A/T is more frequent than that with G/C, thereby resulting in the highest A + T content at the third codon position. The 4 most frequently used codons are Phe (TTT), Leu (TTA), Ile (ATT), and Met (ATA). In addition, codon usage exhibits a high A + T bias that plays a key role in the A + T bias of the entire mitogenome. The codon usage pattern of L. auditura is highly consistent with that of previously reported Cicadellidae species 8-17 . tRNAs and rRNAs. The L. auditura mitogenome comprises the 22 typical tRNAs, with lengths ranging from 61 (Ala, Arg, and Ser1) to 71 (Lys) bp ( Table 1). The total length of the 22 tRNAs is 1,408 bp, with 77.3% A + T content. All tRNAs can fold into the typical cloverleaf secondary structure except tRNA-S1, which lacks the dihydrouridine arm, as documented for other Hemiptera species 9,28,29 . The secondary structure of the 22 tRNAs is presented in Fig. 3.
The 16S and 12S rRNA genes in the Cicadellidae mitogenome are highly conserved in terms of their length and secondary structures [22][23][24][25] . In the L. auditura mitogenome, the 16S rRNA is located between tRNA-L2 and tRNA-V and is 1,160-bp long. The 12S rRNA gene, as identified based on the alignments with Evacanthus heimianus and Idioscopus clypealis 13 , is located between tRNA-V and the control region and is 721-bp long. In the present study, the hypothetical secondary structures of 2 rRNA genes were drawn using RNA Structure version 5.2 30 , predicted against the known rRNA secondary structures 25,31,32 . The secondary structure of 16S rRNA in the L. auditura mitogenome comprises 5 domains (domains I, II, IV, V, and VI; domain III is absent, as in other insects) and 43 helices (Fig. 4) and that of 12S rRNA comprises 3 domains (domains I, II and III) and 24 helices (Fig. 5).
Non-coding regions. Although large intergenic regions have been identified in some species, the mitogenomes of most insects are compact 33 . Usually the long non-coding region is located between 12S rRNA and tRNA-I, which is the control region. In the present study, 3 long non-coding regions (>50 bp) were detected in the L. auditura mitogenome. The first non-coding region (length, 211 bp) is located between tRNA-I and tRNA-Q. The second non-coding region (length, 993 bp) is a repeat region located between tRNA-S2 and ND1. It comprises 2 tandem repeats (Figs 1 and 6): the first repeat sequence is 105-bp long and is repeated 5 times, and the second is 117-bp long and repeated 4 times (Fig. 6). Finally, the third non-coding region, commonly referred to as the control region, is located between 12S rRNA and tRNA-I; it is 721-bp long, which is comparable to that reported in other sequenced leafhoppers, ranging from399 bp of N. cincticeps to 2477 bp of Parocerus laurifoliae. The region shows 91.1% A + T content, and it is the most variable region in the whole mitogenome, with a relatively low pairwise identity. The control region is usually much longer in species with repetitive sequences than in those without repeats. However, there was no association among each repeat unit, the regularity of the occurrence of repetitive sequences, and the significance in the control area, suggesting the need for further research using different methods to resolve this pattern in the future.  www.nature.com/scientificreports www.nature.com/scientificreports/ suggested that Delocephalinae leafhoppers constitute 1 clade, which has been recovered as the sister group to the other members of Cicadellidae [22][23][24][25][26][27] . In the present study, the relationships among the 3 clades was consistent with high support in all the trees [clade 1: Membracidae + Megophthalminae; clade 2: Coelidiinae + Iassinae; clade 3: Cicadellinae + (Typhlocybinae + {Evacanthinae + Ledrinae})]; this result is consistent with previously reported phylogenies using partial gene sequences and morphological features [34][35][36][37] , suggesting that Cicadellidae is paraphyletic with treehoppers, but Cicadellinae subfamilies, including Deltocephalinae, Megophthalminae, Idiocerinae, Typhlocybinae, Cicadellinae, and Coelidiinae are monophyletic, with strong branch support. Within Cicadellidae, the inferred relationship (Iassinae + Coelidiinae) + [Deltophalinae + (Megophthalminae + Idioce rinae)] + [Cicadellinae + (Typhlocybinae + {Evacanthinae + Ledrinae})] was supported with high moderated branch support in 4 phylogenetic trees (BI-PCGRNA, BI-PCG12RNA, ML-PCGRNA and ML-PCG12RNA) (Figs 7, 8, and S3-S5), but Idiocerinae was recovered as the sister clade to Cicadellinae + (Typhlocybinae + ( Evacanthinae + Ledrinae)) in BI-AA and (Membracidae + Megophthalminae) + (Coelidiinae + Iassinae) in ML-AA, with low branch support (Figs S1 and S2). Further sampling from different taxonomic units and additional mitogenomic data will provide a better understanding of the phylogenetic and evolutionary relationships among leafhoppers.

conclusions
In the present study, we successfully sequenced the first complete L. auditura mitogenome in Ledrinae. To the best of our knowledge, this is the first available mitogenome for a species within the subfamily Ledrinae. The mitogenome is 16,094-bp long, ranging between 15,131 bp of Ttocnadella arisana to 16,811 bp of Parocerus laurifoliae. Such variations in mitogenome length can be mainly attributed to difference in control region length 25 . Consistent with previous observations in Cicadellidae, the sequences of L. auditura mitogenome were highly conserved in terms of gene content, gene size, gene order, base composition, codon usage of PCGs, and RNA secondary structures. Furthermore, there exists a 993-bp-long repeat region between Cytb and ND1, which contains 2 tandem repeats (Figs 1, 6); the first repeat sequence is 105-bp long and repeated 5 times and the second is 117-bp long and repeated 4 times (Fig. 6). Interestingly, the repeat sequences are located within the control region, similar to that reported in previous studies 9, 25,28 . Moreover, we analyzed the mitogenomic features, base composition, codon usage, and phylogenetic relationships of L. auditura. In ML and BI analyses, 40 obvious clusters of leafhoppers were identified, consistent with previous phylogenetic findings based on mitogenome data. While Ledrinae was recovered as a paraphyletic group, it emerged as a sister clade to Tartessinae and Iassinae or Aphrodinae, although with low branch support, and its relationship with other clades remained poorly resolved, as revealed by the ML bootstrap analysis of the concatenated anchored hybrid enrichment nucleotide sequence data set in the study of predecessors 35 . There were also large variations in results obtained using different datasets; according to transcriptome analyses, Ledrinae was recovered as a monophyletic group with maximum bootstrap support using ML analyses, with relatively low support among Cicadellidae, and the placements of subfamilies relative to one another were not consistent 38 . Recently, partial mitogenome sequence data were sequenced in leafhoppers, particularly in small groups with few species. Thus, addition of taxa to our small group of mitogenome dataset may help improve the resolution of the still poorly understood relationships among leafhopper lineages. Therefore, the complete mitogenome reported in the present study may provide a basis for further genomic studies of Ledrinae and may be useful for future phylogenetic analyses of Cicadellidae.   Mitogenome sequencing and assembly. L. auditura mitogenome was sequenced using next-generation sequencing (Illumina HiSeq. 2500 and 2 GB raw data; Berry Genomics, Beijing, China), and 2 sequence fragments were reconfirmed via polymerase chain reaction (PCR) amplification using primers #2 and #3 (Table 3). We used 40 μL genomic DNA for next-generation sequencing and diluted the remaining genome with ddH 2 O to obtain a concentration of 100 μL for PCR amplification. Primers were designed based on the sequencing results obtained using Primer Premier 6.0 (Premier Biosoft, Palo Alto, CA, USA). PCR was performed using PCR MasterMix (Tiangen Biotech Co., Ltd., Beijing, China) according to the specification manual. The PCG cycling conditions included pre-denaturation at 94 °C for 3 min followed by 30 cycles of denaturation at 94 °C for 30 s, annealing at a suitable temperature for 30 s, elongation at 70 °C for 1 min, and additional elongation at 70 °C for 10 min at the end of all cycles. The annealing temperatures were adjusted according to the melting temperatures of different primers. Table 3 lists primers used in this study. Clean next-generation sequencing results were assembled using Geneious R9 27 based on the COX1 fragment (sequenced using primer #1; Table 3) of mitochondrial Sequence analysis and gene annotation. The assembled mitogenome was initially annotated using the MITOS web server with invertebrate genetic codes 39 and then analyzed using Geneious R9 27 and NCBI BLAST (https://blast.ncbi.nlm.nih.gov). The locations and secondary structures of 22 tRNAs were identified and predicted using tRNAscan-SE version 1.21 40 and ARWEN version 1.2 41 . Two rRNA genes were indetified based on the locations of adjacent tRNA genes and then compared with the rRNA genes of other Cicadellidae species. www.nature.com/scientificreports www.nature.com/scientificreports/  www.nature.com/scientificreports www.nature.com/scientificreports/ Next, the secondary structures of these rRNAs were predicted based on previously reported models 16,17,25 . DNASIS version 2.5 (Hitachi Engineering, Tokyo, Japan) and RNA Structure version 5.2 30 were used to predict helical elements present in variable regions. Strand asymmetry was calculated using the following formulas: AT  www.nature.com/scientificreports www.nature.com/scientificreports/ skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C) 42 . Furthermore, base composition and codon usage patterns of PCGs were analyzed using MEGA6 43 . Repeated sequences in the L. auditura mitogenome were identified using the Tandem Repeats Finder tool (http://tandem.bu.edu/trf/trf.html) 44 . The complete L. auditura mitogenome is deposited in GenBank under the accession number MK387845.
The alignments of individual genes were concatenated to generate 3 datasets including 13 PCGs and 2 rRNAs: (1) amino acid sequences of 13 PCGs (3,366 amino acids); (2) nucleotide sequences of 13 PCGs and 2 rRNAs (11,918 bp); (3) the first and second codons of 13 PCGs and 2 rRNAs (8,552 bp). ML phylogenetic trees were constructed using IQ-TREE v1.6.3 48 , with the best model for each partition selected under the corrected Akaike Information Criteria using PartitionFinder2 (Table S2) 49 , and evaluated using the ultrafast bootstrap approximation approach for 10,000 replicates. Furthermore, BI analysis was conducted using MrBayes 3.2.6 50 ; following the partition schemes suggested by PartitionFinder, all model parameters were set as unlinked across partitions. Two simultaneous runs with 4 independent Markov chains were performed for 50 million generations, sampling every 100 generations. After the average standard deviation of split frequencies fell below 0.01, the first 25% samples were discarded as burn-in and the remaining trees were used to generate a consensus tree and calculate the posterior probabilities.