The complete mitochondrial genomes of two sibling species of camellia weevils (Coleoptera: Curculionidae) and patterns of Curculionini speciation

Complete mitochondrial genomes contain large and diverse datasets for species delineation. To better understand the divergence of the two morphologically indistinguishable weevil species in Curculionini, we first sequenced and compared their complete mitochondrial genomes. The complete mitochondrial genomes of Curculio chinensis and Curculio sp. were 19,713 bp with an A + T content of 76.61% and 19,216 bp with an A + T content of 76.85%, respectively. All 37 of the typical mitochondrial genes were determined in both species. The 13 protein sequences of the two species shared high homology (about 90%) except for ATP8 (73.08%). The differences in secondary structure of ATP8 were the number of possible proteins and nucleic acid binding sites. There were 22 and 15 mismatched base-pairs in the tRNA secondary structures from C. chinensis and Curculio sp., respectively. Maximum Likelihood and Bayesian analyses indicated that Curculio sp. is a novel species closely related to C. chinensis. The divergence time estimation suggests that Cryptorhynchinae and Curculionini lines diverged in the Cenozoic Period, while C. chinensis and Curculio sp. diverged at 6.7079 (95% CI 5–13) Mya. This study demonstrates the utility of using complete mitochondrial gene sets for phylogenetic analysis and enhances our understanding of the genetic basis for the evolution of the Curculionini.


Results and Discussion
Features of the sequenced mitochondrial genomes. Coleoptera mitochondrial genomes have relatively simple structures and lack spacers and introns. With this tight gene arrangement, genetic rearrangement, inversions, and translocations occur infrequently during the mutation process 6,37 . The complete mitochondrial genome sequences of C. chinensis and the Curculio sp. were 19,713 bp and 19,216 long, respectively (GenBank accessions MG728094 and MG728095) ( Table S1). The genomes of both species contained all 37 typical animal mitochondrial genes, including 13 protein-coding genes, 22 tRNA genes, and two rRNA genes (Fig. S1). The A + T-rich region generated reliable sequence data in both species which was high (>75%) compared to that found in other mitochondrial genomes sequenced using NGS 15 . No gene rearrangement was observed in either species compared with the putative ancestral and sibling superfamily arrangement (Fig. S1). This is consistent with the lack of rearrangement found in all sequenced Curculionini species 8,11 .
The mitochondrial genome of C. chinensis has intergenic spacers with lengths ranging from 1 to 103 bp in 24 different locations. Seven pairs of genes overlap with each other, with overlap lengths ranging from 1 to 17 bp. Eight pairs of genes are directly adjacent to one another, which include the pairs rrnL-trnV and trnV-rrnS ( Table 1). The mitochondrial genome of Curculio sp. has intergenic spacer lengths ranging from 1 to 148 bp in 26 different locations. Eight pairs of genes overlap with each other, with overlap lengths ranging from 1 to 7 bp. Nine pairs of genes were directly adjacent one another, which included the pairs rrnL-trnV and trnV-rrnS (Table 2). In both species, the longest intergenic spacer was located between trnS2 and NAD1. The longest overlapping regions were located between trnF and NAD5 (Tables 1 and 2). The intergenic and overlapping regions of these two species were similar to the mitochondrial genomes of most other insects. Similarly, no gene rearrangement was found in either species compared to genomes from Coleoptera species that have experienced frequent gene rearrangement 23,28 . Base composition. AT-skew, GC-skew, A + T content, and AT and GC asymmetries, are often used to assess the nucleotide-compositional differences of mitochondrial genomes 38 . The mitochondrial genomes of C. chinensis and the Curculio sp. were biased in nucleotide composition ((A + T)% > (G + C)%) across the whole genome, although the numbers of PCGs (n = 13) and rRNA genes (n = 22) were consistent with the genomes from other insects 5,24 . The A + T content of the whole genome was 76.61% for C. chinensis (40.16% A, 36.45% T, 9.89% G and 13.50% C) (Tables 3), and 77.08% for Curculio sp. (40.84% A, 36.24% T, 9.14% G and 13.79% C) ( Table 4). The A + T content of all PCGs in C. chinensis ranged from 69.58% (COX1) to 83.38% (NAD6) ( Table 3), and in Curculio sp. ranged from 69.84% (COX1) to 84.33% (NAD6) ( Table 4). Most of the AT-skews of the two Curculio species were negative except COX2 and ATP8. Most of the GC-skews of both species were negative as well, indicating that the PCGs contained a higher percentage of T and C than A and G, as reported for most other insects 22,38 . Protein-coding genes, codon usage, and protein conformance rates. In the mitochondrial genomes of both C. chinensis and Curculio sp., nine of the 13 protein-coding genes were located on the majority strand (N-strand), while the other four protein-coding genes were located on the minority strand (J-strand) ( Tables 1 and 2). In the C. chinensis mitochondrial genome, the total length of protein-coding genes was 11,148 bp, accounting for 56.55% of the whole genome. The total length of the protein-coding genes of Curculio sp. was 11,121 bp, accounting for 57.87% of the whole genome (Tables 1 and 2). PCGs contained Leu residues in the highest abundance, followed by Ile, Phe and Met. The four amino acids had the highest use frequency (Fig. 1), similar to other insect mitochondrial genomes 36,39 . In the mitochondrial genomes of both species, all PCGs start with the conventional initiation codons (ATN) as seen in other insects 40 . In C. chinensis PCGs, only one gene (NAD1) used ATA, seven used ATT, and five used www.nature.com/scientificreports www.nature.com/scientificreports/ ATG. In contrast, in the PCGs of Curculio sp., only one gene (NAD1) used ATA, while five and seven PCGs started with ATT and ATG, respectively (Tables 1 and 2). In both the C. chinensis and Curculio sp. mitochondrial genome, nine PCGS used TAA as the stop codon, and the NAD1 and NAD3 genes used TAG, while the COX3 and NAD4 genes used an incomplete stop codon T (Tables 1 and 2). The usage of incomplete stop codons in PCGs is common in invertebrate mitochondrial genomes 5,41 .
We calculated the homologous consistency of the 13 protein sequences of the two species as one group. Except for the ATP8 sequences that exhibited a value of 73.08%, the rest of the sequences had values of about 90% ( Fig. 2A). Ratios of K a /K s values for each PCG in the two species showed that ATP8 had the largest ratio (0.3194) among all proteins ( Fig. 2A). Two genes, ATP6 and ATP8 are the core subunits of Complex V, which consists of F 0 and F 1 , and the two genes are directly involved in ATP synthesis [42][43][44] .
We further characterized the ATP8 proteins from both genomes and predicted their structures (Fig. 2B), since this was the most variable gene of the 13 PCGs. The ATP8 protein sequences in both species contained 52 amino acids. The C. chinensis ATP8 protein structure contained three possible protein binding sites and five possible nucleic acid binding sites (Fig. 2B). Across sites 9-29, there was a region that might produce a spiral structure (Fig. 2B). In the entire chain of the C. chinensis ATP8 protein, there were three disordered areas, two exposed regions, two buried regions, and one transmembrane helix region (Fig. 2B). The Curculio sp. ATP8 protein structure has four possible protein binding sites and four possible nucleic acid binding sites (Fig. 2B). Like the C. chinensis ATP8 protein structure, there was also a region that might produce a spiral structure across the sites comprising 9-29 (Fig. 2B). In the whole chain of the Curculio sp. ATP8 protein, there were two disordered www.nature.com/scientificreports www.nature.com/scientificreports/ areas, two exposed regions, three buried regions, and one transmembrane helix region (Fig. 2B). The SOPMA analysis of the ATP8 secondary structure revealed clear structural differentiation between the two species. The alpha helix represented 38.46% and 59.62% of the structures of C. chinensis and Curculio sp., respectively, while the extended strand regions were 13.08% and 11.54%, the beta turn regions were 9.62% and 1.92%, and the random coil accounted for 28.85% and 26.92%, respectively. Adaptive evolution of ATP synthase can occur among species living in different ecological niches 39,44,45 . Thus, we speculated that modifications in the sequence and conformation of ATP8 structures could affect the assembly and function of Complex V, and consequently modulate its ability to produce ATP in Curculio weevils. phylogenetic relationships and comparison of divergence times. In previous studies, mitochondrial sequence length variation was low, resulting in minimal alignment ambiguity that was not investigated further 23 . We calculated saturation plots for COX1, complete mtDNA genomes, and the PCGs before we used these to build a phylogenetic tree. The plots showed uncorrected pairwise divergences in transitions (s) and transversions (v) against divergences calculated with the GTR model, and none of the three genes had reached saturation (Fig. 3).
Transfer RNA and ribosomal RNA genes. Consistent with the results of the phylogenetic relationships and comparison of estimated divergence times, all tRNA anticodons of the sequenced mitochondrial genomes of C. chinensis and Curculio sp. were identical to other Curculionini species (Tables 1 and 2). Of the 22 total tRNA genes, 14 are located on the N-strand and eight are located on the J-strand. Individual tRNAs of C. chinensis (MG728094) and Curculio sp. (MG728095) ranged from 63 bp (trnH) to 71 bp (trnK) in length. Secondary structure models of the tRNA genes from the two mitochondrial genomes were predicted using the Mitos WebServer (http://mitos.bioinf.uni-leipzig.de/). All tRNA genes from C. chinensis and Curculio sp. mitochondrial genomes fold into a canonical clover-leaf structure (Fig. 5).   www.nature.com/scientificreports www.nature.com/scientificreports/ The dihydrouridine (DHU) arm of all the tRNAs was a large loop, instead of the conserved stem-and-loop structure, which is consistent with typical metazoan mitochondrial genomes 40 . While the amino acid acceptor stem was conserved across 7 bp in all tRNA genes, the anticodon loops exhibited differences. trnH and trnR were conserved across 8 bp, while the rest of the 20 tRNAs were conserved across 7 bp (Fig. 5). The DHU arms in the tRNAs from C. chinensis and Curculio sp. were 0 to 4 bp long. The AC arms were 4 to 5 bp long, and the TΨC arms varied in length from 3 to 5 bp. The variable loops ranged from 4 to 8 bp. We also compared the variation in stem regions of tRNA genes among five other Curculionini species (Fig. 5). Among the 22 tRNA genes, trnI was the most conserved, and lacked nucleotide variation in stem regions between C. chinensis and Curculio sp. The rest of the tRNAs exhibited between 1-10 site mutations. The trnF had the highest number of site mutations on stem regions (10 sites), followed by trnV (7 sites) (Fig. 5). Among the 22 tRNA genes, there was no nucleotide variation in the stem regions between Curculio davidi (KY053741) and Curculio davidi (NC034931) (Fig. 5). Curculio elephas (KY0872691) had the most nucleotide variation in stem regions compared with C. chinensis (Fig. 5). Base pairs other than canonical A-Us and C-Gs were occasionally used in C. chinensis and Curculio sp. tRNAs, which is based on predicted tRNA secondary structures (Fig. 5). We found 22 and 15 mismatched base pairs in the tRNAs from C. chinensis and Curculio sp., respectively. Among the 22 mismatched base pairs in C. chinensis, three were U-U pairs, located in the amino acid acceptor stems and anticodon arm stems. The others were A-C pairs located in the amino acid acceptor stem. Curculio sp. had four U-U pairs t located in the TΨC stems (Fig. 5).

Methods
Sample collection and dna extraction. Camellia weevil samples were collected from Tengchong County in Yunnan Province, China. The field collected samples were initially placed in 100% ethyl alcohol and stored at −80 °C prior to DNA extraction. Total genomic DNA was extracted separately from the whole body of individual samples using a DNeasy tissue kit (Qiagen, Hilden, Germany). Voucher DNA was deposited in the entomological collections of the Research Institute of Subtropical Forestry, Chinese Academy of Forestry.
Mitochondrial genome sequencing and assembly. The mitochondrial genome sequences were obtained by next-generation sequencing. Prior to library construction, the DNA was quantified by Qubit 3.0 (Invitrogen, Life technologies, Carlsbad, CA, USA) 5 . The library (Lib. Type: PE400; Lib. Insert Size: 400 bp) with two indexes was constructed using the Illumina TruSeq@ DNA PCR-Free HT Kit and sequenced by Shanghai Personal Biotechnology CO., Ltd (Shanghai, China) using Illumina Miseq with the strategy of 251 bp paired-ends by paired sequencing mode. Raw sequence reads were generated on the Illumina Miseq sequencing platform in FASTQ format, and read quality was evaluated using the FastQC software package (http://www.bioinformatics. babraham.ac.uk/projects/fastqc) 52 . Reads containing ambiguous nucleotides and reads with an average quality value lower than Q30 were excluded from further analyses. The high-quality second-generation sequencing data were assembled de novo to generate contig and scaffold sequences using the A5-miseq v.20150522 53 and SPAdes v.3.9.0 54 assembly pipelines. According to the sequencing depth extraction sequence of the splicing sequence, the high sequencing depth was blastn with the NT library in NCBI (BLAST v2.2.31+) and compared with the www.nature.com/scientificreports www.nature.com/scientificreports/ mitochondrial sequence of each splicing result. The mitochondrial splicing results were combined using the Mummer v.3.1 software to integrate splicing results. Linear analysis was used to determine the positions between contigs and fill gaps between contigs using the Pilon v.1.18 software package 55 . The results were then corrected to obtain the final mitochondrial genome sequences.
Mitochondrial genome annotation. The complete set of linear contigs was uploaded to the MITOS web page server (http://mitos2.bioinf.uni-leipzig.de/) for functional annotation 56 . The optional setting for 'Genetic Code' was selected as 05-verterbrate, and the remaining settings were set according to the default parameters. The circular mitogenomes of both samples were visualized using the Organellar Genome Draw web server tool   Comparative analysis of mitochondrial genomes. The mitochondrial genomes of five Curculionini species, including the two newly sequenced Curculio genomes, were compared. Gene arrangement, base composition, and PCG codon usage features were analyzed. We also analyzed base compositional differences based on the secondary structures of tRNA genes among the mitochondrial genomes of the five species. The AT-and GC-skew were calculated using the following formulas: AT-skew = (A% − T%)/(A% + T%) and GC-skew = (G% − C%)/ (G% + C%) 38 . Intergenic spacers and overlapping regions between genes were manually counted. The rate of protein conformance among the 13 PCGs was analyzed using DNAMAN. K a and K s substitution values were calculated using the DNaSP V5.10 58 . Multiple protein structure prediction web servers (https://www.predictprotein.org/ and http://www.prabi.fr/) 59 were used to predict the secondary structure of the ATP8 protein: The amino acid composition and coding sequence composition of the protein, combined with regional, screw, spiral transmembrane regions, and other irregular regions were analyzed. The protein loci of potential exposure areas and hidden areas were also predicted 60,61 . phylogenetic analyses. Substitution saturation of different genes was tested in DAMBE5 using the GTR substitution model as a reference 62,63 . The best model of evolution for all genes and protein sequences was the GTR + I + G model, as determined by the jModelTest software package 64 . For the phylogenetic analysis, 55 www.nature.com/scientificreports www.nature.com/scientificreports/ published mitochondrial genomes were downloaded from NCBI as references and used along with the two Curculio sequenced mitochondrial genomes (Table S1).
Phylogenetic analyses incorporated both the Bayesian inference method (BI) using the program MRBAYES version 3.152 65 . Maximum Likelihood (ML) methods used the CIPRES server RAxML online (www.phylo.org). We used a Bayesian framework based on the PCG data to estimate the divergence times of clades using the BEAST v.1.6.1 software package. The substitution model (GTR + I + G) was also used for these analyses, as determined to