Complete mitochondrial genome sequencing of Oxycarenus laetus (Hemiptera: Lygaeidae) from two geographically distinct regions of India

Oxycarenus laetus is a seed-sap sucking pest affecting a variety of crops, including cotton plants. Rising incidence and pesticide resistance by O. laetus have been reported from India and neighbouring countries. In this study, O. laetus samples were collected from Bhatinda and Coimbatore (India). Pure mtDNA was isolated and sequenced using Illumina MiSeq. Both the samples were found to be identical species (99.9%), and the complete genome was circular (15,672 bp), consisting of 13 PCGs, 2 rRNA, 23 tRNA genes, and a 962 bp control region. The mitogenome is 74.1% AT-rich, 0.11 AT, and − 0.19 GC skewed. All the genes had ATN as the start codon except cox1 (TTG), and an additional trnT was predicted. Nearly all tRNAs folded into the clover-leaf structure, except trnS1 and trnV. The intergenic space between trnH and nad4, considered as a synapomorphy of Lygaeoidea, was displaced. Two 5 bp motifs AATGA and ACCTA, two tandem repeats, and a few microsatellite sequences, were also found. The phylogenetic tree was constructed using 36 mitogenomes from 7 super-families of Hemiptera by employing rigorous bootstrapping and ML. Ours is the first study to sequence the complete mitogenome of O. laetus or any Oxycarenus species. The findings from this study would further help in the evolutionary studies of Lygaeidae.


Materials and methods
Sample collection. Samples of O. laetus were collected from the post harvested cotton fields located in Punjab Agriculture University, BTI (Punjab), and Tamilnadu Agriculture University, CBE (Tamilnadu) in India, with the help of scientists (species confirmation) and staff. Cotton bolls carrying O. laetus were plucked and transferred to transparent plastic boxes with their mouths covered with muslin clothe and transported to the laboratory. They were reared until an ample population was raised to extract mtDNA. Insects were reared on cotton bolls and maintained at 16:8 h of light:dark photoperiod, 27 °C, and 55% humidity. All the experiments were performed on the F1 generation population. All the chemicals and reagents used in this study were of finite grade.
Extraction and validation of mitochondrial DNA. A mixed population of adult O. laetus weighing 200 mg from both locations was macerated in 10 ml of homogenization buffer (0.4 M sucrose, 50 mM Tris HCl pH 7.5, 50 mM EDTA, 1% BSA, 10 mM β-mercaptoethanol) 19 . The mixture was centrifuged at 3500 rpm for 5 min at 4 °C. The supernatant was collected and centrifuged twice to remove cell debris and nuclear DNA. The supernatant was then centrifuged at 14,000 rpm for 15 min to pellet the mitochondria and resuspended in DNase-RNase free water. Traces of nuclear DNA were removed by incubating the mixture in 2.5 U of DNase I and DNase buffer (Qiagen) for 30 min. The action of DNase was terminated by adding 30 μl of 0.5 M EDTA. The mixture was again centrifuged at 12,000 rpm for 5 min, and the pellet was resuspended in 500 µl lysis buffer (100 mM Tris pH 8, 10 mM EDTA, 2% SDS, 15 μl of 20 mg/ml proteinase K), and the mixture was incubated at 55 °C for 1 h.
Then, 140 μl of 5 M NaCl and 250 μl of 2% CTAB were added to the mixture and incubated at 62 °C for 25 min. RNA contamination was removed by adding 10 μl of 20 mg/mL RNase A and incubation at 37 °C for one hour. An equal volume of chloroform:isoamyl alcohol (24:1) was added to the mixture and centrifuged at 14,000 rpm for 15 min. The top aqueous phase was collected, and the step was repeated. The final aqueous phase was collected, and two volumes of isopropanol was added and incubated at 20 °C for one hour. The mtDNA was precipitated by centrifugation at 14,000 rpm for 15 min and washed with 70% ethanol. The air-dried pellet was then dissolved in 20-30 μl of TE buffer and quantitated using NanoDrop.
The purity of the isolated mitochondrial DNA was assessed by amplifying COI (mitochondrial genome-specific) and Histone 4 gene (nuclear genome-specific). Table 1 shows the primers used in this study. The following conditions were used to amplify the COI gene: initial denaturation at 94 °C for 5 min; 30 cycles of final denaturation at 94 °C for 1 min, annealing at 52 °C for 1 min, elongation at 72 °C for 2 min 30 s; and final elongation at 72 °C for 5 min. Conditions used for amplifying Histone 4 gene are as follows: initial denaturation at 94 °C for 3 min; followed by 30 cycles of final denaturation at 94 °C for 1 min, annealing at 52 °C for 1 min, elongation at 72 °C for 1 min; and final elongation for 5 min. PCR products were visualized on 1% agarose gel.  COI-forward  GGT CAA CAA ATC ATA AAG ATA TTG G   COI-reverse  TAA ACT TCA GGG TGA CCA AAA AAT CA   Histone4-forward  ATT TCC ACT CTG GTG GAT AAGC   Histone4-reverse  ACA CTT GGG CCT TTT AAC TTTG   tRNH-NADH5-gap-forward  TTT CCC AAT CAA CAA AAT AAAAA   tRNH-NADH5-gap-reverse  TTG GGT CAT TCT TTT TCA GG Sequence analysis, annotation, and visualization. MITOS-1 and MITOS-2 24 were used to annotate the assembled mitogenome. The PCGs were confirmed by using ORF Finder, and BlastN confirmed rRNA and tRNAs. The control region was validated by aligning it with the transcriptome of data generated in our previous study (SRX5329347) using standalone BLAST. MEGA 7.0 25 was used to determine nucleotide composition, codon usage, and relative synonymous codon usage (RSCU). Skewness in the mitogenome was calculated using the formula: AT-skew = (A − T)/(A + T); GC-skew = (G − C)/(G + C) 26 . Tandem repeats in the mitogenome were identified using Tandem Repeats Finder 27 . Secondary structures of tRNA were predicted using Mfold 28 . The CGView Server 29 was used to generate the circular map of the mitogenome.
Cross verification of the rearrangement of intergenic region. The rearrangement of the conserved intergenic region of Lygaeidae was verified by designing primers to target and amplify the specific spacer. The amplified product was sequenced using the Sanger method and aligned with the assembled mitogenome. Table 1 lists the primer pair used for verification. PCR conditions used were as follows: initial denaturation at 94 °C for 3 min; 30 cycles of denaturation at 94 °C for 1 min, annealing at 57 °C for 1 min, elongation at 72 °C for 1 min; and final elongation at 72 °C for 5 min (Fig. S1).
Phylogenetic analysis. The phylogenetic study was done using 35 other species from hemipteran superfamilies. These included Aphidoidea, Cimicoidea, Reduvioidea, Pentatomoidea, Coreoidea, Pyrrhocoroidea, and Lygaeoidea (Table 4). Similar methodology as was followed by [30][31][32] was employed with few changes as was needed for this study. The significant difference here was that we used whole mitogenomes instead of PCGs, intending to study the relationship among the selected data as a function of complete mitogenome. Both PhyML online server 33 and raxMLGUI 34 were used to construct the phylogenetic tree. The mitogenomes were aligned using MUSCLE, conserved regions were identified, and the unreliable regions in the data were removed using the Gblocks program. In the raxmlGUI, ML + bootstrapping analysis was done involving 1000 replications invoking the GTRGAMMAI substitution model. The resulting tree was visualized using TreeDyn. We also used the Bayesian Inference method to generate a phylogenetic tree in MrBayes v3.2.7 in order to validate the tree 35 . Markov chain Monte Carlo (MCMC) simulations were run involving 1,000,000 generations with sampling every 1000 generations and the process was continued until the average standard deviation of split frequencies < 0.01. The first 25% sample trees were discarded in the burn-in process and the remaining trees were used to form the consensus tree and posterior probability (PP) values.

Results
Genome organization and structure of O. laetus. The MiSeq data of BTI and CBE samples of O.
laetus was submitted to the NCBI SRA database under the accession numbers SRR11024516 and SRR11024517, respectively. The study was done to understand whether there exists a different species of the genus Oxycarenus similar to O. laetus, which might be the reason for higher incidence and resistance. The study results are such that the mitogenomes of both the samples were highly identical (99.9%), confirming both to be identical species. Therefore, to maintain clarity of reading, findings of only one sample will be discussed hereafter, unless mentioned elsewhere in the manuscript. The mitogenome of O. laetus had all the general characteristics of other similar insects. The 15,672 bp long mitogenome is double-stranded and circular, consisting of typical 13 PCGs, 2 rRNA, and 23 tRNA genes. A unique feature witnessed was 38 genes instead of the typical 37 genes and a control region spanning 962 bp. The majority J-strand had 24, and the minority N-strand harbours 14 of these 38 genes. Gene overlaps were observed at 16 locations, and the longest overlap involved 25 bp between trnL1 and rRNAL 36 . Two overlaps found between atp8-atp6 and nad4-nad4l had a 7 bp ATG ATA A pattern. Overall, ten intergenic/non-coding regions are present, ranging in length from 1 to 200 bp (Table 2, Fig. 1).
The gene arrangement was similar to the ancestral arrangement with a significant change in the intergenic space between nad4 and tRNA-H, which is reduced to 11 bp. Another striking feature was the duplication of the gene tRNA-threonine, which could have occurred due to anticodon mutation or replication slipping 37 . The displacement of tRNA-H, closer to nad4 and shortening of the 41 bp non-coding region to 11 bp was validated by aligning the concerned region of the assembled data with the Sanger sequencing, and 100% identity was obtained, confirming the displacement as shown in Fig. S2. Lygaeidae have a unique synapomorphy by having an unusual spacer between tRNA-His and nad4 in their mitogenomes 38 . Though this was witnessed in O. laetus mitogenome, perhaps on the other end, a feature that is generally expected to be in Lygaeidae.
Nucleotide composition and codon usage. Nucleotide composition is highly skewed towards A/T, a common feature in mitogenomes 30,31 , and the AT content was 74.22%, and the rest being GC content (25.6%) ( Table 3). The standard range for Hemiptera ranges between 69. 53 Table 4). The AT skew observed was 0.11, indicating adenines more in number than thymines, and the GC skew of − 0.19 indicates cytosines outnumbering guanines by a whisker. A similar trend was observed with PCGs as well. The PCGs in the J strand have AT skew of − 0.025 and a GC skew of − 0.130. At the same time, PCGs in the N strand have AT skew of − 0.35 and a GC skew of 0.244. They are indicating that J-strand PCGs are much less AT and GC skewed than N-strand genes. The rRNA genes have a GC content of 25.7% and 22.6% for rrnS and rrnL, respectively. G. pallidipennis is found to closely match the statistics obtained for the O. laetus when the comparison is made among the Lygaeoidae (Table 3). After analyzing 100 + genomes, Wei et al., reported GC skewness as an indication of oriC inversion associated with replication orientation and AT skew dictating gene direction and codon usage, both related to the mitogenome strand asymmetry 39 .
A slight variation in the average codons used among our samples was seen and found 14 bp less in BTI. However, a more extensive analysis revealed that the codon AUU (isoleucine) = 311.0 and GCG (Alanine) = 4.0 were having the highest and lowest frequencies in both the samples (Table 7). Predominant codons like Ile, Met, Phe, and Leu were entirely composed of A and T nucleotides. The nucleotide composition of each codon observed for the entire genome shows that the possibility of A occurring at the first position is 42.4, the second position  (Table 4), and a graph was manually generated. It is the number of times a codon is repeated in relation to the uniform synonymous codon usage, i.e., all codons of amino acid have the same chances of appearing in the sequence. It can be observed from Fig. 2 that all the codons of all amino acids have been strictly used, but the distribution is not equal. This distribution can act as markers for identification at the species level and intra-ordinal level.
Protein coding genes. All the identified 13 PCGs were confirmed for length and sequence by comparing them with the Illumina HiSeq RNA-Seq data generated by our lab for another study (SRR8526511 and SRR8526512). The results obtained from the MITOS2 server agreed with the cross-comparison and verification with the transcriptomic data (results not shown). As shown in Table 2 above, twelve genes have ATN as a start codon, only cox1 has TTG, which can be seen as a start codon for cox1 in many other species 40,41 . Four PCGs (nad2, cox2, nad3, nad1) have ATA as their start codon, four other genes (atp8, nad5, nad4l, and nad6) have ATT as their start codon. The genes atp6, cox3, nad4, cob start with codon ATG. Almost all the genes end with the typical non-sense codon, but few genes have incomplete/different termination codons like cox2 ending with AAA, cox3 ending with an incomplete codon: T-. The majority of the genes are found on the J-strand with much lower AT and GC Skew. Each PCG was aligned and compared with 35 other species to confirm consensus sequences and significant motifs.
Further, from Table 5, we can infer that there was not a vast difference between AT and CG richness between rRNA, tRNA, and most PCGs. However, the cytochrome family of PCGs cox1, cox2, cox3, and cob had a higher representation of GC than the other components of the mitogenome. These genes had an average of 30% GC content which was much lower, 17-26% for other PCGs. The pattern followed by the cytochrome family of genes was also evident for the control region. It has been known that the cytochrome family of genes, primarily, cox1 and cob, are used as markers for eukaryotic species identification and evolutionary studies. The higher incidence of GC in these PCGs may well be attributed to being less prone to environmental mutations, codon usage, and therefore, species identity is maintained 42 . The spacer between trnS and nad1 is common in most insects. The conserved sequence block (CSB) region has two consensus regions, indicating that tandem duplication and random loss (TDRL) process could be a common intermediate step 37,43 . Two consensus motifs are 5 bp long AATGA and ACCTA sequences.
The control region is the major non-coding spacer, considered the box of origin sites for replication and transcription 44 . The largest crucial non-coding region is also helpful in species identification. Due to its high divergence, the evolution of tandem repeats, and variability, it can be used as markers, making it a potential species-specific marker 11 . Studies about the control region in Echinoides have shown that due to the region's unique features, it has high compatibility across the entire class, outperforming conventional mitochondrial markers 45 . Of all the arthropods sequenced, a few prevalent motifs were noted: (1) Tandem/microsatellite repeats (Tables 6, 7), (2) Poly-thymine sequence, (3) a AT-rich region, (4) A stem-loop structure 15 . We identified all these motifs in our mitogenomes. Some additional striking features were also seen in other insects. Two types of tandem repeats were identified with a consensus size of 22 copy number 2 and another with size 20 and copy number 3 spread across the genome except in the control region. www.nature.com/scientificreports/ tRNAs and rRNAs. As mentioned earlier, we identified 23 tRNAs in the O. laetus mitogenome rather than the regular 22, lengths ranging from 60 to 72 bp. The structures predicted were clover-leaf or isoforms with a few exceptions: trnS1(AGN), in which the DHU arm (dihydrouridine) was reduced to just a loop of 6 bp, which is the case for most metazoans (Fig. S3) 46 . The trnV had formed two loops in both the DHU and TѱC arms. A duplicate of trnT_0 was identified as trnT_1, which has a similar structure to trnS1 and resembled its isoform due to the reduction of the TѱC arm to a loop, though the sequence was highly similar to trnT_0. These features proved that this novel gene order could be explained by the tandem duplication/random loss (TDRL) model 37 .
The wobble and mismatch pairs common in insects' tRNA were corrected by comparing data from two software MITOS2 24 and tRNAScan-SE 47 .
Keeping with the trend of mitogenome patterns, two rRNA genes were identified and further aligned with 35 other species to confirm the lengths and motifs. The large subunit gene of rRNA was 1244 bp in length and positioned between trnV and trnL-1. The GC content was about 22.6%, and the gene is AT-rich. The smaller subunit gene of rRNA was 767 bp in length with a slightly higher GC content of 25.8%. Phylogenetic analysis. We used only one O. laetus (CBE) mitogenome to maintain equality with the 35 other species' mitogenomes and avoid any skewness/influence of our mitogenome on the final tree. Seven dif-  www.nature.com/scientificreports/ ferent super-families chosen in this study belonged to Hemiptera (Aphidoidea, Cimicoidea, Reduvioidea, Pentatomoidea, Coreoidea, Pyrrhocoroidea, and Lygaeoidea). Hemiptera is one of the largest and diverse groups of insects which has more than 110 genera. GAMMA + P-Invar model parameter estimates were based on the 13,883 alignment patterns with the help of the GTR substitution matrix, and the overall tree length is 16.016.
In the phylogenetic tree shown in Fig. 3 and Fig. S4 (Bayesian inference method), it can be seen that all the species are well placed under their respective super-families. All the 11 Lygaeoidea super-family species have found themselves accommodated in a polyphyletic clade, including species from the Lygaeidae to which O. laetus belongs. Studies have confirmed that Lygaeidae is polyphyletic 48 , meaning evolution from several common ancestors. Incidentally, scientists have differed over the years on the lineage of Lygaeidae. They have concluded the need for further studies to reach a consensus on this, as the super-family Lygaeoidea consists of several subfamilies within itself.
O. laetus remains a separate member among the selected Lygaeoidea species closely related to Y. parallelus (Berytidae), resulting in Lygaeus Sp., and N. plebeius in one clade and M. inconspicuous and K. resedae resedae in the other. The rigorous phylogenetic analysis supported by higher branch confirmation values only confirms the clustering and relationships already documented and established. In our earlier study 49 , we had identified the species and studied its phylogenetic relationship with other species using COI sequences. Also, the clustering of O. laetus in the Lygaeoidea only confirms this study's sequencing, assembly, and annotation accuracy. However,   www.nature.com/scientificreports/ there always will be opportunities for betterment. Since there are no similar prior mitogenomic studies in Lygaeoidea/Lygaeidae, we could not compare our results. Nevertheless, results obtained from this study will make a significant contribution to future studies involving the Lygaeidae family under the Lygaeoidea super-family.

Discussion
The species O. laetus, although a secondary pest of the cotton plant, is a severe threat to the overall quality and quantity of cotton produce. The insect is widely distributed in various parts of Asia irrespective of climate and soil conditions. The emphasis of this study was to sequence and annotate the complete mitogenome to understand if the two species from these locations are different from each other that might be rendering higher pesticide resistance in BTI (N. India). As both the BTI and CBE samples essentially have identical mitogenomes, the species are the same. Hence, the probability of mutation in the detox system could be assumed as the reason for its survival under such relatively high pesticide and harsh conditions, which the gene expression studies can establish. The mtDNA is responsible for various essential physiological functions, primarily oxidative phosphorylation 50 . Genes such as NAD and Cytochrome oxidase play vital roles in the ETC pathway 51 . Although the gut microbes are primarily responsible for digestion of harmful pesticides, it works in association with the gut cells for the ultimate goal of respiration 52 . Therefore, the mitochondrial genes are indirectly involved in the digestion of pesticides and insecticides. Targeting these genes and other pathways, such as P450, GST, ABC, and others, can help eradicate pests without reliance on harmful chemicals 53,54 . There has been massive research on mtDNA of insects due to its smaller size, comparative and phylogenetic analysis being the majority 55 . The order Hemiptera also has such comparative and phylogenetic studies 56 . This provides data on intra-ordinal and inter-ordinal relations amongst different insect species. The Lygaeidae family under this order comprises many important pests and species, yet www.nature.com/scientificreports/ only four have completely sequenced and analysed mitogenomes 57 . This type of analysis provides a perspective for other research and even helps understand evolution to some extent. In this study, the mitochondrial genome of O. laetus has been sequenced, thoroughly analysed, annotated, and compared with 35 other species from Hemiptera. The gene order is similar to the ancestral arrangement except for a few rearrangements. The genome organization was similar to almost all Hemipterans. Unlike the typical 37 genes, 38 genes were determined; this included 13 PCGs, 23 tRNAs, and 2 rRNAs. The gene order was compared with other species from other genera also. There were 16 overlaps observed between the genes which were more or less similar to other species. The highest was 25 bp obtained between trnL1 and rrnL. Another conserved overlap of 7 bp was observed between atp8 and atp6, and nad4 and nad4l were also observed in Melon thrips 58 . The control region is the largest non-coding region obtained.
Almost all the PCGs started with the start codon ATN; only the cox1 gene has TTG as its start codon, again observed in other species as well 41 . The genes end with complete stop codons except for cox2 and cox3. All the features obtained concerning the PCGs are ancestral and do not vary much from what is already known. The codon usage, AT, and GC skews were all similar to the range obtained for Hemipterans. Some striking differences have been found in both samples, but deflect from the common are: one of them is the duplication of tRNA-Threonine, which increased the number of tRNAs to 23. The duplicates are labeled as trnT_0 and trnT_1. The duplicate has a similar structure to trnS1 and resembled its isoform due to the reduction of the TѱC arm to a loop 37 . Similar observations were seen in Reduvius tenebrosus 59 . This novel rearrangement has been explained by the tandem duplication/random loss (TDRL) model. All the other tRNA structures predicted were clover-leaf, with the above two exceptions 60 . This feature has not been observed for previously sequenced mitogenomes but is seen for newly annotated ones, hinting at mutation and evolution.
The second striking difference observed was the gene rearrangement of the intergenic space between trnH and nad4, which is considered a synapomorphy of Lygaeoidea, which was reduced to 11 bp, as opposed to 41 bp in the same super-family.
The control region is relatively smaller in length with 962 bp. All the standard features of the control region, similar to the study of C. fallax, were observed except for the tandem repeats 61 . Microsatellite repeats replacing the tandem repeats were also observed in various species such as Panaorus albomaculatus 36 .

Conclusion
The taxonomy of Hemiptera has always put up several challenging questions, and researchers across the globe have done extensive studies. The emphasis of this study was to find if there exist another cryptic sister species in the two extreme regions of India-Coimbatore, and Bhatinda, where cotton crop scientists witnessed different levels of bug infestation. As the study concludes, the mitogenomes were 99.9% identical and gave us an understanding of their presence in these two places. The mitogenome sequenced provides many genetic markers from PCGs to control regions for identification purposes.
Our study is the first detailed mitochondrial genomic study in the genus Oxycarenus. We also have found some novel rearrangements in the mitogenome, which will help understand the genus's evolution and ecology. The results obtained from this study, along with the whole genome of this species when available, would provide possibly new and exciting prospects that could help in biocontrol and mitigating resistance of this economically important species. From the resistance to pesticide perspective, further studies involving whole transcriptomic approaches may lead to vital discoveries. Therefore, the data and the findings from this study would further help in the evolutionary studies of Oxycarenidae, Lygaeidae, and Hemiptera in the future.