Novel insights into mitochondrial gene rearrangement in thrips (Insecta: Thysanoptera) from the grass thrips, Anaphothrips obscurus

We sequenced the mitochondrial (mt) genome of the grass thrips, Anaphothrips obscurus, which is highly rearranged and differs from the four thrips species reported previously in the arrangement of both tRNA genes and a protein-coding gene, nad3, and in the copy number of the control region (CR). We reconstructed the phylogeny of the thrips with mt genome sequences, and used it as a framework to gain insights into mt genome evolution in thrips. It is evident that A. obscurus is less rearranged in mt genome organization than the other four known thrips. nad3 is in its ancestral location in A. obscurus but was translocated in other four thrips. Also, A. obscurus has one CR, which is ancestral to hexapods whereas other thrips have two or three CRs. All of the five thrips whose mt genomes have been sequenced to date are from the subfamily Thripinae, which represents about a quarter of the species richness in the order Thysanoptera. The high variation in mt genome organization observed in a subfamily challenges our knowledge about animal mt genomes. It remains to be investigated why mt genomes evolved so fast in the subfamily Thripinae and how mt genomes evolved in other lineages of thrips.

pollen, and arthropod prey 27 . Four species of thrips have been sequenced for mt genomes so far; all of them are highly rearranged and differ from each other but share the same arrangement of the protein-coding and rRNA genes except in one strain of the yellow tea thrips, Scirtothrips dorsalis 19,20,28,29 . The mt genomes of two strains of S. dorsalis have been sequenced; the typical single circular chromosome was found in the East Asia strain (EA1) while the South Asia strain (SA1) has the mt genome consisting of a large circle with 35 genes and a small circle with only three genes 20 . In addition, duplication or triplication of control region was found in the mt genomes of these thrips reported previously 19,20,28,29 .
In this study, we sequenced the complete mt genome of the grass thrips, Anaphothrips obscurus, which is a cosmopolitan pest of cereal crops in many parts of the world 30 . Like the four other thrips species reported previously, A. obscurus also has a highly rearranged mt genome and furthermore, a different arrangement of protein-coding genes from the four thrips species reported previously. We reconstructed the phylogeny of A. obscurus with other four thrips species using mt genome sequences, and used the phylogeny as a framework to gain insights into mt genome evolution in thrips.

Results and Discussion
The mitochondrial genome of the grass thrips, A. obscurus. The full-length mt genome of A. obscurus was obtained by assembling the sequence reads from the two long PCR fragments, which were 7,188 bp and 7,409 bp in size respectively, together with the sequenced short PCR fragments, cox1 (759 bp) and rrnS (450 bp). Like in most animals, the mt genome of A. obscurus is circular, 14,890 bp in size and contains 37 genes (for 13 proteins, two rRNAs, 22 tRNAs) and a CR (Fig. 1). The mt genome of A. obscurus is A and T rich with 38.4% A, 39.8% T, 10.6% C and 11.3% G. Thirty-one genes are on one strand (majority strand) and six genes are on the other strand (minority strand). Ten pairs of genes appear to overlap by 1 to 10 bp: atp8-atp6, atp6-trnS 1 , cox1-trnL 2 , trnH-nad5, trnD-trnR, trnR-trnG, trnN-trnE, trnQ-trnI, trnA-trnF, and trnL 1 -trnC. Most of the tRNA genes can be folded into the conventional cloverleaf structure except for trnS 1 , trnV and trnN, in which the DHU arms form a loop, instead of a stem-loop. The CR is 174 bp long between trnS 1 and nad5, and is highly A and T rich (83.3%). Several features commonly found in the mt CRs of insects are present in A. obscurus, including a stem-loop, an AT-rich segment, a T-stretch, and three G[A]nT motifs 31 (Fig. 2). Like other four species of thrips reported previously 19,20,28,29 , A. obscurus also has a highly rearranged mt genome. Only three ancestral mt gene clusters of insects, containing nine genes in total, are retained in A. obscurus: cox1-trnL 2 -cox2, atp8-atp6, and Figure 1. The mitochondrial genome of the grass thrips, Anaphothrips obscurus. Direction of gene transcription is indicated by arrows. Protein-coding genes are shown as dark blue arrows, rRNA genes as orchid arrows, tRNA genes as coral arrows and large NC regions (>100 bp) as grey rectangles. tRNAs are labeled according to their single-letter abbreviations. The GC content is plotted using a black sliding window, as the deviation from the average GC content of the entire sequence. GC-skew is plotted using a colored sliding window (green and orchid color), as the deviation from the average GC-skew of the entire sequence. In the very inner cycle, the nucleotide positions in the mitochondrial genome start from the 5′-end of cox1.
Phylogenetic relationships among five species of thrips inferred from mt genome sequences. To help understand the evolution of mt genome organization in thrips, we inferred the phylogenetic relationships among the five thrips species whose mt genomes have been sequenced. The maximum likelihood (ML) and Bayesian inference (BI) trees have an identical topology with strong support for each of the  tRNA genes are abbreviated using their corresponding one-letter amino acid names. The CRs were marked as red boxes. The tRNA genes that are conserved among the five species of thrips in their location relative to their upstream and/or downstream genes were in yellow. Genes are transcribed from left to right except those underlined, which are transcribed from right to left. A grey dot at the node indicates the MRCA of the four thrips (F. intonsa, F. ocidentalis, A. obscurus and S. dorsalis) and the origin of the translocated nad3. nodes (Fig. 3). As expected, the two species of the genus Frankliniella, F. intonsa and F. ocidentalis, were grouped together. The two strains of S. dorsalis (EA1 and SA1) formed a group and were most closely related to Thrips imagini; together, they were more closely related to the two Frankliniella species than to A. obscurus.

Evolution of mt genome organization in thrips.
Thysanoptera is one of the least studied orders among the 32 orders of insects for mt genomes. Prior to the current study, only four species of thrips have been sequenced for mt genomes 19,20,28,29 . All of these thrips have highly rearranged mt genomes relative to the inferred ancestral mt genome organization of hexapods 1,32 . Furthermore, the four species of thrips also differ from each other in mt genome organization, even between the two strains of the yellow tea thrips, S. dorsalis 19,20,28,29 . The mt genome of the East Asia strain (EA1) of S. dorsalis has a single circular chromosome whereas the South Asia strain (SA1) has two circular chromosomes: the small one is 921 bp in length and contains three genes, nad6, trnC and trnS 1 , and a CR; the larger one is 14,283 bp in length and contains all other mt genes, trnS 1 , and two CRs 20 . The variation among the EA1 strain of S. dorsalis and other three species of thrips is limited only to tRNA genes and CRs; the arrangement of protein-coding and rRNA genes are conserved among the four species of thrips (Fig. 4). A. obscurus, however, differs from the four species of thrips reported previously in the arrangement of both tRNA genes and one protein-coding gene, nad3. In the four species of thrips reported previously, nad3 was translocated to between cox1 and trnL 2 -cox2. In A. obscurus, however, nad3, cox1 and trnL 2 -cox2 retained their ancestral positions: nad3 is after cox3; cox1 and trnL 2 -cox2 are in one cluster with no genes in between (Fig. 3). As the four thrips species reported previously are more closely related to each other than either of them is to A. obscurus, the most plausible explanation is that the translocation of nad3 occurred in the most recent common ancestor (MRCA) of these four thrips species (Fig. 3).
A number of studies have shown that in insects mt tRNA genes are much more mobile than protein-coding genes and rRNA genes [33][34][35][36][37][38][39][40][41][42][43][44] . Thrips are no exceptions and appear to go much further than other insects in tRNA-gene rearrangement. A. obscurus and the four thrips species reported previously all differ from each other in the arrangement of tRNA genes; only 10 of the 22 tRNA genes, trnL 2 , trnG-trnK, trnY, trnW, trnA-trnF, trnH, trnC and trnV, are conserved among the five thrips species in their locations relative to a protein-coding or rRNA gene upstream or downstream (Fig. 3). Even the two species of the same genus Frankliniella differ in the arrangement of three tRNA genes, trnI, trnT and trnQ (Fig. 3).
In addition to the genes, the CR (also called large non-coding region) also varies among the five thrips species. First, A. obscurus has one CR, T. imaginis and S. dorsalis EA1 have two CRs, whereas the two Frankliniella species and S. dorsalis SA1 have three CRs respectively (Fig. 3). Second, the two Frankliniella species and S. dorsalis SA1 differ in the location of one of their three CRs. T. imaginis and S. dorsalis EA1 also differ in the location of one of their two CRs. Third, the two CRs of T. imaginis have similar length and nearly identical sequence, so do the three CRs of each Frankliniella species. The two CRs of S. dorsalis (EA1), however, differ substantially in length (623 and 197 bp, respectively) and have no detectable sequence similarity to each other. In S. dorsalis SA1, the two CRs in the large mt chromosome differ in length (181 bp and 242 bp respectively) and there is no detectable sequence similarity between them. The CR3 in the small mt chromosome has high sequence similarity (98.35%) to part of CR1 on the large mt chromosome but differ in overall length (181 bp and 344 bp respectively). The CR upstream nad5 is present in all of the five thrips species and thus can be inferred to be ancestral to these thrips. A duplication of CR likely occurred in the MRCA of Frankliniella species, T. imaginis and S. dorsalis, which produced CR2 between cox3 and cob; the different location of CR2 in S. dorsalis EA1 and S. dorsalis SA1 can be accounted for by a translocation event of CR2. Another duplication of CR occurred likely in the MRCA of the two Frankliniella species, which produced CR3 between cob and nad2; in S. dorsalis SA1 strain, CR3 is likely due to the duplication of CR1 when the small chromosome was generated (Fig. 3).
It should be pointed out that the five thrips species whose mt genomes have been sequenced in previous studies and the current study are all from the same subfamily Thripinae of the family Thripidae. The order Thysanoptera contains nearly 6,000 described species in nine families of two suborders. The suborder Terebrantia contains eight families whereas the suborder Tubulifera contains only a single family [45][46][47] . Therefore, the unusual high variation in mt genome organization revealed so far from the five thrips species is essentially only from a small section of the order Thysanoptera. It remains to be investigated how mt genomes evolved in the thrips outside the subfamily Thripinae.
In summary, we sequenced the mt genome of the grass thrips, A. obscurus. The mt genome is highly rearranged and differs from the four thrips reported previously in the arrangement of both tRNA genes and a protein-coding gene, nad3, and in the copy number of CR. We reconstructed the phylogeny of the five thrips species with mt genome sequences, and used the phylogeny as a framework to gain insights into mt genome evolution and gene rearrangement in thrips. It is clear that A. obscurus is less rearranged in mt genome organization than the other four thrips species reported previously. nad3 is in its ancestral location in A. obscurus but was translocated in other four thrips. Also, A. obscurus has one CR, which is ancestral to hexapods whereas other thrips species have two or three CRs. Finally, all of the five thrips species whose mt genomes have been sequenced to date are from the subfamily Thripinae, which represents about a quarter of the order Thysanoptera in terms of species richness. The high variation in mt genome organization observed so far in the subfamily Thripinae is very unusual and goes against what we know in most other animals. It remains to be investigated why mt genomes evolved so fast in the subfamily Thripinae and how mt genomes evolved in other thrips.

Methods
Sample collection and amplification of mt genome. Specimens of A. obscurus were collected at Northwest A&F University, Yangling, China. Thrips specimens were stored at −80 °C in 95% ethanol. Genomic DNA was extracted from individual thrips using DNeasy Tissue kit (QIAGEN). Fragments of mt cox1 (759 bp) and rrnS (450 bp) were amplified by PCR with conserved primer pairs mtd6-mtd11 and 12SF-12SR (see Supplementary Table S1). These amplicons were sequenced at the Australian Genome Research Facility (AGRF). Two pairs of A. obscurus specific primers, RS45C1F-RS45rrnSF and RS45C1R-RS45rrnSR (see Supplementary  Table S1), were then designed from the sequences of cox1 and rrnS fragments. Long PCR with these specific primers amplified the entire mt genome of A. obscurus with two gaps filled by the cox1 and rrnS fragments ( Next-generation sequencing, mt genome assembly and gene identification. The two long PCR amplicons, 7,409 bp and 7,188 bp in size, were obtained at different time and were sequenced separately. The 7,409-bp amplicon was sequenced with Illumina Hiseq 2000 platform at the BGI, Hong Kong. The 7,188-bp amplicon was sequenced with Illumina Hiseq 2500 platform at the Berry Genomics, Beijing. Illumina sequence-reads obtained from the long PCR amplicons were checked for quality and then assembled into contigs with Geneious 6.0.6 48 ; the assembly parameters were: minimum overlap identity 98%; no gaps; maximum mismatches per read 2%; maximum ambiguity 2; and minimum overlap 100 bp.
We identified tRNA genes using tRNAscan-SE 1.2.1 49 and ARWEN 50 . A few tRNA genes that could not be identified by these programs were found by manual inspection for predicted anti-codon sequences and secondary structure found in other thrips. Protein-coding and rRNA genes were identified by BLAST searches of GenBank 51 . The annotated mt genome sequence of A. obscurus has been deposited in GenBank under accession numbers KY498001.
Phylogenetic analysis. We inferred the phylogenetic relationship of A. obscurus with four other species of thrips whose mt genomes have been reported (see Supplementary Table S2). The damsel bug, Alloeorhychus bakeri 52 , which retained the ancestral mt genome organization of insects, was used as the outgroup. Each protein-coding gene was aligned individually by codons using MAFFT algorithm implemented in TranslatorX with L-INS-i strategy and default settings 53 . Poorly aligned sites were removed from the amino acid alignment before translating back to nucleotides using GBlocks in TranslatorX with default settings. The rRNA genes were individually aligned using MAFFT 7.0 online server with G-INS-i strategy 54 . Ambiguous positions in the rRNA gene alignments were filtered using GBlocks v0.91 55 with default settings. Alignments of individual genes were concatenated as two datasets: 1) PCGR dataset, containing all three codon positions of 13 protein-coding genes, and two rRNA genes (11,767 bp in total); and 2) PCG12R dataset, which is the same as the PCGR dataset except the third codon positions of protein-coding genes are excluded (8,646 bp in total).
The two concatenated datasets were analyzed using maximum likelihood (ML) method implemented in RAxML-HPC2 8.1.11 56 , and Bayesian inference (BI) method implemented in MrBayes 3.2.3 57 . Multiparametric bootstrapping analysis of 1,000 replicates was performed in RAxML based on the optimal tree with the best likelihood score and the GTRGAMMA model. For MrBayes analyses, two simultaneous runs of 10 million generations were conducted for the dataset using GTR + I + G model and trees were sampled every 1,000 generations, with the first 25% discarded as burn-in. Stationarity was considered to be reached when the average standard deviation of split frequencies was below 0.01 58 .