The complete mitochondrial genome of Wellcomia compar (Spirurina: Oxyuridae) and its genome characterization and phylogenetic analysis

Wellcomia compar (Spirurina: Oxyuridae) is a pinworm that infects wild and captive porcupines. Despite clear records of its morphological structure, its genetics, systematics, and biology are poorly understood. This study aimed to determine the complete mitochondrial (mt) genome of W. compar and reconstruct its phylogenetic relationship with other nematodes. We sequenced the complete mt genome of W. comparand conducted phylogenetic analyses using concatenated coding sequences of 12 protein-coding genes (PCGs) by maximum likelihood and Bayesian inference. The complete mt genome is 14,373 bp in size and comprises 36 genes, including 12 protein-coding, two rRNA and 22 tRNA genes. Apart from 28 intergenic regions, one non-coding region and one overlapping region also occur. A comparison of the gene arrangements of Oxyuridomorpha revealed relatively similar features in W. compar and Wellcomia siamensis. Phylogenetic analysis also showed that W. compar and W. siamensis formed a sister group. In Oxyuridomorpha the genetic distance between W. compar and W. siamensis was 0.0805. This study reports, for the first time, the complete W. compar mt genome sequence obtained from Chinese porcupines. It provides genetic markers for investigating the taxonomy, population genetics, and phylogenetics of pinworms from different hosts and has implications for the diagnosis, prevention, and control of parasitic diseases in porcupines and other animals.

Mitochondrial genome sequencing and assembly.The whole genome shotgun strategy was used to construct libraries containing DNA fragments of 400-500 bp that were randomly interrupted by a covaris ultrasonic crusher.Paired-end genomic library (approximately 400 bp inserts) was constructed according to the manufacturer's instructions (Illumina), and were sequenced by next-generation sequencing technology based on the Illumina HiSeq sequencing platform.A5-miseq v20150522 27 and SPAdesv3.9.0 (http:// cab.spbu.ru/ softw are/ spades/) 28 were used to assemble high-quality second-generation sequencing data to construct contig and scaffold sequences.The contig was compared with the nt library on NCBI using BLAST v2.2.31+, and the contig from mt was screened.Then, collinearity analysis was carried out using the mummer v3.1 29 software to determine the positional relationship and to fill the gaps between contigs.Finally, A5-miseq software was selected for splicing results, and Pilon v1.18 30 software was used to correct the results to obtain the final mt sequence.
Gene annotation and analysis.The complete mt genome sequence was uploaded to the MITOS web server (http:// mitos.bioinf.uni-leipz ig.de/) for functional annotation 31 .The genetic code was set to 05 invertebrate, and the remaining were set according to the default parameters set by MITOS.For an undetectable tRNA gene, the approximate position was determined by comparison with the corresponding tRNA sequence in the reference genome 32 .The secondary structure was manually drawn to determine the final sequence and structure.The base compositions (%) of the complete mt genome, PCGs, and rRNA genes were calculated using Mega X 33 .The CG View visualisation software was used to map the entire mt genome 34 .The figure of Gene rearrangement was drawn using PhyloSuite v1.2.2 (http:// phylo suite.jushe ngwu.com/ insta llati on_ packa ges/ Phylo Suite_ v1.2.2_ Win64_ with_ plugi ns.rar) 35 .

Phylogenetic analysis and genetic distance.
The 33 mt genomes of nematode parasites were downloaded from NCBI (supplementary material Table S1), including 32 of Spirurina nematodes (mostly published papers) as ingroups and one of an Enoplea nematode, Trichuris suis, as an outgroup.Phylogenetic relationships among the 33 representative nematode species sequences and that of the W. compar mt DNA obtained in this study were reconstructed based on the CDS of 12 PCGs excluding the stop codon, and were concatenated.12 PCGs were aligned in batches with MAFFT 36 using the '-auto' strategy and codon alignment mode.Each gene was translated into amino acid sequences using the invertebrate mt genetic code in MAFFT and aligned based on its amino acid sequence using the default settings.The alignments were back translated into www.nature.com/scientificreports/ the corresponding nucleotide sequences, and refined using the codon-aware program MACSE v. 2.03 37 with the invertebrate mt genetic code, which preserves the reading frame and allows the incorporation of sequencing errors or sequences with frameshifts.Ambiguously aligned fragments of 12 alignments were removed in batches using Gblocks 38 with the following parameter settings: minimum number of sequences for a conserved/ flank position (16/16), maximum number of contiguous non-conserved positions (8), minimum block length (10), and allowed gap positions (with half).The final sequences of the 12 PCGs were concatenated into single alignments for phylogenetic analyses using phylosuite 35 .ModelFinder 39 was used to select the best-fit partition model (edge-linked) using Bayesian information criterion (BIC).Best-fit partition model for BI and ML trees are listed (supplementary material Tables S2, S3) according to BIC.Bayesian inference phylogenies were inferred using MrBayes v3.2.7a 40 under the partition model (2 parallel runs, 4 independent Markov chains run for 3,000,000 metropolis-coupled MCMC generations, sampling a tree every 100 generations), in which the initial 25% of sampled data were discarded as burn-in.ML phylogeny was inferred using IQ-TREE v1.6.12 using an edge-linked partition model for 50,000 ultrafast bootstraps 41,42 .Phylograms were drawn using Interactive Tree Of Life (iTOL) v6 (https:// itol.embl.de/ itol.cgi) 43 .The calculation of pairwise genetic distances between phylogenetic trees was conducted using MEGA X 33 .Variance estimation was performed utilising concatenated PCGs sequences, employing the Bootstrao method with 1000 generations, and modeling the distances using the P-distance.
Ethics approval and consent to participate.No animal was harmed during the course of this research.
We confirm that all experiments were performed in accordance with relevant guidelines and regulations.All experimental procedures involving animals were reviewed, approved and supervised by the Animal Ethics Committee of Zunyi Medical University (approval no.LVRIAEC [2016] 2-059).

Results and discussion
Genome structure, organisation and composition.The complete mt genome of W. compar (Gen-Bank accession No. MW059037) was a 14,373 bp closed-circular molecule (Fig. 1), encoding an entire set of 36 genes, which included 12 PCGs (cox1-3, nad1-6, nad4l, atp6, and cytb), 22 tRNA genes, two rRNA genes (s-rRNA and l-rRNA), and a non-coding region (NCR)) (Table 1).However, this was different from the Enoplea nematode (such as Trichuris_suis) gene set, which has an atp8 gene 44 .All genes were transcribed in the same direction on the N-strand.The distribution of genes in the mt genome was identical to those of Wellcomia siamensis 17 and Syphacia obvelata 18 .The overall base composition of the mt genome of W. compar was as follows A: 25.7%; T: 53.0%; G: 16.9%; C: 4.4%; G + C: 21.3%; and A + T: 78.7%.The overall A-T and G-C skews values were -0.346 and 0.586, respectively (Table 2).The intergenic spacer sequence was 613 bp in total and included 28 regions ranging in size from 1 to 147 bp.The overlapping nucleotide fragments were scattered in one place, located between trn H and rrn S (-2 bp).
Gene annotation and analysis.The mt genome of W. compar encodes 12 PCGs, which contain 3420 codons with a total length of 10,260 bp.The content of A + T was 77.5%, which was far higher than G + C. The nucleotides in metazoan mt genomes are not randomly distributed, and this nucleotide bias is often linked to the unequal usage of codons 45 .The nucleotide usage of the 12 PCGs in the mt genome of W. compar is shown in Table 3.The codons TTT (phenylalanine, 19.2%), TTA (leucine-2, 8.2%), ATT (isoleucine, 7.5%) usage were most frequent.Therefore, the nucleotides in PCGs were biased towards A and T. Most of the PCGs had the start codon TTG, except for cox 1 that uses ATA, and the other three genes (nad 1, nad 4 and cox 2) that use ATT (Table 1).Three types of stop codons were used: TAG (nad 1, cob and nad 5), TAA (nad 6, nad 4 and nad 4l) and an abbreviated stop codon T (cox 1, atp 6, nad 2, cox 3, nad 3 and cox 2), which is an incomplete TAA stop codon and is completed by the addition of 3` A residues to the mRNA (Table 1).
W. compar mt DNA contains 22 tRNA genes, which range from 53 bp (trnS1) to 64 bp (trnI and trnT).The two rRNA genes, rrnL and rrnS, were 959 bp and 732 bp in size, respectively (Table 1).rrnL is located between trnC and trnM, and rrnS is situated between trnH and trnA.The A + T content of rrnL and rrnS was 80.9% and 76.5%, respectively (Table 2).
The NCR, located between trnS2 and trnN, was 530 bp in length with a higher A + T content (97.2%) than of any other region of the mt genome.
Gene rearrangement in Oxyuridomorpha.The Oxyuridomorpha infraorder comprises seven families: Thelastomatoidae, Travassosinematidae, Hystrignathidae, Protrelloididae, Oxyuridae, Pharyngodonidae, and Heteroxynematidae 5 .To date, the mt genomes of many Oxyuridomorpha nematode infraorder lineages are still underrepresented or not represented, except for those of Oxyuridae and Heteroxynematidae.Oxyuridae includes seven species (including W. compar) and Heteroxynematidae, and we thus compared the gene rearrangement between the seven species of Oxyuridomorpha.
Gene rearrangement is mainly caused by mutations in the mt 46 , and for which the TDRL model is possibly the most widely accepted explanation hypothesis 47,48 The mt gene arrangements in the seven species were not the same (Fig. 2), with W. compar being consistent with W. siamensis and S. obvelata but different from Oxyuris equi, Enterobius vermicularis, Aspiculuris tetraptera, and Passalurus ambiguus.The main difference was the occurrence of a transposition event (a position change of trnI for O. equi and E. vermicularis; a position change of trnY for A. tetraptera) or the number of NCR.trnI was inserted between NCR and trnN, and trnY was inserted between trnQ and trnC.P. ambiguus has two NCRs, consistent with other nematodes 44,[49][50][51] , and an extra NCR in P. ambiguus was found between trnA and trnS2.Based on the order of gene arrangement, it can be inferred that W. compare has a closer evolutionary relationship with W. siamensis and S. obvelata.displayed in Data S1 of the Supplementary Material, where we find that the genetic distances between species are smaller within the same superfamily than the distances with species from other families.In Oxyuridomorpha, the genetic distance between W. compar and W. siamensis was 0.0805, which was lower than that between W. compar and other pinworms.Phylogenetic analyses of W. compar and the selected Spirurina nematodes were performed using maximum likelihood (ML) and Bayesian inference (BI) methods based on concatenated mitochondrial CDS of 12 PCGs (BI/ML [Fig.3]).The mt genome sequences may provide reliable genetic markers for examining the taxonomic status of nematodes, particularly when PCG sequences are used as markers for comparative analyses [15][16][17][18][19][20][21] .Because some superfamilies were represented by a single species, this topology should be interpreted with caution.Phylogenetic analysis showed that the BI and ML trees both divided Spirurina into two clades: Spiruromorpha formed one separate clade, and Oxyuridomorpha + Rhigonematomorpha + Gnathostomatomorpha + Dracunculoidea + Ascaridomorpha formed another clade.The second clade was further sub-divided into two clades, Rhigonematomorpha + Gnathostomatomorpha + Dracunculoidea + Ascaridomorpha and Oxyuridomorpha.These results are consistent with a recent study in Spirurina 47 .Within Spiruromorpha, many nodes were well-supported.Of note, our results indicate that Tetrameres grusi is a sister to Spirocerca lupi.Although S. lupi and Thelazia callipaeda belong to the superfamily Thelazioidea, T. callipaeda did not cluster together with S. lupi, but instead showed an early diverging position to S. lupi.These results are identical to those previously reported 50,52 .Within Ascaridomorpha, the superfamily Ascaridoidea forms a well-supported clade, whereas Rhigonematoidea and Heterakoidea form a sister branch 53,54 , and they are the sister group of Seuratoidea.Notably, Ascaridoidea, Heterakoidea, and Seuratoidea belong to the common www.nature.com/scientificreports/infraorder: Ascaridomorpha.The topology of this clade indicated that Rhigonematoidea is closely related to the superfamily Ascaridomorpha.In many early morphology-based classification systems, Rhigonematomorpha was considered to be related to Oxyuridomorpha (pinworms) 55 .Subsequent research based on 18S rRNA phylogeny separated Rhigonematomorpha from Oxyuridomorpha as a distinct group 6 .Our results are inconsistent with those of previous research, and we suggest that it may belong to Ascaridomorpha.Our results show that Dracunculoidea is more closely related to Ascaridomorpha than Spiruromorpha, which is in agreement with recent studies 42 .In our study, Gnathostomatomorpha forms a sister branch with Rhigonematoidea + Ascaridomorpha, which is consistent with previous studies 47,51,56 .However, previous studies have also suggested that Gnathostomatomorpha is nested within Ascaridomorpha 57 , or that Gnathostomatomorpha forms a sister branch with Ascaridomorpha + Oxyuridomorpha + Dracunculoidea + Spiruromorpha 58 .
The phylogenetic trees derived from two analytical methods exhibited a high degree of similarity in their topological structure.Notably, the branching patterns of the trees within the Gnathostomatomorpha clade displayed confuseing characteristics.Furthermore, dissimilar topologies were observed in the Oxyuridomorpha clade.Our results revealed that W. compar was sister to W. siamensis, with high statistical support (Bayesian posterior probability = 1.00; bootstrap support = 100).In Oxyuridomorpha, W. compar and W. siamensis are clustered together and have closer relationships with S. obvelata than with E. vermicularis in both trees, which is further supported by the comparison of gene orders and the genetic distance (PCGs) in this study.In previous research, it was observed that A. tetraptera and S. obvelata exhibited clustering, while P. ambiguus was identified as the sister species to O. equi 18 .Another study confirmed that W. siamensis was the sister species to P. ambiguus, as www.nature.com/scientificreports/mentioned in a prior investigation 19 .Additionally, a recent study 47 found P. ambiuguus to be sister to A. tetraptera.However, our results differ from these findings, possibly due to the limited availability of the complete mt genome of pinworms.In the present study, preference was given to BI tree classification owing to its higher support values.Despite pinworms are nematodes of human and animal health significance, the sequencing and publication of mitochondrial genomes have been limited to only seven species.Consequently, there is a need to increase taxon sampling in order to facilitate future phylogenetic investigations of this infraorder using mt genomic datasets.The systematics of the suborder Spiruromorpha (Spiruromorpha, Oxyuridomorpha, Rhigonematoidea, Gnathostomatomorpha and Ascaridomorpha) has been controversial for decades.However, sequence analysis based on mt genomes presents roughly similar phylogenetic trees, which differ from those constructed based on 18S rRNA and single mitochondrial genes.To date, phylogenetic analysis based on 18S rRNA genes has accumulated more than 2700 sequences 59 , covering many but not all nematode taxa.However, it lacks the resolution needed to elucidate the phylogenetic relationships of the entire nematode tree, as different taxa exhibit different gene mutation rates, meaning that any single marker is unlikely to perform equally well in inferring the phylogeny of all nematode taxa.A better strategy for analysing nematode phylogeny involves targeting multiple genes rather than one gene.A large number of genes can help offset the adverse effects of recalcitrant genes in the analysis 60,61 .An analysis based on mt genomes fits this trend better, perhaps representing a better future for systematics.www.nature.com/scientificreports/ https://doi.org/10.1038/s41598-023-41638-9www.nature.com/scientificreports/Phylogenetic analyses and genetic distance.Pairwise genetic distances in the phylogenetic tree are

Figure 1 .
Figure 1.Circular map of the mt genome of W. compar.Scale is approximate.12PCGs and 2 rRNA have standard nomenclature.The 22 tRNA genes are designated by a single-letter code for the corresponding amino acid, with numbers used to distinguish the two leucine and serine designated tRNAs.Grey region rerensents NCR.From the inside to the outside of the circle diagram, the first circle represents the scale, the second circle represents G-C skew, the third circle represents GC content, and the fourth circle represents the arrangement of PCGs, tRNA genes, and rRNA genes on the genome.

Figure 2 .
Figure 2. Gene orders in the seven species in Oxyuridomorpha.All genes are transcribed in the same direction on the N-strand.

Figure 3 .
Figure 3.A phylogenetic tree of Spirurina inferred by BI and ML using Trichuris suis as the outgroup.Bayesian posterior probability (left) and bootstrap support (right) values of each clade are displayed.Wellcomia compar has been bolded.Inconsistent branches in ML were listed separately(right).

Table 1 .
Characterisation of the mt genome of W. compar.

Table 2 .
Base composition of the complete mt genome, PCGs, and rRNA genes of W. compar.