Introduction

The genus Photinia are evergreen plants that belongs to Rosaceae family, comprising approximately 60 species1. Photinia species are widely cultivated throughout the world and many are cultivated for gardening due to resistance to air pollution and other environmental stressors2,3. P. serratifolia (syn, Photinia serrulata), commonly called as Taiwanese photinia or Chinese hawthorn, is widely cultivated in Southeast Asia, native to China, India, Japan, Indonesia, and the Philippines. P. serratifolia not only has high ornamental value, but also has medicinal value, which is a well-known herb in traditional Chinese medicine (TCM) for the treatment of rheumatism, nephropathy, and spermatorrhea4. Its tender leaves are used as edible vegetables in the south of China, and the matured leaves are used for the treatment of the above diseases4. It has been reported that the leaves of P. serratifolia contain essential oils, triterpenoids, flavonoids, polyphenols, and other bioactive compounds, which have been demonstrated to have antioxidant and anticancer activities in vitro4,5.

The boundaries of some species have not been clearly defined in Rosaceae family due to similar morphological features exhibiting among Rosaceae species1. Compared with morphological identification, DNA sequences can produce more accurate phylogenetic relationships1,6. The phylogenetic study has greatly advanced the studies of taxonomic reclassification with the rapid advances in DNA technologies7,8. Both chloroplast genomes and mitogenomes have been widely used in phylogenetic studies among species6,9. To date, more than 5000 plant chloroplast genomes have been sequenced, but only about 424 plant mitogenome sequences are available (https://www.ncbi.nlm.nih.gov/genome/organelle/, 2/12/2022). Mitochondria are the main organelles involved in energy metabolism in eukaryotic cells10,11. In plant, mitochondria play an important role in plant productivity, development, and various biochemical processes12,13,14. According to endosymbiotic theory, plant mitochondria are believed to have descended from free-living bacteria-independent microorganisms, which explains the presence of their genomes15,16.

Plants have about 100–10,000 times larger and more structurally complex mitogenomes than animals17,18,19,20,21. During evolution, the plant mitogenome underwent dramatic changes in, for example, the gene order, genome structure, and migration of sequences from other organelles17,18,19. The mitogenomes of plants demonstrate significant variations in size and structure organization20. For example, the genome size can vary from 66 kb of Viscum scurruloideum21 to 11.3 Mb of Silene conia22. The number of protein-coding genes varies from 33 of Arabidopsis thaliana23 to 74 of Vitis vinifera24. The number of tRNA genes varies from 3 of Rosa chinensis25 to 31 of V. vinifera24. In addition, the plant mitogenome has numerous repetitive sequences and multiple RNA editing modifications26. In contrast to the conserved structure of plant chloroplast genomes, the variations in mitogenomes are not only between plant species but also can be within the same species12,17,22. For these reasons, mitogenomes have been used as a valuable source of genetic information and for the investigation of essential cellular processes in many phylogenetic studies. However, the characteristics of plant mitogenomes (bigger size, more structural complexity, and low conservation across species) make plant mitogenome assembly difficult13,14. Fortunately, advancements in long-read sequencing, such as PacBio and Oxford Nanopore, have made organelle genome sequencing easier and faster.

Recently, the complete chloroplast genome sequences of Photinia × fraseri, Photinia davidsoniae, and Photinia glabra have been sequenced and published1,27,28. However, at present, no mitogenome of any species in Photinia has been reported. Therefore, in this study, we constructed the complete mitogenome of P. serratifolia based on Oxford Nanopore and Illumina data, performed a phylogenetic analysis, and compared the complete mitogenomes of P. serratifolia and related family. Our results will help better understand the features of the P. serratifolia mitogenome and lay the foundation for identifying further evolutionary relationships within Rosaceae.

Results and discussion

Sequencing and genome structure of the complete mitogenome of P. serratifolia

The total DNA of P. Serratifolia was sequenced, and the raw data had been prepared for assembly, resulting in 115.88 G Nanopore PromethION sequencing data with an average read length of 23,654 bp (61–26,706 bp) and 34.3 G Illumina sequencing data (Supplementary Table S1). We then assembled the complete mitogenome of P. serratifolia in a circular contig of 473,579 bp (Fig. 1), which has been deposited in the NCBI Genome Database (GenBank accession number: MZ153172). The mitogenomes of 19 species were selected for analysis in this study (Supplementary Table S2). It is well known that the plant mitogenome greatly varies in size, from 66 kb in V. scurruloideum21 to 11.3 Mb in S. conica22. As shown in Supplementary Table S2, the relatively medium size of the P. serratifolia mitogenome was smaller than that of Zea mays (680,603 bp) and Oryza sativa (490,520 bp). However, the mitogenome of P. serratifolia was slightly larger than Pyrus betulifolia (469,928 bp), Rhaphiolepis bibas (434,980 bp), and Malus hupehensis (422, 555 bp), and significantly larger than that of Sorbus aucuparia (384,977 bp) and Sorbus torminalis (386,758 bp). These results suggest that P. serratifolia may be identified as a species with a larger mitogenome in the Rosaceae family.

Figure 1
figure 1

The circular map of P. serratifolia mitogenome. Gene map showing 68 annotated genes of different functional groups.

The nucleotide composition of the whole mitogenome is A: 27.6%, T: 27.2%, C: 22.7%, and G: 22.5%, and the overall GC content was 45.2% (Supplementary Table S2), which is consistent with that of most of the species of Rosaceae family we compared (M. hupehensis: 45.21%; Malus domestica: 45.4%; Prunus avium: 45.62%; P. betulifolia: 45.28%; S. aucuparia: 45.39%; S. torminalis: 45.31%) and other angiosperms (Ziziphus jujuba: 45.27%; A. thaliana: 44.79%; Glycine max: 45.03%), but smaller than some gymnosperm, such as Ginkgo biloba: 50.36%.

Gene contents of the mitogenome of P. serratifolia

Although the genome size of plant mitochondrial greatly varied, the number of mitochondrial genes is relatively conserved in the land plant lineage, with 60–80 known genes found in different terrestrial plant species29. In the P. serratifolia mitogenome, 67 genes (38 protein-coding genes, 23 tRNA genes, and 6 rRNA genes) were annotated (Supplementary Table S2). The functional categorization and physical locations of the annotated genes were shown in Fig. 1. The 38 encoded proteins (nad6 and atp1 have two copies) could be divided into 11 classes: ATP synthase (6), cytochrome C biogenesis (4), ubiquinol cytochrome c reductase (1), cytochrome C oxidase (3), maturases (1), transport membrane protein (1), NADH dehydrogenase (10), ribosomal proteins (large subunit (LSU); 3), ribosomal proteins (small subunit (SSU); 6), succinate dehydrogenase (2), and ribonuclease (1) (Supplementary Table S3).

Although comparative analyses of mitogenomes have shown that the sequences of protein-coding genes are highly conserved in plants, variations among plant mitogenomes characterized so far have mainly been reported in the ribosomal proteins30,31. In addition, the gene components cytochrome c biogenesis gene has also been reported to be different among the plant mitogenomes32. Interestingly, consistent with previous mitogenome studies of Rosaceae33, most rps genes (rps2, rps7, rps10, rps11, rps19) were missing in the mitogenome of P. serratifolia (Fig. 2). The functions of missing ribosomal genes may be replaced by nuclear genes, which may be related to the rapid radiation evolution of Rosaceae plants34. Although there was no significant variation of the composition of cytochrome C synthase gene among other species of the Rosaceae family in our study, the length of ccmFc, ccmFn, cob, cox1, cox2, and cox3, in the mitogenome of P. serratifolia, R. bibas, and M. hupehensis, were 797–2271 bp, which was significantly higher than that of other species (212–587 bp) of the family.

Figure 2
figure 2

Distribution of protein-coding genes in plant mitogenomes. Yellow, green, and purple boxes indicate that one, two, and three copies exist in the plant mitogenome, respectively. White boxes indicate that the gene is missing in the plant mitogenome. The circles, squares, and triangles represent dicots, monocots, and gymnosperms, respectively. Besides, the red-colored plant names are species from the Rosaceae family.

Other than ribosomal proteins, the major variations characterized among plant mitogenomes, even in the same genus, are in the tRNA gene contents30. The P. serratifolia mitochondria had 23 tRNAs (Supplementary Table S3). The average length of these tRNAs was 71–87 bp, with a total length of 1725 bp (Supplementary Table S3). The number of tRNAs in the P. serratifolia mitogenome was more than that in other species of the Rosaceae family, such as R. bibas (22), M. domestica (20), P. avium (16), and S. torminalis (18) (Supplementary Table S2). This may be because some tRNAs in the P. serratifolia mitogenome have multiple copies. For example, trnfM-CAT and trnF-GAA have two copies. The function of the missing mitochondrial tRNAs may be replaced by chloroplast-derived tRNAs in species with less mitochondrial tRNAs34. Moreover, consistent with the previous report35, we found that protein-coding genes of the P. serratifolia were not increased along with the increase of tRNAs.

Furthermore, we found that 61 out of the 67 mitochondrial genes have no introns, accounting for 92.54% of the total. Our result is consistent with the general consensus that 63.2% to 100% of mitochondrial genes in most plants have no introns17,18. However, six mitochondrial genes (ccmFC, nad5, nad1, nad2, nad4, and nad7) are found to contain one or more introns of the P. serratifolia (Supplementary Table S3).

Repeat sequences analysis

SSRs, or microsatellites, are DNA stretches consisting of short, tandem units of sequence repetitions of 1–6 base pairs in length36. In the current study, we identified 59 SSRs in the P. serratifolia mitogenome. The proportions of different repeat units were shown in Fig. 3. Consistent with all observed species, mononucleotide repeats were the most abundant SSR type in P. serratifolia, constituting 79.67% (47 repeats) of all identified SSRs. In addition, there were 7 SSRs (11.86%) and 5 SSRs (8.47%) in di-, trinucleotide repeats, respectively. However, there were no tetra-, penta-, and hexa-repeats identified in P. serratifolia mitogenome. The mononucleotide repeats of A/T motifs (a total of 41 repeats) were the most recurrent motifs, representing 69.49% of all identified SSRs (Supplementary Table S4). According to the trend that the distribution pattern of microsatellites is consistent with their phylogenetic status in plants37, the SSR composition of P. serratifolia was similar to its most closely related species, such as R. bibas and P. betulifolia (Fig. 3).

Figure 3
figure 3

The SSRs composition in plant mitogenomes.

In addition, 72 non-tandem repeats, with 50 bp or more in length, were detected in the P. serratifolia mitogenome (Supplementary Table S5). The repetitive sequence in the P. serratifolia mitogenome was 51.05 kb, accounting for 10.78% of the mitogenome. The proportion of repeats is higher than that in Garcinia mangostana (5.8%)38 and Prunus salicina (7.22%)39, but lower than that in Nicotiana tabacum (13%)40 and Daucus carota (16%)41. The different proportions of repeats may be because the mitochondria of G. mangostana and P. salicina are mainly short repeating units, whereas those of P. serratifolia and D. carota are mainly longer repeating units41.

For example, we found one pair long repeat (16,660 bp), one copy at the starting and ending positions of the genome (463990-473579-1-7070), another at 61,999–78,658 bp (Fig. 4a), and 16 pair medium sized repeats between 120 and 920 bp in the P. serratifolia mitogenome (Supplementary Table S5). The distribution of repeat is consistent with many plant mitogenomes that have one or more pairs of large repeats38,42,43. Some reports showed that larger and medium-sized repeats can act as sites for inter- or intramolecular recombination, leading to multiple alternative arrangements or isoforms42,43. Although the frequency of recombination events was low, all these sequencing reads were aligned to the P. serratifolia mitogenome for the detection of potential alternative isoforms. As a benefit of Nanopore PromethION sequencing, these ultra-long reads of P. serratifolia, with an average read length of 23,654 bp, is longer than these identified repeats. Therefore, the long reads can cover identified repeats with high probability. As shown in Fig. 5, the sequencing reads coverage of these repeats is similar to those of other non-repetitive sequences, which implies no branching nodes in each repeat. Therefore, P. serratifolia mitochondrial master genome assembly can be represented in the circular form, as previously reported in plant mitogenomes20,38,44. However, there is a total length of 88,247 bp between the two copies of the long repeats (Fig. 4a), which may give rise to an alternative configuration of mitogenomes via inversions of these long repeats in master conformation (Fig. 4b,c).

Figure 4
figure 4

The distribution of the pair of long repeats and the possible configurations generated from inversions of these long repeats. a: The distribution of the pair of long repeats (16660 bp) (red bars) in the P. serratifolia mitogenome. b and c: Two possible configurations generated from inversions of these long repeats. b is the master conformations, which is same as Fig. 1 shown. c is an alternative configuration of mitogenomes of P. serratifolia.

Figure 5
figure 5

Depth and coverage of the assembled mitogenome using sequencing long-reads. The abscissa shows the genomic positions, and the ordinate shows the depth of mapped raw reads.

The prediction of RNA editing in the P. serratifolia mitogenome

The number of RNA-editing sites varies in different species and is usually frequent in angiosperm and gymnosperm mitochondria45. We predicted 488 RNA-editing sites within the 33 protein-coding genes (Fig. 6) in the P. serratifolia mitogenome, which was similar to those in A. thaliana (441 sites)15, Eucalyptus grandis (470 sites)46, and Citrullus lanatus (463 sites)47 and less than those in gymnosperms that have larger mitogenomes, such as Taxus cuspidata (974 sites), Pinus taeda (1179 sites), Cycas revoluta (1206 sites), and G. biloba (1306 sites)48. However, whether the number of RNA-editing sites is positively correlated with the size of the mitogenome requires further research.

Figure 6
figure 6

Prediction of RNA editing sites in the P. serratifolia mitogenome.

The selection of mitochondrial RNA-editing sites in P. serratifolia shows a high degree of compositional bias. As shown in Fig. 6, all RNA-editing sites are the C-T editing type, which is consistent with the fact that C-T is the most common editing type found in plant mitogenomes49,50. Inconsistent with previous studies50, more than half (313 sites, 64.14%) of the mitochondrial RNA editing occurred at the second codon position in P. serratifolia (Fig. 6), followed by that at the first codon position (161 sites; 32.99%) (Fig. 6). However, no editing site was found at the third position of triplet codons, consistent with the fact that RNA-editing sites at this position were rare in plant mitogenomes48,49.

Although the P. serratifolia mitogenome has more RNA-editing sites, and the vast majority of RNA editing occurs at the first or second position of codons, there were only 30 codon transfer types, corresponding to 14 amino acid transfer types, suggesting a consolidated biological function. The types of transfer are comparable to those of most gymnosperms (30–40 codons; around 20 amino acids)48,50 but less than those of monocotyledonous and dicotyledonous plants (50–60 codons; around 30 amino acids)46,47,49. Among the 30 codon transfer types, TCA =  > TTA was the most common type, with 68 sites. A leucine tendency after RNA editing, supported by the fact that 44.88% (219 sites) of the edits were converted to leucine, was found in the amino acids of predicted editing codons. After RNA editing, 32.0% of the amino acids remained hydrophobic. However, 46.3% of the amino acids were predicted to change from hydrophilic to hydrophobic, while 8.6% were predicted to change from hydrophobic to hydrophilic. Overall, our study suggests that the P. serratifolia mitogenome has more RNA-editing sites but fewer editing types.

It has been well established that RNA editing is an epitranscriptomic mechanism that modifies primary RNAs, and is widespread in plants organelles51, Fig. 7 shows the total number of editing sites of all of the 33 protein-coding genes. Although the pattern changes of RNA editing extent varies between different plant species52, similar to most angiosperms50, ribosomal proteins (except rps4) and ATPase subunits (except atp6) had a relatively small number of RNA-editing-derived substitutions (2–11 sites), while the transcripts of NADH dehydrogenase subunits and cytochrome c biogenesis genes were significantly edited (13–39 sites; Fig. 7) in the P. serratifolia mitogenome. Consistent with the previous report, such as Phaseolus vulgaris26 and Suaeda glauca53, nad4 (36 sites), ccmFn (39 sites), and ccmB (31 sites) had the highest total number of RNA-editing sites predicted in the P. serratifolia mitogenome (Fig. 7). This supports the essential role of editing sites in the proper functioning of mitochondrially encoded proteins.

Figure 7
figure 7

The distribution of RNA-editing sites in the P. serratifolia mitochondrial protein-coding genes.

Codon usage and Ka/Ks analysis

As shown in Supplementary Table S3, in the P. serratifolia mitogenome, ATG was used as the starting codon by almost all the protein-coding genes, while mttB starts with TTG, rpl16 and rps4 start with GTA as the start codon. Three types of stop codons, TAA, TGA, and TAG, were found in the P. serratifolia mitogenome which had utilization rates of 44.7%, 31.6%, and 23.7%, respectively (Supplementary Table S3). The relative synonymous codon usage (RSCU) value for P. serratifolia for the third codon position is shown in Fig. 8. Consistent with most of the currently studied mitogenomes10,53,54, the use of both two- and four-fold degenerate codons was biased toward the use of codons abundant in A or T. In P. serratifolia, 14,333 amino acids were encoded. The most frequently used amino acids were Leu (7.1%), Arg (6.3%), and Ser (6.1%), and the least common amino acids were Trp (1.4%) and Met (1%) (Fig. 8).

Figure 8
figure 8

Relative synonymous codon usage in the P. serratifolia mitogenome.

In genetics, the Ka and Ks substitution ratio (Ka/Ks) is useful for inferring the direction and magnitude of natural selection across diverged species55. A Ka/Ks ratio < 1 implies negative selection, while a ratio of > 1 implies positive selection (driving change) and a ratio of exactly 1 indicates neutral selection. To evaluate selective pressures during the evolutionary dynamics of protein-coding genes among closely related species, the Ka/Ks ratio of 17 single copy PCGs among P. serratifolia and 7 Rosaceae species mitogenomes was calculated. As shown in Fig. 9, there was no substitution in most mitochondrial genes, such as rpl5, rps13, rps14, nad3, nad4L, atp9, ccmB, and cox1, among P. serratifolia and other seven species in Rosaceae. More frequency changes were found in atp genes among species.

Figure 9
figure 9

The Ka/Ks values of 17 protein-coding genes of P. serratifolia versus 7 species. The color in each box represents the Ka/Ks value.

In 21 cases (Fig. 9), Ka/Ks values of P. serratifolia gene-specific substitution rates were higher than 1. This result suggests a positive selection during the evolution of P. serratifolia as compared with 7 other species55,56 Among these cases, the Ka/Ks values of the nad gene-specific substitution rates of P. serratifolia were higher, with Ka/Ks values of 7 nad7 genes and 4 nad3 > 1, suggesting large variation and positive selection during nad gene evolution among Rosaceae55. However, most genes had undergone negative selection pressures during evolution, supported by the fact that the Ka/Ks values of 86 proteins-coding genes, accounting for 72.69% of the proteins-coding genes, were less than 1 compared to the other plant species. Taken together, these results suggest that mitochondrial genes are highly conserved during the evolutionary process in Rosaceae plants.

Phylogenetic analyses

To detect the evolutionary status of the P. serratifolia mitogenome, a phylogenetic analysis was performed on P. serratifolia, together with 8 other species. Phylogenetic relationships (Fig. 10) were analyzed using the concatenated dataset by 17 PCGs through ML phylogenetic analysis. The abbreviations and accession numbers of the mitogenomes investigated in this study are listed in Supplementary Table S2. As shown in Fig. 10, as outgroups, the G. biloba, which belongs to gymnosperm, was distinct from the other angiosperms. Moreover, the taxa of the 7 Rosaceae species were well clustered. Among the Rosaceae cluster, P. avium, which belongs to Amygdaleae subfamily, was distinct from the other 7 species of Maleae subfamily, which also supports the classification of Amygdaleae and Maleae subfamily57,58. Meanwhile, these species in the same genus were clustered together, such as S. aucuparia and S. torminalis, M. hupehensis and M. domestica, which is consistent with previous reports based on morphological and genetic data57,58,59.

Figure 10
figure 10

The phylogenetic relationships of P. serratifolia with other 8 plant species using the ML analysis. The bootstrapping values are listed in each node. The number after the species name is the GenBank accession number. Colors indicate the groups that the specific species belongs.

In addition, we also found that the clade united P. serratifolia with P. betulifolia (Fig. 10). The present phylogenetic analysis shows that R. bibas is sister to P. serratifolia + P. betulifolia, which is consistent with the previous report60. Our results also support the groupings (Sorbus + (Malus + (Rhaphiolepis + (Photinia + Pyrus)))), which have been partly supported in the previous study61. However, more accurate sequence and increased taxa sampling are necessary to further research the monophyly of these genus at the mitogenomes level. In general, the phylogenetic tree topology was in line with the evolutionary relationships among those species, indicating the consistency of traditional taxonomy with the molecular classification.

Conclusions

In conclusion, the current study presented the first mitogenome assembly and annotation of P. serratifolia as well as the mitogenome in the genus Photinia. The mitogenome was 473,579 bp in length, containing 38 protein-coding genes, 23 transfer RNA genes, and 6 ribosomal RNA genes. Comparative analysis of gene structure, codon usage, repeat regions, and RNA-editing sites showed that rps2 and rps11 genes were missing, and a clear bias of RNA-editing sites is existing in the P. serratifolia mitogenome. Furthermore, the Ka/Ks analysis based on code substitution revealed that most of the coding genes had undergone negative selections, indicating the conservation of mitochondrial genes during the evolution. Moreover, Phylogenetic analysis based on the mitogenomes of P. serratifolia and 8 other taxa indicates consistency in molecular and taxonomic classification. These results will help in better understanding the features of the P. serratifolia mitogenome and lay the foundation for identifying further evolutionary relationships within Rosaceae.

Materials and methods

Mitochondrial DNA isolation and genome sequencing

The fresh young leaves of P. serratifolia were collected from City Forest Park in Fuyang of Hangzhou, Zhejiang province of China (119°58′8.4″ E, 30°4′20.28″ N) by Ying Wang and Xingya Wang, which were identified by Dr. Liang Xu of Zhejiang Academy of Forestry, Hangzhou, China. The voucher specimens were stored in the Herbarium of College of pharmaceutical sciences, Zhejiang Chinese Medical University, voucher No. ZCMU4C507. The collection of P. serratifolia was permitted by the City Forest Park. The use of plant leaves in this study complies with all local, national or international guidelines and legislation concerning research involving plants. Leaves were quickly frozen in liquid nitrogen and then stored at − 80 °C refrigerator prior to DNA isolation. High-quality genomic DNA was extracted using a modified cationic detergent cetyltrimethylammonium bromide (CTAB) method62. Sequencing was performed following the protocol for the BGISEQ-500 platform (BGI, Wuhan, China) and the library protocol for Nanopore PromethION sequencing (Pacific Biosciences, Menlo Park, CA, USA).

Genome assembly

In this study, Raw data of second-generation sequencing were filtered using fastp version 0.20.0 software (https://github.com/OpenGene/fastp)63. The three-generation sequencing data of mitochondrial reads were error-corrected, trimmed, and de-novo-assembled using a Canu assembler (version 1.5) with default parameters64. Then, the contig sequence was obtained. The gene databases of plant mitochondria that published on the NCBI were compared using blast v2.6 (https://blast.ncbi.nlm.nih.gov/Blast.cgi), and contigs that matched with the mitochondrial genes as the seed sequence were selected. The original data were used to extend and circularize the contigs to obtain the ring-dominant structure (or secondary ring), and then, the assembly was polished using NextPolish 1.3.1 (https://github.com/Nextomics/NextPolish)65. The assembly results were calibrated using second- and third-generation data, and the parameters were set as rerun = 3 and -max_depth = 100. Then, the final assembly results were obtained.

Genome annotation

The assembled P. serratifolia mitogenome was annotated using the GeSeq tool66 and MITOFY47. To confirm the annotated results, the assembled P. serratifolia mitogenome was also BLAST-searched against protein-coding genes and ribosomal RNA (rRNA) genes of available plant mitogenomes at the NCBI. Then, the sequence coordinates of the identified protein-coding genes (PCGs) were manually verified for start and stop codons. The annotations of transfer RNA (tRNA) genes were also confirmed by tRNAscan-SE 2.067. Vi-ennarNA-2.4.14 was used to visualize the secondary structure of tRNA68. The possible RNA-editing sites in the PCGs of P. serratifolia were predicted using the online predictive RNA editor for plant mitochondrial genes (PREP-Mt) suite of servers (http://prep.unl.edu/)69. The codon frequencies were calculated using the Codon Usage tool in the Sequence Manipulation Suite (bioinformatics.org/sms2/codon_usage.html)70. The relative synonymous codon usage (RSCU) was calculated using the CAI Python package of Lee71. The physical circular map was drawn using the Organellar Genome DRAW (OGDraw) v1.2 program72. The final annotated mitogenome sequences of P. serratifolia have been deposited in the NCBI GenBank (accession no. MZ153172).

Analysis of repeated sequence

The simple sequence and tandem repeats were detected in the P. serratifolia mitogenome. The MIcroSAtellite (MISA) identification tool Perl script was used to detect simple sequence repeats73. The repeats of mono-, di-, tri-, tetra-, penta-, and hexanucleotide bases with 10, 6, 5, 5, 5, and 5 repeat numbers, respectively, were identified. ROUSfinder was used for the identification of repeat elements74. Subsequently, in order to explore whether these identified repeats lead to the formation of multiple mitogenome isoforms, all these sequencing reads were aligned to the P. serratifolia mitogenome using Geneious Basic75. If there are a branching node in repeats, the coverage of reads at the branch will be halved76. In this way, the evidence of recombination that was mediated by repeat sequences could be observed directly.

Phylogenetic tree construction and Ka/Ks analysis

The mitogenomes of previously reported mitochondrial assemblies were down-loaded from the NCBI Organelle Genome Resources Database for R. bibas, P. betulifolia, P. avium, S. torminalis, S. aucuparia, M. hupehensis, M. domestica, and G. biloba (designated as outgroups). These mitogenome sequences were selected due to they are clearly taxonomically classified. The conserved protein-coding genes from the mitogenomes of P. serratifolia and the above 8 species were identified and evaluated. Phylogenetic analyses were conducted using concatenated exon sequences from 17 conserved protein-coding genes (atp4, atp6, atp9, ccmB, ccmC, ccmFn, cob, cox1, nad2, nad3, nad4L, nad5, nad7, nad9, rpl5, rps13, and rps14), which were extracted and aligned using MAFFT v7.402 with default parameters77. ModelTest-NG v0.1.3 was used to determine the best-fit model78, and a maximum likelihood (ML) phylogenetic tree was generated using RAxMLv8.2.12 with the best-fit substitution model (GTR-GAMMA) at 1000 bootstrap replicates79. The synonymous (Ks) and nonsynonymous (Ka) substitution rates of the protein-coding genes in the P. serratifolia mitogenome were analyzed using the 7 species (R. bibas, P. betulifolia, P. avium, S. torminalis, S. aucuparia, M. hupehensis, and M. domesticaand). In this analysis, KaKs_Calculator (v2.0) with the MLWL model was used to calculate Ka/Ks80.