The complete Chloroplast genome of Stachys geobombycis and comparative analysis with related Stachys species

Wang, Ru; Lan, Zheng; Luo, Yongjian; Deng, Zhijun

doi:10.1038/s41598-024-59132-1

Download PDF

Article
Open access
Published: 12 April 2024

The complete Chloroplast genome of Stachys geobombycis and comparative analysis with related Stachys species

Ru Wang¹,
Zheng Lan²,
Yongjian Luo^1,3 &
…
Zhijun Deng¹

Scientific Reports volume 14, Article number: 8523 (2024) Cite this article

315 Accesses
Metrics details

Subjects

Abstract

Herb genomics, at the forefront of traditional Chinese medicine research, combines genomics with traditional practices, facilitating the scientific validation of ancient remedies. This integration enhances public understanding of traditional Chinese medicine’s efficacy and broadens its scope in modern healthcare. Stachys species encompass annual or perennial herbs or small shrubs, exhibiting simple petiolate or sessile leaves. Despite their wide-ranging applications across various fields, molecular data have been lacking, hindering the precise identification and taxonomic elucidation of Stachys species. To address this gap, we assembled the complete chloroplast (CP) genome of Stachys geobombycis and conducted reannotation and comparative analysis of seven additional species within the Stachys genus. The findings demonstrate that the CP genomes of these species exhibit quadripartite structures, with lengths ranging from 14,523 to 150,599 bp. Overall, the genome structure remains relatively conserved, hosting 131 annotated genes, including 87 protein coding genes, 36 tRNA genes, and 8 rRNA genes. Additionally, 78 to 98 SSRs and long repeat sequences were detected , and notably, 6 highly variable regions were identified as potential molecular markers in the CP genome through sequence alignment. Phylogenetic analysis based on Bayesian inference and maximum likelihood methods strongly supported the phylogenetic position of the genus Stachys as a member of Stachydeae tribe. Overall, this comprehensive bioinformatics study of Stachys CP genomes lays the groundwork for phylogenetic classification, plant identification, genetic engineering, evolutionary studies, and breeding research concerning medicinal plants within the Stachys genus.

Phylogenomics and the rise of the angiosperms

Article Open access 24 April 2024

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Introduction

The genus Stachys encompasses a diverse collection of herbaceous and shrubby plants, involving around 300 species distributed across temperate and tropical regions worldwide, except for Australia and New Zealand¹. Stachys species have been found to hold extensive medicinal value and have a long history of use, rendering them highly valuable for medicinal research and development¹. Presently, research on Stachys species primarily focuses on examining their chemical composition and pharmacological effects². For instance, the chemical composition, extraction of active compounds, and pharmacological effects of Stachys plants have been frequently analyzed, with flavonoids, diterpenes, fatty acids, and phenolic acids identified as primary secondary metabolites^3,4,5.

Correctly identifying species is fundamental to biological research. However, Stachys species present challenges due to frequent geological changes, climate variations, and interspecific hybridization. They exhibit extensive variation in morphological and cytological characteristics⁶. However, some species are also highly polymorphic and vaguely delimited, making them challenging taxonomic units in plant classification and phylogenetics⁷. Research on the phylogeny of Stachys mainly focuses on the classification of Stachydeae. Although Stachys species exhibit extensive variations in morphology and cytological characteristics⁸, they typically have tubular to campanulate or nasal shaped calyx with equally or nearly equally short teeth at the calyx apex, and there may be a ring or hair ring inside the corolla tube⁹. Pollen characteristics play an important role in species identification in cytology. However, a study by Salmaki et al.⁸ on the pollen of 30 taxa of the Stachys genus and a closely related genus Sideritis montana distributed in Iran found that while some Stachys plants exhibit distinctive pollen morphological features, they cannot be completely differentiated based on pollen morphology alone. Hence, relying solely on morphological and cytological analysis is insufficient for such a complex genus¹⁰. The development of sequencing technologies and the expansion of molecular databases have rendered them powerful tools for exploring the differentiation and interspecific relationships of Stachys species. In previous studies, molecular evidence such as ISSR^10,11, RAPD¹⁰, and DNA fragments including the nrITS region^7,12,13 and cpDNA fragments^9,14 has been utilized to reconstruct the phylogeny of Stachydeae. For instance, Salmaki et al.⁷ conducted nuclear (ribosomal ITS) and plastid (trnL intron, trnL-trnF spacer, rps16 intron) DNA sequence analysis of 143 species in the Stachydeae tribe, and found that both nuclear and plastid DNA data supported the monophyly of the Stachydeae tribe. Phylogenetic studies of Stachys plants based on ribosomal and plastid DNA data^15,16 demonstrated it as an incomplete clade divided into two distinct lineages. The center of diversity for the first lineage is located in the eastern Mediterranean region and has migrated over time to West Asia, Western Europe, Macaronesia, and sub-Saharan Africa. Meanwhile, the second lineage includes Hawaiian mints, Suzukia, all New World Stachys species, and some Old World species¹⁷. Berumen et al. used cpDNA regions for plant phylogenetic reconstruction and suggested reducing the number of members in the Stachys coccinea complex to three species, i.e., S. coccinea, Stachys lindenii, and Stachys albotomentosa. Meanwhile, their original ranges, including S. pacifica, Stachys manantlanensis, Stachys torresii, and Stachys jaimehintonii should be retained as varieties of S. coccinea⁹. In addition, using nrITS DNA region sequences, Özal et al. found the phylogenetic relationship between newly discovered Stachys species and their close relatives, and the newly discovered Stachys istanbulensis and its relatives Stachys recta and Stachys atherocalyx formed a branch¹⁷.

Chloroplasts are semiautonomous organelles unique to higher plants and some algae, which are also present in a few protists¹⁸. The CP genome, known for its sequences short and relative independence from the nuclear genome, holds a crucial position as the second largest genome in the plant kingdom¹⁹. Within angiosperms, chloroplasts harbor a wealth of genetic information and present distinctive characteristics such as small relative molecular weight, simple structure, moderate evolutionary rate, low mutation rate, genetic stability, low cost, and ease of development of microsatellite sequences²⁰. partially compensating for the limitations of mitochondrial and nuclear genomes²¹. Moreover, CP genome research considerably triggers studies on single nucleotide polymorphisms (SNPs), phylogenetics²², and DNA barcoding²³, while facilitating investigations into the geographical origins of domesticated crops^24,25,26. In recent years, CP genomes have found extensive application in classification studies at the genus and even family levels for various plants^27,28. Chloroplast genomes have been widely applied in taxonomic studies at the genus and even family levels of various plants. For instance, based on the sequences of 79 chloroplast protein-coding genes, Zhao et al.²⁹ selected 175 species from 79 genera in the Lamiaceae family, proposing a new classification system with 12 subfamilies and 22 tribes within Lamiaceae. Li et al.³⁰ conducted the first examination of the structural patterns of Pholidota plastomes, providing novel insights into the phylogenetic relationships within Pholidota and its related genera through comprehensive genome data analysis. These studies have demonstrated that utilizing chloroplast genomes can effectively enhance phylogenetic resolution. However, only brief reports on the chloroplast genomes of Stachys species, specifically Stachys sieboldii³¹and Stachys japonica³² have been published, leaving room for in depth analysis of the chloroplast genomes.

Stachys geobombycis is a perennial herbaceous plant of the Lamiaceae family in the Stachys genus, mainly found in various provinces in southern China. The main edible part of S. geobombycis is its underground tuber, which has beneficial effects such as clearing heat and detoxifying, promoting blood circulation and removing stasis, dispelling wind and dampness, nourishing Qi and blood, as well as promoting health and beauty. It is often used as a medicinal resource and cooking ingredient.³³. Currently, studies on S. geobombycis have been rarely reported, with the focus placed on chemical composition and pharmacological properties^34,35,36. It should be noted that some morphological characteristics of the Stachys genus exhibit minimal differentiation, and phenotypic traits show instability¹. For example, based on phenotypic traits alone, it is difficult to distinguish S. geobombycis from Stachys sieboldii and Stachys affinis, which also rely on underground tubers for food. This poses significant challenges to the agricultural production of S. geobombycis. Besides, this characteristic hinders rapid and precise classification based solely on morphological attributes, posing challenges to species identification within the Stachys genus³⁷. The rapid development of molecular biology and genomics provides valuable genetic information for systematic evolution and species identification in the study of plant chloroplast genomes³⁸. However, the chloroplast genomes of the Stachys genus have been relatively underrepresented, and there is a lack of comprehensive and collaborative research on chloroplast genome datasets^31,32,39. To this end, the complete sequence of the chloroplast genome of S. geobombycis was hereby collected, and compared with the chloroplast genomes of 7 closely related species through comparative genomics. The present study was carried out for the following purposes: (1) to compare the characteristics of the chloroplast genome of Stachys genus and detect differences among 7 species; (2) to identify repeat sequences, simple sequence repeats and genetically variable regions, and select divergence hotspots as candidate DNA markers; (3) to explore their IR expansion and contraction, and estimate genes selective pressure and codon usage; (4) to reconstruct phylogenetic relationships of Stachys species based on the cp genome alignments, and verify their phylogenetic position within Lamioideae. Overall, this study is expected to provide theoretical basis for the genetic breeding and phylogenetic research of Stachys plants.

Results

Characterization of the CP genome structure of Stachys species

The CP genome of S. geobombycis was submitted to the GenBank database with the accession number OR327475 maintained by the National Center for Biotechnology Information (NCBI). The total length of the chloroplast (CP) genome for S. geobombycis is 150,567 base pairs (bp), and it has been sequenced with an average coverage depth of 1612.19x (Supplementary Fig. 1). It possessed a unique quadripartite structure comprising an LSC (large single copy), an SSC (small single copy), and a pair of IRs (inverted repeats) measuring 81,692 bp, 17,567 bp, and 25,654 bp, respectively (Fig. 1). In the CP genome of S. geobombycis, there were a total of 131 predicted functional genes. Among these, there were 110 unique genes, which could be further classified into different groups, including 8 rRNA genes, 36 tRNA genes, and 87 protein coding genes. Besides, the protein coding genes were divided into 4 groups based on their functions. The first group consisted of 45 photosynthetic genes, the second group included 63 self replication expression related genes, the third group contained 6 other genes, and the fourth group comprised 7 genes of unknown function (Supplementary Table 1). Additionally, 13 in the genome contained intron sequences. Among these, the genes clpP and ycf3 contained two introns each, while the remaining genes had only one intron each (rps16, atpF, rpoC1, petB, petD, rpl2, ndhB, ndhA, ycf1, ndhB, and rpl2).

The CP genomes of the 8 studied Stachys species exhibited a characteristic circular double chain structure, varying from 149,523 to 150,599 bp in size (Table 1). The Stachys plastomes displayed the conventional quadripartite architecture, consisting of a LSC region (81,156–81,743 bp) and a SSC region (17,057–17,977 bp) separated by two IR regions (25,250–25,666 bp). The total GC content of all 8 CP genomes was similar, ranging from 38.36 to 38.53%. Variation in the number of genes among individuals was observed in some species. For instance, S. japonica and S. coccinea were reported to possess 123 and 133 genes, respectively³². To mitigate the influence of refer-ence genomes and annotation software, the plastid genomes of 7 Stachys species obtained from NCBI were re-annotated utilizing the PGA (Plastid Genome Annotator) program⁴⁰ and Geneious v11.0.336⁴¹, with S. geobombycis taken as the reference. The analysis revealed that all plant genomes were annotated with a total of 131 genes (Except S. coccinea), comprising 87 protein coding genes, 8 ribosomal RNA genes, and 36 transfer RNA genes. The gene count and types were consistent with those of S. geobombycis, indicating strong conservation of the Stachys CP genome in genetic evolution.

Table 1 Comparison of CP genome features of eight Stachys species.

Full size table

Codon usage analysis

In order to investigate codon usage patterns and nucleotide composition in the 8 Stachys plastomes, amino acid frequency, codon usage number, and the relative synonymous codon usage (RSCU) were analyzed and summarized⁴². The results showed that all 58 homologous protein-coding genes in these species consisted of 64 codons, and encoded 20 amino acids, including three stop codons (UAG (*), UAA (*), UGA (*)) (Supplementary Table 2). The number of encoded codons varied between 21,330 and 22,935 across the species. While the overall count of codons exhibited minimal variation, the types of codons and amino acids remained consistent. Besides, the RSCU value was used to measure the association between the observed frequency and the anticipated frequency of a particular codon. Out of the 64 codons, excluding the three stop codons and the unbiased methionine (Met) and threonine (Thr) (RSCU = 1), 31 codons displayed a preference with RSCU values exceeding 1, indicating a higher priority for these codons. Among them, the AUU codon for Leucine (Leu) had the highest frequency as indicated by an average RSCU value of 1.89. The remaining 31 analyzed codons showed relatively low bias, with RSCU values less than 1 (Fig. 2). The codons in the eight CP genomes of Stachys species exhibited a preference for A/T bases and A/T-ending codons, as evidenced by the GC and GC3s content being below 0.5. Besides, analysis of codon adaptation index values and an effective number of codon values revealed a minor tendency toward biased codon usage in the Stachys species. The frequency of optimal codons was relatively low. Furthermore, the hydrophobicity and aromaticity of the protein, as measured by Gravy and Aromo respectively, had a minimal effect on the observed bias in codon usage.

Identification of repeat elements

Among the 8 analyzed CP genomes, a total of 512 long repeats were detected, comprising 135 forward repeats, 198 tandem repeats, 17 reverse repeats, and 162 palindromic repeats. The analysis revealed a varying number of repeated sequences in the 11Stachys CP genomes, ranging from 58 in Stachys chamissonis to 75 in Stachys byzantine. Tandem repeats were the most common type among these repeats, accounting for 33.3–45.43% of the repeats and varying from 22 (Stachys affinis, S. chamissonis, and Stachys palustris) to 34 (S. byzantine, S. palustris), followed by palindromic repeats (31.34–32.78%), ranging from 19 (S. chamissonis) to 22 (Stachys sylvatica), and then by for-ward repeats (22.67–28.98%)ranging from 15 (S. chamissonis) to 20 (S. sylvatica) (Fig. 3A). Meanwhile, the length of long repeats differed across the 8 sequenced CP genomes, with the majority falling within the 30–49 bp range (Fig. 3B–D).

The highest number of SSRs was identified in S. chamissonis (51), followed by S. coccinea (48) (Fig. 4A), while the smallest number of SSRs, 152, was identified in S. palustris. The most frequently observed SSR type was mononucleotide repeats, ranging from 20 to 31. All species exhibited Mono-, di-, tri-, and tetra-nucleotide repeats, while penta-nucleotide repeats were only observed in S. coccinea, and hexa-nucleotide repeats were present in S. chamissonis, S. palustris, and S. sylvatica species. Furthermore, as shown in Fig. 4B, the SSR distribution was mostly located in the LSC region (51.28–68.75%), followed by the SSC (13.36–28.21%) and IR regions (15.68–20.51%).

IR contraction and expansion in the Stachys CP genomes

The genotypes of the IR-LSC and IR-SSC boundaries were almost identical, and the lengths of the IRs in the 8 Stachys cp genomes were relatively conserved (25,250–25,655 bp) (Fig. 5), involving no obvious amplification or contraction events. At the LSC-IRB boundary, a fragment of rps19 gene was detected, with no difference in the 8 CP genomes. At the IRB/SSC boundary, the pseudogene ycf1 was situated 5 bp from the left side of LSC-IRB, with the ndhF gene extending across the LSC region for 28–29 bp, overlapping with the ycf1 pseudogene. At the SSC/IRA boundary, the ycf1 gene was consistently found in all 8 cp genomes and spanned 1092–1093 bp across the IRA boundary. The trnH gene was located 1417–1418 bp from the SSC/IRA boundary within the IRA region. The rpl2 and trnH genes were positioned between the IRa-LSC boundary, with rpl2 located on the left side of the boundary for approximately 93–94 bp, and trnH on the right side, spanning 0–1 bp.

Comparative genomic analysis

The multiple genome alignment method of chloroplast genomes detected only one locally collinear block (LCB) among the 8 sequences (Supplementary Fig. 2). The types, quantities, and arrangement of all genes remained highly consistent within the genus. The chloroplast genomes of this genus were completely collinear, without any occurrence of rearrangement or recombination events, further indicating a high level of conservation. In order to elucidate the sequence differences among Stachys plants, the analysis was conducted with the S. affinis CP genome as reference in the mVista software (Fig. 6), and the results indicated no significant alterations such as large fragment inversions, duplications, or other structural changes of the 8 plastomes. The genomes exhibited a high level of collinearity, indicating a consistent evolutionary conservation at the genomic level. Besides, sequence differences were higher in the LSC and SSC regions compared to the IR region. Consistent with previous studies on angiosperm CP genomes, the coding regions exhibited a higher degree of conservation compared to the non-coding regions.

Subsequently, multiple sequence alignments of the complete CP genomes were conducted, and the nucleotide diversity within a 600bp window was then calculated for all 8 CP genomes, The observed values ranged from 0 to 0.07746 (Fig. 7), and the analysis showed the highest difference level of the SSC region compared to other regions (Average pi = 0.007222681). The LSC region had a range of PI values from 0 to 0.04554, with an average of 0.005641219, while the IR region had the lowest Pi values, ranging from 0 to 0.02134, with an average of 0.001354375, indicating the IR region as the most conserved region, which was consistent with the results shown in Fig. 6. In addition, 5 hotspot regions (Pi > 0.02, average = 0.037), including trnH-GUG-psbA, trnK-UUU-rps16, trnG-UCC-trnR-UCU, trnN-GUU-trnR-ACG, and rps12-trnV-GAC, were identified. These sequences could serve as potential markers for further phylogenetic reconstruction and species identification in Stachys species.

Phylogenetic analysis

In order to ascertain the evolutionary position of Stachys within the Lamioideae subfamily, the complete CP genome sequences of 32 other sequenced CP genomes were aligned by multiple sequence alignments. Phylogenetic reconstruction was carried out using the ML (maximum likelihood) method and BI (Bayesian inference) analysis based on 40 complete CP genomes of Lamiaceae (Fig. 8 and Supplementary Fig. 3). Both methods produced nearly identical tree topologies with high support values of Bayesian posterior probability (PP) and maximum likelihood bootstrap support (BS) in each branch. Almost all nodes on the phylogenetic tree received strong support (PP/BS = 1/100), even though some clades represented a limited number of species. In the phylogenetic tree, Pogostemoneae was identified as the earliest diverging branch, followed by Gomphostemmatae, Colquhounieae, Synandreae, Betoniceae, Galeopseae, Stachydae, Aparlomideae, Phlomideae, Leonureae, Marrubieae, Leucadeae, and Lamieae. All Stachys samples were nested within Stachydeae, with S. geobombycis being closely related to S. japonica and S. affinis. S. byzantina from Western Asia was located in the basal clade of Stachydeae, while species S. chamissonis and S. coccinea from North America were in the basal clade of other Stachydeae species. Besides, the remaining Stachys species from East Asia were grouped together, forming a well supported branch, which was consistent with previous reports^7,29.

Discussion

Structure characteristics of the CP genome in the Stachys species

The chloroplast (CP) genome of plants is a valuable resource for studying intra and inter species evolution and developing molecular markers¹⁸. In this particular investigation, CP genome sequencing and annotation of S. geobombycis officinale were conducted, and its features were compared with those of 7 other Stachys species. The findings demonstrate that the Stachys CP genome, like other angiosperms, consisted of a circular double stranded DNA molecule with a conserved quadripartite structure^38,43. This structure included a large single-copy region (LSC), a small single copy region (SSC), and two inverted repeat regions (IRs). Notably, the Stachys CP genome exhibited significant conservation and similarity to previously reported plastomes within the Lamiaceae family^44,45.In terms of size, the Stachys CP genome ranged from 14,523 to 150,599 bp, displaying a difference of 1077 bp across different genomes, which indicated the relatively stable and conserved nature of the Stachys CP genome. The major variation in genome size could be attributed to the varying length of the LSC (558 bp), suggesting that changes in LSC length primarily drive the variation in genome length. Overall, the GC content of the 8 Stachys samples ranged from 38.36 to 38.53%. Generally, higher GC content contributed to the stability and complexity of genome sequences⁴⁶. Interestingly, the present study revealed a lower GC content in the LSC and SSC regions of the CP genome than in the IR region, which was a characteristic feature observed in angiosperms. This discrepancy was primarily attributed to the presence of four high GC content rRNA genes in the IR region⁴⁷. Furthermore, the gene composition, protein coding genes, tRNA, and rRNA in the Stachys CP genome exhibited high similarity. This conservation of plastomes was consistent with previous reports in various angiosperms, such as Malvaceae and Araceae, where identical gene content and order were observed^48,49,50. Similarly, a high degree of conservation in the CP genomes of Stachys species was hereby. Various molecular mechanisms, including maternal inheritance, rarity of plastid fusion, and active repair mechanisms, were found to contribute to the maintenance of CP genome conservation in Stachys⁵¹, resulting in the typically conservative nature of Stachys plastomes.

Analysis of repetitive sequences and codon bias

In terms of evolutionary rate and pattern, the synonymous codon usage bias (SCUB) in plant CP genomes differs from that of mitochondrial and nuclear genomes. In addition to being influenced by DNA sequence mutation pressure and natural selection affecting gene translation⁵², SCUB in plant chloroplast and mitochondrial genomes is also associated with other factors such as tRNA abundance, strand specific mutation bias, gene expression levels, and gene length^{53,54,55,56,57}. These influencing factors have been widely used to explain variation in codon usage among species and within genomes⁵⁸. In this study, specific codons were found to be more frequently used than other synonymous codons in the nucleotide sequences of protein coding genes in the Stachys species CP genomes, consistent with previous reports. Besides, no distinct species specific features were observed in codon usage levels among the 8 Stachys species. Relative Synonymous Codon Usage (RSCU) is often used to reflect codon bias. The GC content of the CP genomes is a result of the balance between mutation pressure and adaptation, which is one of the most common influences in the formation of codon usage bias⁵⁹. Although synonymous mutations of the third base of a codon do not change the amino acid type, it is still considered an important feature in determining the amino acid type. Therefore, GC3 is often used as a significant indicator of codon preference⁶⁰. Herein, the Stachys species CP genomes, all optimal synonymous codons (RSCU>1), except UUG and UCC, had A or U at the end, leading to a preference for A/T bases throughout the genome. Chloroplast transformation has made significant strides in various domains, including crop salt tolerance, drought resistance, and herbicide resistance in recent years. In conducting chloroplast gene breeding research, it is crucial to consider the stability of chloroplast genes and their genetic diversity⁵⁴. RSCU can affect gene expression by regulating the accuracy and efficiency of translation, with stronger RSCU values associated with higher gene^61,62,63. In chloroplast gene expression vector design, optimizing codons according to their bias can boost the expression levels of inserted genes in the CP genome. Additionally, known codon usage patterns can help predict the expression and function of unknown genes⁶⁴. The identified codons with RSCU>1 can serve as efficient indicators for detecting the expression levels of hypothetical genes or open reading frames. They are also valuable for designing primers and introducing point mutations in agricultural breeding research.

SSRs can be found in various regions of both prokaryotic and eukaryotic organisms, including both coding and noncoding regions⁶⁵. Microsatellite sequences are favored markers in plant genetics and breeding research due to their variability, simplicity in utilization, detectability, and repeatability. They have been extensively employed in studies related to biodiversity and population genetic⁶⁶. In this study, a total of 358 SSRs were detected in the CP genomes of 8 Stachys species, and the majority of SSRs were found in the LSC region, which might be correlated with the length of the LSC region. Additionally, they were predominantly composed of A/T bases, consistent with the AT richness observed in the entire CP genome¹⁹. Moreover, repeat sequences are essential for studying insertions, deletions, and replacements, and they are highly abundant in the chloroplasts of Stachys members⁶⁷. In this study, a total of 512 long repeat sequences, encompassing 4 types, were identified. Overall, the SSRs and long repeat sequences identified in this study provided useful information for further research on molecular marker development, population genetics, evolution, breeding, species identification, and conservation studies in the Stachys species⁶⁸.

Comparative genomics and highly variable regions analysis

In angiosperms, it is common for the Inverted Repeat (IR) regions in the CP genome to undergo expansion and contraction. This phenomenon often results in size variations, gene duplications or deletions, and the generation of pseudogenes⁶⁹. Abnormal expansions of the IR region, transferring a large number of genes from the Small Single Copy (SSC) region to the IR region, have been observed in some taxa, such as Paphiopedilum ⁷⁰, Bidens⁷¹, and Pilea⁵⁶. In this study, significant similarities were found in the expansion and contraction of the IR regions in Stachys, with highly consistent distribution and positioning of genotypes in these regions. The IR boundaries were relatively stable, consistent with previous reports in Lamiaceae⁴⁹. The movement of the IR/SSC boundary in Stachys always leads to an increase in the length of the IR region. Overall, the conservation of the IR region in Stachys may contribute to its overall length and structural stability.

DHighly variable regions with informative sites served as DNA barcodes, enabling the construction of phylogenetic trees and identification of closely related species, and expediting the discovery of previously unidentified organisms in nature⁷². Due to the insufficiency of classical DNA barcodes (rbcL, matK, psbA-trnH, and ITS2) for species identification and phylogenetics in Stachys, additional highly variable regions at the genus level as potential markers of Stachys should be explored for future identification studies. Based on mVISTA and nucleotide diversity analysis, 5 highly variable regions, including trnH-GUG-psbA, trnK-UUU-rps16, trnG-UCC-trnR-UCU, trnN-GUU-trnR-ACG, and rps12-trnV-GAC, were hereby identified, had been validated as reliable markers in previous studies. For example, Fan et al.²⁰ found that the amplified fragment of trnH-GUG-psbA could effectively distinguish plants within the Papaver genus. Yang et al.⁷³ used three pairs of primers amplifying variable DNA sequences located in the psbA-trnK, psbB-psbH, and trnR-trnN regions, and their analysis using Maximum Parsimony showed consistent classification and phylogenetic results, making them useful tools for plant species identification and phylogenetic research. Overall, these candidate barcode regions could provide rich molecular marker development information.

Phylogenetic analysis

Powerful molecular phylogenetics is the basis for establishing stable classifications and provides a solid framework for understanding diverse patterns, historical Biogeography, and trait evolution⁷⁴. The Lamiaceae family, ranked as the sixth largest among angiosperms, serves as a significant reservoir of essential oils, timber, ornamental plants, culinary herbs, and medicinal herbs. This diversity makes it a crucial subject of study in fields such as ecology, ethnobotany, and floristics⁷⁵. In recent studies, Stachys has been placed in the tribe Stachydeae within the subfamily Lamioideae, and within Stachydeae, 12 genera and approximately 470 species have been recognized⁷. Stachydeae is the largest and most challenging tribe in the subfamily Lamioideae in terms of classification¹⁶, which has also been the focus of some previous molecular phylogenetic studies^16,76,77. These phylogenetic studies are mostly constructed based on gene fragments of Stachydeae or multiple gene fragments of a species to build phylogenetic trees. However, due to the limited number of informative sites, they fail to fully explain the phylogenetic relationships and systematic position of Stachydeae plants⁷⁸. Mounting evidence has indicated the suitability of CP genome sequences for inferring phylogenetic relationships across various taxonomic levels⁷⁹. Using complete CP genomes, many deep level phylogenetic questions have been resolved, such as the determination of the earliest diverging lineages of angiosperms^68,80,81 or the phylogenetic relationships among Ferula species⁸². This approach can better elucidate the complex evolutionary relationships among angiosperms. At the same time, CP genome datasets can also address shallow level phylogenetic questions. In our study, we constructed a phylogenetic tree of 40 Lamioideae plants based on sequence data. The results showed well supported nodes in the phylogenetic tree, and Stachy species were not a monophyletic group. This outcome aligned with the overall findings of chloroplast genome-based phylogenetic studies in Lamiaceae²⁹. However, previous phylogenetic studies based on chloroplast genomes suggested S. sylvatica as a basal branch of the Stachydea system, while S. byzantina from West Asia was hereby found to be located at the basal branch of Stachydea. This result was consistent with the study by Xue et al., who utilized ITS + ETS + 5S-NTS for phylogenetic research on Stachys species³⁷. Furthermore, S. chamissonis and S. coccinea from North America clustered together with other Stenogyne, Phyllostegia, and Haplostachys plants, while the remaining Stachys plants from East Asia formed a clade. This suggested that geographical isolation might have a greater impact on the interspecific relationships within Stachys. Overall, the research results offer important implications for the assessment of genetic diversity and systematic phylogenetic studies of Stachys in the future. However, this study still failed to fully elucidate the relationships between genera. Additionally, the phylogenetic research was solely based on the chloroplast genome. In this case, the nuclear genome of plants should be further analyzed to comprehensively understand the phylogeny of Stachydeae and even Lamiaceae species, and future studies should also have more genera included. Nevertheless, the phylogenetic research still provides valuable resources for the classification, systematic phylogeny, and evolutionary history of Stachys.

Conclusions

The present research was primarily conducted to assemble the S. geobombycis genome and to reannotate the entire CP genome of Stachys species. Efforts were made to investigate the characteristics of the cp genome and explore the phylogenetic relationships among Stachys plants. The comparative analysis revealed conserved genome size, gene structure, and organization across the Stachys species. Furthermore, long repetitive sequences, SSRs, and regions with high variability were identified in the Stachys species. Overall, findings serve as a foundation for analyzing genetic diversity, developing mo-lecular markers, and addressing classification and identification challenges within this genus. Additionally, by examining the genetic relationships within Stachys species, this research offers comprehensive insights into phylogenetic connections, sheds light on its evolutionary history, and facilitates further research in this field.

Materials and methods

Sample collection and DNA extraction

Fresh leaves of S. geobombycis were collected from Guangning County, Guangdong Province, China (\(21^\circ 38^\prime 49.6^{\prime \prime }\) N, \(39^\circ 41^\prime 49.3^{\prime \prime }\) E). This plant is currently preserved at the Guangdong Crop Germplasm Resources Nursery (http://gdseedbank.cn/catalog/guild/). The germplasm resource number is 20224412265. Please contact yongjianluo@foxmail.com for free access. The corresponding GenBank accession number in NCBI is OR327475. Related CP genome sequences were retrieved from the NCBI database. Detailed information about the experimental materials is presented in Table 1. Total DNA was extracted from dried fresh leaves of the samples using a plant DNA extraction kit manufactured by Tiangen Biotech Co., Ltd, and the integrity of the extracted DNA was assessed through 1% agarose gel electrophoresis. Subsequently, the samples were sent to BGI Genomics for further analysis, where the purity and concentration of the total DNA were determined using the NanoDrop 2000 spectrophotometer by Thermo Scientific, USA.

Library construction and De novo Genome sequencing

MGISEQ-2000 sequencing platform was used to construct a library with an insertion fragment of 500 bp, and paired end sequencing was performed to obtain 150 bp sequences at both ends of each read. Following sequencing, the filtering software SOAPnuke v2.0 33⁸³ (https://github.com/ The Beijing Genomics Institute (BGI)-flexlab/SOAPnuke), developed by BGI, was employed for filtering with specific parameters: (1) Adapter trimming: Reads exhibit a 25% or higher match to an adapter sequence were entirely discarded; (2) Low quality filtering: Reads with bases having a quality value below 20 that account for 30% or more of the total read were eliminated; (3) N removal: Reads containing 1% or more N bases with respect to the entire read were removed; (4) Acquisition of clean reads. The resulting data were stored in FASTQ format for subsequent assembly and annotation

Assembly and annotation of the CP genome

The assembly of the CP genome was conducted using NOVOPlasty v2.7.2 software; the size of k-mers was 39⁸⁴. The gene annotation for the CP genome of S. geobombycis, and the downloaded complete CP genomes from NCBI were performed using the default parameters of the Plastid Genome Annotator (PGA) program⁴⁰. Additional manual refinements were carried out using Geneious v11.0.3⁴¹. Upon the completion of the annotation process, the data were submitted to the NCBI database (https://www.ncbi.nlm.nih.gov/genbank/), and the online tool OGDRAW-DRAW Organelle Genome Maps (https://chlorobox.mpimpgolm.mpg.de/OGDraw.html) was employed visualization of the chloroplast structure.

Analysis of contraction and expansion of IR boundaries in CP genomes

The synonymous codon usage in the CP genomes of the three mentioned plants was hereby compared using CodonW v1.42 software (http://codonw.sourceforge.net). The encoded genes were filtered based on the following criteria: (1) The gene’s start codon was ATG. (2) The gene length was \(\ge \) 300 bp. (3) Only one gene was selected for genes located in repetitive regions, and pseudogenes were excluded. Meanwhile, the Shuffle-LAGAN mode of the online tool mVISTA (https://genome.lbl.gov/vista/mvista/submit.shtml)⁸⁵ was utilized to perform a comparative analysis of the CP genomes of the 8 plants. MISA software⁸⁶ was used for SSR analysis of the CP genomes, involving parameters including mononucleotide SSRs (repeat unit of 10), dinucleotide SSRs (repeat unit of 6), trinucleotide SSRs (repeat unit of 5), and tetra, penta, and hexanucleotide SSRs (repeat unit of 4). Furthermore, REPuter (https://bibiserv.cebitec.unibielefeld.de/sessionTimeout.jsf)⁸⁷, an online software, was employed for long repetitive sequence analysis of the CP genomes of the 8 plants, and parameters including Hamming Distance of 3 and a minimum repeat unit of 30 base pairs.

Codon preference and repetitive sequence analysis of the CP genome

In this study, Geneious v11.0.3 software⁴¹ was used to determine the lengths of the IRa/IRb, LSC, and SSC regions, as well as the boundary genes, in the CP genomes of the Stachys species. To visualize and compare the IR boundaries in the CP genomes of 8 Stachys species, Adobe Illustrator software was employed for creating comparison maps. For the detection of intra species variations, the mVISTA software was utilized to compare the CP genomes within the Stachys species. Meanwhile, Mauve software 110 was applied for the analysis of the homology and collinearity of the CP genome sequences. For calculating nucleotide diversity values, DnaSP v6.0⁸⁸ specifically for the CP genomes of Stachys species was used. Finally, the MAFFT v7.487 software⁸⁹ was chosen to perform sequence alignments for all CP genome sequences.

Phylogenetic analysis

To investigate the placement of Stachys within the Lamioideae subfamily and explore relationships between different Stachys species, multiple alignments were conducted using complete CP genome sequences from 40 samples representing the Lamioideae subfamily. This comprehensive dataset included representatives from 18 different genera. Besides, the analysis involved the use of several outgroups, including Lancea hirsuta, Lancea tibetica, Rehmannia solanifolia, Triaenophora shennongjiaensis, Wightia speciosissima, and Mazus pumilus. The Phylogenetic trees of complete chloroplast genomes were constructed using the maximum likelihood (ML) method and the Bayesian inference (BI) method. ML analysis was conducted by IQ-TREE (version 2.1.3) in Phylosuite^90,91 software with a GTR + F + I substitution model and 1000 bootstrap replicates. The Bayesian inference (BI) tree was implemented in MrBayes in Phylosuite⁹¹ and ran for two million generations in total. Based on the Markov chain Monte Carlo (MCMC) algorithm, the best fitting GTR + F + I substitution model was determined with sampling after every 1000 generations, and the running was stopped once the value of the average standard deviation of split frequencies was less than 0.01 Finally, less than 25% of the aging samples were discarded and a consistent tree was constructed based on the remaining samples.

Statement of plant collection

We hereby declare that Stachys is not a plant species covered by the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora. The botanical collection work involved in this research has obtained the necessary permits and approvals from relevant local institutions, and strict compliance with applicable laws and guidelines has been ensured. Moreover, we have minimized the impact on the environment and ecosystems during the collection process and made every effort to maintain the survival and reproductive capacity of the Stachys plants.

Data availability

The complete chloroplast genome of S. geobombycis generated in this study was submitted to the NCBI database (https://www.ncbi.nlm.nih.gov/) with GeneBank accession number OR327475. The assembled genome sequences and raw sequencing data are accessible in the NCBI database under the research accession PRJNA1076470 with the sample identification number SRR27966668.

References

Tomou, E.-M., Barda, C. & Skaltsa, H. Genus stachys: A review of traditional uses, phytochemistry and bioactivity. Medicines 7, 63. https://doi.org/10.3390/medicines7100063 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sadeghi, H. et al. Stachys pilifera benth: A review of its botany, phytochemistry, therapeutic potential, and toxicology. Evid. Based Complement. Altern. Med. 1–9, 2022. https://doi.org/10.1155/2022/7621599 (2022).
Article Google Scholar
Marin, P. D., Grayer, R. J., Grujic-Jovanovic, S., Kite, G. C. & Veitch, N. C. Glycosides of tricetin methyl ethers as chemosystematic markers in stachys subgenus betonica. Phytochemistry 65, 1247–1253. https://doi.org/10.1016/j.phytochem.2004.04.014 (2004).
Article CAS PubMed Google Scholar
Meremeti, A., Karioti, A., Skaltsa, H., Heilmann, J. & Sticher, O. Secondary metabolites from Stachys ionica. Biochem. Syst. Ecol. 32, 139–151. https://doi.org/10.1016/S0305-1978(03)00161-3 (2004).
Article CAS Google Scholar
Karioti, A., Bolognesi, L., Vincieri, F. F. & Bilia, A. R. Analysis of the constituents of aqueous preparations of stachys recta by hplc-dad and hplc-esi-ms. J. Pharm. Biomed. Anal. 53, 15–23. https://doi.org/10.1016/j.jpba.2010.03.002 (2010).
Article CAS PubMed Google Scholar
Mulligan, G. A. & Munro, D. B. Taxonomy of species of North American Stachys (Labiatae) found north of Mexico. J. Pharm. Biomed. Anal. 116, 35–51 (1989).
Google Scholar
Salmaki, Y. et al. Molecular phylogeny of tribe Stachydeae (Lamiaceae subfamily Lamioideae). Mol. Phylogenet. Evol. 69, 535–551. https://doi.org/10.1016/j.ympev.2013.07.024 (2013).
Article PubMed Google Scholar
Salmaki, Y., Jamzad, Z., Zarre, S. & Bräuchler, C. Pollen morphology of Stachys (Lamiaceae) in Iran and its systematic implication. Flora Morphol. Distrib. Funct. Ecol. Plants 203, 627–639. https://doi.org/10.1016/j.flora.2007.10.005 (2008).
Article Google Scholar
Berumen Cornejo, A. M., Lindqvist, C., Perez Molphe Balch, E. M. & Siqueiros Delgado, M. E. Phylogeny of the Stachys coccinea (Lamiaceae) complex based on molecular and morphological data. Syst. Bot. 42, 484–493. https://doi.org/10.1600/036364417X696113 (2017).
Article Google Scholar
Kochieva, E. Z., Ryzhova, N. N., Legkobit, M. P. & Khadeeva, N. V. Rapd and ISSR analyses of species and populations of the genus Stachys. Russ. J. Genet. 42, 723–727. https://doi.org/10.1134/S1022795406070039 (2006).
Article CAS Google Scholar
Kharazian, N., Rahimi, S. & Shiran, B. Genetic diversity and morphological variability of fifteen Stachys (Lamiaceae) species from Iran using morphological and issr molecular markers. Biologia 70, 438–452. https://doi.org/10.1515/biolog-2015-0051 (2015).
Article CAS Google Scholar
Dündar, Ekrem, Akçiçek, Ekrem, Dirmenci, Tuncay & Akgün, Şakir. Phylogenetic analysis of the genus Stachys sect. eriostomum (Lamiaceae) in turkey based on nuclear ribosomal its sequences. Turk. J. Bot. 37, 14–23. https://doi.org/10.3906/bot-1203-26 (2013).
Article CAS Google Scholar
Salmaki, Y. Investigation of the evolutionary trend of morphological characters of Stachys (Lamiaceae) in Iran based on nrITS sequences data. Nova Biologica Reperta 3, 327–340. https://doi.org/10.21859/acadpub.nbr.3.4.327 (2017).
Article Google Scholar
Roy, T., Cole, L. W., Chang, T.-H. & Lindqvist, C. Untangling reticulate evolutionary relationships among new world and Hawaiian mints (Stachydeae, Lamiaceae). Mol. Phylogenet. Evol. 89, 46–62. https://doi.org/10.1016/j.ympev.2015.03.023 (2015).
Article PubMed Google Scholar
Scheen, A.-C. et al. Molecular phylogenetics, character evolution, and suprageneric classification of Lamioideae (Lamiaceae)1. Ann. Mo. Bot. Gard. 97, 191–217. https://doi.org/10.3417/2007174 (2010).
Article Google Scholar
Roy, T., Chang, T.-H., Lan, T. & Lindqvist, C. Phylogeny and biogeography of new world Stachydeae (Lamiaceae) with emphasis on the origin and diversification of Hawaiian and south American taxa. Mol. Phylogenet. Evol. 69, 218–238. https://doi.org/10.1016/j.ympev.2013.05.023 (2013).
Article CAS PubMed Google Scholar
Güner, Ö. Stachys Istanbulensis (Lamiaceae) a new species from Turkey: Evidence from morphological, micromorphological and molecular analysis. Turk. J. Bot. 46, 624–635. https://doi.org/10.55730/1300-008X.2737 (2022).
Article Google Scholar
Dobrogojski, J., Adamiec, M. & Luciński, R. The chloroplast genome: A review. Acta Physiol. Plant. 42, 98. https://doi.org/10.1007/s11738-020-03089-x (2020).
Article CAS Google Scholar
Wu, L. et al. Comparative and phylogenetic analyses of the chloroplast genomes of species of Paeoniaceae. Sci. Rep. 11, 14643. https://doi.org/10.1038/s41598-021-94137-0 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Fan, Z.-F. & Ma, C.-L. Comparative chloroplast genome and phylogenetic analyses of Chinese polyspora. Sci. Rep. 12, 15984. https://doi.org/10.1038/s41598-022-16290-4 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Ravi, V., Khurana, J. P., Tyagi, A. K. & Khurana, P. An update on chloroplast genomes. Plant Syst. Evol. 271, 101–122. https://doi.org/10.1007/s00606-007-0608-0 (2007).
Article CAS Google Scholar
De Las Rivas, J., Lozano, J. J. & Ortiz, A. R. Comparative analysis of chloroplast genomes: Functional annotation, genome-based phylogeny, and deduced evolutionary patterns. Genome Res. 12, 567–583. https://doi.org/10.1101/gr.209402 (2002).
Article CAS Google Scholar
Hollingsworth, P. M., Graham, S. W. & Little, D. P. Choosing and using a plant DNA barcode. PLoS ONE 6, e19254. https://doi.org/10.1371/journal.pone.0019254 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Arroyo-Garcia, R. et al. Multiple origins of cultivated grapevine (Vitis vinifera L. ssp. sativa) based on chloroplast DNA polymorphisms. Mol. Ecol. 15, 3707–3714. https://doi.org/10.1111/j.1365-294X.2006.03049.x (2006).
Article CAS PubMed Google Scholar
Dong, W., Xu, C., Cheng, T., Lin, K. & Zhou, S. Sequencing angiosperm plastid genomes made easy: A complete set of universal primers and a case study on the phylogeny of Saxifragales. Genome Biol. Evol. 5, 989–997. https://doi.org/10.1093/gbe/evt063 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lu, R.-S., Li, P. & Qiu, Y.-X. The complete chloroplast genomes of three Cardiocrinum (Liliaceae) species: Comparative genomic and phylogenetic analyses. Front. Plant Sci. 7, 2054. https://doi.org/10.3389/fpls.2016.02054 (2016).
Article PubMed Google Scholar
Greiner, S. et al. The complete nucleotide sequences of the five genetically distinct plastid genomes of oenothera, subsection oenothera: I. Sequence evaluation and plastome evolution. Nucleic Acids Res. 36, 2366–2378. https://doi.org/10.1093/nar/gkn081 (2008).
Article CAS PubMed PubMed Central Google Scholar
Young, H. A., Lanzatella, C. L., Sarath, G. & Tobias, C. M. Chloroplast genome variation in upland and lowland switchgrass. PLoS ONE 6, e23980. https://doi.org/10.1371/journal.pone.0023980 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhao, F. et al. An updated tribal classification of Lamiaceae based on plastome phylogenomics. BMC Biol. 19, 1–27. https://doi.org/10.1186/s12915-020-00931-z (2021).
Article CAS Google Scholar
Freudenstein, J. V. & Chase, M. W. Phylogenetic relationships in Epidendroideae (Orchidaceae), one of the great flowering plant radiations: Progressive specialization and diversification. Ann. Bot. 115, 665–681. https://doi.org/10.1093/aob/mcu253 (2015).
Article CAS PubMed PubMed Central Google Scholar
Huang, W., Gao, X., Zhang, Y., Jin, C. & Wang, X. The complete chloroplast genome sequence of Stachys sieboldii Miquel. (Labiatae), a kind of vegetable crop and Chinese medicinal material plant. Mitochondrial DNA Part B 5, 1832–1833. https://doi.org/10.1080/23802359.2020.1751006 (2020).
Article Google Scholar
Wang, M., Zhao, Q., Jiang, D. & Wang, Z. Complete chloroplast genome sequence of Stachys japonica (Labiatae). Mitochondrial DNA Part B 5, 2675–2676. https://doi.org/10.1080/23802359.2020.1787263 (2020).
Article PubMed PubMed Central Google Scholar
Li, S. P., Yang, F. Q. & Tsim, K. W. K. Quality control of Cordyceps sinensis, a valued traditional Chinese medicine. J. Pharm. Biomed. Anal. 41, 1571–1584. https://doi.org/10.1016/j.jpba.2006.01.046 (2006).
Article CAS PubMed Google Scholar
Steiner, H., Hillemanns, H. G., Krüger, H. J., Rasenack, R. & Deichsel, W. [is there an optimal induction method for programmed labor?]. Arch. Gynecol. 228, 120–129. https://doi.org/10.1007/bf02427496 (1979).
Article CAS PubMed Google Scholar
Zhang, Z., Yang, Z., Tang, D. & Sun, M. Isolation and structure identification of chemical constituents from Stachys geobombyci. Chin. Tradit. Patent Med. 26, 1051–1053 (2004).
Google Scholar
Yan, M. et al. Two new glycosides from Stachys geobombycis CYWu. Nat. Prod. Res. 38, 78–84. https://doi.org/10.1080/14786419.2022.2103810 (2024).
Article CAS PubMed Google Scholar
Xue, L. et al. Molecular and morphological evidence for a new species of Stachys (Lamiaceae) from Hunan, China. PhytoKeys 236, 121–134. https://doi.org/10.3897/phytokeys.236.112741 (2023).
Article PubMed PubMed Central Google Scholar
Daniell, H., Lin, C.-S., Yu, M. & Chang, W.-J. Chloroplast genomes: Diversity, evolution, and applications in genetic engineering. Genome Biol. 17, 134. https://doi.org/10.1186/s13059-016-1004-2 (2016).
Article CAS PubMed PubMed Central Google Scholar
Khan, A. et al. Complete chloroplast genomes of medicinally important teucrium species and comparative analyses with related species from Lamiaceae. PeerJ 7, e7260. https://doi.org/10.7717/peerj.7260 (2019).
Article MathSciNet PubMed PubMed Central Google Scholar
Qu, X.-J., Moore, M. J., Li, D.-Z. & Yi, T.-S. Pga: A software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods 15, 1–12. https://doi.org/10.1186/s13007-019-0435-7 (2019).
Article Google Scholar
Ripma, L. A., Simpson, M. G. & Hasenstab-Lehman, K. Geneious! Simplified genome skimming methods for phylogenetic systematic studies: A case study in Oreocarya (Boraginaceae). Appl. Plant Sci. 2, apps.1400062. https://doi.org/10.3732/apps.1400062 (2014).
Article PubMed PubMed Central Google Scholar
Sharp, P. M. & Li, W.-H. The codon adaptation index—A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295. https://doi.org/10.1093/nar/15.3.1281 (1987).
Article ADS CAS PubMed PubMed Central Google Scholar
Isobe, S., Shirasawa, K. & Hirakawa, H. Challenges to genome sequence dissection in sweetpotato. Breed. Sci. 67, 35–40. https://doi.org/10.1270/jsbbs.16186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Choi, H., Kang, W. S., Kim, J. S., Na, C.-S. & Kim, S. . De. novo assembly and species-specific marker development as a useful tool for the identification of Scutellaria L. species. Curr. Issues Mol. Biol. 43, 2177–2188. https://doi.org/10.3390/cimb43030152 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhao, W. et al. Complete chloroplast genome sequences of Phlomis fruticosa and Phlomoides strigosa and comparative analysis of the genus Phlomis sensu lato (Lamiaceae). Front. Plant Sci. 13, 1022273. https://doi.org/10.3389/fpls.2022.1022273 (2022).
Article PubMed PubMed Central Google Scholar
Luo, C. et al. Comparative chloroplast genome analysis of Impatiens species (Balsaminaceae) in the karst area of China: Insights into genome evolution and phylogenomic implications. BMC Genomics 22, 571. https://doi.org/10.1186/s12864-021-07807-8 (2021).
Article PubMed PubMed Central Google Scholar
Zhu, A., Guo, W., Gupta, S., Fan, W. & Mower, J. P. Evolutionary dynamics of the plastid inverted repeat: The effects of expansion, contraction, and loss on substitution rates. New Phytol. 209, 1747–1756. https://doi.org/10.1111/nph.13743 (2016).
Article CAS PubMed Google Scholar
Li, J., Ye, G. .-Y. ., Liu, H. .-L. . & Wang, Z. .-H. . Complete chloroplast genomes of three important species, Abelmoschus moschatus, A. manihot and A. sagittifolius: Genome structures, mutational hotspots, comparative and phylogenetic analysis in malvaceae. PLoS ONE 15, e0242591. https://doi.org/10.1371/journal.pone.0242591 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, B. et al. Complete chloroplast genome sequences of three aroideae species (Araceae): Lights into selective pressure, marker development and phylogenetic relationships. BMC Genomics 23, 218. https://doi.org/10.1186/s12864-022-08400-3 (2022).
Article CAS PubMed PubMed Central Google Scholar
Talkah, N. S. M., Wongso, S. & Othman, A. S. Complete chloroplast genome data for Cryptocoryne elliptica (Araceae) from Peninsular Malaysia. Data Brief 42, 108075. https://doi.org/10.1016/j.dib.2022.108075 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, W. et al. Interspecific chloroplast genome sequence diversity and genomic resources in diospyros. BMC Plant Biol. 18, 210. https://doi.org/10.1186/s12870-018-1421-3 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, H. et al. Analysis of synonymous codon usage in Zea mays. Mol. Biol. Rep. 37, 677–684. https://doi.org/10.1007/s11033-009-9521-7 (2010).
Article CAS PubMed Google Scholar
Srivastava, S., Chanyal, S., Dubey, A., Tewari, A. & Taj, G. Patterns of codon usage bias in WRKY genes of Brassica rapa and Arabidopsis thaliana. J. Agric. Sci. 11, 76. https://doi.org/10.5539/JAS.V11N4P76 (2019).
Article Google Scholar
Li, G., Zhang, L. & Xue, P. Codon usage pattern and genetic diversity in chloroplast genomes of panicum species. Gene 15, 145866. https://doi.org/10.1016/j.gene.2021.145866 (2021).
Article CAS Google Scholar
Wang, P., Mao, Y., Su, Y. & Wang, J. Comparative analysis of transcriptomic data shows the effects of multiple evolutionary selection processes on codon usage in Marsupenaeus japonicus and Marsupenaeus pulchricaudatus. BMC Genomics 22, 781. https://doi.org/10.1186/s12864-021-08106-y (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, J. et al. Comparative plastid genomics of four Pilea (Urticaceae) species: Insight into interspecific plastid genome diversity in Pilea. BMC Plant Biol. 21, 25. https://doi.org/10.1186/s12870-020-02793-7 (2021).
Article CAS PubMed PubMed Central Google Scholar
Parvathy, S. T., Udayasuriyan, V. & Bhadana, V. Codon usage bias. Mol. Biol. Rep. 49, 539–565. https://doi.org/10.1007/s11033-021-06749-4 (2022).
Article CAS PubMed Google Scholar
Nie, X. et al. Comparative analysis of codon usage patterns in chloroplast genomes of the Asteraceae family. Plant Mol. Biol. Rep. 32, 828–840. https://doi.org/10.1007/s11105-013-0691-z (2014).
Article CAS Google Scholar
Duan, H. et al. Analysis of codon usage patterns of the chloroplast genome in Delphinium grandiflorum L. reveals a preference for at-ending codons as a result of major selection constraints. PeerJ 9, e10787. https://doi.org/10.7717/peerj.10787 (2021).
Article PubMed PubMed Central Google Scholar
Liu, Q. P., Feng, Y. & Xue, Q. Z. Analysis of factors shaping codon usage in the mitochondrion genome of Oryza sativa. Mitochondrion 4, 313–320. https://doi.org/10.1016/j.mito.2004.06.003 (2004).
Article CAS PubMed Google Scholar
Hershberg, R. & Petrov, D. A. Selection on codon bias. Annu. Rev. Genet. 42, 287–299. https://doi.org/10.1146/annurev.genet.42.110807.091442 (2008).
Article CAS PubMed Google Scholar
Xing, Z.-B., Cao, L., Zhou, M. & Xiu, L.-S. Analysis on codon usage of chloroplast genome of Eleutherococcus senticosus. China J. Chin. Mater. Med. 38, 661–665 (2013).
CAS Google Scholar
Zhou, Z. et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proceedings of the National Academy of Sciences vol. 113 E6117–E6125. https://doi.org/10.1073/pnas.1606724113 (2016).
Cui, G. et al. Complete chloroplast genome of Hordeum brevisubulatum: Genome organization, synonymous codon usage, phylogenetic relationships, and comparative structure analysis. PLoS ONE 16, e0261196. https://doi.org/10.1371/journal.pone.0261196 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, S.-R., Li, W.-Y., Long, D., Hu, C.-G. & Zhang, J.-Z. Development and characterization of genomic and expressed ssrs in citrus by genome-wide analysis. PLoS ONE 8, e75149. https://doi.org/10.1371/journal.pone.0075149 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Zane, L., Bargelloni, L. & Patarnello, T. Strategies for microsatellite isolation: A review. Mol. Ecol. 11, 1–16. https://doi.org/10.1046/j.0962-1083.2001.01418.x (2002).
Article CAS PubMed Google Scholar
Cavalier-Smith, T. Chloroplast evolution: Secondary symbiogenesis and multiple losses. Curr. Biol. 12, R62-64. https://doi.org/10.1016/s0960-9822(01)00675-3 (2002).
Article CAS PubMed Google Scholar
Li, B. & Zheng, Y. Dynamic evolution and phylogenomic analysis of the chloroplast genome in Schisandraceae. Sci. Rep. 8, 9285. https://doi.org/10.1038/s41598-018-27453-7 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Plunkett, G. M. & Downie, S. R. Expansion and contraction of the chloroplast inverted repeat in Apiaceae subfamily Apioideae. Syst. Bot. 25, 648–667. https://doi.org/10.2307/2666726 (2000).
Article Google Scholar
Guo, Y.-Y., Yang, J.-X., Bai, M.-Z., Zhang, G.-Q. & Liu, Z.-J. The chloroplast genome evolution of Venus slipper (Paphiopedilum): IR expansion, SSC contraction, and highly rearranged SSC regions. BMC Plant Biol. 21, 248. https://doi.org/10.1186/s12870-021-03053-y (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, D. et al. Analysis of the chloroplast genome and phylogenetic evolution of Bidens pilosa. BMC Genomics 24, 113. https://doi.org/10.1186/s12864-023-09195-7 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wu, X. et al. Comparative genomic and phylogenetic analysis of chloroplast genomes of hawthorn (Crataegus spp.) in southwest China. Front. Genet. 13, 900357. https://doi.org/10.3389/fgene.2022.900357 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y. C., Kung, T. L., Hu, C. Y. & Lin, S. F. Development of primer pairs from diverse chloroplast genomes for use in plant phylogenetic research. Genet. Mol. Res. 14, 14857–14870. https://doi.org/10.4238/2015.November.18.51 (2015).
Article CAS PubMed Google Scholar
Oyston, J. W., Wilkinson, M., Ruta, M. & Wills, M. A. Molecular phylogenies map to biogeography better than morphological ones. Commun. Biol. 5, 521. https://doi.org/10.1038/s42003-022-03482-x (2022).
Article PubMed PubMed Central Google Scholar
Li, B. et al. A large-scale chloroplast phylogeny of the Lamiaceae sheds new light on its subfamilial classification. Sci. Rep. 6, 34343. https://doi.org/10.1038/srep34343 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Lindqvist, C. & Albert, V. A. Origin of the Hawaiian endemic mints within North American Stachys (Lamiaceae). Am. J. Bot. 89, 1709–1724. https://doi.org/10.3732/ajb.89.10.1709 (2002).
Article PubMed Google Scholar
Bendiksby, M., Thorbek, L., Scheen, A.-C., Lindqvist, C. & Ryding, O. An updated phylogeny and classification of Lamiaceae subfamily Lamioideae. Taxon 60, 471–484. https://doi.org/10.1002/tax.602015 (2011).
Article Google Scholar
Serrote, C. M. L., Reiniger, F. R. S., Silva, K. B., Rabaiolli, AMd. S. & Stefanel, C. M. Determining the polymorphism information content of a molecular marker. Gene 726, 144175. https://doi.org/10.1016/j.gene.2019.144175 (2020).
Article CAS PubMed Google Scholar
Androsiuk, P. et al. Evolutionary dynamics of the chloroplast genome sequences of six Colobanthus species. Sci. Rep. 10, 11522. https://doi.org/10.1038/s41598-020-68563-5 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Hansen, D. R. et al. Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae). Mol. Phylogenet. Evol. 45, 547–563. https://doi.org/10.1016/j.ympev.2007.06.004 (2007).
Article CAS PubMed Google Scholar
Henriquez, C. L. et al. Molecular evolution of chloroplast genomes in Monsteroideae (Araceae). Planta 251, 72. https://doi.org/10.1007/s00425-020-03365-7 (2020).
Article CAS PubMed Google Scholar
Yang, L. et al. Analysis of complete chloroplast genome sequences and insight into the phylogenetic relationships of Ferula L. BMC Genomics 23, 643. https://doi.org/10.1186/s12864-022-08868-z (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. Soapnuke: A mapreduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience 7, 1–6. https://doi.org/10.1093/gigascience/gix120 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Dierckxsens, N., Mardulyn, P. & Smits, G. Novoplasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45, e18. https://doi.org/10.1093/nar/gkw955 (2017).
Article CAS PubMed Google Scholar
Walkty, A., Keynan, Y., Karlowsky, J., Dhaliwal, P. & Embil, J. Central nervous system blastomycosis diagnosed using the MVista® Blastomyces quantitative antigen enzyme immunoassay test on cerebrospinal fluid: A case report and review of the literature. Diagn. Microbiol. Infect. Dis. 90, 102–104. https://doi.org/10.1016/j.diagmicrobio.2017.10.015 (2018).
Article PubMed Google Scholar
Wang, X. et al. Comparative analysis of chloroplast genomes of 29 tomato germplasms: Genome structures, phylogenetic relationships, and adaptive evolution. Front. Plant Sci. 14, 1179009. https://doi.org/10.3389/fpls.2023.1179009 (2023).
Article PubMed PubMed Central Google Scholar
Kurtz, S. et al. Reputer: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29, 4633–4642. https://doi.org/10.1093/nar/29.22.4633 (2001).
Article CAS PubMed PubMed Central Google Scholar
Rozas, J. et al. Dnasp 6: Dna sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 34, 3299–3302. https://doi.org/10.1093/molbev/msx248 (2017).
Article CAS PubMed Google Scholar
Katoh, K., Rozewicki, J. & Yamada, K. D. Mafft online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief. Bioinform. 20, 1160–1166. https://doi.org/10.1093/bib/bbx108 (2019).
Article CAS PubMed Google Scholar
Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. Iq-tree: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274. https://doi.org/10.1093/molbev/msu300 (2015).
Article CAS PubMed Google Scholar
Zhang, D. et al. Phylosuite: An integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol. Ecol. Resour. 20, 348–355. https://doi.org/10.1111/1755-0998.13096 (2020).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (31860073, 81303169) and the Project of opening Fund for the Year 2022 of Hubei Key Laboratory for the Protection and Utilization of Biological Resources (PT012212).

Author information

Authors and Affiliations

Hubei Key Laboratory of Biologic Resources Protection and Utilization (Hubei Minzu University), Enshi, 445000, China
Ru Wang, Yongjian Luo & Zhijun Deng
Heilongjiang Bayi Agricultural University, Daqing, 163319, China
Zheng Lan
Central South University of Forestry and Technology, Key Laboratory of Forestry Biotechnology of Hunan Province, Changsha, 410000, China
Yongjian Luo

Authors

Ru Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Lan
View author publications
You can also search for this author in PubMed Google Scholar
Yongjian Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhijun Deng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.W., and Z.D., designed the study, defined sampling, obtained samples, and obtained funding. R.W. and Y.L., annotated plastomes and performed comparative and phylogenetic analyses. R.W. assembled Illumina sequences. Z.L., and R.W., participated in the technical guidance and language revision of the paper. R.W., Z.L., and Z.D. interpreted the results and co-wrote the manuscript.

Corresponding author

Correspondence to Zhijun Deng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Figures.

Supplementary Table S1.

Supplementary Table S2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, R., Lan, Z., Luo, Y. et al. The complete Chloroplast genome of Stachys geobombycis and comparative analysis with related Stachys species. Sci Rep 14, 8523 (2024). https://doi.org/10.1038/s41598-024-59132-1

Download citation

Received: 11 October 2023
Accepted: 08 April 2024
Published: 12 April 2024
DOI: https://doi.org/10.1038/s41598-024-59132-1

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Phylogenomics and the rise of the angiosperms

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Complexity of avian evolution revealed by family-level genomes

Introduction

Results

Characterization of the CP genome structure of Stachys species

Codon usage analysis

Identification of repeat elements

IR contraction and expansion in the Stachys CP genomes

Comparative genomic analysis

Phylogenetic analysis

Discussion

Structure characteristics of the CP genome in the Stachys species

Analysis of repetitive sequences and codon bias

Comparative genomics and highly variable regions analysis

Phylogenetic analysis

Conclusions

Materials and methods

Sample collection and DNA extraction

Library construction and De novo Genome sequencing

Assembly and annotation of the CP genome

Analysis of contraction and expansion of IR boundaries in CP genomes

Codon preference and repetitive sequence analysis of the CP genome

Phylogenetic analysis

Statement of plant collection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Figures.

Supplementary Table S1.

Supplementary Table S2.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comments

Search

Quick links