Introduction

Polygonatum Miller belongs to the tribe Polygonateae Benth. & Hook. f. of the family Asparagaceae1. The species in this genus are perennial herbs with horizontal creeping fleshy roots and unbranched stems2. This genus comprises approximately 80 species in the world (https://wcsp.science.kew.org/), accessed 30 March 2022). According to Chen and Tamura2, 39 species have been recorded in China with 20 of them being endemic. Polygonatum is widely distributed in Northern Hemisphere, with the center of diversity in East Asia, especially in the Hengduan Mountains of southwest China and the eastern Himalayas3,4. This genus is valued significantly for its medicinal properties, with species such as Polygonatum kingianum and P. sibiricum being used as traditional Chinese medicine due to their properties of tonifying Qi, nourishing Yin, strengthening the spleen, moistening the lung and benefiting the kidney5.

Phylogenetic relationships reconstructed using ribosomal ITS and plastid DNA sequence suggested the monophyly of Polygonatum and its sister relationship to Heteropolygonatum M.N. Tamura & Ogisu6,7,8,9. In terms of infrageneric classification of this genus, it received considerable attention from researchers in history owing to the wide phenotypic variation within and among the species. Baker subdivided Polygonatum into three sections according to the leaf arrangement: the sect. Alternifolia with alternate leaves, sect. Oppositifolia with opposite leaves and sect. Verticillata with whorled leaves10. However, phyllotaxy types in this genus were considered to be unstable in subsequent studies7. On account of morphological traits like leaf arrangement, bract size and texture, length of the perianth tube, perianth shape, anther length and ovary shape, Tang et al. proposed eight series for Polygonatum distributed in China11. Based on karyological and micromorphological characters, Tamura sub-divided Polygonatum into the sect. Polygonatum and sect Verticillata12. Recently, Meng and Nie reconstructed the phylogenetic relationship among this genus using four chloroplast (cp) genes, rbcL, trnK, trnC-petN and psbA-trnH, and they proposed a new group on the basis of Tamura’s work, namely sect. Sibirica7. As a result, Polygonaum was divided into sect. Polygonatum, sect. Verticillata and sect. Sibirica. This infrageneric classification system was most widely accepted and was demonstrated by Floden’s research based on the complete cp genomes of Polygonatum13,14.

The chloroplast is a unique organelle found in green plants that is responsible for photosynthesis. It has a separate genome from the nuclear and the mitochondria genomes, and is mostly inherited matrilineally in angiosperms. Compared to the nuclear and the mitochondrial genomes, plastomes are small, less vulnerable to recombination, with low nucleotide substitution rates as well as generally more conserved in terms of gene structure and organization, and therefore can provide unique genetic information15,16. Among most higher plants, the cp genome possesses a typical tetrad structure comprising a small single-copy (SSC), a large single-copy (LSC), and two inverted repeats (IRs)17. Most cp genomes examined in plants have a constrained size varying from 120 to 160 kb, and this discrepancy is mainly related to expansion/contraction or even loss of IR15,18,19. Considerable genetic information is involved in the cp genome, which encodes about 120–130 genes20, which can be classified into three groups, genes involved in chloroplast gene expression, genes related to photosynthesis, and those with functions unclear21. The speed of molecular evolution between the coding and non-coding regions of chloroplast genomes differs noticeably, which is suitable for systematic studies at different levels22.

Benefiting from advances in next-generation sequencing technologies, cp genomes can be obtained more efficiently and economically. In the National Center for Biotechnology Information (NCBI) organelle genome database, there are about 40,000 cp genomes of plants currently published (accessed: 2023/1/21). Among angiosperms, plenty of cp genomes have been successfully employed to address the issues of phylogenetic relationships and species identification at different taxonomic levels23,24,25,26,27. There are about 156 complete cp genome sequences (ca. 40 species) of Polygonatum that have been reported in NCBI (accessed: 2022/11/07). However, previous studies have mainly been concerned with the size and gene contents of the plastid genome, with insufficient studies on the comparative genomic analysis13,28. Although, chloroplast gene fragments and complete cp genomes between species in Polygonatum have been adopted for phylogenetic analysis recently7,13,28, there are still some species whose complete chloroplast genome data have not been published, thus their phylogenetic placement of them is not well understood.

In this study, we reported the initial complete chloroplast genomes of Polygonatum campanulatum, together with the complete plastome sequences of P. franchetii, P. cyrtonema1, P. filipes1, P. zanlanscianense1 and P. sibiricum1, and then compared them with three related species i.e., P. kingianum (MW373517), Heteropolygonatum alternicirrhosum (MZ150832), H. ginfushanicum (MW363694). P. campanulatum is a critically endangered species discovered by professor Guangwan Hu in Yunnan Province in 2011. No molecular information about this species has been reported before, which can provide essential information for its conservation strategies and conducting restoration practices. In this present study, 9 species sequences were selected for plastid genome comparative analysis, including two species of Heteropolygonatum and seven species of Polygonatum, which covers the three subgroups of Polygonatum as well as the major branches. There are 11 plastomes of Polygonatum species that have not been verified by NCBI (Table S1). Manual checking of the 11 unverified plastomes found that the two IR regions had different lengths, and this discrepancy mainly occurred in the no-coding regions. Therefore, these unverified plastomes have only been used to reconstruct phylogenetic relationships and collect general information, and not for deep comparative analysis of the cp genome. A total of 56 published cp genome sequences (51 from Polygonatum; 4 from Heteropolygonatum; Maianthemum henryi was chosen as outgroup) obtained from the NCBI database were employed to reconstruct phylogenetic tree. The aims of this study were to (1) conducting a comprehensive analysis of the chloroplast genome among the six Polygonatum and its related species; (2) exploring hotspots regions of Polygonatum from the cp genomes; (3) inferring the phylogenetic relationships of Polygonatum species and determine the taxonomic status of P. campanulatum, P. franchetii, P. cyrtonema, P. filipes, P. zanlanscianense and P. sibiricum based on cp genome.

Materials and methods

Sample collection, total DNA extraction and sequencing

The six newly sequenced Polygonatum species (Polygonatum campanulatum, P. filipes, P. franchetii, P. zanlanscianense, P. cyrtonema, P. sibiricum) were collected by Guangwan Hu in China during the period of 2019 to 2021. Detailed field collection information of them is described in Table 1. The collected species were identified and verified by professor Guangwan Hu, from Wuhan Botanical Garden, Chinese Academy of Science. Voucher specimens were deposited at the Herbarium of Wuhan Botanical Garden, CAS (HIB) (China), with voucher specimen numbers listed in Table 1. Total genome DNA was extracted from the dry leaves preserved in silica gels, using a modified cetyltrimethylammonium bromide (CTAB) method, and then sequenced based on the Illumina HiSeq X Ten platform, 150 bp paired-end reads (PE150) at Novogene Co., Ltd. (Beijing, China).

Table 1 Specimen collection information of the six Polygonatum samples.

Assembly and annotation of chloroplast genome

Chloroplast genome assembling was done using Get Organelle v1.7.529 with default parameters. Gene annotation was completed by PGA (Plastid Genome Annotator) software30 with Amborella trichopoda as a reference31,32. To ensure the reliability of the data used for subsequent analysis, all chloroplast genome download from NCBI was annotated over again by PGA. Manual checking and adjustment of the annotation results, including positions of initiation and termination codons and boundaries of IR repeat regions, were performed in Geneious v10.2.333. Annotated chloroplast genome sequences of the six species were submitted to GenBank (Table S1) in NCBI. Further, the circular chloroplast genome map was drawn online by OGDRAW34.

Comparative analysis of the whole chloroplast genome

Geneious v10.2.333 was employed to analyze length and guanine-cytosine (GC) content of the whole chloroplast genome, LSC, SSC and IR regions, together with numbers of genes and genes categories. Multiple genome alignment analysis was performed in MAFFT program35. Comparative chloroplast genomes divergence was conducted and visualized by mVISTA36 with the annotation of Polygonatum campanulatum as a reference in Shuffle-LAGAN mode. To detect the contraction or expansion at the boundaries, the SC/IR boundary analysis of the chloroplast genomes was carried out by IRscope37. Mauve was adopted to perform the analyses of cp genome rearrangement based on default settings38, and one of the IR regions was removed uniformly in all sequences.

Codon usage, and repeated sequences analysis

Relative synonymous codon usage (RSCU) value was detected using MEGA v7.039. RSCU is defined as the ratio of the observed frequency of a codon to the expected frequency without preference. The values greater than 1.0 mean that the particular codons are used more frequently than expected, while the reverse indicates the opposite40.

Long dispersed repeats were identified using REPuter41 with a hamming distance equal to 3 bp, and repeat size no less than 30 bp. Simple sequence repeats (SSRs) were identified using MicroSatellite identification tool (MISA)42 with minimum parameters being set as 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexanucleotides SSR motifs, respectively.

Nucleotide diversity analysis and selective pressure

DnaSP43 was adopted to analyze the nucleotide diversity (Pi) with the window length of 600 bp and the step size of 200 bp. Given that DnaSP v6 cannot recognize degenerate bases, like M, K, and Y, dashes were used to take the place of these letters. Further, the figure was generated in Excel and optimized in Adobe Illustrator.

To identify the positive selection loci of coding sequences (CDS) in the cp genome, the dN/dS values were calculated by employing EasyCodeML v1.1244. Each single-copy CDS were extracted from the complete chloroplast genome using Geneious v10.2.333, after aligning under the codon model, they were finally combined into one matrix. The input tree was an ML tree reconstructed by IQ-TREE45. Four site models (i.e., M0 vs. M3, M1a vs. M2a, M7 vs. M8, and M8a vs. M8) along with a likelihood ratio test (LRT) were used to perform the analyses. Naive Empirical Bayes (NEB) and Bayes Empirical Bayes (BEB)46 analyses were conducted under the M8 model to identify positive selection loci and the selected genes.

Phylogenetic analysis

The phylogenetic analysis was performed based on the complete chloroplast genomes of 57 Polygonatum sequences and 4 Heteropolygonatum taxa. Maianthemum henryi was set as an outgroup. The chloroplast genomes of all species were obtained from GenBank (Table S1), except for Polygonatum campanulatum, P. filipes1, P. franchetii, P. zanlanscianense1, P. cyrtonema1, and P. sibiricum1. The total matrix was aligned using MAFFT35. ModelFinder47 was adopted to select the best-fit model according to the Bayesian information criterion (BIC). Maximum likelihood (ML) phylogenetic tree was reconstructed using IQ-TREE45 under the GTR+I+G model for 5000 ultrafast bootstraps48. BI (Bayesian inference) analysis was conducted using MrBayes v3.2.649 based on GTR+F+I+G4 model. Two independent Markov Chain Monte Carlo (MCMC) run for 1,000,000 generations, trees were sampled every 100 generations, and the initial 25% of sampled data were discarded as burn-in. The two output trees were visualized and improved by Figtree v1.4 (http://github.com/rambaut/figtree/).

Ethical approval and consent to participate

The authors have complied with the relevant institutional, national and international guidelines in collecting biological materials for the study. The study contributes to facilitating future studies in population genetics and species identification.

Results

Chloroplast genome structure and characteristics analyses

The complete chloroplast genomes of the six newly sequenced species in Polygonatum displayed closed circular and common tetrad structures (Fig. 1). The length of the 57 cp genomes in Polygonatum ranged from 154,564 bp (P. multiflorum) to 156,028 bp (P. stenophyllum), while the length of the 4 cp genomes in Heteropolygonatum ranged from 155,436 (H. pendulum) to 155,944 (H. alternicirrhosum) (Table1). Each plastome included a large single-copy (LSC), a small single-copy (SSC) and a pair of inverted repeats (IRa and IRb) that separated the LSC and SSC regions (Fig. 1). The LSC regions of the Polygonatum species ranged from 83,486 bp (P. odoratum) to 94,843 bp (P. sibiricum3), while SSC regions varied from 18,210 bp (P. cyrtonema6) to 18,570 bp (P. kingianum2). The sizes of the IR regions ranged from 42,290 bp (P. sibiricum3) to 52,830 bp (P. cirrhifolium, P. curvistylum, P. hookeri, P. prattii, P. verticillatu3, P. zanlanscianense1, P. zanlanscianense2, P. zanlanscianense3) (Table 2). The total Guanine-Cytosine (GC) content of the plastomes ranged from 37.6 to 37.8%. Further, GC content exhibited an unbalanced distribution among the regions both in the cp genomes of Polygonatum and Heteropolygonatum. The SSC regions had presented the lowest GC content of 31.4% to 31.7%, followed by LSC regions (35.6–36.1%), whereas the IRs had the highest GC content ranging from 42.9 to 43% (Table 2).

Figure 1
figure 1

Gene map of the chloroplast genome among the Polygonatum species. Genes inside and outside the circle transcribed in counter-clockwise and clockwise respectively. The dark gray and light gray areas inside the inner circle indicate GC content and AT content respectively. LSC (Large single-copy), SSC (Small single-copy) and the inverted repeats (IRa, IRb) were denoted inner the circle.

Table 2 General information and comparison of chloroplast genomes of the 57 cp genomes of Polygonatum and 4 cp genomes of Heteropolygonatum.

A total of 131–132 genes (113 unique genes) were detected in the complete cp genomes of the 57 Polygonatum in the same order. One rps19 gene was detected pseudogenized in P. stewartianum and P. sibiricum1 and P. sibiricum2. And, both ycf1 genes were detected pseudogenized in H. ogisui. The whole genomes included 87 protein-coding genes (PCGs), 38 transfer RNA (tRNA) genes and 8 ribosomal RNA (rRNA) genes (Table 2). Moreover, a total of 19 genes, comprising 7 PCGs (rps19, rpl2, rpl23, ycf2, ndhB, rps7, rps12), 8 tRNA genes (trnN-GUU, trnR-ACG, trnA-UGC, trnI-GAU, trnV-GAC, trnL-CAA, trnI-CAU, trnH-GUG) and 4 rRNA genes (rrn5, rrn4.5, rrn23, rrn16) were duplicated in the pair of inverted repeats. In addition, a total of 18 genes (trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC, rps12, rps16, rpl2, rpl16, rpoC1, petB, petD, atpF, ndhA, ndhB, clpP, ycf3) and six tRNA contained at least one intron in the complete cp genome, in which clpP and ycf3 included two introns. Particularly, rps12 gene was a trans-spliced gene with the 5' exon situated in the LSC region and the two copies of 3' exon and intron sitting in the IRs. The longest intron was identified in trnK-UUU with the length of 2,568–2,586 bp and the matK gene was placed inside the intron (Table S2). All of the functional genes can be divided into three categories, i.e., self-replication genes, photosynthesis genes, and other genes (Table 3).

Table 3 The annotated genes in the chloroplast genomes of Polygonatum.

Relative synonymous codon usage analysis

Given that codon usage is closely related to genome-wide protein and mRNA levels, it is an essential feature of gene expression. The same codon presents different frequencies in different organisms. The codon usage frequencies of Polygonatum campanulatum, P. filipes1, P. franchetii, P. zanlanscianense1, P. cyrtonema1, P. sibiricum1, P. kingianum2, Heteropolygonatum alternicirrhosum and H. ginfushanicum were computed based on protein-coding genes of the complete chloroplast genome. The total codons in these nine species varied from 26,453 codons (P. kingianum2) to 26,651 codons (P. zanlanscianense1). The most abundant amino acid (AA) was leucine (Leu), with the proportions ranging between 10.2 and 10.3%, followed by serine (Ser) accounting for 7.8–7.9% (Table S3). In contrast, cystine (Cys) possessed the lowest number of codons (306–309 codons) in all the nine species when terminal codons were not considered. The AGA codon, encoding arginine (Arg), presented the highest RSCU (relative synonymous codon usage) value of 10.92–1.96, while AGC codon, encoding serine (Ser), showed the lowest RSCU value with 0.31–0.33 (Table S3). Additionally, CGC encoding Arginine (Arg) and AGC encoding serine (Ser) shared the lowest RSCU value of 0.31–0.32 and 0.31–0.33 respectively. Figure 2 illustrates the summary statistics for amino acid frequency and relative synonymous codon usage. Among the 64 codons, there were 31 codons with RSCU values less than 1 (RSCU < 1), which showed a lower usage frequency than expected. Meanwhile, 30 codons were used more frequently than expected in P. campanulatum and P. filipes1 with RSCU values greater than 1 (RSCU > 1), while 31 codons in the other seven species. Furthermore, the RSCU values of AUG and UGG in all the nine species were equal to one (RSCU = 1) appearing without usage preference, while UCC only showed the same characteristics in P. campanulatum and P. filipes1. Particularly, methionine (AUG) and tryptophan (UGG) were encoded by only one codon. All codons with RSCU > 1 were characterized by Adenine–Thymine ending in the six species apart from UUG and the UCC in P. franchetii, P. zanlanscianense1, P. cyrtonema1, P. sibiricum1, P. kingianum2, H. alternicirrhosum and H. ginfushanicum. On the contrary, 28 of the 31 codons with RSCU < 1 were detected ending with Guanine-Cytosine (GC) in each species. When comparing nine Polygonatum, there were nearly no differences in RSCU value, indicating that the codon use bias of Polygonatum is rather stable (Fig. 2).

Figure 2
figure 2

Relative synonymous codon usage (RSCU) value of 20 amino acids and stop codons of seven Polygonatum and two Heteropolygonatum species based on protein-coding sequences in chloroplast genomes. The colors of the bar correspond to the colors of codons. Each amino acid corresponds to nine histograms, and y-axis represents the RSCU value. The order of each six columns from left to right is P. campanulatum, P. filipes1, P. franchetii, P. zanlanscianense1, P. cyrtonema1, P. sibiricum1, P. kingianum 2, H. alternicirrhosum and H. ginfushanicum.

Long dispersed repeats and microsatellites analysis

A total of 378 long dispersed repeats were observed in the seven Polygonatum and two Heteropolygonatum species, consisting of 191 palindromic repeats, 177 forward repeats, nine reverse repeats and one complementary repeat (the palindromic repeat of IR regions itself was excluded in all the nine species) (Table S4). Obviously, palindromic repeats were the dominant repeat type (from 47.2% in P. filipes1 to 53.5% in P. zanlanscianense1), while complementary repeats were the least frequent one which was only detected in P. campanulatum (2.7%). Likewise, P. franchetii and H. ginfushanicum did not possess any reverse repeats. On the other hand, the species that harbor the highest number of long repeats was P. zanlanscianense1 (49), and the species with the lowest number was P. kingianum2 (35) (Fig. 3A). In H. ginfushanicum, the length of the longest repeat sequence was 66 bp while in the rest eight species were 71 bp, and all of them were forward repeats. Furthermore, among all repeats detected in the nine species, the length of repeats ranging from 30 to 34 bp accounted for the majority (260, 68.1%) (Fig. 3B, Table S5). The most repeats were detected in the CDS, followed by IGS regions, some repeats were also identified between CDS, IGS, tRNA and introns (Fig. 3C, Table S6). Most of the repeat sequences were located in the IR regions except for P. campanulatum and P. filipes1, which harbored the highest number of repeats in LSC region (Fig. 3D, Table S7).

Figure 3
figure 3

Analysis of long dispersed repeats in the cp genomes of seven Polygonatum and two Heteropolygonatum species. (A) The number of the four types of long repeats. (B) Distribution ratio of repeats in regions of the cp genome. (C) Distribution ratio of repetitive sequences in functional regions. (D) Proportion of repeats in different length intervals of the chloroplast genome.

In this study, we observed 507 SSRs among the nine species in total, comprising 303 mono-, 91 di-, 27 tri-, 63 tetra-, 20 penta-, and two hexa-nucleotide repeats (Table S8). Moreover, a total of two mono-, three di-, four tri-, eight tetra-, four penta-types and two hexa-nucleotide repeats types were identified. And one tri-, two tetra-, three penta- and two hexa-nucleotide types were observed only once in only one species (Table S9). Most SSRs were mononucleotide and dinucleotide repeats, besides, the rest of SSRs showed lower frequencies. As shown in Fig. 4a, mono-nucleotide repeats were the most frequent type ranging from 55.9% (Polygonatum kingianum2) to 61.8% (Heteropolygonatum ginfushanicum). The number of SSRs of H. alternicirrhosum reached a peak value of 64 among the nine species. On the other hand, P. sibiricum1 possessed the least number of SSRs of 50 (Fig. 4a, Table S9). The most dominant SSRs were A/T polymers (Fig. 4b–j), suggesting a remarkable base preference. And the majority of the microsatellites were located in the LSC region (Table S10). These results indicate that there were no distinctive differences in SSRs between Polygonatum and Heteropolygonatum. The identified SSRs will provide valuable genetic information for the phylogeny and population genetics of Polygonatum in the future.

Figure 4
figure 4

Simple sequence repeats (SSRs) analysis of the complete chloroplast genomes of the seven Polygonatum and two Heteropolygonatum species. (a) Numbers of mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide repeats. (b–j). Frequencies of SSRs motifs in different repeat class types.

Comparative genome analysis and sequence variation

To identify highly variable regions among the seven species of Polygonatum and two species of Heteropolygonatum, multiple sequence alignment of the cp genomes was carried out. The annotation of Polygonatum campanulatum was set as a reference. It can be seen from the data in Fig. 5 that coding regions were much more conserved than non-coding regions, with almost no significant variations except for ycf1. Additionally, we detected that some intergenic spacer region and introns appeared considerable variations, including rps16-trnQ, trnS-trnG, atpF-atpH, atpH-atpI, petA-psbJ, ndhF-rpl32, rpl32-trnL and rpl16. Another significant result was that compared with the IRs regions, LSC and SSC regions showed higher variation, consistent with the result of nucleotide polymorphisms analysis (Fig. 8). Apart from ycf1, all highly divergent regions mentioned above were in single-copy regions. With respect to tRNA and rRNA, they were strongly conserved without evident variations. Additionally, collinearity detection analysis found that there were no interspecific or intraspecific rearrangements in the nine species (Fig. 6).

Figure 5
figure 5

Alignment of chloroplast genomes of Heteropolygonatum alternicirrhosum, H. ginfushanicum, Polygonatum campanulatum, P. filipes1, P. franchetii, P. zanlanscianense1, P. cyrtonema1, P. sibiricum1, P. kingianum2. The grey arrows at the top represent the direction of gene translation, and the y-axis indicates the percentage identity between 50 and 100%. (Exon: protein codes; UTR: tRNAs and rRNAs; CNS: conserved noncoding sequences).

Figure 6
figure 6

Genomic rearrangement of the seven Polygonatum and two Heteropolygonatum. Blocks in different colors correspond to different gene types. Black: transfer RNA (tRNA); green: intron-containing Trna; Red: ribosomal RNA; White: protein-coding genes (PCGs).

Expansion and contraction of IRs

A comprehensive comparison of boundaries between single-copy and the IRs regions was carried out. We observed that the complete cp genome structure of the nine species varied from each other slightly. Apart from Polgonatum sibiricum, the junctions of LSC/IRb sit between rpl22 gene and rps19 gene among the other eight species. The rpl22 gene was located in the LSC region completely with 26 bp to 34 bp away from LSC/IRb border, while the rps19 genes within IR regions were close to two IR/LSC boundaries. Furthermore, in P. sibiricum, two rps19 genes extended into the LSC region due to the contraction of IRs (Fig. 7), leading to the one located at IRa/LSC junction being a pseudogene. Apart from this special case, rps19 in the other species was quite conservative with the same length of 279 bp. Likewise, rpl22 gene was also very conserved with the same length of 366 bp in all the nine species. Moreover, the ndhF gene was located in the boundaries of IRb/ SSC and expanded to the IRb region by 22, 29, or 34 bp. And trnN gene was close to the IRs/SSC boundaries with the whole gene within IRs regions. The ycf1 gene ranges from 4454 to 4573 bp and straddled the SSC/IRa boundary, with 883–895 bp distributed in the IRa region and the rest in the SSC region (Fig. 7). In terms of IRa-LSC boundary, rps19 gene was located on the left side while psbA gene was on the right, and psbA gene was highly conserved with a steady length of 1062 bp. The distances between psbA and the IRa/LSC junction varied from 87 to 94 bp.

Figure 7
figure 7

Comparative analysis of the LSC, IR and SSC boundary regions in the nine chloroplast genomes.

Together these results provided important insights into contractions and expansions of IR region borders in Polygonatum and Heteropolygonatum. The structures and gene orders of the two genera were relatively conserved except for P. sibiricum, in which a slight expansion and contraction occurred between IRs and LSC.

Nucleotide diversity and selective pressure analysis

The nucleotide diversity of nine chloroplast genomes of Polygonatum and Heteropolygonatum was calculated to detect divergence hotspots. The pair of inverted repeats were relatively conserved regions with an average Pi value of 0.00113. At the same time, LSC and SSC showed higher nucleotide diversity with a mean Pi value of 0.00492 and 0.00674 respectively. Significant variations (Pi > 0.014) were found in the following regions: trnK-UUU-rps16, trnC-GCA-petN, trnT-UGU-trnL-UAA, ccsA-ndhD and ycf1 (Fig. 8), in which the most divergent region was trnK-UUU-rps16, with the Pi value of 0.01565. Of these five regions, 80% (4) were intergenic genes. In contrast, protein-coding regions accounted for 20% (1), indicating that non-coding regions harbored more variations and coding region were more stable and conservative. Moreover, all five divergent hotspots might be potential molecular markers for DNA barcodes adopted into species identification and phylogenetic studies in the future.

Figure 8
figure 8

Nucleotide diversity analysis of the complete chloroplast genomes of the seven Polygonatum and two Heteropolygonatum (window length: 600 bp; step size: 200 bp).

Synonymous substitutions in the nucleotide preserve the same amino acids. On the contrary, non-synonymous substitutions will change the amino acids. The substitution rates of nonsynonymous (dN) and synonymous (dS) have been widely used for quantifying adaptive molecular evolution in the chloroplast genome50. In the current study, according to BEB methods, a total of 14 genes corresponding to 65 sites were detected under positive selection. Among them, four genes (rpoC2, rpoB, psaA, ndhK) were identified under significant positive selection, and ten genes (psbA, psbK, atpA, rpoC1, psbD, psbC, psbZ, psaB, rps4, ndhJ) under positive selection (Table S11). All the selected genes were located in LSC regions, and 10 were related to photosynthesis. We observed that rpoC2 harbored the highest number of sites under positive selection (13), followed by psaA (12) and rpoB (11).

Phylogenetic analysis of Polygonatum

A total of 62 cp sequences of Polygonatum and its related species were selected to reconstruct phylogenetic relationships among this genus. Maianthemum henryi was chosen as an outgroup own to its closer distances and more basic position to Polygonatum and Heteropolygonatum. The 62 cp sequences comprise six newly sequenced data (i.e., Polygonatum campanulatum, P. filipes1, P. franchetii, P. zanlanscianense1, P. cyrtonema1, P. sibiricum1) and 56 cp genome published in NCBI (Table S1). The topologies of Maximum likelihood (ML) and Bayesian inference (BI) were highly identical both in tree structure and species position with generally strong support (Fig. 9). The difference lies in the fact that the BI analysis cannot tell apart the branch structure of some different samples belonging to the same species (Fig. 9). Both Polygonatum and Heteropolygonatum exhibited monophyletic relationships and shared the most recent common ancestor. Polygonatum was divided into two main lineages including sect. Verticillata and the clade consisting of sect. Polygonatum and sect. Sibirica. Phylogenetic analysis suggested that sect. Sibirica comprise only one species, i.e., P. sibiricum. Moreover, we also observed that P. verticillatum and P. cyrtonema were paraphyletic. P. verticillatum1 was sister to P. zanlanscianense (BS = 100, PP = 1.00), while P. verticillatum2 appeared as sister clade to P. curvistylum + P. pratti + P. stewartianum (BS = 100, PP = 1.00), and P. verticillatum3 located at the base of the branch composed by P. curvistylum + P. pratti + P. stewartianum + P. verticillatum2 + P. hookeri + P. cirrhifolium + P. verticillatum3 (BS = 100, PP = 1.00). Four samples of P. cyrtonema, including the newly sequenced one, appeared as the sister to P. hunanense (BS = 100, PP = 1.00) and this clade locates at the base of sect. Polygonatum. The other two samples present as sister clade to P. hirtum with significantly high Bayesian posterior probability and bootstrap support (BS = 100, PP = 1.00). For P. franchetii, it was the sister clade to P. stenophyllum (BS = 100, PP = 1.00). Furthermore, P. filipes strongly supported being included in sect. Polygonatum and being sister to P. yunnanense plus P. nodosum (BS = 99, PP = 1.00). Surprisingly, P. campanulatum with alternate leaves located in sect. Verticillata, a group characterized by whorled leaves, and formed a sister clade with Polygonatum tessellatum plus Polygonatum oppositifolium (BS = 100, PP = 1.00), which suggested that leaf arrangement is not suitable as the basis for delimitation of subgeneric groups in Polygonatum.

Figure 9
figure 9

Phylogenetic relationships of the 57 cp sequences of Polygonatum and 4 of Heteropolygonatum, with Maianthemum henryi set as the outgroup. Maximum likelihood (ML) and Bayesian inference (BI) methods were used to reconstruct the tree. Only ML tree was shown, because of the highly identified topologies of ML tree and BI tree. The value of ML supports and Bayesian posterior probabilities were shown above the branches. The cp genomes newly sequenced in this study are highlighted with red triangle marks.

Discussion

Features of complete chloroplast genome and comparative analyses

In the current study, we reported the initial complete cp genomes for one critically endangered Polygonatum species, Polygonatum campanulatum. Additionally, the complete cp genomes of other five species were newly sequenced (P. cyrtonema1, P. franchetii, P. filipes1, P. zanlanscianense1, P. sibiricum1) using Illumina sequencing technology. Besides, cp genomic comparative analyses of the plastomes were carried out among the six species plus another three related species (P. kingianum2, Heteropolygonatum alternicirrhosum, H. ginfushanicum) to understand potential genetic information of Polygonatum. The cp genome showed a typical quadripartite structure, with the length between 154,564 and 156,028 bp in Polygonatum, and 155,436–155,944 bp in Heteropolygonatum. The range of chloroplast genome length variation in these two species was similar to other Asparagaceae and higher plants reported previously51,52,53,54,55. And the size changes are partially caused by elongation or contraction of inverted repeat regions.

Our study revealed that gene content and gene order in the cp genomes of Polygonatum and Heteropolygonatum were highly conserved, with only slight variations in gene size, gene position and gene number. This result is similar to other species of Asparagaceae56. All plastomes contained 131–132 genes comprising 85–86 protein-coding genes, 38 tRNA and eight rRNA. Among these genes, 18 included intron and 19 were duplicated in IR regions. The difference in gene number is due to pseudogenization of rps19 and ycf1 in some sequences. In detail, one of the rps19 genes in P. stewartianum, P. sibiricum1 and P. sibiricum2 presented to be a pseudogene. The first one is attributed to genetic mutation and the others to its location at IR/LSC boundary, which makes the gene lose its ability to replicate fully. And, both ycf1 genes were detected pseudogenized in H. ogisui due to the insertion of a sequence Expression of the rps19 gene is relatively unstable among species of Asparagaceae, the pseudogenization of rps19 has also been reported in Behnia reticulate, Hesperaloe parviflora and Hosta ventricosa, while Camassia scilloides and Chlorophytum rhizopendulum missed this gene completely57. The rps2, infA and other pseudogenes reported previously in Asparagaceae were not detected in this study57,58. In addition, although there were no remarkable variations in GC content among different species, the distribution of GC content was identified as asymmetrical. The higher GC content in IRs means a more stable structure in that GC pairs include three hydrogen bonds and AT pairs have two59. Moreover, this may be attributed to the four rRNA genes, which possess high-level GC nucleotide percentages. Similar results have been found in the chloroplast genomes of other angiosperms60,61,62.

The pattern of codon usage is a vital genetic characteristic of the organism, related to mutation, selection and other molecular evolutionary phenomena63. Our results demonstrated that Leucine (Leu) presented the highest frequency of all amino acids in Polygonatum campanulatum, P. filipes1, P. franchetii, P. zanlanscianense1, P. cyrtonema1, P. sibiricum1, P. kingianum2, Heteropolygonatum alternicirrhosum and H. ginfushanicum. On the contrary, cystine (Cys) was the least abundant amino acid except for stop codons, which was also found in other angiosperm taxa24,64. Furthermore, The result of RSCU analysis illustrated that most codons ended with A or U when RSCU value was greater than one, likewise, most codons ended with C or G when the RSCU value was less than one. This phenomenon revealed that codon usage was biased towards A and U at the third codon position in Polygonatum, which coincided with previous studies56,61,65.

Long dispersed repeats are essential for the rearrangement and stability of the chloroplast genome and relevant to copy number differences among species66. Identifying their number and distribution plays a key role in genomic studies67. The current study found that palindromic repeats were the most common repeat type, followed by forward repeats. Whereas complementary repeat was identified only in P. campanulatum, P. franchetii and H. ginfushanicum did not harbor any reverse repeats. In the plastomes of the nine species reported here, the length of repeats ranging from 30 to 39 bp is dominant, which is commonly observed in other angiosperm lineages31,52,68. Our study also revealed that the repetitive sequences were not randomly allocated in the seven cp genomes of Polygonatum and two cp genomes of Heteropolygonatum, they were mainly identified in the LSC region (48.7%) and CDs (51.9%).

SSR (Simple Sequence Repeats) is a significant codominant DNA molecular marker with the advantages of high abundance, random distribution throughout the genome and ample polymorphism information69,70. Therefore, it provides essential insights into many fields, such as species identification, phylogeography and population genetics71,72. A total of 507 SSRs were detected in the current study, with H. alternicirrhosum containing the most. Further, among the seven cp genomes of Polygonatum and two cp genomes of Heteropolygonatum, six categories of SSRs were observed in total. Mononucleotide SSRs showed the highest frequency in each genome, with A/T as the predominant motif type. Similar results had been reported in numerous taxa53,61,73. By contrast, hexanucleotide SSRs were the rarest type, with only one element being observed in P. cyrtonema1 and P. filipes1. In addition, SSRs lying within LSC regions accounted for the majority (72.4%), which was in agreement with previous studies65,68. In summary, the microsatellites identified in this study will be developed as markers for Polygonatum, and contribute to species identification and evolutionary studies of this genus in the future.

Multiple sequence alignment results revealed the similarities of cp genome in structure, content, and order among Polygonatum and its related species. Consistent with previous reports74,75,76, we also found that no coding regions harbored more distinctive variation than coding regions in this study. Two single-copy regions exhibited higher sequence divergence than the IRs. The following seven intergenic regions, i.e., rps16-trnQ, trnS-trnG, atpF-atpH, atpH-atpI, petA-psbJ, ndhF-rpl32, rpl32-trnL and two genes, i.e., ycf1 and rpl16 were detected as the most divergent. Comparative analysis of Polygonatum and its related species discovered that the cp genomes presented highly conserved, and no interspecific or intraspecific rearrangement was detected.

Contraction and expansion in IRs regions led to variations in cp genome size, which were observed in the evolutionary history of terrestrial plants commonly62. The size of IR regions was relatively similar in Polygonatum and Heteropolygonatum, ranging from 26,214 bp in H. ginfushanicum to 26,415 bp in P. zanlanscianense1. Despite that, all the cp genomes showed similarity in the overall gene order and structures, several variations were identified at the junctions of IR/SC. The current study demonstrated that boundary genes in Polygonatum were mainly rpl22, rps19, trnN, ndhF, ycf1 and psbA, which is also identified with Heteropolygonatum and Hosta56. It further confirms that boundary features are relatively stable across closely related species77. The LSC/IRb boundary was traversed by the rps19 gene in P. sibiricum1, whereas the junctions located between rpl22 and rps19 in the other species. Incomplete duplication of the normal copy resulting in pseudogenization of the rps19 gene located at IRa/LSC boundary, and this phenomenon has also been reported in Polygonatum cyrtonema (MZ029094)14 and other taxa of Asparagaceae, such as Behnia reticulate, Hesperaloe parviflora and Hosta ventricosa57. Excluding rps19, the other genes situated at SC/IR boundaries exhibited relative stability across the six Polygonatum and two Heteropolygonatum species studied in this work. Only ndhF and ycf1 had slight variations in size. The high resemblances in boundaries between SC/IR also demonstrate that all the species share the same genes. Besides, the total number of genes does not change due to IR contraction and expansion78.

We detected trnK-UUU-rps16, trnC-GCA-petN, trnT-UGU-trnL-UAA, ccsA-ndhD and ycf1 were prominent divergent regions, with nucleotide diversity greater than 0.014. There are three loci (matK-rps16, trnC-GCA-petN and ccsA) consistent with previous study14. The result indicated that divergent regions located in LSC were in the majority, and the IR regions displayed relatively poor diversity, which agreed with the results of multiple sequence alignment conducted by mVISTA. The same phenomenon has been observed in many taxa24,31. The regions detected in nucleotide diversity analysis might also provide additional genetic information for DNA barcodes in Polygonatum, but this required the support of further experiments.

The non-synonymous (dN) and synonymous (dS) substitution rates are beneficial in inferring the adaptive evolution of genes25,79. The analysis of dN/dS was carried out owing to its popularity and reliability in quantifying selective pressure80,81. In this study, a total of 14 positively selected sites (comprising 4 significant positive and 10 positive sites) were detected under the BEB method, which were distributed in atpA, ndhJ, ndhK, psaA, psaB, psbA, psbC, psbD, psbK, psbZ, rpoB, rpoC1, rpoC2, rps4. Results indicated that 10 of the 14 positively selected genes are relevant to photosynthesis (Table S11). The plants of Polygonatum are mainly distributed in the shady places of forest, scrub or mountain slopes11. The week sunlight may exert selective pressure on genes, which could leave a trace of natural selection in genes of chloroplast engaged in adaptation to the environment. It can be speculated that photosynthesis-related genes drive the successful adaptation of Polygonatum to diverse environment conditions, considering their extensive distribution range in the northern hemisphere. Photosynthesis-related genes were also found to undergo positive selection in other taxa that are widely distributed or live in shady environments82,83,84,85,86.

Phylogenetic analysis

Phylogenetic analysis based on complete cp genome demonstrated that both Polygonautm and Heteropolygonatum were monophyly. Coinciding with the results of previous studies7,13,28, Polygonatum was composed of three major clades, sect. Verticillata, sect. Sibirica and its sister clade sect. Polygonatum. In the current study, we observed that sect. Sibirica contained only one species, P. sibiricum, which was consistent with Xia, Meng and Wang’s findings7,14,28. However, data from Floden13 suggests that one sample of P. verticillatum was sister to P. sibicirum within sect. Sibirica. Moreover, previous studies indicated that P. verticillatum was paraphyletic, potentially as a result of its wide geographic distribution and diverse morphological variations13,28. A similar result was presented in this study. P. verticillatum1 exhibited as the sister clade to P. zanlanscianense while P. verticillatum2 was sister to P. curvistylum + P. pratti + P. stewartianum, and P. verticillatum3 located at the base of the branch composed by P. curvistylum + P. pratti + P. stewartianum + P. verticillatum2 + P. hookeri + P. cirrhifolium + P. verticillatum3. With similarities to previous findings28, P. cyrtonema was either recovered as paraphyletic in this study given that four samples, including the newly sequenced one, appeared as the sister to P. hunanense, while the other two samples presented being sister relationship with P. hirtum. All the clades were supported highly. It suggests that the circumscription of these two broadly distributed species, P. cyrtonema and P. verticillatum requires further study.

There is little study on the systematic position of P. franchetii, and even less on the its cp genome information. Meng’s team7 reported the phylogenetic relationships included in P. franchetii using four chloroplast fragments (rbcL, psbA-trnH, trnK and trnC-petN) for the first time. Regrettably, the branch structure to which P. franchetii belonged was ambiguous, making it difficult to recognize the relationship between P. franchetii and its close taxa. Wang-Jing14 reported the cp genome of P. franchetii for the first time. However, the sample chuster with P. hirtum + P. multiflorum and located in sect. Polygonatum, which shows difference with this study. Our study suggests that P. franchetii is strongly supported as the sister clade to P. stenophyllum and is situated in sect. Verticillata. Furthermore, P. filipes presented the sister clade to P. yunnanense plus P. nodosum within sect. Polygonatum in this study. And it is found by Xia et al.28 that P. filipes was the sister to the clade consisting of P. inflatum + P. multiflorum + P. odoratum + P. macropodum + P. involucratum + P. acuminatifolium + P. arisanense + P. orientale + P. yunnanense + P. nodosum with high support. However, the clade composed of P. yunnanense + P. nodosum was weakly supported as the sister to the rest species in the sister clade of P. filipes. Besides, P. filipes is the sister clade to P. cyrtonema, and this branch clusters with P. jinzhaiense and P. hunanense in Wang-Jing’s study14. It suggests that the voucher specimens of P. filipes and P. franchetii in Wang’s study14 should be checked further.

One unanticipated finding was that phylogenetic tree strongly supported the placement of Polygonatum campanulatum in sect. Verticillata, despite the fact that P. campanulatum grows alternating leaves, but sect Verticillata is characterized by whorled or opposite leaves. P. campanulatum was compared to P. gongshanense and P. franchetii when it was first published, but material for P. gongshanense was not available in this work. Furthermore, phylogenetic analysis indicated that P. franchetii and P. campanulatum presented in separate branches whereas P. tessellatum + P. oppositifolium were highly supported as the sister to P. campanulatum (BS = 100, PP = 1.00). Despite P. campanulatum, P. tessellatum and P. oppositifolium sharing similar lustrous and lanceolate leaves2,87, they differ in leaf arrangement, filament structure and florescence, etc. In detail, P. campanulatum is characterized by alternate leaves with a retrorse spur at the filament apex and flowers in October, while P. tessellatum and P. oppositifolium differ in whorled or opposite leaves without a retrorse spur at the filament apex and flower in May2,87. Moreover, previous studies discovered that leaf arrangement is labile and the whorled leaves have arisen from the alternate state at least twice7,88. In conclusion, we infer that the use of phyllotaxis to define subgenera within Polygonatum is inappropriate. Additionally, blossom color and pollen exine sculpture were also used as the features to subgroup Polygonatum in previous studies7,12,89. Whereas sect. Verticillata typically displayed reticulate pollen exines and purple or pink perianths, sect. Polygonatum was distinguished by its perforated pollen exines and greenish-white or yellow perianths7,89. In contrast, P. campanulatum placed in Verticillata has perforate reticulate decorations and perianths that are either yellowish green or greenish white87. The controversy over flower color has been reported in the study of Xia and her team28. From this, we can see that flower color and pollen exine sculpture may be irrelated with phylogeny and not ideal as the basis for subgenus classification of Polygonatum either. Moreover, further research about the information is required on base chromosome numbers and karyotypes of P. campanulatum. This work will contribute to a more insightful understanding of the infrageneric classification of Polygonatum and demonstrate that the cp genome is an efficient tool for resolving specific level phylogeny.

Conclusion

In the current study, we sequenced and annotated the cp genomes of Polygonatum campanulatum, P. franchetii, P. filipes1, P. zanlanscianense1, P. cyrtonema1 and P. sibiricum1. Comparative analyses of the chloroplast genome of the six taxa and three related species were conducted. The genome size, gene content, gene order and G-C content maintained a high similarity in the cp genomes of Polygonatum and Heteropolygonatum. No interspecific or intraspecific rearrangements were detected. Five highly variable regions were found to be potential specific DNA barcodes. Fourteen genes were revealed under positive selection and a large variety of repetitive sequences were identified. Sixty-two cp sequences of Polygonatum and its related species were utilized for phylogenetic analyses. The phylogenetic results illustrated that Polygonatum can be divided into two significant clades, sect. Verticillata and sect. Sibirica plus sect. Polygonatum. Further, P. campanulatum and P. tessellatum + P. oppositifolium were strongly supported being sister relationship and located in sect. Verticillata, suggesting that leaf arrangement appears not suitable as basis for delimitation of subgeneric groups in Polygonatum. Additionally, P. franchetii is sister to P. stenophyllum within sect. Verticillata, too. With high morphological and karyological diversity, Polygonatum has attracted much attention in phylogenetic and taxonomic research. Our analysis provides more chloroplast genomic information of Polygonatum and contributes to improving species identification and phylogenetic studies in further work.