The complete chloroplast genome of critically endangered Chimonobambusa hirtinoda (Poaceae: Chimonobambusa) and phylogenetic analysis

Chimonobambusa hirtinoda, a threatened species, is only naturally distributed in Doupeng Mountain, Duyun, Guizhou, China. Next-generation sequencing (NGS) is used to obtain the complete chloroplast (cp) genome sequence of C. hirtinoda. The sequence was assembled and analyzed for phylogenetic and evolutionary studies. Additionally, we compared the cp genome of C. hirtinoda with previously published Chimonobambusa species. The cp genome of C. hirtinoda has a total length of 139, 561 bp and 38.90% GC content. This genome included a large single -copy (LSC) region of 83, 166 bp, a small single-copy (SSC) region of 20, 811 bp and a pair of inverted repeats of 21,792 bp each. We discovered 130 genes in the cp genome, including 85 protein-coding genes, 37 tRNA, and 8 rRNA genes. A total of 48 simple sequence repeats (SSRs) were detected. The A/U preference of the third nucleotide in the cp genome of C. hirtinoda was obtained by measuring the codon usage frequency of amino acids. Furthermore, phylogenetic analysis using complete cp sequences and matK gene revealed a genetic relationship within the Chimonobambusa genus. This study reported the chloroplast genome of the C. hirtinoda.

www.nature.com/scientificreports/ Several reports are present on the chloroplast genome of the Arundinariatae in the Bambusoideae 17 , but limited data is available on the Chimonobambusa genus. Thus, this study reported the chloroplast genome of C. hirtinoda, including its gene content, codon usage, and its comparison with closed species. A phylogenetic relationship was constructed based on previously published cp genomes of Bambusoideae to clarify the taxonomic position of C. hirtinoda. These findings will provide valuable genetic resources for further research on the phylogenetic position of C. hirtinoda and investigate evolutionary relationships of the order Bambusoideae.

Results
Assembly and annotation of the chloroplast genomes. Assembly resulted in a whole cp genome sequence of C. hirtinoda with a length of 139, 561 bp ( Fig. 1), consisting of 83, 166 bp large single-copy region, 20, 811 bp small single-copy regions, and two 21,792 bp IR regions, comprising the typical quadripartite structure of terrestrial plants. The cp genome of C. hirtinoda was annotated with 130 genes, including 85 proteincoding genes, 37 tRNA genes, and 8 rRNA genes ( Table 1). Most of the 15 genes in the C. hirtinoda cp genome contain introns. Of these, 13 genes contain one intron (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rps16, trnA-UGC , trnI-GAU , trnK-UUU , trnL-UAA , trnV-UAC ) and only the gene cyf3 includes two introns, and the gene clpP intron was deleted (Supplementary Table S1). The rps12 gene contained two copies, and the three exons were spliced into a trans-splicing gene 18 .
The accD, ycf1, and ycf2 genes were missing in the cp genome of C. hirtinoda, and the introns in the genes clpP and rpoC1 were lost. This phenomenon is consistent with previous systematic evolutionary studies on the genome structure of plants in the Poaceae family 19 . The phenomenon of missing genes is reported in other plants [20][21][22][23] .
The total GC content in the C. hirtinoda cp genome was 38.90%, and the content for each of the four bases, A, T, G, and C, was 30.63%, 30.46%, 19.57%, and 19.33%, respectively ( Table 2). The LSC region (36.98%) and SSC region (33.21%) exhibited much lower values than the IR region (44.23%), indicating a non-uniform distribution of the base contents in the cp genome, probably because of four rRNAs in the IR region, which in turn makes Genes outside the circle indicate counterclockwise transcription, and genes inside the clockwise transcription. The thick black line on the outer circle represents the two IR regions. The GC content is the dark gray area within the ring. Repeat sequences and codon analysis. SSR consists of 10-bp-long base repeats and is widely used for exploring phylogenetic evolution and genetic diversity analysis [26][27][28][29] .
In total, 48 SSRs were detected in C. hirtinoda, including 27 mononucleotide versions, accounting for 56.25% of the total SSRs, primarily consisting of A or T. Additionally, four dinucleotide repeats consisting of AT/TA and TC/CT repeats, and 3 tri, 13 tetra, and 1penta-repeats ( Fig. 2A). From the SSRs distribution perspective, the majority (79%) of SSRs (38) were observed in the LSC area, whereas 6 SSRs in the IR region (13%) and 4 SSRs in the SSC region (8%) were discovered (Fig. 2B). Previous research suggests that the distribution of SSRs numbers in each region and the differences among locations in GC content are related to the expansion or contraction of the IR boundary 30 .
The REPuter program revealed that the cp genome of C. hirtinoda was identified with 61 repeats, consisting of 15 palindromic, 19 forward and no reverse and complement repeats (Fig. 3). We noticed that repeat analyses of three Chimonobambusa genus species exhibited 61-65 repeats, with only one reverse in C. hejiangensis. Most of the repeat lengths were between 30 and 100 bp, and the repeat sequences were located in either IR or LSC region 31 (Supplementary Table S2).
We identified 20,180 codons in the coding region of C. hirtinoda (Fig. 4, Supplementary Table S3). The codon AUU of Ile was the most used, and the TER of UAG was the least used codon (817 and 19), excluding the termination codons. Leu was the most encoded amino acid (2,170), and TER was the lowest (85). The Relative Synonymous Codon Usage (RSCU) value greater than 1.0 means a codon is used more frequently 32 . The RSCU values for 31 codons exceeded 1 in the C. hirtinoda cp genome, and of these, the third most frequent codon was A/U with 29 (93.55%), and the frequency of start codons AUG and UGG used demonstrated no bias (RSCU = 1).
Comparative analysis of genome structure. The nucleotide variability (Pi) values of the three cp genomes discovered in the Chimonobambusa genus species ranged from 0 to 0.021 with an average value of 0.000544, as demonstrated from DnaSP 5.10 software analysis. Five peaks were observed in the two single-copy regions, and the highest peak was present in the trnT-trnE-trnY region of the LSC region (Fig. 5). The Pi value for LSC and SSC is significantly higher than that of the IR region. In the IR region, highly different sequences were not observed, a highly conserved region. The sequences of these highly variable regions are reported in other plants during examinations for species identification, phylogenetic analysis, and population genetics research [33][34][35] .
The structural information for the complete cp genomes among three Chimonobambusa genus species revealed that the sequences in most regions were conserved (Fig. 6). The LSC and SSC regions exhibit a remarkable degree of variation, higher than the IR region, and the non-coding region demonstrates higher variability than the coding region. In the non-coding areas, 7-9 k, 28-30 k, 36 k and other gene loci differed significantly. No. of tRNA 37 No. of rRNA 8 Table 2. Base composition in the C. hirtinoda choloroplast genome. www.nature.com/scientificreports/  www.nature.com/scientificreports/ Genes rpoC2, rps19, ndhJ and other regions differ in the protein-coding region. However, the agreement between the tRNA and rRNA regions is 100%. A similar phenomenon has also been reported by others 36 .

Region Length (bp) A (%) T (%) G (%) C (%) GC (%)
IR contraction and expansion in the chloroplast genome. Due to the unique circular structure of the cp genome, there are four junctions between the LSC/IRB/SSC/IRA regions. During species evolution, the stability of the two IR regions sequences was ensured by the IR region of the chloroplast genome expanding and contracting to some degree, and this adjustment is the primary reason for chloroplast genome length variation 37,38 . The variations at IR/SC boundary regions in the three Chimonobambusa genus chloroplast genomes were highly similar in the organization, gene content, and gene order. The size of IR ranges from 21,797 bp (C. tumidissinoda) to 21,835 bp (C. hejiangensis). The ndhH gene spans the SSC/IRa boundary, and this gene extended 181-224 bp into the IRa region for all three Chimonobambusa genus. The gene rps19 was extended from the IRb to the LSC region with a 31-35 bp gap. The rpl12 gene was located in the LSC region of all genomes, varied from 35-36 bp apart from the LSC/IRb (Fig. 7).
Three chloroplast genomes of the Chimonobambusa genus were compared using the Mauve alignment. The results showed that all sequences show perfect synteny conservation with no inversion or rearrangements (Fig. 8).

Phylogenetic analysis.
We performed a phylogenetic analysis using the complete chloroplast genomes and matK gene reflecting the phylogenetic position of C. hirtinoda. The maximum likelihood (ML) analysis   www.nature.com/scientificreports/ based on the complete chloroplast genomes indicated seven nodes with entirely branch support (100% bootstrap value). However, the three Chimonobambusa genera exhibited a moderate relationship due to fewer samples used, supporting that C. hirtinoda is closely related to C. tumidissinoda with a 62% bootstrap value more than C. hejiangensis. A phylogenetic tree based on the matK gene revealed that Chimonobambusa species clustered in one branch was consistent with the phylogenetic tree constructed by the complete cp genome tree (Fig. 9). The results show that the whole chloroplast genome identified related species better than the former, consistent with the previous study 39 .

Discussion
In the current study, Chimonobambusa genus exhibited a typical circular tetrad structure, similar to most species. The flowering cycle in bamboo is long and unfixed because of its specific characteristics, and the morphological reproductive and nutritional traits are often difficult to identify because of environmental changes. Therefore, their classification based on morphological details is controversial, leading to unreliable systematic research results 40 . Molecular biology and sequencing technology are more practical methods for their classification. The analysis of plant genetic diversity, species formation, and genetic differentiation is also studied 41 . Therefore, mutation sites observed in the whole chloroplast genome can be used as super bar codes to study the phylogeny and molecular taxonomy of bamboo species. Genetic diversity is one of the crucial indicators to measure the degree of variation. The higher the degree of genetic diversity, the higher the genetic diversity and abundant genetic resources in the population. Genetic diversity is helpful to improve the adaptability of the species to climate change and historical events and provide scientific and effective strategies for the protection and management of germplasm resources of endangered species. Huang et al. used 16 SSR primers to explore the population genetic diversity and genetic differentiation of the endangered plant Camellia chekiangoleosa, and the species conservation strategy was formulated based on the research results 42 .

Conclusions
The current study primarily explored the chloroplast genome of C. hirtinoda and compared it with related species within Chimonobambusa genus. These data provide valuable genetic information that advances the genetic research on Chimonobambusa. A phenomenon of genes loss was discovered by successfully assembling, annotating, and analyzing the whole chloroplast genome sequence of C. hirtinoda. The loss is associated with the rapid evolution of the Poaceae species and the extensive rearrangements of chloroplast structures during the evolutionary process. The acquisition of these data, particularly in terms of SSRs, will enhance the study of the phylogenetic relationships of Chimonobambusa plants, their cp genome variation, and gene function.
IR region is the most conserved region of the chloroplast genome. The expansion and contraction of the IR region impact understanding of the evolution of plant population development. The IR region showed no significant difference in the three Chimonobambusa species in our current research. In addition, a comparative analysis of the Chimonobambusa species revealed that coding regions of the cp genome are more conservative than non-coding regions. Such a change in genetic structure can reflect a relationship with the changes in species, but the mechanism that generates such variations and the subsequent results require further study.
The Poaceae family is generally divided into two large evolutionary branches (BEP and PACCMAD), among which the Bambusoideae, Pooideae, and Oryzoideae belong to the BEP branch. Panicoideae, Arundinoideae, Chloridoideae, Aristidoideae, Arundinoideae, and Micrairoideae belong to the PACCMAD branch. Here, the complete chloroplast genomes based phylogenetic tree (Fig. 9A) revealed high bootstrap support values (only three values less than 80%), and these species can be polymerized into two clades and an outgroup. The genus Bambusa constitutes an isolated evolutionary branch, becoming a monophyletic group, a conclusion that is consistent with the previous reports 43 . The two phylogenetic trees revealed that C. hirtinoda, C. hejiangensis and C. tumidissinoda formed a closely related group. In the evolutionary subclade of the second branch, the genus Ampelocalamus and C. longiusculus have a very close relationship. DNA extraction, Chloroplast genome sequencing. Total genomic DNAs were extracted from the sample using the TIANGEN DNA extraction kit (TIANGEN BIOTECH CO., Beijing, China), and the DNA concentration was detected using spectrophotometry. Total DNA quality was detected using 1% agarose gel electrophoresis. All the DNA obtained from C. hirtinoda was sent to BGI (Wuhan, China. https:// www. genom ics. cn), and the total DNA was sequenced using an Illumina sequencer with an HiSeq2500 system, with the library type selected to be the De Novo Sequencing ≤ 800 bp conventional library.    Sequence divergence. Determining the nucleotide diversity of the whole cp genome can make the identification of related species more accurate and help solve similar problems in the phylogenetic research 49,50 . To compare the differences, three species of Chimonobambusa were selected, using C. hejiangensis as a reference sequence. The software package MAFFT 51 was used to compare the whole cp genomes of the three species, with the comparison results manually truncated at both ends. Then the software package DnaSP5.10 was used to calculate the Pi values among species sequences 51 . The sliding window was set to 600, and the step size was 200. The online program mVISTA 52 was used to compare three species, using C. hejiangensis annotation as the reference. The software MAUVE 53 provided rearrangements of those gene sequences.
Phylogenetic analyses. For 14 sequences of complete chloroplast genome sequences and matK gene of Bambusoideae species and Hypolytrum nemorum (Cyperaceae: Hypolytrum) was selected as an outgroup for the construction of the phylogenetic tree to identify the taxonomic position of the C. hirtinoda. All sequences were aligned using the tool MAFFT, and the maximum likelihood (ML) phylogenetic tree was constructed using the software package MEGA-X 54 , and the bootstrap replicates parameter was set to 1000.
Specimen collection statement. The collection of fresh leaves obtained the permission of the nature reserve.

Data availability
The complete chloroplast sequence generated and analyzed during the current study is available in GenBank (https:// www. ncbi. nlm. nih. gov, accession numbers are described in the text).