Structural variation of the complete chloroplast genome and plastid phylogenomics of the genus Asteropyrum (Ranunculaceae)

Two complete chloroplast genome sequences of Asteropyrum, as well as those of 25 other species from Ranunculaceae, were assembled using both Illumina and Sanger sequencing methods to address the structural variation of the cp genome and the controversial systematic position of the genus. Synteny and plastome structure were compared across the family. The cp genomes of the only two subspecies of Asteropyrum were found to be differentiated with marked sequence variation and different inverted repeat-single copy (IR-SC) borders. The plastomes of both subspecies contains 112 genes. However, the IR region of subspecies peltatum carries 27 genes, whereas that of subspecies cavaleriei has only 25 genes. Gene inversions, transpositions, and IR expansion-contraction were very commonly detected in Ranunculaceae. The plastome of Asteropyrum has the longest IR regions in the family, but has no gene inversions or transpositions. Non-coding regions of the cp genome were not ideal markers for inferring the generic relationships of the family, but they may be applied to interpret species relationship within the genus. Plastid phylogenomic analysis using complete cp genome with Bayesian method and partitioned modeling obtained a fully resolved phylogenetic framework for Ranunculaceae. Asteropyrum was detected to be sister to Caltha, and diverged early from subfamily Ranunculoideae.

expansion-contraction, may provide important systematic information about the family 3,11,12,15,20 . Zhai et al. carried out a plastid phylogenomic study focusing on Ranunculaceae based on a well sampling scheme 15 . They also discussed structural variation of plastid genome sequences of the family. However, their study focused on tribal relationship of Ranunculaceae and only included one sample of Asteropyrum. In their study, they detected and reported five types of plastome sequences of the family based on gene inversions and transpositions but left IR expansion-contraction (widely distributed and highly diverged in the family) issue unaddressed. For phylogenomic analyses, they used two matrices (coding region and complete plastome sequences) and three phylogenetic reconstruction methods (Parsimony, Maximum Likelihood, and Bayes) with no partitioned modeling applied the Bayesian analysis. The results gave a better resolution of the family. However, position of Asteropyrum was still not well supported.
In this study, we reported the complete cp genome sequences of both subspecies of Asteropyrum, as well as 25 plastome sequences from other genera in Ranunculaceae using genome skimming data. We then compared the synteny and plastome structure across the family and conducted a plastid phylogenomic study using partitioned modeling method to resolve deep-level relationships and explore plastome structural evolution across Ranunculaceae. The aims of these analyses were to clarify variation in the cp genome across the genus Asteropyrum, to detect structural variation in the Asteropyrum cp genome in comparison with other genera in the family, to infer the phylogenetic position of Asteropyrum within the family, and to try to compare and reconstruct the deep-level relationships of Ranunculaceae using different plastome partitions.  (17,655,994 reads), respectively. Blat analysis was used to select 133,486 putative plastid reads for ssp. peltatum and 144,722 reads for ssp. cavaleriei. The reads from ssp. peltatum were used to obtain two large contigs from the de novo assembly (51,204 bp, and 81,421 bp). One gap was bridged using Sanger sequencing (with primers: LSCF: TGCGATGCTCTAACCTCTGAG; LSCR: AGAGCAATCCTAACCAGAATCATCT). For the ssp. cavaleriei data, only one large contig (131,836 bp), which included complete LSC, IR, and SSC regions was derived. Information regarding the cp genome assembly for all of the other newly sequenced samples from Ranunculaceae is presented in Supplementary Table S1.
The complete cp genome sequences of ssp. peltatum and ssp. cavaleriei are 164,455 bp and 164,274 bp, respectively (Fig. 2, Table 1), and the rate of identical sites between the two plastome sequences was 98.1%. The two complete cp genome sequences had similar GC content (38.0% for ssp. cavaleriei and 37.9% for ssp. peltatum). Structural variation of the chloroplast genome in Ranunculaceae. When the LAGAN alignment program was applied, the mVISTA results showed a large proportion of un-matched area in tribe Anemoneae and the genus Adonis (Fig. 3), which indicated inversions and gene transpositions in the cp genomes of these plants. The shuffle-LAGAN method was used to obtain a well-matched global alignment for all plastome sequences of Ranunculales ( Supplementary Fig. S2). Inversions and transpositions in cp genomes of tribe Anemoneae have been reported previously 11,12,15,20 , and the cp genome of Adonis was found to have a large inversion in the LSC region between rps16 and trnT-UGU, which contains 35 genes (43,411 bp in total length), including 21 protein-coding genes and 14 tRNA genes ( Supplementary Fig. S1-Adonis). The inversion within Adonis showed no phylogenetic relationship with that in tribe Anemoneae.
IR expansion-contraction was found to be very common in Ranunculaceae and its relatives ( Table 2, Supplementary Fig. S3). After comparing a wide range of angiosperm cp genomes, we identified the cp genome of Amborella as the standard IR composition (with 17 genes) that is commonly shared by most of angiosperms. In Ranunculaceae, the cp genomes of Asteropyrum, Dichocarpum W. T. Wang & P. K. Hsiao, Hydrastis L., and tribe Anemoneae (Anemone L., Pulsatilla Mill., Hepatica Mill., Anemoclema (Franch.) W. T. Wang, Clematis L., Naravelia DC., etc.) showed IR expansion, whereas IR contraction was detected in Helleborus L. and Ceratocephala Moen. (detailed IR information for all tested Ranunculaceae samples is presented in Supplementary Table S3). The longest IR in Ranunculales was found to be that of Berberis amurensis Rupr., which carries 32 genes. Asteropyrum showed the longest IR regions within the family Ranunculaceae (Fig. 2, Table 2). plastid phylogenomic analyses. Seven datasets and three methods were used to obtain 21 phylogenetic frameworks for the family Ranunculaceae. All of these frameworks were largely similar to each other (presented in Supplementary Fig. S4). Topology conflicts among different datasets or methods were usually not well supported statistically.
The parameters of the parsimony analyses are presented in Table 3. This analysis yielded less resolved phylogenies in comparison with the ML and Bayesian methods. The phylogenetic inferences using the parsimony method showed largely congruent results among the seven datasets. The complete cp genome dataset obtained the best resolved and supported phylogenetic tree, in which Asteropyrum was sister to Nigella with weak bootstrap  Supplementary Fig. S4). This result was different from ML and Bayesian analyses (Asteropyrum sister to Caltha). The topologies from the Bayesian and ML analyses were almost the same for each dataset, but the ML analyses also yielded less resolved and supported trees than the Bayesian method. Among all the 21 phylogenetic trees, the one obtained from the complete plastome sequence dataset with Bayesian method showed a fully resolved phylogeny with all the branches supported by PP value of 1 (Fig. 4). Therefore, the phylogenetic relationships discussed below were mainly based on this result. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussions
After comparing the first diverged Amborella 36 and other angiosperm species, we determined that a total number of 113 genes (with 17 genes in IR region) in the cp genome can be set as a primitive cp genome structure based on its prevalence in angiosperms. Then, we summarized the structural variation of the cp genomes (including gene composition, gene inversion and transposition, and IR expansion-contraction) across Ranunculaceae based on all the available data.

Species
Gene number in IR length of IR Additional (or absent) genes when IR expansion-contraction occurs (Amborella as the standard)  www.nature.com/scientificreports www.nature.com/scientificreports/ Structural variations of the cp genome within Ranunculaceae have been reported 11,12,15,20,37,38 . Using restriction site mapping, Johansson 38 reported large inversions and gene loss in plastomes of Adonis species. In this study, one inversion in A. coerulea, which was identical with the A. vernalis inversion described by Johansson 38 , was confirmed and located in detail. Zhai et al. reported the same inversion in A. sutchuenensis 15 . We also detected one gene loss (rpl32) in A. coerulea (Supplementary Table S3). The inversion within Adonis plastomes could be a synapomorphy within the genus. Structural variations of cp genome have been reported within tribe Anemoneae by previous studies 11,12,15,20,37 . In comparison with Adonis, those cp genome structural variations of tribe Anemoneae differed in size and number. No evidence indicates that these inversions in the cp genomes of Adonis and tribe Anemoneae are phylogenetically related.
IR expansion-contraction is also an important process involved in cp genome variation within Ranunculaceae. Based on a broad comparison with other angiosperm species, ten genera (Asteropyrum, Clematis, Archiclematis, Naravelia, Pulsatilla, Anemone, Anemoclema, Hepatica, Dichocarpum, and Hydrastis) were found to exhibit IR expansion, whereas two genera (Helleborus and Ceratocephala) showed IR contraction. Some of these IR expansion events may be phylogenetically informative. For example, within the well supported tribe Anemoneae, a single IR-expansion, which may be a synapomorphy of the plastome variation in the tribe, was detected 12 . However, in other case, IR expansion-contraction may occur independently in some genera. For example, the cp genomes from two species of Dichocarpum showed different IR regions. One species, D. dalzielii, had a normal 17-gene IR, whereas the other, D. sutchuenense, carried expanded IR regions (19 genes) ( Table 2). On the other hand, IR contraction and gene loss in Helleborus and Ceratocephala (Table 2) also evolved independently within Ranunculaceae because these two genera were separated phylogenetically (Fig. 4).
The genus Asteropyrum did not have gene inversions or transpositions in their cp genomes, but they showed significant IR expansion and carried the longest IR regions in the family ( Table 2). Different gene compositions in IR regions of the two subspecies were detected (Fig. 2, Table 2). Two more genes (rpoA and rps11) were present in the IR region of ssp. peltatum, and this variation is stable within the subspecies. These phenomena suggested that plastome structural variation can occur not only among genera but also within a single species.
In the present study, an intermediate sample showed the same IR-SC boundary with A. peltatum ssp. cavaleriei. Morphologically, ssp. peltatum and ssp. cavaleriei can be easily distinguishable by their size and shape of leaves. Intermediate forms between the two subspecies in morphology and palynology are sometimes present at their overlapping zones 24 and can be either more similar to ssp. peltatum or to ssp. cavaleriei. The intermediate individual sampled in this study was small in its leaf and scape size but with slightly five-angled leaves (Fig. 1E,F). According to Yuan & Yang 24 , it can be recognized as an intermediate form but more similar to ssp. cavaleriei. The IR-SC boundary indicated that this intermediate individual could be a hybrid one between the two subspecies with ssp. cavaleriei as its maternal parent.
A large number of molecular phylogenetic as well as several phylogenomic analyses have been conducted for Ranunculaceae [33][34][35][39][40][41][42][43][44][45] . However, these studies have suffered from poor resolution largely due to an insufficient phylogenetic signal or insufficient sampling. Zhai et al. carried out a well sampled plastid phylogenomic study on Ranunculaceae 15 , and they obtained a better phylogenetic framework of the family. However, this study did not separate the plastid sequence in detail and the substitution model was tested without partitioning. Thus, we still do not know the diversification and phylogenetic resolving ability of different partition of the plastid genome sequences.
In this study, phylogenetic relationships within Ranunculaceae were inferred from six separate datasets, as well as a complete plastome sequence data. Among all the six separate datasets, intergenic spacer data showed the highest rate of informative sites (45.37%). Whereas, the IR region showed the lowest rate of informative sites (8.50%), indicating that it is the most conserved region in the entire plastid genome ( Table 3). The phylogenetic resolution of IR region was also shown to be the worst among the seven datasets ( Supplementary Fig. S4) because of its small number of phylogenetic signals.
All the phylogenetic trees inferred by different datasets were largely congruent ( Supplementary Fig. S4). Hydrastis and Coptis consistently appeared as the earliest diverged lineages in the family. Major clades, including subfamily Thalictroideae, tribe Adonideae, tribe Ranunculeae, tribe Anemoneae, tribe Cimicifugeae, and tribe Delphinieae were resolved in all the analyses. However, the systematic positions of Asteropyrum, Caltha, Helleborus, Callianthemum and Nigella were found to be unstable and often had low statistical support values www.nature.com/scientificreports www.nature.com/scientificreports/ using the six separate datasets. These genera may have undergone unusual evolutionary processes, such as ancient hybridization or rapid radiation, coupled with their origin and evolution process.
Because of the different rates and patterns of nucleotide substitutions among the cp genome sequences 46 , data partitioning methods are required for phylogenetic reconstruction to ensure the accuracy of the analysis 2,47-50 . Our results showed that Bayesian analyses with partitioned models always obtained the best resolved Ranunculaceae phylogeny for each dataset. Using Bayesian analysis, the complete plastome dataset obtained a fully resolved phylogeny for Ranunculaceae (Fig. 4), which was better resolved than all the previous molecular and phylogenomic studies 15,[33][34][35][39][40][41][42][43][44][45] .
In general, the Bayesian phylogeny inferred from the complete cp genome sequences is largely congruent with the phylogeny obtained by Cossard et al. 35 Using eight DNA fragments from the chloroplast, mitochondrial, and nuclear genomes, Cossard et al. identified the sister relationship of subfamilies Thalictroideae and Adonideae with insignificant statistical support (PP = 0.80) 35 . This relationship was not resolved by most previous molecular phylogenetic studies 32,33,[40][41][42]44,51,52 . In the present study, our plastid phylogenomic analysis resolved the sister relationship between subfamilies Thalictroideae and Adonideae (Fig. 4), and supported the hypothesis of Cossard et al.
From the Bayesian analysis, we detected the sister relationship of Callianthemum and Helleborus (Fig. 4) which was not resolved by all the previous studies of Ranunculaceae phylogeny 32,33,[40][41][42]44,51,52 . The clade of Callianthemum and Helleborus was found to be related to a well-supported clade of tribe Ranunculeae + tribe Anemoneae. The sister relationship of tribe Cimicifugaea and the tribe Delphinieae + tribe Nigellaea clade www.nature.com/scientificreports www.nature.com/scientificreports/ (Fig. 4) was resolved by our analysis. This relationship was previously reported by Hoot but without sufficient statistical support 41 .
The phylogenetic position of Asteropyrum has been disputed for a century and still remains to be resolved by now 20,24,32,35,53 . In this study, the sister relationship of Asteropyrum and Caltha was discovered using the complete cp genome datasets with ML and Bayesian methods. This clade was further found to be first diverged from  www.nature.com/scientificreports www.nature.com/scientificreports/ subfamily Ranunuloideae (Fig. 4). Wang et al. also proposed sister relationship of Asteropyrum and Caltha by combining molecular and morphological data, but their results was not statistically supported 32 . It is noteworthy that Cossard et al. resolved a sister relationship of Asteropyrum and Callianthemum with a strong support value using a combination of chloroplast, mitochondrial, and nuclear genes 35 . This unexpected result was solely contributed by the nuclear gene RanaCYL1 dataset. The contradictory results by our cp genome analysis and nuclear RanaCYL1 data may be caused by either an ancient hybridization event or incomplete lineage sorting of nuclear RanaCYL1. Further studies focused on this issue may be conducted using additional markers from nuclear genome.
In the Bayesian phylogram, many major clades of Ranunculaceae had a very short branch length, including the Asteropyrum + Caltha clade, Cimicifugeae + Delphinieae + Nigelleae clade, and Ranunculeae to Nigelleae clade (Fig. 4). This indicated ancient lineage radiation of the subfamily Ranunculoideae, which was also proposed by Zhai et al. 15 Thus, it is not surprising that previous molecular phylogenetic analyses using limited molecular markers failed to resolve the phylogenetic framework of Ranunculaceae. Further phylogenomic studies using additional evidence, such as mitochondrial and nuclear genomic data, may help to deeper our understanding of the evolution of Ranunculaceae.

conclusion
The two subspecies of Asteropyrum carried quite different plastid genomes with different IR-SC borders and much sequence variation. The plastome sequence of Asteropyrum has the longest IR regions in the family. Unlike Adonis and species from tribe Anemoneae, no gene inversions and transpositions were detected in the Asteropyrum cp genome. The complete cp genome showed excellent suitability for drawing phylogenetic inferences within Ranunculaceae. The complete cp genome sequence, as well as its structural variation (gene inversions-transpositions and IR expansion-contraction), can provide abundant phylogenetic information for the family. In contrast, non-coding regions have excessive variations and a high level of noise, so they are not suitable for resolving the generic relationships within the family. As mentioned by Ma et al., analyses in plastid genomic studies should always be conducted using partitioned datasets, and parsimony analysis often obtains unsatisfactory results 2 . Our plastid phylogenomic inferences, which were obtained using the complete cp genome sequence and Bayesian analysis, provided a better resolved phylogenetic framework for Ranunculaceae in comparison with all the previous studies. Asteropyrum was detected to be closely related to Caltha, and unusual ancient evolutionary processes for Asteropyrum was also suggested by our findings.

Methods
Taxon sampling and sequencing. Fresh leaves were collected from plants from field by the authors (Table 4) and the samples of Nigella and Helleborus were from cultivated plants (seeds were obtained from Mr. Fothergill's Seeds Limited Company). The leaf samples were dried with tell-tale silica-gel. Samples from both subspecies of Asteropyrum were collected from different populations. We also sequenced 25 other species from Ranunculaceae and mined 21 accessions of complete cp genomes (including five outgroups from Ranunculales) from GenBank for comparative analysis. The samples covered 31 genera and represented most tribes of Ranunculaceae, but did not include the basal clades of Trib. Glaucidieae and Trib. Xanthorhizeae 33 (Table 4). For all leaf tissue samples, total genomic DNA was isolated using the CTAB method 54 and assessed by agarose gel electrophoresis.
The total DNA samples of the 27 newly sequenced species were sent to Novogene (http://www.novogene. com, China) for library construction and next-generation sequencing. Short-insert (350 bp) paired-end read library preparation and 2 × 150 bp sequencing were performed on an Illumina (HiSeq4000) genome analyzer platform. Approximately 2-4 Gb of raw data for each species were first filtered using the FASTX-Toolkit to obtain high-quality clean data by removing adaptors and low-quality reads (http://hannonlab.cshl.edu/fastx_toolkit/ download.html). The remaining clean reads (high-quality reads) were sent to the authors for further analysis.
Chloroplast genome assembly and annotation. For the clean reads, BLAT analysis was used to exclude nuclear and mitochondrial reads using published plastid genome sequences from Ranunculaceae as references 55 . Next, de novo assembly was performed using Geneious R11 with a medium-low sensitivity setting 56 to assemble plastid genome sequences. For most of the tested samples, only one contig (approximately 130 kb) was obtained by de novo assembly. If more than one smaller contig was obtained, the whole-genome reads were mapped to those contigs using the Fine Tuning program in Geneious R11 (iterating up to 100 times) to fill gaps. Contigs were connected by overlapping their terminal sequences using the Repeat Finder program implemented in Geneious R11. Sanger sequencing was also used to bridge gaps when necessary. When a 130 kb contig (including SSC, IR, and LSC) was built for each sample, the IR region was determined using the Repeat Finder program, after which the IR region was inverted and copied manually to construct the complete cp genome sequence. Because great variation was found at the IR-SC boundaries in the two Asteropyrum plastomes, we subjected these regions to Sanger sequencing. Broader population sampling of the two Asteropyrum subspecies, as well as an intermediate individual, was applied to assess the stability of the variation at IR-SC boundaries using Sanger sequencing (Supplementary Tables S4 and S5).
Complete plastid genomes were annotated using the Unix program Plann 1.1.2 57 and manually verified using Geneious Annotate R11 and the online program Blast 58 . The cp genome sequences and annotations were uploaded to GenBank using Bankit (https://www.ncbi.nlm.nih.gov/books/NBK63590/). Accession numbers are shown in Table 4. Illustrations of all the newly sequenced plastomes were obtained using the Organellar Genome DRAW tool 59 . (2019) 9:15285 | https://doi.org/10.1038/s41598-019-51601-2 www.nature.com/scientificreports www.nature.com/scientificreports/ Comparative chloroplast genomic analyses for two subspecies of Asteropyrum and their relatives. The IR/SC boundaries of Asteropyrum and the other tested species were illustrated and compared with other outgroups to address IR expansion-contraction. MAFFT v7.309 60 was used to align the plastome sequences, whereas mVISTA 61 was used to export visual results to allow evaluation of the structural similarity of plastomes. The alignment programs applied in mVISTA were LAGAN, which produces true multiple alignments regardless of whether they contain inversions or not, and Shuffle-LAGAN, which can detect rearrangements and inversions 62,63 . The sequence diversification of the two subspecies of Asteropyrum was assessed by comparing the two aligned sequences. The rate of identical sites was calculated using Geneious R11. plastid phylogenomic analysis. In this study, 48 cp genome sequences, including those of five outgroups from Ranunculales, were aligned for phylogenomic analysis. The gene orders for the cp genome sequences of Tribe Anemoneae and Adonis (in which inversions and/or transposition regions were present) were shuffled in the same order with other Ranunculaceae and outgroup species. The following data sets were applied for phylogenetic reconstruction: complete cp genome sequence (with only one IR region), LSC, SSC, IR, coding regions (CDs), intron, and intergenic spacer regions. All datasets were aligned using MAFFT v7.309 60 . Ambiguous alignments and sites with more than 80% missing data 4 were deleted automatically using a Python script (https:// github.com/HeJian151004/get_homology).
Maximum Parsimony (MP) analysis was conducted for all the seven datasets using PAUP v4.0b10 64 . Characters were treated as unordered and equally weighted, whereas gaps were treated as missing data. Branch-and-Bound or a 1000-replicate heuristic search was applied with simple addition, and tree bisection reconnection branch swapping with MUL-trees was utilized to search the MP tree(s). Statistical support was assessed by 1000 bootstrap replicates with 1000 random taxon addition replicates and 10 trees held at each step.
Maximum likelihood (ML) analyses were carried out with RAxML-HPC2 v8.2.10 65 performed on the online server (https://www.phylo.org/). The GTR + G model was applied for all datasets as suggested by the software instructions (see RAxML manual). The statistical value was inferred using the combined rapid bootstrap method (1000 replicates).
Bayesian inference (BI) was performed with MrBayes v3.2.3 66 using partitioned substitution models tested by PartitionFinder v2.1.1 67 with a minimum subset size of 5000, because excessively parameter-rich models often cause calculation problems in Bayesian analysis and fail to converge 2,50 . The best substitution models and data partition schemes were selected by Akaike information criterion (AIC) 68 . Two parallel independent Markov chain Monte Carlo (MCMC) chains were run, each of which consisted of three hot chains and one cold chain for 5,000,000 generations. The trees were sampled and saved every 100 generations. The MCMC convergence was tested by calculating the standard deviation value of split frequencies (less than 0.01) and by assessing the convergence of the parameter values of the two runs. The first 25% of trees were discarded as burn-in, and the remaining trees were used to generate the consensus tree.

Data availability
The new sequenced plastome sequenced are all deposited at NCBI and seen in Table 4.