Plastome evolution and organisation in the Hoya group (Apocynaceae)

The genus Hoya is highly diverse and many of its species are popular ornamental plants. However, the relationships between Hoya and related genera (the Hoya group) are not fully resolved. In this study, we report 20 newly sequenced plastomes of species in the Hoya group. The complete plastomes vary in length from 175,405 to 178,525 bp while the LSCs vary from 90,248 to 92,364 bp and the complete SSCs vary from 2,285 to 2,304 bp, making the SSC in the Hoya group one of the shortest known in the angiosperms. The plastome structure in the Hoya group is characterised by a massive increase in the size of the inverted repeats as compared to the outgroups. In all ingroup species, the IR/SSC boundary moved from ycf1 to ndhF while this was not observed in outgroup taxa, making it a synapomorphy for the Hoya group. We have also assembled the mitogenome of Hoya lithophytica, which, at 718,734 bp, is the longest reported in the family. The phylogenetic analysis using exons from 42 taxa in the Hoya group and three outgoups confirms that the earliest divergent genus in the Hoya group is Papuahoya, followed by Dischidia. The relationship between Dischidia and the clade which includes all Hoya and Oreosparte taxa, is not fully supported. Oreosparte is nested in Hoya making it paraphyletic unless Clemensiella is recognised as a separate genus.


Results
Plastome structure in the Hoya group. We acquired complete plastomes for ten species in the ingroup and two in the outgroup. For a further ten species in the ingroup, we acquired near complete plastomes, with 1-6 gaps in mononucleotide regions with low coverage (Table 1). For the remaining 19 species (18 ingroup, one outgroup), all targeted exons were acquired.
We observed that species in the Hoya group have a very large copy number of plastomes per cell: 3.55-21.70% of all sequencing reads mapped to the plastome ( Table 1) 29 .
The plastome structure in all species in the Hoya group ( Fig. 1) is characterised by a massive increase in the size of inverted repeats as compared to the outgroups; the IR/SSC boundary moved from ycf1 to ndhF (Fig. 2). The outgroup species that is most closely related to the Hoya group, Marsdenia flavescens A.Cunn. ex Hook., lacks this boundary change, but it is characterised by a smaller change in the IR/LSC boundary (loss of rps19 from IR to LSC).
The nucleotide diversity (Pi) in the ingroup varies from 0 to 0.0433 (Fig. 3). The IRs were consistently less variable than the other parts of the plastomes, except for one highly variable gene (ycf1). One gene in the LSC (accD) and the region at IR-SSC boundaries (near ndhF) were similarly variable. ycf1 and accD were characterised by long aminoacid repeats.
Mitogenome structure. The mitogenome structure of Hoya lithophytica Kidyoo (Fig. 4) shows massive restructuring in relation to the other complete mitogenomes available in Apocynaceae (Fig. 5). At 718,734 bp, it is the longest mitogenome reported in the family. Movement of plastome DNA to mitogenome explains at least 56,698 bp (7.889%) of the mitogenome.
Phylogenetic analysis. The model choice in MrBayes had no effect on the tree topology, and only a minor effect on the node values. The two model options tested resulted in the same topology and highly similar BPP values (Bayesian Posterior Probability), differing at most by 0.05 for any node; both runs passed our quality control. The values indicated in the next paragraph and shown in the molecular phylogeny presented in Fig. 6 were acquired using GTR + Gamma.
To facilitate comparison, for the Hoya clades we refer to the clade names of Wanntorp et al. 10     The inverted repeat closest to psbA was removed, and the small single copy is displayed in a direction that best illustrates the shift in inverted repeat boundaries. The alignment colours refer to locally collinear blocks shared between plastomes. The extent of the inverted repeat is shown with a bar.

Discussion
With the exception of Wei et al. 26 , previously published work on plastomes of the Hoya group only resulted in incomplete plastomes 16,30 or incorrectly assembled plastomes 25 . This is not surprising, as assembling plastomes in this group is very challenging: our attempts at automated sequence assembly only resulted in small fragments, which often incorporated mitogenome sequences and required extensive manual corrections. Likely causes of the difficulty in assembling the genomes are the extensive expansion of IRs, and the near-loss of SSC, the large amount of sequences shared by the plastome and the mitogenome and the frequent repetitive elements found in the plastomes.
In our experience, the fastest method of genome assembly is assembly of reads to a reference, followed by manual correction. A highly time-consuming process of manual checking of the entire alignment followed all initial assemblies. As the order of the genes in the plastomes and the placement of IRs was highly conserved in all assembled plastomes, we did not assemble the sequences of the remaining 20 species we sequenced.
The large copy number of plastomes per cell that we observed is not uncommon in Apocynaceae. Similarly high proportions of plastome reads (11.8%) has been reported in Asclepias 11 . This is much higher than in most angiosperms, where c. 1% of sequencing reads mapping to plastomes is common (pers. obs.). The high copy number helps in part to reduce assembly issues derived from the highly repetitive intergenic parts of the Hoya plastome; however, routine assembly of Hoya plastomes from short sequencing reads is likely to remain challenging. Even with the very high coverage that we attained, some low GC content intergenic regions had very low or even zero sequence coverage, leading to gaps in some of our assemblies. We think this was likely due to heavy degradation of the plastome, which may have occurred as leaves age. Use of younger leaf tissue might help to avoid this issue.
The assembly of the mitogenome was even more time consuming, as iterative extension of sequences to bridge gaps was soon interrupted by presence of plastome sequences. Assembling reads from other species to the already assembled mitogenome of H. lithophytica did not help much, suggesting that there is a high level of instability of mitogenomes within Hoya.
The gene order and overall architecture of the ingroup samples is highly similar to that reported by Wei et al. 26 in Hoya carnosa, but all our assemblies differed significantly from those reported by Tan et al. 25 in H. verticillata and H. diversifolia. We have reported CDSs and/or exons omitted by Wei et al. 26 , specifically accD, ndhD, ndhH, ycf2 and ycf15, as a corresponding open reading frame was present. Tan et al. 25 also omitted the CDSs of accD, ndhH, ycf1, and ycf2.
The dramatic IR-LSC boundary shift reported by Wei et al. 26 is shared by the entire Hoya group, including Papuahoya, whose plastome is strongly supported to have diverged before all other ingroup taxa. Since all  www.nature.com/scientificreports/ ingroup taxa included this boundary change which is not seen in any of the outgroups used, it can be considered a synapomorphy for the Hoya group. The boundary shift was not reported by Tan et al. 25 , but we believe that this was in error. Our study includes conspecific sequences to those they reported, and in our analyses, they clearly shared the structure with the other Hoya group species. The two species are deeply nested in the phylogeny of the Hoya group.
The genome structure of Hoya has some parallels to other Asclepiads. As reported for Asclepias 13 the intergenic regions of Hoya have long repeated regions of very low GC content. These regions make it difficult to map reads of Hoya even to a closely related species, and undoubtedly offer a challenge in use of intergenic reads. However, the boundary-shift observed is unique to the Hoya group.
Outside of Apocynaceae, there are clear parallels between the plastome restructuring in the Hoya group and that of Lamprocapnos spectabilis (L.) Fukuhara (Papaveraceae). Park et al. 29 reported the extension of the IR/SSC boundary from ycf1 (outgroup) to ndhF, and Lamprocapnos also has AARs in accD and ycf1 (however, this was not mentioned in ycf2). Unlike in Lamprocapnos, no additional inversions or other changes to gene order were     , ycf1 and ycf2). Numbers at the nodes indicate bootstrap percentages followed by Bayesian Posterior Probability (only indicated when not fully supported). www.nature.com/scientificreports/ seen in our study in relation to the outgroup used. The SSC observed in Lamprocapnos is the smallest known in any plant, but only slightly smaller than that observed in species in the Hoya group. The phylogenetic tree obtained is congruent with the most recent phylogeny of the Hoya group 7 , with the recognition of the monophyletic genera Papuahoya, Oreosparte and Dischidia that are fully supported.
Based on Rodda et al. 7 the relationships between Oreosparte, Dischidia and Hoya as well as Group 1 and numerous smaller clades within Hoya are not fully supported (their Fig. 4). Our analysis (Fig. 6) confirms that the earliest divergent genus in the Hoya group is Papuahoya, followed by Dischidia. Dischidia is sister to a clade which is not fully supported (77% BS, 1 BPP) including all species currently attributed to Hoya and Oreosparte. Species of Hoya were segregated in two clades in Rodda et al. 7 , one (their Clade I) including four species from continental Asia (100% BS) the other containing the rest of the species (80% BS). Basal clades in Group 1 of Hoya were unsupported (53-69% BS, their Fig. 4). In our analysis instead their Clade I is deeply nested in Hoya s.s., while Hoya is still separated into two clades, the first (Clade II) including species formerly included in Eriostemma + Clemensiella (100% BS, 1 BPP) is sister to Oreosparte + Hoya s.s., the second (Hoya s.s., 99% BS, 1 BPP) is sister to Oreosparte. Clade II can therefore be tentatively classified under the already available genus name Clemensiella, here represented by C. omlorii. Hoya coronaria and H. gigas have also been alternatively classified in the genus Eriostemma (type species: Hoya coronaria), which could now be considered a synonym of Clemensiella. This clade also includes H. lithophytica, a rock dwelling species from NW Thailand. The four species in this clade are characterised by terrestrial (or hemi epiphytic) climbing habit and by having pollinia without pellucid margins. These characters are unique to this clade among the species we sampled here, but not unique in the genus as other species can be terrestrial and lack pellucid margins of the pollinium (e.g. Hoya surisana Rodda & S.Rahayu). The second Hoya clade that includes the type of the genus H. carnosa is to be considered as Hoya s.s.
Based on our results either Hoya needs to be separated in two genera, Hoya and Clemensiella, or Hoya needs to be more broadly circumscribed to also include species currently in Oreosparte. In this latter scenario Clemensiella, Oreosparte and Clade I (Group 1 and 2) of Hoya may be allocated to subgeneric rank.
Our sampling of the Clemensiella clade is limited and samples of more taxa are needed to verify whether Clemensiella and Eriostemma should be kept separated (either at generic or subgeneric level).
Before making any nomenclatural changes, a more extensive phylogeny should be generated including extensive nuclear data to verify that the topology is congruent and that the observed clades are supported.

Materials and methods
We sequenced 38 species in the Hoya group, and three outgroups (in Marsdenia R.Br. s.l., and Jasminanthes Blume). Outgroups were selected due to their known position as outgroups of Hoya group (Rodda et al. 7 ). The ingroups were selected to represent all the genera of the Hoya group where material is available (Hoya, Dischidia, Oreosparte, Papuahoya). Within Hoya we included at least one sample for each of the main Hoya clades (clades I to VI) of Wanntorp et al. 10 and Rodda et al. 7 . For Dischidia we included eight taxa that represent the morphological variation of the genus, including Dischidia parasita (Blanco) Arshed, Agoo & Rodda, the type of the synonymous genus Dischidiopsis Schltr. Oreosparte and Papuahoya are represented by two species each.
Plant materials and DNA extraction. The leaves were collected from plants cultivated at the Singapore Botanic Gardens or obtained from herbarium specimens. All plant specimens used for this study were collected to the best of our knowledge in compliance with local, institutional, national, or international regulations at the time of collection. All newly prepared voucher specimens were deposited in the Singapore Botanic Gardens Herbarium (SING). Their information is summarised in Table 2. The herbaria acronyms follow Thiers 31 .
Fresh, silica-dried or herbarium leaf samples were extracted using DNeasy Plant Mini Kit (Qiagen Inc., Valencia, California, U.S.A.). A minimum of 400 ng of total genomic DNA was sent for library preparation and genome skimming sequencing using Illumina HiSeq (AITbiosciences, Singapore). A minimum of 1 Gbp of sequence with a read length of 100 bp were acquired per sample. Sequence quality filtering was done with Geneious 11.1.2 (Biomatters Ltd, New Zealand) trim and filter function, using error probability limit of 0.05, a minimum read length of 70 and removing adapters with a minimum blast alignment score of 16.  www.nature.com/scientificreports/ Sequencing, assembly and annotation. The plastome of Hoya lithophytica was assembled first, using a combination of GetOrganelle 32 , and assemblies to several reference genomes in Geneious 11.1.2 and ORG.asm 33 without a reference or using a variety of Apocynaceae plastomes as reference. The automated assemblies had a large number of artefacts, mostly due to frequent gene movement between the plastome and the mitogenome. The assemblies were checked by assembling sequencing reads to the initial assembled genomes in Geneious 11.1.2, followed by visual correction of alignment, and extension of gaps using iterative assemblies. The approximate length of the inverted repeat was estimated by observing the part of the genome with high sequencing coverage, and areas of exceptionally low coverage were identified as mitogenome sequences; the position of the inverted repeats was approximately corrected, the mitogenome reads were removed from the assemblies, and further iterative gap filling was carried out, resulting in a full circular plastome. Twenty one plastomes (19 in the Hoya group and two outgroup taxa) were assembled to Hoya lithophytica in Geneious 11.1.2, with a manual correction of alignment and gaps (with iterative extension of gaps when required), followed by further assemblies to the resulting genome to detect and correct errors. In a few cases, one difficult to sequence region (intergenic region psbA-trnH) was acquired through Sanger Sequencing (AITbiosciences, Singapore). Other gaps were not corrected if present.
For 20 further species (19 in the Hoya group and one outgroup taxon), the assembly of the entire plastome was not attempted, and only exons were assembled by aligning them to the reference.
Final circular plastomes were checked by re-mapping the filtered reads to the plastome using Geneious 11.1.2 (Biomatters Ltd, New Zealand) read mapper using low sensitivity, adjusted to not allowing gaps, and alignments were visually inspected for errors and gaps.
Sequences were annotated by transferring annotation from the published plastome of Hoya liangii (a synonym of Hoya diversifolia) (GenBank accession number NC_042245 25 ), followed by correction of position of CDSs. Sliding window analysis was conducted to generate the nucleotide diversity (Pi) of complete ingroup plastomes. The plastomes were aligned using MAFFT v.7.309 34 , using scoring matrix 200PAM / k = 2, gap open penalty of 1.53 and offset value of 0.123. The resulting alignment was analysed using DnaSP v. 6.12.03 35 to compare levels of nucleotide variation across the plastomes.
The mitogenome of Hoya lithophytica was constructed by filtering sequence reads that completely matched the plastome, and mapping the remaining reads to the mitogenome of Asclepias syriaca (KF541337). While parts of the mitogenome were identical to the plastome, enough reads with a single read error were present to cover all parts of the mitogenome for unambiguous assembly. Only small fragments initially matched the reference genome. The other parts of the mitogenome were assembled by iterative mapping and by identifying the boundaries of mitogenome/plastome overlap by mapping reads to the plastome. Attempts to construct further mitogenomes were abandoned once we identified massive restructuring of the mitogenome even within the ingroup.
Gene movement from plastome to the mitogenome was estimated by cutting the plastome of Hoya lithophytica into 30 bp fragments, and measuring the percentage of resulting fragments that mapped to the mitogenome, using the Geneious mapper in Geneious 11.1.2 (Biomatters Ltd, New Zealand).
Baysian support for the nodes was tested with MrBayes 3.2.5 40 . We used 30,000,000 Markov chain Monte Carlo iterations, keeping one tree every 100 generations, with a burn-in of 25% (mcmc ngen = 30,000,000 samplefreq = 100 burnin = 75,000). We used the exon-partitioned sequence alignments generated for the IQ-TREE 2.0.6 38 , and applied the same model, GTR + Gamma (lset nst = 6 rates = gamma) for all partitions, with rates unlinked between datasets (unlink statefreq = (all) revmat = (all) shape = (all) pinvar = (all), prset applyto = (all) ratepr = variable). We accepted the results if the likelihoods had converged and the minimum estimated sample www.nature.com/scientificreports/ size was over 100 for all parameters by the end of the run. To test the effect of the model used, we also ran the same analysis with all substitution rates set to equal and equal rate variation (lset nst = 1 rates = equal).
Data archiving statement. Raw, demultiplexed sequence reads are available at the Sequence Read Archive (https:// www. ncbi. nlm. nih. gov/ sra) and can be accessed with the BioProject IDs listed in Table 2. The complete or incomplete plastome sequence data of the 38 species sequenced as well as the complete mitochondrial genome of H. lithophytica obtained for this study have been deposited to the GenBank of NCBI (see Table 2 for accession numbers). The sequence alignment is available in Figshare at https:// doi. org/ 10. 6084/ m9. figsh are. 14189 021.