The evolutionary fate of rpl32 and rps16 losses in the Euphorbia schimperi (Euphorbiaceae) plastome

Gene transfers from mitochondria and plastids to the nucleus are an important process in the evolution of the eukaryotic cell. Plastid (pt) gene losses have been documented in multiple angiosperm lineages and are often associated with functional transfers to the nucleus or substitutions by duplicated nuclear genes targeted to both the plastid and mitochondrion. The plastid genome sequence of Euphorbia schimperi was assembled and three major genomic changes were detected, the complete loss of rpl32 and pseudogenization of rps16 and infA. The nuclear transcriptome of E. schimperi was sequenced to investigate the transfer/substitution of the rpl32 and rps16 genes to the nucleus. Transfer of plastid-encoded rpl32 to the nucleus was identified previously in three families of Malpighiales, Rhizophoraceae, Salicaceae and Passifloraceae. An E. schimperi transcript of pt SOD-1-RPL32 confirmed that the transfer in Euphorbiaceae is similar to other Malpighiales indicating that it occurred early in the divergence of the order. Ribosomal protein S16 (rps16) is encoded in the plastome in most angiosperms but not in Salicaceae and Passifloraceae. Substitution of the E. schimperi pt rps16 was likely due to a duplication of nuclear-encoded mitochondrial-targeted rps16 resulting in copies dually targeted to the mitochondrion and plastid. Sequences of RPS16-1 and RPS16-2 in the three families of Malpighiales (Salicaceae, Passifloraceae and Euphorbiaceae) have high sequence identity suggesting that the substitution event dates to the early divergence within Malpighiales.


Results
General features of Euphorbia schimperi plastome. The Euphorbia schimperi plastome had a length of 159,462 base pairs (bp) with a pair of inverted repeats (IR) of 26,629 bp, which separate the large single copy (LSC, 88,904 bp) and small single copy (SSC, 17,300 bp) regions (Fig. S1, accession number MT900567). Mapping raw reads to the plastome indicated that the average coverage was 1157×. The genome included a total of 128 genes (17 in IR) including 4 rRNAs (all in IR), 30 tRNAs (7 in IR) and 77 protein-coding genes (6 in IR). The plastome of Euphorbia schimperi had three putative gene losses, translation initiation factor 1 (infA), ribosomal protein L32 (rpl32) and ribosomal protein S16 (rps16).
Alignment of the pseudogene of rps16 of E. schimperi with intact rps16 of Manihot esculenta revealed a 5-bp deletion, 10 bp insertion and 27 nucleotide substitutions within exon 2 causing a frameshift (Fig. S2A). A 250 bp deletion, 11 bp insertion and 338 nucleotide substitutions in the intron of E. schimperi caused nearly complete loss of the intron and entire loss of exon 1 (Fig. S2B). In rosid plastomes, infA is usually located between rpl36 and rps8 with length of about 234 bp. The alignment of the infA pseudogene of E. schimperi with intact infA of Brexia madagascariensis revealed a 3-bp deletion, 10 bp insertion and 69 nucleotide substitutions causing a frameshift (Fig. S3). Plastid rpl32, which is usually located between ndhF and trnL-UAG and ranges between 150 and 171 bp, was completely missing from the plastome of E. schimperi.
Assembly of Euphorbia schimperi transcriptome and quality assessment. The sequenced Illumina libraries yielded 80,916,952 reads. The total reads used, number of assembled contigs and N50 statistics are in Table 1. Mapped read coverage to the assembly using Bowtie2 65 was 90.26% (73,033,718 reads). BUSCO indicated that the transcriptome assembly covered 87% and 72.3% of conserved single-orthologs of 100 species of eukaryotes (BUSCOs: 303) and 30 species of embryophytes (BUSCOs: 1440), respectively. Amino acid sequences of the candidate ORFs from the E. schimperi transcriptome were used to identify pt rpl32 and rps16 genes in the nucleus. Statistics of Trinity translated transcriptome assembly are provided in Supplementary Table S1 2). Two transcripts of nuclear-encoded RPS16 were detected in the E. schimperi transcriptome. Pairwise aa sequence identity of the two E. schimperi transcripts (RPS16-1 and RPS16-2) with other Euphorbiaceae, Arabidopsis thaliana and Nicotiana tabacum was 77.2% and 77.5% (Fig. 3A,B). The lengths of nuclear-encoded RPS16-1 and RPS16-2 in E. schimperi were 134 aa and 111 aa, respectively, and both were longer than the plastid-encoded RPS16 in M. esculenta (88 aa), R. communis (50 aa), H. brasiliensis (88 aa), Arabidopsis thaliana (79 aa) and Nicotiana tabacum (85 aa) (Fig. 3A,B). Alignment of the upstream sequences of RPS16-1 and RPS16-2 of E. schimperi to the transit peptides of RPS16-1 and RPS16-2 of Passiflora pittieri, P. tenuiloba and Populus alba resulted in aa identities of 98% and 81.4%, respectively (Fig. 4A,B). A BLAST search (BLASTp) of RPS16-1 against NCBI resulted in sequence identity matches with chloroplastic/mitochondrial 30S ribosomal protein S16-1 for multiple species of Malpighiales including Euphorbiaceae [Manihot esculenta (80%), Hevea brasiliensis (79%), Jatropha curcas (63%)], Passifloraceae [Passiflora tenuiloba (78.12%), P. oerstedii (78%), P. pittieri (77.44%)] and other angiosperm lineages, whereas RPS16-2 matched with chloroplastic/mitochondrial 30S ribosomal protein S16-2  Pairwise aa sequence identity of nuclear-encoded RPL32 from E. schimperi, Populus alba, Passiflora (P. biflora, P. contracta, P. oerstedii and P. pittieri) and nuclear-encoded SOD-1 of Populus alba was 59.4% and 88.9% for transit peptide and the ribosomal protein L32 conserved domain, respectively. Gaps are indicated by dashes.  Table S2). The results indicated that five genera of Euphorbiaceae have representative species that lost rps16, Deutzianthus, Euphorbia, Jatropha, Mallotus and Vernicia, and one genus (Euphorbia) also lost rpl32 (Fig. 5). Since all Euphorbia plastomes do not have intact rpl32 or rps16 these losses likely occurred during the early divergence of the genus. However, the presence of these genes in the nucleus by either a transfer or substitution event has only been documented in E. schimperi. Some members of other Malpighiales families have experienced the loss of rpl32 and/or rps16 from their plastomes, including Passifloraceae, Salicaceae, Violaceae, Erythoxylaceae and Rhizophoraceae, whereas Chrysobalanaceae, Irvingiaceae, Malpighiaceae and Linaceae are missing only rps16 and Clusiaceae is missing only rpl32. The fate of these gene losses has only been determined in Passiflora and Populus with rpl32 transferred to the nucleus and rps16 substituted in selected species in both genera ( Fig. 5) 11,14,22 . Phylogenetic analysis of the second data set included sequences of the rpl32 gene for 71 species, 65 encoded in the plastid and six in the nucleus. In the resulting phylogram ( Fig. 6) the nuclear copy of E. schimperi grouped with nuclear copies from the three families Salicaceae (Populus alba), Passifloraceae (Passiflora tenuiloba) and Rhizophoraceae (Bruguiera gymnorhiza). The four species with genes encoded in the nucleus were in a clade   www.nature.com/scientificreports/ of plastid copies from three species in different families of Malpighiales (Euphorbiaceae, Irvingiaceae, Phyllanthaceae) and Cucurbita in the Cucurbitales. All other plastid encoded rpl32 sequences of Malpighiales occurred in a separate clade that was sister to the clade the included the nuclear-encoded copies. Branch lengths of the nuclear-encoded rpl32 genes from E. schimperi, Populus, Passiflora and Bruguiera were much longer than in the plastid-encoded copies. Phylogenetic analysis of the third data set included sequences of the rps16 gene for 63 species, 55 encoded in the plastid and 8 copies of 4 nuclear genes. In the resulting phylogram ( Fig. 7) the nuclear copies of E. schimperi grouped with nuclear copies from the two families Salicaceae (Populus alba), Passifloraceae (P. tenuiloba and P. pittieri) in a distant position from plastid encoded copies of Euphorbiaceae. The nuclear-encoded copies resolved as a clade at the base of the eudicots, far removed from plastid-encoded copies of Malpighiales. The long branches of the nuclear-encoded copies of rpl32 and rps16 are indicative of the faster substitution rates observed in the nuclear than the plastid genomes 66 . There is some uncertainty in the resolution of the clades due to poor support (bootstrap percentages mostly < 60) among deeper nodes and the artifacts due to long branch attraction.

Discussion
Pseudogenization or loss of plastid genes is often accompanied by the transfer of the gene to the nuclear genome or substitution by a nuclear gene that is already targeted to the plastid 27,67 . Plastid gene loss and transfer to the nucleus or substitution by a dual targeted nuclear gene targeted to organelles has been recorded for several genes across multiple angiosperm lineages [4][5][6][7][8][9][10][11][12][13][14][15][16][17] . The focus of this study was to use transcriptome data to bioinformatically identify the fate of two plastid genes, rpl32 and rps16, in Euphorbia schimperi that have either been lost or pseudogenized. Plastome sequences of Malpighiales have documented the loss of rpl32 and rps16 but the fate of these two genes in most families has not been examined. In this study, the phylogenetic distribution of the loss/ transfer/substitution of these genes across the Malpighiales was examined.
The transfer of plastid-encoded rpl32 to the nucleus has been identified previously in three families of Malpighiales, namely Rhizophoraceae, Salicaceae and Passifloraceae 10,11,14 . Cusack and Wolfe 10 identified the duplication of a nuclear chimeric gene (pt SOD-1-RPL32 fusion protein and pt SOD-1 protein) that is associated with the transfer of rpl32 in Populus. Ueda et al. 11 experimentally confirmed the functional transfer of the rpl32 gene from the plastid to the nucleus and showed that the pt SOD-1-RPL32 fusion protein is targeted to the plastid of Populus by using green fluorescent protein (GFP). Shrestha et al. 14 identified high sequence identity of the pt SOD-1-RPL32 fusion protein to pt SOD-1-RPL32 transcript of Populus by mapping to the transcriptome of Passiflora. In the current study, Euphorbia schimperi and other Euphorbia species available at NCBI have experienced loss of the rpl32 gene but no previous studies have been performed to determine the fate of this gene. www.nature.com/scientificreports/ An E. schimperi transcript that represents pt SOD-1-RPL32 has been identified confirming that the transfer in Euphorbiaceae is similar to three other families (Rhizophoraceae, Salicaceae and Passifloraceae) of Malpighiales. Since these four families share a high sequence similarity of the transit peptide derived from pt sod-1 and the loss from the plastome is widespread in order, the timing of the transfer event may date to the early divergence of this clade (Fig. 5). The other families of Malpighiales have not been examined but they may also have nuclear encoded copies of rpl32. If this is the case, the plastid-encoded copies in some Malpighiales may not have been pseudogenized or lost yet in these families. Additional sampling of transcriptomes of other families of Malpighiales is needed to more accurately determine the timing of the rpl32 transfer to the nucleus. Ranunculaceae (Thalictrum coreanum and Aquilegia caerulea) experienced an independent transfer of plastid-encoded rpl32 to the nucleus because its transit peptide sequence is substantially different from the distantly related families of Malpighiales (Fig. 6) 12 .
The substitution of the plastid-encoded rps16 by a dual targeted nuclear-encoded mitochondrial gene was identified previously in two families of Malpighiales, Salicaceae and Passifloraceae 14,22 . In Populus alba (Salicaceae), Ueda et al. 22 experimentally localized the two nuclear-encoded rps16 genes that are dually targeted to the plastid and mitochondrion using GFP. A similar substitution occurs in the monocot Oryza sativa and eudicot Arabidopsis thaliana 22 . In addition, bioinformatic comparisons in Passifloraceae identified RPS16-1 and RPS16-2 in the transcriptome, with one targeted to the plastid and the other to the mitochondrion. Euphorbia schimperi plastome sequences and other Euphorbia species available at NCBI have experienced pseudogenization of rps16 gene and no previous studies have elucidated the fate of plastid-encoded rps16 loss in the genus. In Euphorbiaceae three species of subfamily Crotonoideae, Jatropha carcus 23 , Deutzianthus tonkinensis 25 and Vernicia fordii 24 , are missing plastid-encoded rps16. In most cases the loss of rps16 is associated with a deletion in the intergenic spacer between trnK-UUU and trnQ-UUG or an inversion in the same region 23 . The loss of rps16 in the gymnosperm Keteleeria davidiana, the monocot Dioscorea elephantipes and eudicots (Aethionema cordifolium, Aethionema grandiflorum, Arabis hirsuta, Draba nemorosa, Lobularia maritima, Populus alba, Populus trichocarpa, Cuscuta gronovii, Cuscuta exaltata and Epifagus virginana) is also the result of deletion in the same region 23 . In Jatropha curcas the loss of rps16 gene is associated with a 1.3 kb deletion in the trnK-trnQ intergenic region 23 . Likewise, Euphorbia schimperi has a deletion of 0.5 kb in the same intergenic region (Fig. S2A,B).
The situation in Euphorbia schimperi is similar to Salicaceae and Passifloraceae with two copies, RPS16-1 and RPS16-2, in the nucleus (Fig. 7). Since the three families (Salicaceae, Passifloraceae and Euphorbiaceae) of www.nature.com/scientificreports/ Malpighiales share a high sequence identity of the RPS16-1 and RPS16-2, the gene substitution likely occurred early in the divergence of Malpighiales (Fig. 5), although comparisons of plastomes and transcriptomes of other families in the order are needed to confirm the timing. Some taxa in these families still retain an intact copy of rps16 in the plastome but it is not known if these are functional or if they simply have not been lost or pseudogenized yet.

Conclusion
The sequence of Euphorbia schimperi expands the understanding of the evolution of plastomes within Malpighiales. Gene order of E. schimperi is highly conserved with the typical structure of the angiosperm plastomes. The only unusual feature of the E. schimperi are gene-content changes with the loss of rpl32, rps16 and infA. Screening the nuclear transcriptome of E. schimperi shows that two of these genes have been either transferred to the nucleus or substituted by a duplicated nuclear-encoded mitochondrially-targeted gene. The fate of infA was not determined because a nuclear-encoded copy was not found in the transcriptome. Comparisons of the nuclear copies of rpl32 and rps16 genes of E. schimperi to members of other families of Malpighiales (Salicaceae and Passifloraceae) suggest that the transfer or substitution events in Euphorbiaceae may have occurred early in the divergence of this order.  70 . De novo assembly with default settings was run to generate long contigs that represented the entire plastome. Putative plastid contigs were identified using the Euphorbia esula plastome as a reference in Geneious. A gap between ycf1 and ndhF was filled by overlapping contigs. The second gap between atpH and atpF and ambiguous nucleotides were filled by mapping contigs against the raw reads using Bowtie 2 version 2.3.4 65 .

Materials and methods
Annotation of the plastome was conducted using multiple software platforms. Geneious was used to check for start and stop codons for every gene compared to Nicotiana tabacum (NC_001879.2) and species of Euphorbiaceae available publicly, including Jatropha curcas (NC_012224.1), Euphorbia esula (NC_033910.1), Manihot esculenta (NC_010433.1), Ricinus communis (NC_016736.1) and Hevea brasiliensis (NC_015308.1). Dual Organellar Genome Annotator (DOGMA) was utilized to identify coding sequences with default settings 71 . The tRNAscan-SE online search server was used to confirm tRNA genes 72,73 . Based on the loss of portions of the sequence or presence of internal stop codons, pseudogenes were identified. A genome map was drawn using OGDRAW 74 .
Transcriptome de novo assembly. Standard RNA-Seq with ribosomal RNA removal library preparation and sequencing via Illumina HiSeq 4000 were carried out at GSAF. The quality of raw FastQC reads was examined using the FastQC tool v.0.11.5 (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/) 75 . Raw RNA-seq data was not subjected to quality trimming. A de novo assembly of RNA-seq reads into transcripts was performed using Trinity 76 with 25 k-mer size 77 . Trinity sequentially integrates Inchworm, Chrysalis and Butterfly modules to process a large number of RNA-Seq reads. This has been used to partition the sequence data into different individual de Bruijn graphs, which represent the transcriptional complexity at a given gene or locus 77  Putative transit peptides and a mitochondrial targeting peptide of nuclear transferred genes were identified using TargetP-1.1 80 (http:// www. cbs. dtu. dk/ servi ces/ Targe tP-1. 1/ index. php) and LOCALIZER 81 (http:// local izer. csiro. au/). To detect the source of the transit peptide for the nuclear-encoded RPL32 and RPS16, BLAST searches (BLASTp) were conducted against the NCBI database.