Reconstructing the complex evolutionary history of mobile plasmids in red algal genomes

The integration of foreign DNA into algal and plant plastid genomes is a rare event, with only a few known examples of horizontal gene transfer (HGT). Plasmids, which are well-studied drivers of HGT in prokaryotes, have been reported previously in red algae (Rhodophyta). However, the distribution of these mobile DNA elements and their sites of integration into the plastid (ptDNA), mitochondrial (mtDNA), and nuclear genomes of Rhodophyta remain unknown. Here we reconstructed the complex evolutionary history of plasmid-derived DNAs in red algae. Comparative analysis of 21 rhodophyte ptDNAs, including new genome data for 5 species, turned up 22 plasmid-derived open reading frames (ORFs) that showed syntenic and copy number variation among species, but were conserved within different individuals in three lineages. Several plasmid-derived homologs were found not only in ptDNA but also in mtDNA and in the nuclear genome of green plants, stramenopiles, and rhizarians. Phylogenetic and plasmid-derived ORF analyses showed that the majority of plasmid DNAs originated within red algae, whereas others were derived from cyanobacteria, other bacteria, and viruses. Our results elucidate the evolution of plasmid DNAs in red algae and suggest that they spread as parasitic genetic elements. This hypothesis is consistent with their sporadic distribution within Rhodophyta.

The ML tree inferred from the concatenated dataset of 193 plastid protein-coding genes (Supplementary Table S3; Table S4) resolved phylogenetic relationships among red algae (Fig. 1A, Supplementary Fig. S6). The early diverging Cyanidiophyceae was chosen as the outgroup for this phylogeny 62,63 . The Bangiophyceae and the Florideophyceae grouped together with maximum ML bootstrap support value (MLB, 100%), and each class formed a strongly supported monophyletic clade, as previously reported [62][63][64][65] . Within the Bangiophyceae, Porphyra pulchra grouped within Pyropia clade (100% MLB) rather than Porphyra clade, suggesting a taxonomic revision of Porphyra pulchra as Pyropia pulchra. Relationships within the Florideophyceae were consistent with previous work [64][65][66] . For example, two Corallinophycidae species, Sporolithon durum (Sporolithales) and Calliarthron tuberculosum (Corallinales) grouped together (100% MLB) and were sister to the rest of florideophycean clades. Within the subclass Rhodymeniophycidae, Chondrus crispus (Gigartinales) diverged first, followed by Gelidium (Gelidiales), Grateloupia taiwanensis (Halymeniales) and Gracilaria (Gracilariales). Although internal relationships within the Rhodymeniophycidae were not resolved with the concatenated plastid dataset, we used this ML tree (Fig. 1A) as a reference for inferring the evolution of red algal plasmid DNAs. Distribution of plasmid-derived genes in red algal ptDNA. We identified 22 plasmid-derived (PD) sequences in nine red algal species when 56 red algal plasmid-encoded proteins were used to query the available 21 red algal plastid genomes (using BLASTx, e-value ≤ 1.0e −05 ) (GI numbers of the 56 proteins are listed in Scientific RepoRts | 6:23744 | DOI: 10.1038/srep23744 Table 1). The putative origin, copy number, and distribution in the ptDNAs were different for each ORF (Fig. 1B;  Supplementary Table S5). In addition to the previously reported bacterial operon leuC and leuD gene 35,36 (two black blocks in Fig. 1B), out of the 22 PD orfs (including pseudogenized regions) identified here, six were homologous to orf4 and orf5 of the Porphyra pulchra plasmid Pp6859 (GI: 11466614; green region in Fig. 1B), five were homologous to the P. pulchra plasmid Pp6427 (GI: 11466608) orf3 (dark green region in Fig. 1B), and two were homologous to the P. pulchra plasmid Pp6859 orf6 (bright orange region in Fig. 1B). The rest of the PD orfs were unique to plasmids in their species of origin. Interestingly, six homologous PD sequences from Pp6859 orf4 and orf5 (green box in Fig. 1B) were found in four red algal plastid genomes but their copy number and position were not consistent with their phylogenetic relationships. For instance, two copies of the Pp6859 orf4-orf5 homolog were found in Pyropia haitanensis among eight Porphyra/Pyropia species, whereas a single copy was found in each Gelidium species, but at different locations. Sporolithon durum contained two homologous copies but one was pseudogenized. The sequences homologous to plasmid Pp6427 orf3 of P. pulchra 39 were found in the plastid genomes of three Gracilaria species and Grateloupia taiwanensis (dark-green in Fig. 1B) in addition to that of P. pulchra, and were located near ribosomal RNAs and ycf27 genes. We note that half of the PD orfs were positioned near rRNA (rps6-rRNA-ycf27-psbD, see Fig. 1B), in particular in Gelidium, Grateloupia, and Gracilaria.
We tested whether these PD orfs were conserved in populations within a species and in different individuals within a population. To this end, PCR was used to test three populations of G. elegans (SKKU18, SKKU22, SKKU28), two individuals of P. pulchra selected from a single population (UC1879714 and UC1454976), and three individuals of S. durum from a single population (SKKU_SD01, SKKU_SD02, and SKKU_SD03; Supplementary Table S6). All tested PD orfs were found in the same position with the same flanking region sequences. Therefore, these PD orfs are conserved across different individuals within one species.
Origin of the plasmid-derived Pp6859 orf4-orf5 homologs in ptDNA. The origin of plasmid-derived orfs was difficult to determine because most plastid-encoded PD orfs matched only plasmid orf data, except for the following five cases (see, Figs 2-4, S8, S9). A BLAST search against the NCBI database using six homologous plastid genes of the P. pulchra plasmid Pp6859 orf4-orf5 resulted in 26 hypothetical proteins encoded in a bacterial genome, cyanobacterial genomes, cyanobacterial plasmids, and the mitochondrial genome of a liverwort. All homologous sequences of Pp6859 orf4-orf5 were used to reconstruct the ML phylogeny using RAxML (Fig. 2). In the best tree, red algal plastid PD orfs grouped together, including plasmid Pp6859 (98% MLB). It is interesting that plasmid genes of Pp6859 (P. pulchra) grouped with pseudogenized plastid genes from P. haitanensis (100% MLB), suggesting a possible ORF gene transfer mediated by a plasmid to a plastid genome (see discussion in previous study 40 ).
The red algal clade was positioned within cyanobacterial clade Group I (92% MLB) that included hypothetical proteins encoded in the cyanobacterial genome as well as cyanobacterial plasmids (Fig. 2). Group II (72% MLB) contained cyanobacterial species and mitochondrial sequences from the liverwort Marchantia polymorpha (combined with two fragmented genes with flanking region data). Moon and Goff 39 reported the putative homologous relationship between Pp6859 and the liverwort mitochondrial region. Two cyanobacterial plasmid genes and a hypothetical gene from the Planctomycetes Zavarzinella formosa were grouped together (Group III, 100% MLB).
Because only 12 species (16 strains; Fig. 2) out of the 100 cyanobacterial genomes available in NCBI contain a homolog of Pp6859 orf4-orf5, it is unlikely to be a core cyanobacterial gene. If this orf was inherited from the primary endosymbiosis event, it should be retained in most red algal plastid genomes as well as those of other primary endosymbiotic lineages (i.e., green and glaucophyte algae). However, it is sporadically distributed in only a few species (e.g., Pyropia, Gelidium and Sporolithon) (Fig. 1). We postulate that this orf originated from an unknown cyanobacterial species, then spread independently to other cyanobacteria, to a bacterium (Z. formosa), to a liverwort (M. polymorpha), and to a few red algae.
The cyanobacterium Crocosphaera watsonii WH8501 contains three copies of this orf as a result of gene duplications 72,73 . However, it is likely that these red algal PD orfs originated independently, as a result of plasmid mobility. Alternatively, a red algal species inherited this orf from a cyanobacterial genome through the plasmid, after which it was transferred into the plastid genome in random genomic positions (e.g., see Fig. 1B), followed by pseudogenization or complete loss. This plasmid-mediated HGT may have occurred after speciation. For example, two Gelidium species both retain PD orfs, but they differ in size and genomic position. Similar cases were found in three Gracilaria species. If indeed the PD orfs were introduced during speciation, the presence and position of PD orfs could be used as species-specific markers.
Because the evolutionary trajectories of plasmid and plastid copies are very different (the former presumably functional and therefore subject to purifying selection, but the latter pseudogenized and under relaxed selective constraint), it is difficult to infer evolutionary relationships, since both rates and types of mutation (synonymous versus nonsynonymous) may be very different depending on the genetic background. We think it is likely that the plasmid orfs are ancestral because they contain complete orfs (405-485 aa), whereas plastids contain pseudogenized genes (up to 190 aa). On the other hand, plastid sequences occur in all the Gracilaria clades; the difference may be due to relaxed purifying selection on the shorter, non-functional (pseudogenized) plastid copies. Pp6427 orf3 homologs were found in the closely related genera Grateloupia, Gracilaria, and Gracilariopsis (multigene phylogeny using mitochondrial genes 43 ), suggesting that an ancestral Pp6427 orf3 of P. pulchra was transferred into the ancestral plastid of these genera and the mitochondrial genome of Palmaria palmata. Some plasmid orfs were duplicated (e.g., Gch7220 orfs and Gch3937 orf1) and fragmented (orf1, orf6, and orf7) within a plasmid (Gch7220). Although the origin of the plasmid-derived sequences is unknown, they may have spread into red algal organelle genomes and subsequently undergone relaxed selective constraint.
Two other plasmid orfs of Pp6427, orf2 and orf4 showed exclusive homology to cyanobacteria and green plants species, respectively. Pp6427orf2 was homologous to a putative transcriptional regulator protein (GI: 495464247) from the cyanobacterium Moorea producens (e-value: 4e −08 ) and to other cyanobacterial genes. The red algal plasmid orf2 was likely transferred from cyanobacteria ( Fig. 4; MLB 90% in basal clade). The combined region from Pp6427 orf4 (Supplementary Table S7 . Therefore, orfs encoded in the plasmid Pp6427 originated from various sources, and some orfs were subsequently transferred to the red algal plastid and mitochondrial genomes. Both plasmid Pp6859 orf4-orf5 and Pp6427 orf2 were homologous to cyanobacterial orfs, including those from several common species, Calothrix sp. 336/3, Moorea producens 3L, and Rivularia sp. PCC7116. Thus, these two plasmids may have served as reservoirs for orfs from different sources that eventually were delivered to organelles. Bacterial and viral origins of red algal plasmid ORFs. Bacterial or viral sequences were detected by a BLASTp search of the NCBI (nr) database using 22 PD red algal plastid orf queries (Table S5). The homologous sequence of Gracilariopsis lemaneiformis plasmid GL3.5 orf2 in the Grateloupia taiwanensis plastid genome showed a close phylogenetic relationship with bacterial and viral sequences ( Supplementary Fig. S8). This red algal clade was positioned within the bacterial clades (100% MLB), suggesting the bacterial origin of the GL3.5 orf2 homologs. It was, however, unclear whether this plasmid-related sequence was transferred from bacteria directly or by a virus-mediated process, because the clade showed a sister relationship to the viral clade but with weak statistical support (48% MLB).
Virus-derived plasmid genes (i.e., GL3.5 orf2, three replicase genes in P. pulchra plasmids, and two replicase genes in Py. tenera plasmids) were detected in both eukaryotic nuclear and organellar genomes. These were different from non-viral-derived red algal plasmid homolog sequences that were found only in organelle genomes (Table 1). It is likely that virus-derived plasmid genes could be transferred to the eukaryotic nuclear genome more easily than could non-viral plasmid genes.

Figure 2. Maximum likelihood (ML) tree based on aligned amino acid sequences of homologous regions
of Porphyra pulchra plasmid Pp6859 orf4 and orf5 with 2,000 ML bootstrap replications. Species names are followed by GI, amino acid (aa) length, and location. Colored names indicate cyanobacteria (cyan), bacteria (black), liverwort (bright green) and red algae (red). Locations of the sequences are genome (black), plastid (green), mitochondria (orange) and plasmid (yellowish brown). Some orfs and pseudogenized or non-coding regions were combined and aligned with sampled taxon sequences (Supplementary Table S3; Table S4; Table S7). The clades of the ML tree are divided into three groups based on species composition. Group I includes cyanobacterial plasmids and genomes with red algal plasmid and plastid regions. Group II includes cyanobacterial genomes and mitochondrial regions of liverwort, Marchantia polymorpha). Group III includes cyanobacterial plasmids and a bacterial (Zavarzinella formosa) genome. Remnant DNA replication domain in plasmid-derived plastid genes. Plasmids are composed of three essential domains for replication, segregation and conjugation with additional accessory genes 76,77 . From the alignment of the Pp6859 orf4-orf5 homologs with the size range of 104 ~1,242 amino acid sequences, the functional domain was detected by a conserved domain database search 79 . One distinct domain is the DNA polymerase type-B family catalytic domain (POLBc) superfamily. Nine amino acid sequences were identical in this domain (aligned 142 aa), including highly conserved active sites (R-K-ND motif) and metal binding sites (DG motif) (see Fig. 5). The DNA polymerase type-B family consists of an editing active site and excision region for DNA replication (562 ~3,425 aa in size) that has been reported in a wide range of organisms, including Archaea, Bacteria, eukaryotes, bacteriophages and viruses [80][81][82][83][84][85][86][87][88][89][90][91] . Although the POLBc motif was generally conserved in nine major subfamilies 85 , we found differences in the catalytic domain of the Pp6859 orf4-orf5 homologs. These unique domains were represented in the ML tree that was reconstructed using homolog regions of the domain (aligned 222 aa) from the public POLBc superfamily database (Supplementary Table S8; Fig. 6). The ML tree showed that all POLBc domains in the Pp6859 orf4-orf5 homolog were grouped into a clade (100% MLB), but the clade did not belong to any other known POLBc subfamilies. This novel POLBc domain might contribute to the insertion of plasmid orfs into the red algal plastid genome.

Conclusions
Plasmids have long been recognized as mobile elements but their origins in red algae remained unclear. Using a comprehensive database of 21 plastid genomes that included five novel red algal ptDNAs, we found evidence for the spread of plasmid DNA into plastid and mitochondrial genomes. There is currently insufficient nuclear genome data from species that contain plasmid-derived DNA to determine whether this compartment is also a major target for integration (Fig. 7). The distribution of plasmid-derived orfs showed a species-specific pattern, consistent with the evolution of a mobile genetic element. Because organelles are inherited maternally, foreign genetic DNA can be rapidly fixed in a population. Consistent with this idea, individual members of three lineages (i.e., Porphyra pulchra, Sporolithon durum, and Gelidium elegans) all showed plasmid DNA retention, although these orfs were absent or located in different genomic positions in closely related sister species (e.g., eight Porphyra/Pyropia species, Sporolithon-Calliarthron, two Gelidium species, see Fig. 1). It is known that the distribution of transposable elements can show variation within a single cyanobacterial species 72,73,92 . Therefore, plasmids may be regarded as analogous to transposable elements 76,77,[93][94][95][96] , with mobility and loss contributing to variation in gain/loss among closely related genomes. For instance, Halary et al. 97 demonstrated that plasmids are key vectors of genetic exchange between bacterial chromosomes on the basis of network analysis using sequences including phage, plasmid and environmental viral genomes.
It should be noted that we were originally interested in testing the idea whether plasmids may have facilitated EGT in algae and thereby played a key role in their genome evolution. Analysis of the available data, however, suggests that plasmids are better thought of as parasitic elements (e.g., group II introns in red algal ptDNA 98 ) that spread plasmid-derived DNA regions. As "mobile gene cassettes" [75][76][77][78] it nonetheless remains possible that these selfish elements can mediate gene transfer between foreign DNA and organelles. As the databases of available  Table S7). Location of sequences is indicated by color: plasmid (underlined black), plastid (green) and mitochondria (orange). Synteny is shown with the schematic alignment on the right of the tree based on major regions of homology.
Scientific RepoRts | 6:23744 | DOI: 10.1038/srep23744 organelle and nuclear genome data increase, plasmid involvement in recent instances of EGT may become apparent.
In summary, one of the major challenges in the field of microbial eukaryote genome evolution is to understand how genes move across the tree of life. Species such as Galdieria sulphuraria encode at least 5% foreign genes, many of which are clearly of adaptive value 69 . The halotolerant green alga Picochlorum SE3 has acquired at least 24 genes of bacterial provenance, putatively to deal with abiotic stress 99 . Plasmids, viruses, symbionts, and pathogens likely all play a role in the HGT process in protists. Therefore, the search for "smoking guns" of recent transfer will continue to fascinate biologists who seek to show that highways of gene sharing 100 , common in prokaryotes, are drivers of evolution in eukaryotic microbes. The raw NGS reads were assembled using the CLC Genomics Workbench 5.5.1 (CLC bio, Aarhus, Denmark) and the MIRA assembler that was incorporated in the Ion Server. Contigs of plastid genes were sorted by customized Python scripts with local BLAST searches. Sorted contigs were re-assembled to construct consensus plastid genomes. A draft plastid genome was confirmed by the read-mapping method using CLC Genomics Workbench 5.5.1. Gaps were filled by PCR to generate intact genomes.

Methods
Gene annotation and plasmid-derived ORFs search. Putative ORFs in the five novel genomes were predicted using ORF Finder in Geneious 6.1.6 101 and annotated based on BLASTx searches (e-value ≤ 1.0e −05 ) with codon table 11 (Bacterial, Archaeal and Plant Plastid Code). Ribosomal RNAs and transfer RNAs were predicted using the RNAmmer 1.2 Server 102 and ARAGORN programs 103 . Group II intron and RNase P were searched using the program RNAweasel (http://megasun.bch.umontreal.ca/cgi-bin/RNAweasel/ RNAweaselInterface.pl). Plasmid-derived sequences were searched by BLASTx (e-value ≤ 1.0e −05 ) using 56 proteins encoded in 14 red algal plasmids (Supplementary Table S9) derived from all available red algal ptD-NAs. We also searched for plasmid-derived sequences in nuclear genome data. Here 56 plasmid-encoded genes were searched in the complete nuclear genomes of Cyanidioschyzon merolae 67 Table S10).  Phylogenetic analysis of red algal plasmid-derived genes in plastid genome. Plastid-coding genes from 21 taxa (16 reference genomes and our five new genomes) were extracted and sorted by customized Python scripts with local BLAST searches. To identify the independent loss of plastid genes, each gene set was manually analyzed. A selection of 193 plastid-coding genes (e.g., homologous genes present in at least 16 different taxa) and plasmid-derived sequences were aligned using MAFFT 7.110 104 . All aligned plastid genes were concatenated for multigene phylogenetic analysis. Based on the alignment, fragmented plasmid-derived orfs were combined (Supplementary Table S7). To reconstruct the phylogenetic tree, an evolutionary model was selected using Modeltest implemented in MEGA 6.0 105 Table S8) were used to find the inter-subfamily relationship based on the RAxML phylogeny. Plasmid-mediated HGT in plastid genomes are divided into two types: with plasmid and without plasmid (Figs 2, 3 and S8). Organisms with and without plasmid DNA are listed below as red algae (red taxa), green lineage (green taxa), stramenopiles (brown taxa), and rhizarians (violet taxa). Plastid genomes of Porphyra pulchra and Gracilaria chilensis include plasmid-derived homologs in both their plastid and plasmid genomes. The other red algae include plasmid-derived homologs only in the plastid genome. Mitochondrial HGT is found in red algae, the green lineage and stramenopiles (Figs 2, 3 and S9). Plasmid-mediated transfer to the nuclear genome is found only in Nicotiana tomentosiformis (plants) and Reticulomyxa filosa (rhizarian), with both regions related to viruses (Supplementary Fig. S9).