The horizontal gene transfer of Agrobacterium T-DNAs into the series Batatas (Genus Ipomoea) genome is not confined to hexaploid sweetpotato

The discovery of the insertion of IbT-DNA1 and IbT-DNA2 into the cultivated (hexaploid) sweetpotato [Ipomoea batatas (L.) Lam.] genome constitutes a clear example of an ancient event of Horizontal Gene Transfer (HGT). However, it remains unknown whether the acquisition of both IbT-DNAs by the cultivated sweetpotato occurred before or after its speciation. Therefore, this study aims to evaluate the presence of IbT-DNAs in the genomes of sweetpotato’s wild relatives belonging to the taxonomic group series Batatas. Both IbT-DNA1 and IbT-DNA2 were found in tetraploid I. batatas (L.) Lam. and had highly similar sequences and at the same locus to those found in the cultivated sweetpotato. Moreover, IbT-DNA1 was also found in I. cordatotriloba and I. tenuissima while IbT-DNA2 was detected in I. trifida. This demonstrates that genome integrated IbT-DNAs are not restricted to the cultivated sweetpotato but are also present in tetraploid I. batatas and other related species.

Several hypotheses have been put forward to explain the sweetpotato's botanical origin. Nishiyama 12 proposed, based on cytogenetical studies, that Ib6x could have originated from the diploid species I. leucantha, from which the tetraploid I. littoralis was derived through polyploidization. The hybridization between these two species could have produced I. trifida, which is suggested to have different ploidies. Further cross-pollinations between these wild species, followed by selection and domestication of interesting genotypes, could have produced the Ib6x. Based on morphological and cytogenetical data, two additional hypotheses were subsequently suggested. Shiotani 13 suggested that I. trifida forms an autopolyploid complex, and that the cultivated Ib6x is derived from this group. Austin 8 suggested that the cultivated sweetpotato was derived from a hybridization event between I. trifida and I. triloba. Other studies carried out using molecular markers (RFLP, RAPD and SSR) [14][15][16] , beta-amylase gene sequences 17 and cytogenetic analysis 18 supported a contribution of I. trifida to the cultivated sweetpotato genome.
Advances in DNA sequencing technologies have allowed the assembly of complex polyploid genomes, including that of the cultivated sweetpotato. Yang et al. 19 identified six haplotypes based on the assembly of a monoploid genome (15 pseudo chromosomes). The phylogenetic analysis of these haplotypes permitted the authors to trace back the hexaploidization process of Ib6x giving rise to a new hypothesis on its origin. These authors 19 suggested that the cultivated sweetpotato could have arisen from a cross between a tetraploid and a diploid progenitor. The most likely diploid progenitor is I. trifida, while the tetraploid progenitor is currently unknown. It is not unreasonable to suspect that Ib4x, described by Bohac et al. 20 ; Jarret et al. 16 ; Roullier et al. 21 , which are known to share haplotypes with Ib6x 22 , might be the tetraploid progenitor.
A more recent, but related, hypothesis about the origin of the cultivated sweetpotato has been proposed by Muñoz-Rodríguez et al. 23 . These authors, based on the phylogenetic analyses of nuclear and chloroplast DNA regions, have proposed that Ib6x has a monophyletic origin (by autopolyploidization) and suggested that I. trifida is its most probable progenitor. This hypothesis also indicated a second role for I. trifida in the origin of the sweetpotato. Once Ib6x arose from I. trifida, it expanded its distribution range further than I. trifida's natural distribution. Over time, both species became reciprocally monophyletic and then hybridized, giving rise to two cultivated sweetpotato lineages.
These previous investigations suggest that a further study of Ib4x and their wild relatives in series Batatas is required since they are key in efforts to elucidate the botanical origin of the cultivated sweetpotato.
The discovery of Agrobacterium IbT-DNA1 and IbT-DNA2, inserted into the Ib6x genome constitutes a noteworthy example of an ancient HGT event in a domesticated crop 24 . IbT-DNA1 contains genes for auxin biosynthesis (T R -T-DNA like), while IbT-DNA2 contains RolB/C genes (T L -T-DNA like). The acquisition of these genes by the cultivated sweetpotato and other Ipomoea species opens the possibility that these sequences have played a role in the evolution of this crop and its related species 25 . However, whether the acquisition of one or both IbT-DNAs by the Ib6x genome occurred before or after its speciation remains unknown. To address this issue, it is necessary to evaluate the presence/absence of IbT-DNA1 and IbT-DNA2 insertions in members of the sweetpotato group and/or other members of the series Batatas. The resulting knowledge might be expected to shed light on the botanical origin of the cultivated sweetpotato and also provide critical clues related to the time of the ancestral Agrobacterium infection(s). Hence, the current study proposes to evaluate (i) the presence of IbT-DNA1 and IbT-DNA2 in the sweetpotato group and other Ipomoea (series Batatas) species and (ii) the use of IbT-DNA1 and IbT-DNA2 genes as markers to reconstruct the evolutionary history of the sweetpotato.

Distribution of IbT-DNA1 and IbT-DNA2 in Ipomoea spp. series Batatas. The presence of
Agrobacterium T-DNAs (IbT-DNA1 and IbT-DNA2) in the genome of Ib6x was demonstrated by Kyndt et al. 24 . Likewise, a limited number of wild relatives, including Ib4x and member species of the series Batatas, were evaluated in that work. Nine Ib4x and four representatives of the species I. triloba, I. tabascana and I. trifida were tested for the presence of IbT-DNA genes [Acs, C-prot, iaaH, iaaM and ORF13 (Open Reading Frame 13)] by PCR, using sequence-specific primers. None of IbT-DNA genes were detected in these samples except for the ORF13 gene (on IbT-DNA2) in I. trifida.
The current analysis was extended to include a total of 14 species representative of Ipomoea series Batatas, 2 species corresponding to other Ipomoea members (not in series Batatas) and 5 from related genera (Supplementary Data; Tables 1-4) using newly designed degenerate primers. IbT-DNA1 genes were detected in Ib4x (3 out of 15) and 3 other species in the series Batatas, including; I. cordatotriloba (1 out of 5), I. tenuissima (1 out of 1) and one ambiguous Ipomoea sp. (2 out of 2). The IbT-DNA2 gene was detected in 8 out of 15 Ib4x and 9 out of 28 I. trifida (Fig. 1). No other Ipomoea species outside of the series Batatas (0 out of 2) and no species from related genera (0 out of 5) examined in this study tested positive for the presence of IbT-DNA genes by PCR using the degenerate primers.
The presence of IbT-DNA1 was analyzed and confirmed by DNA blot analysis in two PCR positive Ib4x accessions (PI 518474 and CIP 403270) and the three PCR positive wild relatives (Ipomoea sp. and I. cordatotriloba). Ipomoea batatas (L.) Lam. var. apiculata (PI 518474) (Fig. 2A3) showed four bands -like Ib6x (Fig. 2B1); while CIP 403270 (Ib4x) showed only one (Fig. 2A2). Ipomoea sp. CIP 460250 (2x) displayed at least 1 band (Fig. 2B2), whereas Ipomoea cordatotriloba PI 518494 (2x) (Fig. 2C2) and Ipomoea sp. CIP 460814 (2x) (Fig. 2C1), appear to have at least four bands. The presence of IbT-DNA2 was only tested and confirmed in Ib4x PI 518474 (1 band - Fig. 2D). www.nature.com/scientificreports www.nature.com/scientificreports/ collected as I. grandifolia, was morphologically similar to I. cordatotriloba. Conversely, CIP 460814 and CIP 460815 were collected as I. cordatotriloba, but had the characteristics of I. grandifolia (I. grandifolia and I. cordatotriloba are very similar, differing only in the size of the corolla, and some authors consider them varieties of the same species). CIP 460002 was collected as I. leucantha, which is a hybrid species between I. trichocarpa and I. lacunosa and which has highly variable characteristics. CIP 460811 was collected as I. cordatotriloba, however its flower color is white rather than violet as is typical for I. cordatotriloba. ]. Both groups, Ib6x and Ib4x and their wild relatives, form a monophyletic group as compared to homologous genes from other sequenced T-DNAs; suggesting that they belong to the same lineage with a common origin.

Characterization of wild
In the case of the IbT-DNA2 ORF13 gene (492 nt; Fig. 7), the analysis indicates that Ib6x and Ib4x accessions grouped together in a well-supported clade (bootstrap value 99%) that includes one I. trifida accession PI 561544. The rest of the I. trifida samples formed a basal group and together with the sweetpotato group, they form a well-supported lineage (bootstrap value = 100). Nucleotide sequences from two species of the genus Nicotiana were included in the analysis of IbT-DNA2. The results show that those are phylogenetically closer to A. rhizogenes strains pRi2659 (AJ271050.1), K599 (EF433766.1) and MAFF03-01724 (AP002086.1) in comparison with the Ipomoea sequences.
IbT-DNA1 and IbT-DNA2 gene similarities among Ib6x and its wild relatives. Pairwise comparisons of identities of partial nucleotide sequences of IbT-DNA1 genes (C-prot, Acs, iaaH, iaaM) and IbT-DNA2 gene (ORF13) were estimated. Nucleotide sequence identity values are above 99% for all genes analyzed within the sweetpotato group; which includes both Ib6x and Ib4x. Of note is that ORF13 from Ipomoea trifida PI 561544 shows higher identity values (~99.9%) with the sweetpotato group than the rest of the Ipomoea trifida accessions (Supplementary Data, Tables 6 and 7). Among the sweetpotato group and its wild relatives, the identity values of all genes analyzed ranged from 96-98.8%. Previously, IbT-DNA1 was found to be inserted in two copies, in the form of a partial inverted repeat, in the genome of the Ib6x cv. Xu781 24 . In the present study, the nucleotide sequence identity between the two copies of IbT-DNA1 ( Fig. 8) was calculated in Xu781, which corresponded to 98.8% (divergency 1.2%).
Ib6x and Ib4x share the same insertion site of IbT-DNA1. A phylogenetic analysis of the region flanking IbT-DNA1 (687 nt; F-box third intron) was performed in order to elucidate the evolutionary relationship among all accessions in the sweetpotato group carrying IbT-DNA1 (Fig. 9). The alignment included: F-box-IbT-DNA1 sequences of six Ib6x accessions and three Ib4x accessions; F-box gene (without IbT-DNA1) of two Ib6x and three Ib4x; and F-box gene of the wild relatives I. trifida, I. triloba, I. cordatotriloba and Ipomoea sp. CIP 460250. An F-box gene sequence from I. nil, cv. Tokyo-kokei, were included as an outgroup. The resulting tree shows that the Ib6x and Ib4x F-box genes carrying IbT-DNA1 group together in a well-supported clade (bootstrap value = 99%). Likewise, sequences corresponding to the F-box gene uninterrupted by IbT-DNA1 appear in a sister clade. This suggests that the F-box gene carrying IbT-DNA1 might have diverged from the original F-box gene (either before or after the T-DNA insertion or both) and that the Ib6x and the Ib4x belong to the same lineage with a common origin. The nucleotide sequence identity calculated between F-box intact and F-box-IbT-DNA1 www.nature.com/scientificreports www.nature.com/scientificreports/ was 96.9% (3.1% divergence). The regions flanking IbT-DNA1 from I. tenuissima, I. cordatotriloba and Ipomoea sp. could not be included in the analysis since we were unable to amplify them with the primers designed.
Analysis of IbT-DNA2 in cultivated sweet potato Taizhong 6. The region flanking IbT-DNA2 in the Ib6x genome has not been described previously. It was predicted based on whole-genome sequencing data from cv. Taizhong 6. This analysis indicated that IbT-DNA2 (cv. Taizhong 6) is inserted in chromosome 7 and has an estimated size of 11,187 bp (Fig. 10). It comprises seven open reading frames (ORFs) homologous to ORF18/ ORF17n, ORF13, RolB/RolC family, ORF17n, ORF14 and a hypothetical protein with a "NADB Rossman" domain of Agrobacterium rhizogenes. Compared to IbT-DNA2 in cv. Huachano (KM052617), there is an insertion of 369 bp within ORF13 cv. Taizhong 6. The region flanking IbT-DNA2 was confirmed using PCR, and on the basis of significant homology (via tblastx) it was identified as the mitochondrial substrate carrier family protein UcpB -the highest score associated with Ipomoea nil (e-value = 6e-108; score = 1494). There is also an uninterrupted copy of the UcpB gene (without IbT-DNA2) on chromosome 7 of cv. Taizhong 6, that is 4,004 bp in size with nine exons. The insertion site of IbT-DNA2 was determined by comparing UcpB and UcpB-IbT-DNA2. On one side, the T-DNA is flanked by an intronic region with high A/T-content after exon 7 while the other side is located in an intronic region 24 bp upstream from exon 9. Linked to the T-DNA insertion, there is a deletion of 893 bp in the UcpB gene that includes exon 8 (Fig. 10).

Discussion
Our data demonstrate that the HGT event of Agrobacterium into series Batatas taxa is not confined to the hexaploid sweetpotato. It is present also in its wild relatives, which includes its tetraploid form, as well as other members of the series Batatas. We report here the detection of sequences homologous to IbT-DNA1 and IbT-DNA2 genes in at least ten accessions corresponding to Ib4x and fourteen accessions belonging to I. trifida, I. cordatotriloba, I. tenuissima, and a currently unidentified Ipomoea sp. from the series Batatas. Accessions belonging to the genus Ipomoea, but not members of series Batatas, and other related genera, were also analyzed. These included members of the Quamoclit group and species from the genera Calystegia, Xenostegia, Operculina and Merremia. The presence of IbT-DNA1 and IbT-DNA2 could not be confirmed in any of these samples. However, it should be noted we cannot exclude the possibility of false negatives in our analyses, and our findings likely represent an underestimation of the HGT events across the target species. This is because despite using degenerate primers and Southern blots, only regions corresponding to a few genes were tested, and remnants of (re-arranged) T-DNAs may exist that do not contain these complete regions. Also, we generally only tested one or two seedlings from each wild Ipomoea sp. accession (which are maintained as seeds) and if the accession was segregating for T-DNAs their presence could have been missed by chance.
The tetraploid form of I. batatas has been poorly characterized and its taxonomic status remains unclear. This taxon, collected from Ecuador, Colombia, Guatemala and Mexico, has been a subject of interest for over 50 years. The fact that these samples form thickened "pencil-shaped" storage roots has been considered as evidence that the tetraploids are primitive sweetpotatoes 28 . Some accessions were initially tentatively identified as I. trifida but later they were classified as wild I. batatas 20 . Subsequently, it was observed that the tetraploid form shared haplotypes (based on chloroplast and nuclear DNA markers) with the cultivated hexaploid 21 . These findings reinforced the hypothesis proposed by several authors, who suggested that tetraploid I. batatas are the closest wild relative of the cultivated sweetpotato 21,29 .
In the current study, nucleotide sequence analyses (pairwise comparisons) of IbT-DNA1 and IbT-DNA2 genes reveal high identity values (above 99%) among accessions from the sweetpotato group (Ib6x and Ib4x). These results were supported by the phylogenetic analyses of the regions flanking IbT-DNA1 and IbT-DNA2, which showed that Ib6x and Ib4x share the same insertion site (Figs 9 and 11). These findings reinforce previous taxonomic and molecular studies 20,21 and suggest that I. batatas includes both hexaploid and tetraploid forms. However, there is also a possibility that the tetraploid form represents an interspecific hybrid between I. batatas and a close wild relative (I. trifida). We suggest the use of IbT-DNA1 and IbT-DNA2 genes as markers to further   www.nature.com/scientificreports www.nature.com/scientificreports/  The series Batatas contains the sweetpotato group and 13 other species considered to be its closest wild relatives 5,6 . Within this group, the species I. trifida has been identified as a potential wild ancestor in several studies based on morphological data, molecular markers and cytogenetic analyses [14][15][16][17][18] . Recently, two studies have reopened the debate about the role of I. trifida in the origin of the sweetpotato. Yang et al. 19 analyzed a complete 6x I. batatas genome and proposed that the crop species could have resulted from a cross between a tetraploid and a diploid (most likely I. trifida) progenitor. Such a hybridization would have resulted in triploid progeny that, subsequently undergoing genome duplication, would result in 6x forms. In contrast, Muñoz-Rodriguez et al. 23 , based on genomic analyses of whole chloroplast and single-copy nuclear DNA regions, proposed that I. trifida played a dual role in the origin of the cultivated sweetpotato. Firstly, to form the first I. batatas lineage, as its most likely progenitor by autopolyploidization and, secondly, as the species that this autopolyploid (6x) later hybridized with to produce another independent sweetpotato lineage. Most recently, Wu et al. 31 found through sequence comparison of the genome of hexaploid I. batatas with the genomes of I. trifida and I. triloba, that approximately one third of the hexaploid I. batatas genome shows higher similarity to I. triloba than to I. trifida. In relation to the data in the present study, the detection of IbT-DNA2 (ORF13 gene) only in the I. trifida accessions (9 out of  IbT-DNA1 (Xu 781); IbT-DNA1 and its inverted repeat are presented as interrupted black arrows. The region flanking IbT-DNA1, to be analyzed in the next section (Fig. 7), is indicated as red arrows and its size (687 bp) is placed between brackets. www.nature.com/scientificreports www.nature.com/scientificreports/ 28) examined, and not in the other series Batatas species examined, provides additional evidence supporting the close relationship of this species (I. trifida) with the hexaploid and tetraploid forms of I. batatas. Furthermore, the phylogenetic analysis of IbT-DNA2 and its flanking region indicated that I. batatas (6x and 4x) and I. trifida originated from a common ancestor.
Similar to the cT-DNAs in Nicotiana species 30 , it is possible that IbT-DNA2 was acquired initially by I. trifida (or a common ancestor of I. trifida and I. batatas) and later transmitted across speciation events to the sweetpotato. This hypothesis is reinforced by the fact that I. trifida, together with Ib6x and Ib4x, share the same insertion site of IbT-DNA2. An alternative explanation for the presence of Ib-TDNA2 in the sweetpotato involves its transfer by interspecific hybridization that is known to occur between I. batatas and I. trifida 21 . Ipomoea trifida accessions carrying the ORF13 gene do not form a monophyletic group as PI 561544 appears in the clade of the sweetpotato group. This accession was collected in Venezuela and could represent the closest sweetpotato wild relative, in addition to the tetraploid form of I. batatas.  www.nature.com/scientificreports www.nature.com/scientificreports/ Species from the series Batatas other than I. trifida have also been proposed as potential contributors to the origin of the sweetpotato, albeit these hypotheses are less generally accepted within the community. Jarret et al. 16 considered I. tabascana (4x), I. trifida and K233 (4x, suggested to be a hybrid between I. batatas and I. trifida) to be the closest relatives of the cultivated sweetpotato based on RFLPs, among the taxa examined (which did not include Ib4x). Recently, Eserman 32 concluded, based on hybridization analysis, that Ib6x could have hybrid ancestry, with parentage from I. ramosissima and either I. triloba or I. cordatotriloba. The present study indicates the presence of IbT-DNA1 genes in accessions belonging to the species I. cordatotriloba, I. tenuissima and two as yet unclassified Ipomoea accessions (CIP 460250 and CIP 460814). Our phylogenetic trees of IbT-DNA1 genes indicate that the sweetpotato group, I. cordatotriloba, I. tenuissima and Ipomoea sp. form a strongly supported (~99% bootstrap) monophyletic clade as compared to their homologues in Agrobacterium spp., suggesting a common ancestry. The identity of the two Ipomoea sp. accessions containing IbT-DNA1 has not been elucidated. These accessions were initially classified as I. trifida (CIP 460250) and I. cordatotriloba (CIP460814). However, upon morphological re-evaluation, it became clear that they were not consistent with the recorded classification. The latter shows phenotypic characteristics consistent with I. grandifolia, whereas the formers' characteristics are not consistent with any of the established species. This was also confirmed by molecular markers, which showed CIP 460250 formed a sister clade compared to other Ipomoea series batatas 33 . It is not clear to what extent, if any, mis-identification of plant materials may have clouded efforts to resolve relationships within this group of taxa.
The presence of IbT-DNA1 in Ib4x, I. cordatotriloba, and other Ipomoea spp. from the series Batatas was confirmed by southern blot analyses. Tetraploid I. batatas (CIP403270 and PI 518474) and wild relatives (Ipomoea sp. and I. cordatotriloba) show dissimilar banding patterns when compared to Ib6x. Additionally, the identity values of IbT-DNA1 genes, among the sweetpotato group members and the wild relatives, range between 96-98.8% which is lower than within the sweetpotato group (above 99%). Thus, if the T-DNAs found in the series Batatas spp. represent a single ancestral event, it indicates that IbT-DNA1 sequences have evolved and diverged since their acquisition by the sweetpotato's ancestors. Recently, Ipomoea evolutionary trees have been calibrated, with an estimated mutation rate of 0.7% base pairs per million years 19 . The divergency between the repeats of IbT-DNA1 is 1.2%, which leads to an estimated age of IbT-DNA1 of 1.7 million years. Muñoz-Rodríguez et al. 23 pointed out that the clade including the sweetpotato and I. trifida diverged from its sister clade at least 1.5 million years ago. Considering that IbT-DNA1 is estimated to be older than the clade containing the sweetpotato and its potential ancestor (I. trifida); it is possible that IbT-DNA1 might have been acquired early in the evolution of these species. Consequently, IbT-DNA1 was fixed in the course of the evolution of the sweetpotato; while in other wild relatives it became less common, and in I. trifida this region could have been lost completely. The fact that I. trifida samples analyzed in this study do not contain IbT-DNA1, supports this possible course of events.
Based on the current data, at least two hypotheses arise to explain the combined origin of IbT-DNA1 and IbT-DNA2 in the hexaploid I. batatas. Hypothesis I suggests that the HGT from A. rhizogenes (or an ancestral related species) may have occurred in a single event, transferring both IbT-DNAs into a common ancestor of the species I. trifida, I. tenuissima, I. cordatotriloba and I. triloba. Subsequently, both regions were passed (independently or in combination) to I. trifida, I. tenuissima, I. cordatotriloba and I. triloba (or primitive forms). Later, one of these potential progenitors passed IbT-DNAs to the tetraploid I. batatas (L.) Lam by speciation, which later became I. batatas (L.) Lam (6x). Hypothesis II proposes that the HGT from Agrobacterium spp. into the cultivated sweetpotato's ancestor might have occurred via two or more independent events. It is possible that at least two species independently acquired IbT-DNA1 and/or IbT-DNA2 and then two of them combined in the common ancestor of I. batatas (L.) Lam (4x) and (6x). This hypothesis could explain the fact that the flanking region of IbT-TDNA1 in I. tenuissima and I. cordatotriloba could not be amplified, despite using various sets of primers. Future efforts to determine the flanking sequences in these accessions should be able to confirm or discard this hypothesis. Nevertheless, based on our current data, because HGT events that enter the host germline are relatively rare in nature, and because of the clear correspondence between the phylogeny of the T-DNA genes and the species taxonomy, hypothesis I seems the most likely. For taxonomic verification, tetraploid I. batatas accessions from CIPs Genebank (3 siblings per accession) were germinated in a petri dish and then transferred to planting trays (Jiffy 7) for 15 days after which they were transferred into screenhouses for characterization using 30-60 descriptors (Rossel et al., unpublished). To determine the ploidy levels, samples from young leaves were analyzed in an Accuri C6 flow cytometer (BD Biosciences) with propidium iodide and data were analyzed with BD Accuri C6 Software. This was supplemented by chromosome counting in squashed root-tips stained with aceto-orcein as required.