Genome-wide identification of Argonautes in Solanaceae with emphasis on potato

Regulatory small RNAs (sRNAs) play important roles in many fundamental processes in plant biology such as development, fertilization and stress responses. The AGO protein family has here a central importance in gene regulation based on their capacity to associate with sRNAs followed by mRNA targeting in a sequence-complementary manner. The present study explored Argonautes (AGOs) in the Solanaceae family, with emphasis on potato, Solanum tuberosum (St). A genome-wide monitoring was performed to provide a deeper insight into gene families, genomic localization, gene structure and expression profile against the potato late blight pathogen Phytophthora infestans. Among 15 species in the Solanaceae family we found a variation from ten AGOs in Nicotiana obtusifolia to 17 in N. tabacum. Comprehensive analyses of AGO phylogeny revealed duplication of AGO1, AGO10 and AGO4 paralogs during early radiation of Solanaceae. Fourteen AGOs were identified in potato. Orthologs of AGO8 and AGO9 were missing in the potato genome. However, AGO15 earlier annotated in tomato was identified. StAGO15 differs from the other paralogs having residues of different physico-chemical properties at functionally important amino acid positions. Upon pathogen challenge StAGO15 was significantly activated and hence may play a prominent role in sRNA-based regulation of potato defense.

Scientific Reports | (2020) 10:20577 | https://doi.org/10.1038/s41598-020-77593-y www.nature.com/scientificreports/ High-throughput sequencing followed by comparative genomics has revealed several gains and losses of AGO encoded genes. In addition, genes not previously annotated are still a source of new information such as reported for AGOs in Nicotiana attenuata 21 . Species in the Solanaceae family have experienced a specific ploidy event after the split with Asterids about 49 million years ago followed by further species divergence 22 . The large genus Solanum diverged from peppers (Capsicum) c. 19 Million years ago (Mya) whereas the Solanum crops, potato and tomato split rather recently c. 8 Mya 23 . Overall, the Solanaceae gene families vary in size due to duplications and different gene evolutionary events 24 .
In this study we used all public genomic and transcriptomic resources to clarify the number of AGO encoding genes, and their divergence in Solanaceae followed by tests of expression upon filamentous pathogen infection. Extensive gains and losses of AGOs have occurred in Nicotiana species compared to Solanum species. The AGO4 clade which harbors a specific Solanaceae sub-clade, AGO15, with novel sequence and structural features received our attention. StAGO15 was activated by filamentous pathogen challenge suggesting an important role in the immune responses of potato. Details of its sRNA binding and function under biotic stress remains to be elucidated.

Results
Extensive gains and losses of AGOs in Solanaceous genomes. We searched full length AGO sequences in potato and related genomes of Solanaceae species to generate an overview picture of the number of AGOs in the Solanaceae family and their evolutionary history. To generate confidence over number of gene gain and loss events, the reconciliation of the gene trees was repeated three times using three different species as out-groups Arabidopsis, Vitis vinifera and Erythranthe guttata. We found six AGO duplication events prior the split between Petunia and the other Solanaceae linages ( Fig. 1; Supplementary Fig. S1). After the divergence of Petunia, the ancestral line experienced four duplication and two loss events followed by speciation processes leading to Nicotiana and Solanum lineages. The AGO family in the ancestral Nicotiana lineage has experienced extensive changes. Based on the six Nicotiana species analyzed, 33 duplication and 59 loss events had occurred prior further speciation. Processes involving gene losses have continued within each species. Maximum losses have occurred in N. benthamiana (20) followed by N. tabacum (15). Compared to Nicotiana, the expansion of the AGO family is less variable in the Solanum genus. Overall, the number of AGOs in the Solanaceae species analyzed varies from ten in N. obtusifolia to 17 in N. tabacum. Potato has 14 Argonaute encoding genes. The search for homologous AGO sequences in the Solanaceae family generated fourteen full length AGO genes recovered from the S. tuberosum (potato) genome (StA-GOs). PAZ and PIWI domains were found in all 14 AGO sequences whereas presence of other conserved parts such as MID, N terminal and linker domains were not predicted in all candidates when applying default settings ( Supplementary Fig. S2a). Based on sequence identity and phylogenetic clustering, the orthologs of Arabidopsis AGO1, AGO2, AGO3, AGO4, AGO5, AGO6, AGO7 and AGO10 were all discovered in the potato genome ( Supplementary Fig. S2). In addition, two orthologs each for AGO1, AGO2, AGO10 and three for AGO4 were identified. AGO6 was not found in the reference annotation of the potato genome (PGSC) but later identified   Fig. S2b). Two candidates, StAGO15 and StAGO7 were the most divergent AGOs compared to the other members. Next, the StAGOs were mapped on the S. tuberosum chromosomes ( Supplementary Fig. S3). The close positions of StAGO2a, StAGO2b and StAGO3 on chromosome 2, and StAGO1a, StAGO4a and StAGO10c on chromosome 6, together with sequence similarities, suggest that they have experienced gene duplications. Similar tandem gene duplications are observed on chromosome 2 and 6 in tomato 25 . In tomato and potato, no AGOs are found on chromosome 4, 5, 8, 10 and 11.
A maximum likelihood phylogeny was reconstructed by using a total of 203 AGO homologs from the sampled Solanaceae lineages. To get confidence over the topology and partitions, 99 AGO homologs were added from nine sequenced Brassicaceae species (Supplementary Fig. S4). In line with earlier clustering ( Supplementary  Fig. S2b) three major clades (AGO1, AGO2, AGO4) were formed (Fig. 2). Homologs from almost all species were observed in all three clades including duplications in Solanaceae. The clustering pattern and topology indicates that a duplication of AGO10 has occurred in an early ancestor prior the divergence of Solanaceae and   www.nature.com/scientificreports/ Brassicaceae, but the duplicated ortholog has been lost in Brassicaceae. The duplication of the AGO1 gene, on the other hand, has most likely occurred early at the base of Solanaceae after the split with Asterids. Two gene copies of AGO1 and AGO10 were found in Nicotiana benthamiana, and two copies for AGO5 in the four species Petunia inflata, N. tabacum, N. tomentosiformis and N. benthamiana. Orthologs in the AGO5 sub-clade are more dissimilar than those in the AGO1 and AGO10 sub-clades, a situation which also is reflected in variations among the branch lengths. Orthologs of AGO7 were observed in almost all Solanaceae lineages and a separate partition with the Brassicaceae orthologs formed the AGO7 sub-clade. Almost the same branch length indicates low variation among the orthologs. Further, AGO2 and AGO3 are sister orthologs that grouped together forming a separate sub-clade in which Solanaceae lineages showed varied number of paralogs (Fig. 2). Only one ortholog for AGO2 and AGO3 was detected from Nicotiana, Petunia and Capsicum lineages, while the Solanum species analyzed had three orthologs, except S. chacoense that had two gene copies. The AGO4 clade was most likely formed by orthologs of AGO4 exhibiting clear partition with AGO8 and AGO9 sequences from Solanaceae (Fig. 2). Two groups of AGO4 orthologs were found in Solanaceae. Based on identity with the corresponding members in Arabidopsis, three paralogs in for example potato were found (StAGO4, 4a and 4d). Orthologs of AGO6 from the two plant families were also identified.
StAGO15 has novel domain architecture. A well supported sub-clade containing a single member from each Solanaceae species, except from N. attenuata and N. benthamiana, was found at the base of the AGO4 clade, earlier annotated as AGO15 in tomato 25 ( Fig. 2; Supplementary Fig. S4). The relative position of this node in the phylogeny and branch lengths suggests that the evolution of AGO15 occurred early in Solanaceae. We searched databases and compared AGO15 sequences from Solanaceae and Poaceae in order to detect potential similarities in sequence and function. The phylogenetic tree generated two monophyletic clades with representatives from each plant family ( Supplementary Fig. S5). The topology of the tree coincides with estimated divergence time between Monocots and Eudicots 26 . In rice, the AGO4 clade comprise of four members, OsAGO4a, OsAGO4b, OsAGO15 and OsAGO16 (Fig. 3). StAGO15 does not cluster with any of the potato or rice AGOs in this clade, suggesting independent evolution.
We aligned the AGO15 and the AGO1 clade sequences and found clear divergence in the PIWI domain among the Solanaceae species, particularly among the amino acids corresponding to the catalytic "slicing" residues D-E-D-H/D ( Supplementary Fig. S6). In potato, StAGO15 has replaced the catalytic tetrad 'D-E-D-H/D' with a G-E-Q-R motif with unknown slicer function. Likewise, tomato has a G-Q-R/P motif at the catalytic site of SlAGO15 25 .
We closer examined the MID domains that did not fulfill the default sequences in comparison with AtAGO1 ( Supplementary Fig. S7a,b). Of particular interest was the nucleotide specificity loop (NSL) which in Arabidopsis is known to regulate 5′ specificity (C, U or A) 9,10,27 . The Solanaceae AGO15 protein sequences deviate substantially from the AGO1 clade in the NSL positions, where for example StAGO15 has AFY as 5′ end recognition sequence ( Supplementary Fig. S7b).
A three-dimensional protein structure comparison was performed by first model the human AGO2 protein to visualize the different main domains together with their interaction with miR20a 28 (Fig. 4a). Next, models of AtAGO1 and StAGO15 were constructed to facilitate identification of divergent units (Fig. 4b,c). Merged protein structures of AtAGO1 and StAGO15 showed large similarities (Fig. 4d). However, the StAGO15 protein appeared somewhat "bulky". This feature is explained by three single coils, located either at the N-terminal, in the L1 domain or at the opening of the central pocket. In comparison with AtAGO1, the NSL sequence of StAGO15 has a hydrophobic residue (Phe583) replacing Asn 687 in AtAGO1 (Fig. 4e). This residue is of importance for the 5′ nucleotide selection in Arabidopsis 9 . Further, the D-E-D-H catalytic pocket observed in AtAGO1 is replaced by a G-E-Q-R motif in StAGO15 (Fig. 4f). The D-E-D-H and G-E-Q-R motifs resemble each other, sharing the glutamic acid (Glu 708 vs. Glu 803) as 2nd motif residue with negative charge and a 4th positive residue (Arg 882 vs. His 988). The 3rd motif residue, being negative in AtAGO1 (Asp 848) and polar in StAGO15 (Gln 750) is in both cases capable of binding positive residues, hence this substitute may not affect the function of StAGO15. The charge of Gly 667 in G-E-Q-R is pH dependent, a feature whose impact is unknown particularly under stress condition. The divergent recognition and binding motifs compared to AGO1 clade may suggest specific function(s).

StAGO15 is elevated upon pathogen infection.
Based on generated RNAseq data on potato challenged by the Phytophthora infestans virulent strain 11388 StAGO4c, StAGO10b, and StAGO15 were found upregulated ( Supplementary Fig. S8). Quantitative real-time PCR supported the activation of StAGO15 (Fig. 5a). In a time-course experiment the gene activation was clearly observed 4 to 5 days post inoculation (Fig. 5b,  Supplementary Fig. S9) when P. infestans has switched from biotrophic to necrotrophic stage 29 . To clarify if this elevated AGO15 activity was specific for P. infestans, the early blight fungus Alternaria solani was used for potato infection in parallel experiments. Again, StAGO15 was up-regulated but not as much as seen in the P. infestans response (Fig. 5a).

Discussion
The Solanaceae plant family comprises many important crop species with variable genome and gene family sizes, reflecting their history of genome duplications and variable selective constrains 22,30 . In an analysis of twelve Solanaceae species, gene duplication rate, strength of selection, and gene function was shown to vary extensively together impacting the gene family sizes 24 . Genes were detected enriched in the genomes either by whole genome duplication or by tandem duplication. Members in gene families with low domain variability displayed a tendency of housekeeping functions. Aforesaid genes appeared to have duplicated by whole genome duplication, in contrast to the tandem duplicated category that showed higher variability. In our analysis of AGO genes in 15 Solanaceae species a rather extensive variation in gene numbers were detected, particularly when comparing Scientific Reports | (2020) 10:20577 | https://doi.org/10.1038/s41598-020-77593-y www.nature.com/scientificreports/ species in the Nicotiana genus with numbers in the genera Solanum and Petunia. In potato, remnants of AGO gene duplications can be observed on chromosome 2 and 6 and in tomato on chromosome 1, 2, 3, and 6 31 . In potato, we discovered three StAGO4 (4, 4a, 4d) genes whereas four was found in tomato (SlAGO4a, 4b, 4c, 4d). When considering branch lengths, AGO4 is closer to AGO4b compared to AGO4c (Supplementary Fig. S3). In this case it is still not clear whether an incomplete gene duplication or a gene loss has occurred. The split between Nicotiana and Solanum species is estimated to be rather recent c. 24 Myr 22 , hence gene family expansion and gene turnover rates should not deviate much between the two genera as found in our analysis. We can only speculate that human selection and clonal propagation could have had a major impact on gene content beside different duplication mechanisms as earlier suggested 31 .
It is believed that a functional RNAi pathway was present in the last common ancestor for eukaryotes as a defense system against viruses and transposons, a system that has expended to comprise regulation of endogenous RNAs 32 . Members of AGO proteins can be found in a majority of eukaryotic super-groups, where AGOs act as partners in a multi-protein regulatory system impacting an array of processes 15 . This multi-function feature also applies on plants, including defense. For example, AGO1, AGO2, AGO4, AGO5, AGO7, and AGO10 in Arabidopsis are known to participate in mechanisms involving defense responses towards different types of viruses 33,34 . More precisely antiviral AGOs associate with virus-derived small RNAs to repress complementary viral RNAs or DNAs, or with endogenous small RNAs to regulate host gene expression and promote antiviral defense. In infected N. benthamiana plants, 21 and 22 nt sRNA from the potato spindle tuber viroid associate with AGO1, AGO2 and AGO3, while 24 nt viroid sRNA bind to AGO4, AGO5, and AGO6 35 . Similar events are also reported from rice where OsAGO1 and OsAGO18 act against Rice stripe tenuivirus and Rice dwarf phytoreovirus 36 . Not much is known about function of OsAGO15. It is believed to have evolved by duplication events followed by differentiation within the AGO4 clade 37 . OsAGO15 is expressed in reproductive tissue and harbor a D-D-H/P catalytic motif.  www.nature.com/scientificreports/ We checked for the presence of nuclear export signal (NES) and the nuclear localization signal (NLS) domains in the 14 potato AGO sequences. StAGO1a and 1b were the only AGOs containing both NES and NLS domains, known to be of importance for nuclear-cytoplasmic shuttling 38 . High scores of NLS were only detected for StAGO15, suggesting nuclear localization. For nuclear to cytoplasm transport, potato has five members in the exportin family. However, details on translocation from the nucleus, including loading partners and associated processes reported in Arabidopsis, are not known in potato. The protein sequence of StAGO15 differs at the NSL and the catalytic tetrad sequences compared to AtAGO1. The G-E-Q-R motif is so far only observed in the Solanaceae AGO15 proteins. Uracil is the most preferred 5′ nucleotide of AtAGO1 bound sRNA 10 , however adenine is the most hydrophobic nucleotide. The change from the polar Asn 687 in AtAGO1 to the hydrophobic Phe 583 in StAGO15, could indicate a preference for adenine as the 5′-nucleotide of sRNA binding. In Arabidopsis 5′ A is a signature for a loading preference of 24 nt sRNAs 39 . These features open up several functional possibilities, including induction of 24 nt phasiRNAs upon pathogen infection. Overlapping functions cannot be excluded at this stage. Resistance genes can become negatively regulated by host miRNAs upon pathogen attack as a self-defense response. In tomato, particularly the miR482/2118 family are active and R gene mRNA can be targeted both by these miRNAs and by self-generate secondary sRNAs 40 . There are many R genes in individual plant species, not least in potato, and it is thought that self-suppression by RNA silencing is a strategy to balance costs and benefits under pathogen attack. However, there is a complex co-evolutionary relationship between sequence diversity in R genes and interactions of evolving miRNA where much remains to be clarified 41 . In this context adds the potential impact of miR8788 from P. infestans on susceptibility in potato during infection another level of complexity 42 .  Phylogenetic analysis. The AGO homologs were aligned using 'MAFFT v7.123b 49 with ' ensi' option.
Poorly aligned regions were cleaned using 'trimAl' 50 and option ' Automated1' . Phylogenetic trees were reconstructed using Maximum Likelihood (ML) method as implemented in RAxML v 8.2.11 51 . The best substitution model JTT + Γ was applied for all trees. Robustness of the topologies and branches were assessed with 100 or 150 bootstrap replicates. The AGO homolog from Physcomitrella patens was used as outgroup for the rooting of the analysis and the R package ggtree for drawing. To infer evolutionary events, the AGO gene family tree was reconciled with the species tree, generated by the NCBI taxonomy browser, using NOTUNG 52 . Erythranthe guttata was used as outgroup in the gain and loss gene predictions. Pairwise identities, genetic distances and corresponding Neighbor-Joining tree were computed using MEGA v.7 53 .   P. infestans DNA quantification. To evaluate P. infestans infection, its genomic DNA (gDNA) was quantified by qPCR essentially as described earlier 42,58 . Genomic DNA was isolated from potato leaves inoculated with P. infestans. Concentration of obtained gDNA was determined using Qubit dsDNA BR Assay Kit (Thermo Scientific). qPCR analyses with four biological replicates were carried out using iTaq Universal SYBR Green Supermix (Bio-Rad). 20 ng gDNA was used as template in each qPCR reaction together with primers for PiO8 or StACT101. Primers are listed in Supplementary Table S1. All statistics were calculated as detailed as earlier described 42 .
Transcriptome sequencing and bioinformatic analysis. Leaves were collected from potato plants (cv. Sarpo Mira) 5 days post inoculation (dpi) using P. infestans isolates 88069 and 11388. Uninfected and water inoculated leaves were used as controls. For each sample at least 3 leaves were pooled. Total RNA was extracted using the RNeasy mini kit (QIAGEN). Thirteen transcript libraries followed by Illumina HiSeq 2500 sequencing were performed at the National Genomics Infrastructure, Illumina sequencing platform (Stockholm). The Illumina adaptor sequences and low-quality reads were removed using Trimmomatic v0.36 59 . The filtered datasets were mapped to S. tuberosum v4.04 and the P. infestans reference genomes 60,61 using kallisto v0.43.0 62 . Differential expression analysis was performed using the DESeq2 package 63 . All calculations were performed in R v3.2.0 (www.R-proje ct.org).