An unusual type I ribosome-inactivating protein from Agrostemma githago L.

Agrostemma githago L. (corn cockle) is an herbaceous plant mainly growing in Europe. The seeds of the corn cockle are toxic and poisonings were widespread in the past by consuming contaminated flour. The toxic principle of Agrostemma seeds was attributed to triterpenoid secondary metabolites. Indeed, this is in part true. However Agrostemma githago L. is also a producer of ribosome-inactivating proteins (RIPs). RIPs are N-glycosylases that inactivate the ribosomal RNA, a process leading to an irreversible inhibition of protein synthesis and subsequent cell death. A widely known RIP is ricin from Ricinus communis L., which was used as a bioweapon in the past. In this study we isolated agrostin, a 27 kDa RIP from the seeds of Agrostemma githago L., and determined its full sequence. The toxicity of native agrostin was investigated by impedance-based live cell imaging. By RNAseq we identified 7 additional RIPs (agrostins) in the transcriptome of the corn cockle. Agrostin was recombinantly expressed in E. coli and characterized by MALDI-TOF–MS and adenine releasing assay. This study provides for the first time a comprehensive analysis of ribosome-inactivating proteins in the corn cockle and complements the current knowledge about the toxic principles of the plant.

In 2003 Hebestreit et al. showed that triterpene saponins from the seeds of Agrostemma githago L., increased the cytotoxicity of agrostin in a synergistic manner 12 . The reason for this increase lies in the fact that triterpene saponins enhance the endosomal escape process of the type I ribosome-inactivating proteins within the cell 13 . The endosomal escape is thus a prerequisite for RIP-related toxicity. This synergistic toxicity of agrostin with triterpene saponins contributes significantly to the toxicity of the seed material.
Recently we have identified Gypsophila elegans M. Bieb (Caryophyllaceae) as yet another plant able to cosynthesize triterpene saponins and type I RIPs in seeds 14 .
Astonishingly, although a commercial product called "agrostin from Agrostemma githago seeds" (Sigma A7928) has been available for several years (its distribution was discontinued around 2005), never any molecular data pertaining to agrostin has been published. This is in contrast to other type I RIPs where such data was reported at an early stage: The amino-acid composition for two RIPs from Saponaria officinalis L. and a RIP from the latex of the sandbox tree Hura crepitans (Euphorbiaceae) was published in the very same paper in which the purification of the three forms of agrostin was originally reported 11 . The N-terminal sequence of saporin-6, a RIP from Saponaria officinalis L., was available as early as 1985 15 .
In order to fill this lack of knowledge we aimed to isolate, characterize and identify type I RIPs from Agrostemma githago L.

Results and discussion
isolation of agrostin from seeds of Agrostemma githago L. Agrostin was isolated by affinity chromatography using an anti-agrostin antibody raised against commercially available agrostin. Using this approach allowed for a direct one-step purification from the aqueous extract from Agrostemma seeds by which agrostin was obtained in high purity, as shown in Fig. 1a. In comparison with the commercial agrostin from Sigma-Aldrich a small mass shift was observed in the SDS-gel. This might be due to a glycosylation of agrostin from Sigma-Aldrich. Glycosylation of agrostin had been reported previously 1 .
Mass-spectrometric analysis of the intact protein yielded a main peak at 26,962 ± 7 Da, with two side peaks at a slightly higher total mass. Whether these are due to artificial or physiological protein modifications or represent very similar protein isoforms could not be established. The isolated agrostin henceforward is referred to as Agrostin_seed and agrostin obtained from Sigma-Aldrich as Agrostin_sigma.
MALDi-tof-MS. The isolated Agrostin_seed was further subjected to in-gel digestion using trypsin and AspN protease (data not shown) and the resulting peptides were analysed by MALDI-TOF-MS (Fig. 2). A number of selected peptides (depicted with an asterisk in Fig. 2) were fragmented and their sequences were determined de novo from the MS/MS spectra. Assuming from its total mass that agrostin consists of approximately 245 amino acids, these peptides represent roughly 40% of its total sequence. A comparison with the tryptic peptide map of Agrostin_sigma (Fig. 2) showed that the two proteins are essentially identical. Interestingly, one additional peptide in the Sigma protein at M + H = 1,279 was identified as an O-glycosylated form of the peptide  Supporting the value of this observation, however, the bioinformatic prediction using the NetOGlyc server over the whole length of the agrostin sequence yields the highest O-glycosylation probability for the two threonine residues T100 and T102 contained in precisely this peptide. A higher degree of glycosylation-there might be more glycosylation events that remained undetected-might explain the slightly different migration behaviour in SDS-PAGE seen in Fig. 1.
For the determination of the full sequence and to identify further RIPs in the transcriptome of Agrostemma githago L. RNAseq was performed. Prior to RNAseq an expression analysis of agrostin in different developmental stages of Agrostemma githago L. was conducted. expression analysis of agrostin in Agrostemma githago L. For the RNA extraction (RNAseq) we aimed to identify those developmental stages in which Agrostemma githago L. shows a high expression of agrostin. For this reason Agrostemma githago L. was seeded and grown to different developmental stages (stage a-g, Fig. 3a). The extracts of the plant material derived from the different developmental stages were analysed by western blot using the anti-agrostin antibody. As shown in Fig. 3b agrostin is already expressed in young plants (stage a) but it could also be detected in stage g. The expression of agrostin apparently fluctuates during plant development.
For the extraction of RNA (RNAseq) the plant material from stage a was used.
Determination of amino-acid sequences. By transcriptome analysis we identified the sequence Agrostin_RNA3 shown in Fig. 4a, which is very similar to the peptide sequences shown above. However, we found a substantial number of discrepancies between this sequence and the peptide sequences obtained by MALDI-TOF MS-analysis, e.g. while a peptide with the sequence VAITVAFRK (M = 1,003.62) was identified by MS/MS-analysis, the corresponding sequence in Agrostin_RNA3 was VAITVALRK with a clearly different mass (M = 969.63). Similar small differences existed for most of the analysed peptides. We therefore concluded that the Agrostin_RNA3 sequence represents an agrostin isoform, which is present in stage a (Fig. 3a) of the development, but not exactly the protein purified from the seeds. By combining the results obtained on the peptide level by mass spectrometry and on the level of the nucleotide sequences by transcriptome sequencing, we succeeded in assembling the sequence of Agrostin_seed corresponding to the protein isolated from the seeds (Fig. 4a).
Agrostin_RNA3 and Agrostin_seed are very similar (92% sequence identity), representing agrostin isoforms.  www.nature.com/scientificreports/ For Agrostin_seed ambiguities remain in three positions between the protein data and the RNA seq data (V019M, N042S, S228T). In these cases we used the amino acids according to the peptide sequences for the final sequence presented in Fig. 4, since we assume that they represent more direct evidence for the protein purified from the plant. The discrepancies might be due to sequencing inaccuracies or might arise from the diversity of the biological material used in this study. Intriguingly we found one peptide in two versions (R.ANFVANELTAQER, M = 1,461.7, and R.ANFVANELTPQER, M = 1,487.7) pointing to a certain degree of heterogeneity within the Agrostin_seed fraction.
Agrostin_seed shows the typical features of a type I RIP such as gypsophilin-S from Gypsophila elegans M.Bieb 14 . Its theoretical molecular mass calculated from the sequence is 26,966.0 Da (M + H, average) which is in good agreement (Δ = 148 ppm) with the experimental value (Fig. 1b). The theoretical isoelectric point as calculated with the tool ProtParam is 9.43, which is somewhat higher than the experimental values given by Stirpe for the different agrostin peaks observed in his work (7.7 for peak 2, 8.7 for peak 5 and 8.75 for peak 6) 11 .
The hypothetical three-dimensional structure of agrostin generated using Phyre2 17 shows the typical composition of other type I RIPs, which consists of an N-terminal β-sheet-rich domain followed by an α-helix-rich succession (Fig. 4b).
Using the Agrostin_seed amino-acid sequence, a similarity search using the protein basic logic alignment tool BLASTp yielded the highest percentage identity value of 36% to the type I RIP bouganin from Bougainvillea spectabilis Willd. This is surprising since Bougainvillea spectabilis Willd belongs to another plant family (Nyctaginaceae). The value of 36% is remarkably low; sequence identity is even lower with other RIPs from plants from the same plant family (Caryophyllaceae), 30% with gypsophilin-S, 27% with saporin-6, and 26% with dianthin. This finding is even more striking as the similarity between these three proteins is much higher (in the range of 80% sequence identity).
It highlights the exceptional position that Agrostin_seed adopts among the type I RIPs from the carnation family.
Type I RIPs, especially saporin, are used for the construction of targeted anti-tumor toxins, consisting of monoclonal antibodies and type I RIPs as toxin portions 18 . Clinical studies have also been performed with such kind of conjugates 19,20 and a huge number of saporin-based antibody conjugates, addressing different targets, are commercially available 21 . In this context agrostin is a new interesting option for generating conjugates with potentially lower immunogenicity. Immunogenicity of the RIP portion is a big problem and differs quite a lot among RIPs 22 . The conjugation of toxins to monoclonal antibodies is achieved by chemical linkers. Due to their intrinsic nucleophilicity thiols (cysteines) are well suited for chemical conjugations via disulfide formation or coupling via maleimides 23 . However, in order to take advantage of thiol coupling chemistry the cysteines must be accessible on the surface of the protein. Agrostin_seed contains the cysteines Cys 32 and Cys 216. Molecular analysis using Jmol 24 shows that the thiol of Cys 32 might be accessible for chemical modification, whereas that of Cys 216 is rather oriented towards the core of the protein (Fig. 4c). This offers the possibility of a site-specific modification with chemical linkers such as maleimide cross-linkers and coupling to monoclonal antibodies with defined coupling stoichiometry. www.nature.com/scientificreports/ Hypothetical Rip-sequences from the transcriptome of Agrostemma githago L. The analyses of the RNAseq data set revealed 7 different RIP sequences. It is likely that the translation of these transcripts depends on factors such as development, infections or abiotic stress 25 . The derived protein sequences were aligned using CLUSTALW 16 and signal sequences were determined by SignalP 5.0 26 (The alignment is depicted in the supplementary information, Fig. S1) and the functionally relevant amino acids are present throughout all sequences. There are only very few plants with such a variety of RIPs in their transcriptomes. cytotoxic activity of agrostin. The cytotoxicity of Agrostin_seed was investigated in ECV-304 cells by impedance-based real-time analysis. In previous studies we have shown that particular triterpene saponins augment the cytotoxicity of type I RIPs by improving the endosomal escape of internalized type I RIPs 13 .
Following endocytosis into the cell, type I RIPs need to escape from lysosomes into the cytosol. This is a very important step in the course of the toxin routing, since the target organelles (ribosomes) are located in the cytosol.
For this reason we combined agrostin with the a non-toxic concentration of the triterpene saponin SO1861 33 (Fig. 5).

Recombinant expression of Agrostin_seed.
Based on the amino-acid sequence of Agrostin_seed, a codon-optimized nucleic acid sequence including the sequence for an N-terminal 8 × His affinity tag was generated by gene synthesis. The recombinant Agrostin_seed is henceforward referred to as his Agrostin. his Agrostin was expressed in E. coli. and following the isolation by metal affinity chromatography one prominent band at around 29 kDa could be seen on the SDS-PAGE (Fig. 6a). The exact mass of his Agrostin was determined by MALDI-TOF-MS as 28,117 Da. (Fig. 6b). This value is in very good agreement with the theoretical mass calculated from the sequence (28,119 Da). The identity of his Agrostin was further verified by its peptide mass fingerprint (data not shown) and MALDI ISD sequencing (see supplementary information, Fig. S2).
The enzymatic activity of his Agrostin was determined in a densitometric TLC assay 34 , which is based on the RIP-catalysed release of adenine molecules from an artificial substrate.
As shown in Fig. 6c his Agrostin showed enzymatic activity, even though its activity was not as high as the activity of native Agrostin_seed. This could be due to a partially uncorrect folding of his Agrostin during expression in E. coli. In future studies this issue might be solved by optimzing the expression conditions in E. coli. However, the recombinant type I RIP dianthin from Dianthus caryophyllus L., which was used as positive control, showed an even a higher activity. This could be also due to a higher substrate specifity of his Agrostin and native Agrostin compared to dianthin, DNA not being the natural substrate of RIPs.

Methods
Seed material. Seeds (Agrostemmae semen, AGRO 26/80) from Agrostemma githago L. were obtained from the Bundesanstalt für Züchtungsforschung und Kulturpflanzen (BAZ) in Gatersleben, Germany. Seeds (200 g) were grinded and defatted by Soxhlet extraction using petroleum ether overnight. The material was airdried and extracted at 4 °C by 500 ml PBS supplemented with protease inhibitor (cOmplete Protease Inhibitor Cocktail, Roche, Mannheim, Germany). After 12 h the extract was centrifuged at 6,000 g for 20 min and then subjected to ultracentrifugation (Optima L-90 K, Beckmann Coulter GmbH; 30,000 rpm, 30 min, 4 °C). The clear supernatant was subjected to affinity chromatography (see below).

isolation of agrostin. For the isolation of Agrostin_seed an anti-agrostin antibody was generated in rabbits
(Pineda antibody service, Berlin, Germany). For the immunization, commercial agrostin (Sigma-Aldrich, Steinheim, Germany) was used. Following ammonium sulfate precipitation of the serum the IgG fraction was isolated by protein A-based column chromatography (Pierce Protein A Agarose, Thermo Fisher Scientific). The antibodies were eluted by 0.1 M glycine, pH 2.5, 4 °C and neutralized by Tris buffer (1 M, pH 9.0, 4 °C). For the isolation of anti-agrostin antibodies, 100 µg of commercial agrostin was immobilized on NHS-Activated Agarose Spin Columns (Pierce, Thermo Fischer Scientific). After applying the IgG fraction and washing (PBS), anti-agrostin antibodies were eluted by 0.1 M glycine, pH 2.5, 4 °C and neutralized (Tris buffer 1 M, pH 9.0, 4 °C). Fractions were pooled, dialysed against PBS and analysed by SDS-PAGE (12%).
For the isolation of Agrostin_seed, anti-agrostin antibodies were immobilized on NHS-Activated Agarose Spin Columns (Pierce, Thermo Fisher Scientific). The Agrostemma seed extract (500 ml) was gradually applied to the column. After washing (5 ml PBS, 4 °C), bound Agrostin_seed was eluted by adding 5 ml 0.1 M glycine, pH 2.5, 4 °C. In total 13 fractions (each 0.5 ml) were collected, neutralized (see above) and dialysed against PBS. Two fractions contained agrostin. Protein concentration was determined by BCA assay and fractions were analysed by SDS-PAGE (12%), Coomassie Brilliant Blue staining.    www.nature.com/scientificreports/ operated in positive mode. Samples were spotted using the dried-droplet technique. Intact protein mass was determined in linear mode (LP_ProtMix) using sinapinic acid as the matrix (saturated solution in 33% acetonitrile/0.1% trifluoroacetic acid) and spectra were acquired over an m/z range of 3,000-40,000. The mass accuracy obtained in linear mode measurements in the higher mass range (> 10 kDa) was estimated as ± 1 ‰. Peptides generated by in-gel trypsin digestion (modified from Shevchenko et al. 35 ) were measured in reflector mode (RP_PepMix) using α-cyano-4-hydroxycinnamic acid (saturated solution in 33% acetonitrile/0.1% trifluoroacetic acid) as the matrix and spectra were typically acquired over an m/z range of 600-4,000. Data was analysed using FlexAnalysis 2.4. software. MS/MS spectra of selected tryptic peptides were acquired in the LIFT mode 36 and de novo interpretation of the fragment spectra was performed manually. In-source decay (ISD) was used to generate N-terminal c ions and C-terminal (z + 2) ions from the intact purified and acetone-precipitated recombinant protein using 1,5-diaminonaphthalene (1,5-DAN) as matrix. Spectra were recorded in the positive reflector mode (RP_PepMix) in the m/z range 800-4,000. Mass accuracy here was < 100 ppm.
Agrostin expression in different development stages. In order to identify the right time point for RNA isolation for the transcriptome sequencing different maturation states of growing Agrostemma githago L. plants were analysed for agrostin expression. For this purpose 7 development states of the plant were selected: Stadium a: 3 months after seeding, appearance of the sepals, stadium b: appearance of the petals, stadium c: appearance of the seed capsule, stadium d: Petals fully developed and colored, stadium e: Petals parched, growing seed and seed capsule, stadium f: Maturation of seed and seed capsule, seeds white-yellow, stadium g: Loss of sepals, seeds black and fully developed, seed capsule open.
The fresh plant material was snap-frozen in liquid nitrogen, grinded and defatted. The material was extracted by PBS (cOmplete Protease Inhibitor Cocktail; Roche, Mannheim, Germany) at a concentration of 100 mg/ml and analysed by western blot using the anti-agrostin antibody (1:7,500) as primary and a goat anti-rabbit antibody (IgG, H and L Chain Specific Peroxidase Conjugate, Merck, 1:2,000) as secondary antibody. Amersham Hybond ECL, (GE Healthcare Lifesciences), ECL (Enhanced chemoluminiscence)-reagent and an Optimax TR (M&S Laborgeräte, Heidelberg, Germany) were used for development. transcriptome sequencing (RnAseq). Total RNA was isolated from plants in stadium a (see above).
For this purpose, the frozen plant material was grinded in liquid nitrogen. RNA was extracted from 102 mg plant material using TriSure and Direct-zol RNA Miniprep kits (Zymo Research, Freiburg, Germany). Extracted RNA was stored at − 80 °C. The sample was analysed by agarose gel (1%) electrophoresis. Concentration was determined to 2.50 µg/µl (NanoDrop 1,000, Thermo Fischer Scientific). The RNAseq was performed using an Illumina MiSeq V3 (LGC Genomics GmbH, Berlin, Germany).
The raw results were demultiplexed with Illumina's data analysis software CASAVA and then cleaned of adapter sequences. Forward and reverse reads were combined using 37 BBMerge 34.48.
The resulting sequences were deconcatemerised and quality trimmed to include only reads with an average P. hred quality score of at least 30. Based on these 12,115,767 reads, a de novo assembly was performed using www.nature.com/scientificreports/ Newbler v 2.9 in cDNA mode, and putative ORF identification was done by Transdecoder. Trinotate was used to annotate the resulting transcontigs and predicted peptides to identify those sequences with a high similarity to known RIPs. Besides using Newbler, we performed another assembly with Mira 38 . This assembly was based on all quality trimmed reads with a sequence that could be translated into either of the peptide fragments obtained by MALDI-TOF MS and all other reads similar to these originally filtered reads.
Recombinant protein expression. The codon-optimized coding sequence was established by gene synthesis (General Biosystems, Inc., Morrisville, USA) and cloned into the expression vector pET11d (Merck, Darmstadt, Germany). The coding sequence contained an N-terminal 8 × His tag for metal affinity chromatography. The construct his Agrostin_pET11d was transformed into competent Escherichia coli Rosetta 2 (DE3) cells (Merck, Darmstadt, Germany). The bacterial culture was expanded to 3.2 l using LB medium containing 50 µg/ µl ampicillin and incubated until an optical density at 600 nm (OD 600 ) between 0.9 and 1.2 was reached. Protein expression was induced using 1 mM isopropyl β-D-1-thiogalactopyranoside (AppliChem, Darmstadt, Germany) for 3 h at 37 °C and 200 rpm. The expression was stopped by centrifugation for 10 min at 5,000 g and 4 °C. Subsequently, the bacterial pellets were resuspended in 20 ml PBS and stored at − 20 °C. The bacterial suspensions were thawed and lysed by sonication (Branson Sonifier 250, G. Heinemann, Schwäbisch Gmünd, Germany). The lysates were centrifuged at 15,800 g and 4 °C for 10 min and imidazole was added to the supernatant to a final concentration of 20 mM. his Agrostin was purified using Ni-nitrilotriacetic acid agarose affinity chromatography (Protino Ni-NTA agarose, Macherey-Nagel, Düren, Germany). The bound protein was eluted using increasing imidazole concentrations (20, 50, 75, 125 and 250 mM, 5 ml for each concentration) and analysed by SDS-PAGE [12% acrylamide (w/v) gel]. The protein was dialysed against 2 l PBS and protein concentration was determined using the bicinchoninic acid assay (Pierce BCA Protein Assay, Thermo Scientific, Waltham, MA, USA).
n-glycosidase assay. The N-glycosidase activity was determined using an adenine releasing assay with an artificial substrate. The assay is described in detail elsewhere 34 .
Briefly, the substrate consists of the DNA oligonucleotide 5′-A 30 -3′ (A30). Once the N-glycosidic bond is cleaved, released adenine is separated from the reaction mixture by Thin Layer Chromatography (TLC) on silica gel 60 glass plates. The glass plates are then scanned by a TLC-densitometer (TLC Scanner 4, CAMAG, Berlin, Germany) at 260 nm. The RIP-mediated release of adenine is determined by calculating the Area Under the Curve (AUC).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.