Ginsentides: Cysteine and Glycine-rich Peptides from the Ginseng Family with Unusual Disulfide Connectivity

Ginseng, a popular and valuable traditional medicine, has been used for centuries to maintain health and treat disease. Here we report the discovery and characterization of ginsentides, a novel family of cysteine and glycine-rich peptides derived from the three most widely-used ginseng species: Panax ginseng, Panax quinquefolius, and Panax notoginseng. Using proteomic and transcriptomic methods, we identified 14 ginsentides, TP1-TP14 which consist of 31–33 amino acids and whose expression profiles are species- and tissues-dependent. Ginsentides have an eight-cysteine motif typical of the eight-cysteine-hevein-like peptides (8C-HLP) commonly found in medicinal herbs, but lack a chitin-binding domain. Transcriptomic analysis showed that the three-domain biosynthetic precursors of ginsentides differ from known 8C-HLP precursors in architecture and the absence of a C-terminal protein-cargo domain. A database search revealed an additional 50 ginsentide-like precursors from both gymnosperms and angiosperms. Disulfide mapping and structure determination of the ginsentide TP1 revealed a novel disulfide connectivity that differs from the 8C-HLPs. The structure of ginsentide TP1 is highly compact, with the N- and C-termini topologically fixed by disulfide bonds to form a pseudocyclic structure that confers resistance to heat, proteolysis, and acid and serum-mediated degradation. Together, our results expand the chemical space of natural products found in ginseng and highlight the occurrence, distribution, disulfide connectivity, and precursor architectures of cysteine- and glycine-rich ginsentides as a class of novel non-chitin-binding, non-cargo-carrying 8C-HLPs.

shows the tissue distribution of ginsentides in Panax ginseng roots, seeds, leaves and flowers. Figure 3 shows the mass spectra of Panax notoginseng and Panax quinquefolius flower. The peak at m/z value 3054 was designated as ginsentide TP1, which was isolated by RP-HPLC and subjected to S-reduction and S-alkylation using dithiothrietol (DTT) and N-ethylmaleimide (NEM) to determine the disulfide content. All other peaks representing the putative ginsentides in the mass region 3 to 3.5 kDa showed a mass shift of 1008 Da after S-reduction and S-alkylation, indicating the presence of eight cysteine residues (Supplementary Data S1).
Primary sequence and biosynthesis of ginsentides. MS/MS sequencing of the 3054 Da ginsentide TP1, which was found in both Panax ginseng and Panax notoginseng, serves as a representative example of the TP peptides (Fig. 4). Enzymatic digestion of S-reduced TP1 by chymotrypsin or trypsin produced one major fragment having m/z values of 2459 and 2831, respectively (Fig. 4). Using the b-ions and y-ions generated from MALDI-TOF MS/MS, these fragments showed that the sequence of the 2459-fragment was CKSGGAWCGFDPHGCCGNCGCLVGF and the 2831-fragment was SGGAWCGFDPHGCCGNCGCLVGFCYGTGC. Combining these two overlapping fragments yielded the full sequence of the 3054-Da ginsentide TP1. De novo peptide sequencing was also performed to determine the primary sequence of the 3084-Da ginsentide TP2 (Supplementary Data S2). The Basic Local Alignment Search Tool (BLAST) of the NCBI database and transcriptomic analyses revealed that ginsentides are the mature products of ginseng-specific abundant proteins (GSAPs). Our results revealed 14 putative ginsentide-encoding gene sequences (TP1-TP14) from the Panax family of Panax ginseng, Panax quinquefolius and Panax notoginseng ( Fig. 5 and Table 1). Sequence analysis also showed that the eight cysteine residues in the C-terminal regions of ginsentide-encoding genes are conserved. All ginsentides have between 31 and 33 amino acids that include eight cysteine residues arranged in a cysteine motif of CX n CX n CCX n CXCX n CX n C with the tandemly connecting CC motif highlighted in bold. In addition, ginsentides are exceptionally glycine-rich; TP1 has nine glycine residues. Sequence comparison showed that 66% of amino acid residues in TP1 are conserved among the TP family, and sequence conservation is highest for cysteine and glycine.   Transcriptomic analysis further showed that ginsentides (TP1-TP14) are synthesized as precursors with three domains: N-terminal signal peptide, pro-domain and C-terminal mature ginsentides (Fig. 5).
Secondary structure and disulfide connectivity of ginsentide TP1. We next used a chemical mapping method involving sequential S-tagging to determine disulfide connectivity of ginsentides [22][23][24][25][26] . Stepwise determination of ginsentide TP1 to determine disulfide connectivity showed an initial partial S-reduction with tris(2-carboxyethyl)phosphine followed by S-alkylation with excess NEM (Fig. 6). Three NEM-labeled intermediates with one (1SS), two (2SS), or three (3SS) intact disulfide bonds were then collected. These intermediate species were subsequently fully S-reduced and S-tagged with a second alkylation reagent, iodoacetamide (IAM). Mixed S-labeled peptides were digested with trypsin and sequenced by MS/MS (Supplementary Data S3). Combining the information from the 1SS-and 3SS-intermediates, we deduced the ginsentide TP1 disulfide connectivity as Cys I-IV, Cys II-VI, Cys III-VII and Cys V-VIII.
Tertiary structure of ginsentide TP1. The three-dimensional (3-D) structure of ginsentide TP1 was determined using the distance, dihedral angle and hydrogen bond restraints derived from 1 H NMR analysis ( Table 2). The average RMSD for secondary structural regions were 0.35 ± 0.05 Å and 0.68 ± 0.07 Å for all  backbone and heavy atoms, respectively. Ginsentide TP1 (PDB code: 2ML7) adopts a β-sheet structure with two antiparallel β-strands consisting of residues Gly20-Leu22 and Phe25-Tyr27, and eight β-turns, as well as a β-hairpin that includes Gly20 to Tyr27 (Fig. 7A). The solution structures of ginsentide TP1 showed that it adopts unusual disulfide connectivity wherein the three disulfide bonds Cys I-IV, II-VI and III-VII adopt a cystine-knot fold similar to knottin family peptides such as the cystine-knot α-amylase inhibitors ( Fig. 7B) 12,13,15,17,20 . The additional disulfide bond at Cys V-VIII is a penetrating disulfide bond that is unique to ginsentides. The overall structure is tightly folded with approximately 90% and 30% of the amide proton signals remaining in the 1 D spectra after H/D exchange in D 2 O for 2 h and 18 h, respectively (Fig. 7C). NMR analysis showed that ginsentides possess a pseudocyclic structure in which both N-and C-terminal Cys residues participate in the disulfide linkages. This arrangement, together with a cystine-knot, forms the ginsentide sulfur core. A search for conserved structures using the ginsentide TP1 coordinates in the Dali Server 27 yielded 11 similar structures with Z-scores ranging from 2.0 to 2.6. All 11 structures belong to ion (sodium/potassium/calcium) channel blockers from spider toxins, such as hainantoxin-IV (PDB code: 1niy and 1ryv) 28 , HS1A (PDB code: 2mt7), U1-TRTX-SP1A (PDB code: 2LL1) 29 , jingzhaotoxin-XI (PDB code: 2a2v) 30 , μ-TRTX-Tp1a (PDB code: 2mxm) 31 , HD1A (PDB code:2mpq) 32 , psalmotoxin-1 (PDB code: 2kni) 33 , psalmotoxin 1 (PDB code:1lmm) 34 , SGTX1 (PDB code:1la4) 35 , and VSTx1 (PDB code:2n1n) 36 (Fig. 8A). The primary sequence similarities of ginsentide TP1 and the 17 spider toxins are limited to the six cysteine residues involved in the disulfide bonds: Cys I-IV, Cys II-VI, and Cys III-VII (Fig. 8B). These three disulfide bonds form a scaffold that is similar to the common cystine-knot disulfide connectivity 12,13,15 . Ginsentide TP1 is unique in the presence of an additional disulfide bond that links the C-terminal Cys VIII to Cys V in the middle of the peptide sequence. Analyses of the peptide surface properties revealed the presence of positively charged residues distributed around the hydrophobic patches on the structural surface of spider toxins that are essential for their ion-channel blocking properties 28,33,[37][38][39] . This positively-charged surface property, however, is absent in ginsentide TP1, where more than half of its sequence are Cys and Gly residues.

Stability of ginsentide TP1 against heat, proteolytic, acid and serum-mediated degradation.
To examine the stability of the unique pseudocyclic cystine-knot motif of ginsentides, heat, proteolytic, acid and serum stability assays were performed on ginsentide TP1 (Fig. 9). The percentages of remaining ginsentides were quantified based on their relative peak areas in RP-HPLC profiles before and after treatment. Ginsentide TP1 was relatively stable to heat with less than 10% degradation after heating at 100 °C for 30 min and 29% after 120 min. Ginsentide TP1 displayed high stability against enzymatic degradation, including that by trypsin, chymotrypsin, and pepsin, with >80% of peptides remaining intact after 3 h incubation. Similarly, in an acid stability assay, ginsentide TP1 was highly stable in 0.2 N HCl. Ginsentide TP1 was also stable in human serum with <10% degradation over a 48 h incubation period at 37 °C.
Cytotoxicity, hemolyticity and immunogenicity assessment of ginsentides TP1. To examine the toxicity of ginsentides, we incubated ginsentide TP1 with Huh7 cells or red blood cells and found no change in cell viability or hemolysis at concentrations up to 100 µM. Ginsentide TP1 was non-immunogenic to THP-1 cells and induced no observable increase in IL-6, IL-8, IL-10 and TNF-α secretion (Fig. 10).

NOE constraints 551
Intra-residue (|i-j| = 0) 32 Sequential (|i-j| = 1) 220 Long-range (|i-j|≥ 5) 226 Dihedral angle restraints 13 Transcriptomic database search for ginsentide-like 8C-HLPs. To explore the occurrence and distribution of ginsentide-like 8C-HLPs in other plant species, we performed a TBLASTN and BLASTP search of the NCBI and Onekp databases using the ginsentide TP1 precursor sequence. Based on our database search, we identified 50 other three-domain ginsentide-like precursor sequences containing four disulfide bonds and a cysteine motif of CX n CX n CCX n CXCX n CX n C from 31 plant species in 19 families (Fig. 11).

Discussion
In this study, we report the identification, isolation and characterization of 14 novel ginseng-derived cysteine-rich peptides, ginsentides TP1-TP14, from Panax ginseng, Panax quinquefolius, and Panax notoginseng. To the best of our knowledge, this is the first report on the discovery and characterization of ginseng-derived CRPs. Using transcriptomic and proteomic approaches, we collectively identified 14 ginsentides (TP1-TP14). Ginsentides are 3 to 3.5 kDa peptides with 31-33 amino acids that are rich in Cys and Gly residues. With cysteine occurring at approximately one in every four amino acids, ginsentides are highly disulfide constrained and structurally compact. All ginsentides possess a CX 6 CX 6-7 CCX 2-4 CXCX 4-6 CX 1-4 C cysteine motif that is similar to 8C-HLPs. However, the 8C-cysteine motif of ginsentides differs from other 8C-HLPs in that it contains both a CC and a CXC motif. This cysteine motif results in a fold that contains one loop with a single amino acid and five loops of >2 amino acids. Additionally, all 14 ginsentides had high sequence similarity with conservation of cysteine and glycine residues. In particular, the intercysteine loop 4 is absolutely conserved in terms of loop size and presence of a Gly residue. In contrast, loop 5 and loop 6 showed a greater variability in size, particularly for ginsentides TP6, TP12, TP13 and TP14.
Interestingly, although ginsentide sequences have high sequence similarity (>66%), the occurrence and distribution patterns of ginsentides are species-dependent. At the mRNA level, ginsentide TP4 and TP7 are unique to Panax ginseng, whereas only Panax quinquefolius expresses ginsentides TP8, TP9, TP10, TP11, TP12 and TP14. Ginsentide TP13 is unique to Panax notoginseng and ginsentides TP2, TP3, TP5 and TP6 are common to both Panax ginseng and Panax quinquefolius. TP1 is produced by both Panax ginseng and Panax notoginseng. Mass spectrometry profile analyses revealed that ginsentide expression is also tissue-dependent. Aqueous extracts of roots and flowers from Panax ginseng displayed similar ginsentide expression patterns, with TP1 and TP2 as the dominant ginsentides. Panax quinquefolius and Panax notoginseng also had similar expression profiles in aqueous extracts of roots and flowers. In Panax ginseng, we saw a distinct ginsentide tissue expression pattern wherein the dominant ginsentide in aqueous extracts of seeds and leaves was TP3 and TP4, respectively. Collectively, these results suggested that ginsentide expression profiles could be used as biologic markers for identifying species and tissues of ginseng. The 8C-HLPs belong to a family of CRPs that has an evolutionarily conserved CX n CX n CCX n CX n CX n CX n C cysteine motif. The tandemly-connecting CC motif at Cys III and Cys IV found in both 6C-HLPs and 8C-HLPs produce the cystine-knot disulfide connectivity of Cys I-IV, Cys II-V and Cys III-VI. For 8C-HLPs, the cysteine knot is followed by the small intercysteine loop Cys VII-VIII. The 8C-HLPs can be further divided into two subfamilies based on the presence or absence of a chitin-binding domain. Chitin-binding 8C-HLPs have a highly conserved SXΦXΦ domain (Φ, aromatic residues; X, any amino acid) in intercysteine loop 3 and a conserved aromatic residue at loop 4, which are essential for chitin-binding activity 18,19,21 . Because ginsentides lack the chitin-binding domain, we have classified them into a new subfamily described as non-chitin-binding 8C-HLPs. Transcriptome database searches of NCBI and Onekp revealed that 31 other plant species from 19 families in both gymnosperms and angiosperms express 50 other three-domain ginsentide-like 8C-HLP precursor sequences having the cysteine motif of CX n CX n CCX n CXCX n CX n C. Ginsentide-like peptides are found in some of our most important crops, including coffee (Coffea canephora), cacao (Theobroma cacao), cotton (Gossypium raimondii), rice (Oryza sativa) and wheat (Triticum aestivum).
Although ginsentides display a cysteine spacing pattern typical of 8C-HLPs with a tandemly connecting CC motif, ginsentides display a novel disulfide connectivity not found in 8C-HLPs. Using the stepwise S-reduction and S-alkylation method reported by Gray et al. 22 , we unequivocally determined the connectivity of ginsentide TP1 as Cys I-IV, II-VI, III-VII and V-VIII. The disulfide bonds Cys I-IV, II-VI, and III-VII formed a cystine-knot that is similar to that of 6C-HLPs, whereas the fourth penetrating disulfide bond Cys V-VIII is unique to ginsentides. By comparing differences in cysteine spacing patterns and disulfide connectivities between ginsentides and chitin-binding 8C-HLPs, we found that ginsentides and ginsentide-like sequences have a conserved, and highly shortened one-amino-acid intercysteine loop 4, whereas the chitin-binding 8C-HLPs have the SXΦXΦ motif at loop 3 and a six-amino-acid loop 4 with a conserved aromatic residue that is essential for chitin binding. Due to the absence of the chitin binding domain, ginsentides are non-chitin binding (Supplementary data S4). The unique disulfide connectivity of ginsentides confers high stability against heat, proteolytic, acidic and human serum-mediated degradation. Chemical disulfide mapping and NMR analysis showed that three of four disulfide bonds of ginsentide TP1, Cys II-VI, III-VII, and V-VIII, are buried in the core of the structure. Consequently, the side chains of the other residues are all solvent-exposed, resulting in hydrophobic patches on the structural surface of the peptide. Thus, ginsentide TP1 displays an overall amphipathic distribution of the hydrophobic and hydrophilic side chains. The first residue in Cys I forms a disulfide bond with Cys IV, and the last residue in Cys VIII connects with Cys V. In this way, both the N-and C-termini of ginsentide TP1 are topologically fixed in the tertiary structure through disulfide bonds, which confers a pseudocyclic topology. This feature combined with a tightly folded structure fortified by four disulfide bonds and intramolecular hydrogen bonds, contribute to the high stability of ginsentides.
The biosynthetic precursors of ginsentides are also known as the mature product of ginseng-specific abundant proteins (GSAPs), which were previously identified in random gene screening of Panax ginseng and Panax quinquefolius genomes 40 . The mRNA transcripts of GSAPs were reported to be highly expressed in rhizomes, ranking third among 17,605 ESTs in the ginseng cDNA library. Biosynthesis of mature ginsentides from precursors is similar to that for other 8C-HLPs, which are generally synthesized as a three-domain precursor consisting of an N-terminal signal peptide, a mature peptide, and a C-terminal tail or a C-terminal protein-cargo. In this study, transcriptomic analysis showed that ginsentides are also biosynthesized as a three-domain precursor but have a different arrangement. The precursor architecture of ginsentides and ginsentide-like sequences consists of an N-terminal signal peptide, a pro-domain and a C-terminal mature peptide that differs from the protein-cargo family of chitin-binding 8C-HLPs. The processing of precursor proteins to mature ginsentides probably requires at least two proteolytic events. The first event is likely catalyzed by a signal peptidase that cleaves the ER signal In conclusion, here we identified 14 novel ginsentides from Panax ginseng, Panax quinquefolius, and Panax notoginseng of the Panax family that have an unusual disulfide connectivity and represent a new precursor architecture that distinctly differs from all known 8C-HLPs. The novel and highly compact structure of ginsentides confers their resistance to heat, acid, and digestive enzymes. Ginsentides possess certain features of small chemical metabolites but have large footprints, which could be of interest for drug development. This study greatly expands the occurrence, disulfide connectivity, and precursor architectures of non-chitin binding 8C-HLPs.

Materials and Methods
Materials. All chemicals and solvents, unless otherwise stated, were purchased from Sigma Aldrich, US and Fisher Scientific, US.
Isolation and purification of ginsentides. Dried roots, seeds, and flowers from Panax ginseng, P. quinquefolius, or P. notoginseng (Yue Hwa Chinese Products Emporium Ltd., Singapore) were pulverized and 100 mg were extracted with 0.5 mL 50% ethanol to screen for CRPs with molecular masses of 2-6 kDa by mass spectrometry using an Applied Biosystems 4800 MALDI TOF/TOF Analyzer. To obtain sufficient ginsentides for characterization studies, ~2 kg of dried material were extracted with 10 L water. The extracts were filtered and subjected to flash chromatography using C18 powder (Grace Davison). The ginsentide-enriched fractions were subsequently eluted with 60% ethanol and concentrated using a rotary evaporator. The concentrated fractions were then purified by preparative RP-HPLC using a C18 Grace Vydac column (250 × 22 mm) at a flow rate of 8 mL/min on a Shimadzu system. A linear gradient of 1%/min of 10-80% buffer B was applied. Buffer A contained 0.05% (v/v) trifluoroacetic acid (TFA) in HPLC grade water, and buffer B contained 0.05% (v/v) TFA and 99.5% (v/v) acetonitrile (ACN). To obtain isolated ginsentides, the resulting fractions were further purified by a semi-preparative C18 Vydac column (250 × 10 mm), using the same gradient, at a flow rate of 3 mL/min. Sequence determination. 20 µg of isolated and purified ginsentides were dissolved in 50 µL 100 mM ammonium bicarbonate buffer (pH 7.8) containing 50% ethanol. S-reduction was performed with addition of 20 mM dithiothreitol (DTT) and incubated for 2 h at 37 °C. S-reduced ginsentides were S-alkylated with N-ethylmaleimide (NEM) followed by enzymatic digestion with trypsin or chymotrypsin at 37 °C. Peptide   Chitin binding assay. Ginsentide TP1 was incubated with chitin beads (New England Biolabs, Ipswich, MA US) in chitin binding buffer (10 mM phosphate; pH 7.4) at room temperature for 1 h. At each time point up to 1 h, the beads were centrifuged at 12,000 g for 1 min and the absorbance of the supernatant was read at 214 nm to assess binding. Samples were further analyzed by MALDI-TOF MS.
Stability assays. Heat Stability. 10 μg ginsentide TP1 was dissolved in 100 μL distilled water and incubated at 100 °C for 30, 60, 90, and 120 min. As a control, a replica was performed with incubation at room temperature. The RP-HPLC profiles of the heated and control samples were compared to evaluate their stability.
Enzymatic Stability. 10 μg ginsentide TP1 was dissolved in 100 μL 100 mM ammonium bicarbonate buffer (pH 7.8) with 1 μL 0.5 μg/μL trypsin or chymotrypsin, incubated at 37 °C for 3 h. Stability assays against pepsin was performed with ginsentide TP1 dissolved in 100 mM sodium citrate buffer (pH 2.5). A replica without enzymes served as the control. The RP-HPLC profiles of the treated and control samples were compared to evaluate their stability.
Acid Stability. 10 μg ginsentide TP1 was dissolved in 100 μL 0.2 M HCl and incubated at 37 °C for 2 h. A control replica was performed without the addition of acid. The RP-HPLC profiles of the treated and control samples were compared to evaluate their stability.
Human serum-mediated stability. 0.1 mM ginsentide TP1 was incubated in 25% human serum in Dulbecco's Modified Eagle Medium (DMEM) (GE Healthcare Life Sciences, UK) containing 1 mM sodium pyruvate, 4 mM L-glutamine, without phenol red at 37 °C for 48 h. Synthetic peptide DALK (sequence: KRPPGFSPL) was used as a positive control. After incubation, precipitation was performed with an addition of 100% ethanol and centrifuged at 18,000 g for 15 min, 4 °C. Supernatants were collected in a fresh tube and monitored using analytical RP-HPLC (Shimadzu Shim-pack XR-C8 column, 3.0 × 50 mm, 2.2 µm, flow rate 0.3 mL/min, Japan), with a 30 min linear gradient of 0-50% buffer B (0.05% TFA (v/v) in 99.5% ACN). Individual peaks were collected and identified by MALDI-TOF MS.
Cell culture. Huh7 (human liver carcinoma cells) and human-derived endothelial cells (HUVEC-CS) were kindly provided by Professor Kathy Qian Luo (Nanyang Technological University, Singapore). THP-1 cells were cultured in DMEM or RPMI medium (Thermo Scientific HyClone) supplemented with 10% fetal bovine serum, 100 U/mL of penicillin and streptomycin and grown in a 5% CO 2 humidified incubator at 37 °C.