Introduction

Orb-weaving spiders manufacture up to seven different silks, e.g. dragline silk derived from major ampullate silk proteins (spidroins, MaSp) and flagelliform silk derived from flagelliform spidroins (FlSp). The various spidroins share a common architecture - a large core repetitive region capped by globular N- and C-terminal domains (NT and CT)1. The divergent and large aggregation-prone repetitive regions of the spidroins determine the mechanical properties of the respective spider silks, while the terminal domains regulate silk fiber formation2,3. Despite their high aggregation propensity the spidroins can be stored at extremely high concentrations (30–50% w/v) in the spider silk gland, solubilized by the NT domain1,4.

The NT dimerizes upon a drop in pH, which is crucial for silk fiber formation1,5. To ensure solubility also at low pH and widen the applicability of NT as a solubility enhancing fusion partner, a charged-reversed mutant has been designed (referred to as NT*MaSp)6. The previously reported NT*MaSp tag is derived from the NT domain of Euprosthenops australis MaSp1 and folds as a five-helix bundle6,7. NT*MaSp is a pH insensitive constitutive monomer, highly stable and extremely soluble, and has been successfully applied for efficient production and purification of, among others, lung surfactant protein analogs, cholecystokinin-58, human antimicrobial cathelicidin and a designed β-sheet protein6,8.

Aggregation-prone proteins and peptides are associated with several neurodegenerative disorders, e.g. Alzheimer’s disease (AD), the most prevalent form of dementia9,10. These proteins/peptides often exhibit high β-sheet propensity, which make them prone to aggregate and form insoluble amyloid fibrils11. These intrinsic properties of amyloid-forming proteins make high-yield biochemical production challenging, yet the availability of pure protein samples is crucial for studying protein self-assembly and its associated neurotoxicity in vitro and in vivo. This is probably one important reason behind the fact that, despite immense efforts, the exact mechanisms of Aβ self-assembly are still unknown9,10,11. Recent advances have however revealed new insights into the nucleation mechanism of Aβ in vitro12,13, structural details of the fibril morphology14 and biological mechanisms implicated in the AD etiopathology15,16. These experiments typically require access to very pure and homogeneous Aβ peptides as small impurities or preformed seeds have a great impact on the aggregation behavior17. In particular for structural studies of amyloid fibrils, but also for certain in vivo experiments, the availability of large quantities of isotope-labeled Aβ is essential.

Studies of Aβ aggregation in vitro have often been conducted with synthetically produced peptides18,19. Synthetic preparations have several drawbacks including batch-to-batch variations, intrinsic impurities and relatively high cost, especially for isotope labeling. As a consequence, several recombinant expression systems have been established. These production protocols either result in peptides with an initiating non-native methionine residue20,21 or are based on solubility tags that require proteolytic cleavage to obtain the native human Aβ sequence22,23,24. The main disadvantage of having methionine as the first residue is that it might affect processes such as posttranslational modifications, e.g. pyroGlu formation25,26,27, and metal ion binding since the metal ion-binding site is located in the N-terminus28,29,30. Here we describe a useful solubility tag for production of aggregation-prone proteins and peptides, and demonstrate that this tool enables very efficient production of native and isotope-labeled Aβ peptides.

Results and Discussion

Evolutionary relationships of NT and repetitive regions of different spidroins

A phylogenetic tree based on sequence alignment of 67 NTs found in GenBank (Supplementary Fig. S1) reveal evolutionary relationships between NT and their respective repetitive regions (Fig. 1A). The NT domains cluster according to silk type, as previously reported7. Hence, the NTs of different spidroin types, which are defined by the nature of their respective repetitive regions, have been conserved through evolution of different spider species. Structural characteristics of the repetitive regions appear to co-vary with the evolution of NTs, e.g. for the tubuliform (TuSp) and aciniform (AcSp) NTs, which are evolutionarily close, the corresponding repetitive regions stand out by forming globular folded domains31,32 (Fig. 1A). NT from FlSp is linked to a unique repetitive region that contains several embedded spacers (each 27 residues), which are predicted to form β-hairpins33 (Fig. 1A). NTFlSp exhibits distant evolutionary relationship (<35% sequence identity) to the previously reported NTMaSp6 (Fig. 1B) and MaSps contain repetitive regions with predicted α-helical and random coil structures34,35,36,37,38. We speculate that different NTs may have evolved to facilitate optimal solubility of their respective repeat region in the silk gland during storage conditions, where pH is neutral and NT monomeric4,5. Irrespective of any potential evolutionary co-variation between NTs and the repetitive regions, we aimed to explore whether NTFlSp could work in protein expression in an equivalent way to the previously investigated and distantly related NTMaSp6.

Figure 1
figure 1

(A) Evolutionary relationships of the NTs of different spidroins. The analysis involved 67 NT amino acid sequences (Supplementary Fig. 1), revealing that spidroins from different spider species cluster according to the silk type in the phylogenetic tree. The typical repetitive regions of the respective spidroins and their known structures or main secondary structure propensities are displayed. (B) Sequence alignment of NTFlSp and NTMaSp where strictly conserved residues are colored in blue. The residues marked in orange display the mutated sites in NT*. (C) Schematic representation and structure of the NT*-Aβ fusion protein where the arrows indicate the TEV protease cleavage site. The mutated D and K residues are marked by yellow colour in the NT structure (pdb 4FBS).

Design of the novel solubility tag NT*FlSp

To prevent dimerization of NTFlSp at low pH we introduced the D40K and K65D mutations6 in NTFlSp from Nephila clavipes (Nc) (Fig. 1B) (numbering as described previously6, wherefore the mutations correspond to positions 36 and 60 in Fig. 1B). NT*FlSp has a larger number of charged residues (25 vs. 11) and stronger net charge (−7 vs. −5) compared to NT*MaSp, which potentially enhance its solubility properties. In contrast to NT*MaSp, NT*FlSp has no tryptophan, whose absorbance at 280 nm would cover the intrinsic low absorbance at 280 nm of the target peptide Aβ. Thus, for size exclusion chromatography (SEC), where detection relies on the protein absorbance at 280 nm, NT*FlSp enables clearly separated intensity peaks.

Efficient expression and purification of Aβ monomers using NT*FlSp

We designed the fusion protein NT*FlSp-Aβ by fusing the genetic codes of the solubility tag NT*FlSp and Aβ with a TEV recognition site in-between (Fig. 1C). An overview of the expression and purification scheme is given in Fig. 2. The fusion protein was expressed in BL21(DE3) E.coli cells grown in rich or minimal medium, dissolved in 8 M urea after cell lysis and purified using immobilized metal ion affinity chromatography (IMAC). Urea was added as denaturant to increase binding to IMAC column. For optimal cleavage of the fusion protein by TEV, a buffer exchange was conducted, either by overnight dialysis or by column chromatography. TEV cleavage can alternatively be conducted during buffer dialysis to speed-up the purification, yet a short dialysis step to decrease the urea concentration below 2 M is recommended before the addition of TEV protease. Finally, the solution was applied to SEC with a Superdex 30 column, whereby monomeric Aβ monomers were isolated.

Figure 2
figure 2

(A) Schematic expression and purification protocol, including typical times for performance. Yields of NT*-Aβ42/40 are derived from 1 L expression cultures and extrapolated from purification from 100 and 500 mL, resulting in very similar values. (B) SDS-PAGE gel, with protein ladder (L), void (V), before (I) and after SEC yielding pure Aβ. An uncropped full-length gel is presented in Supplementary Fig. S2. (C) SEC diagram showing separation of TEV, NT* and monomeric Aβ.

The expression and purification protocol presented here results in highly pure Aβ40 and Aβ42 monomers within 2.5–4.5 days, depending on buffer exchange and cleavage method. The yields in rich and minimal medium are listed in Table 1. The example shown in Fig. 2 represents purifications from 100 and 500 mL culture medium, yielding very similar amounts of 37 ± 7 mg of pure Aβ42 monomers if extrapolated to one-liter culture. Notably, the present scheme gives by far the highest yields, both in rich and minimal medium, compared to other reported protocols (Table 2).

Table 1 Average yields of fusion proteins and monomeric Aβ peptides in rich (LB) and minimal (M9) medium in mg per liter culture.
Table 2 Yields of Aβ40 and Aβ42 variants reported in literature and herein.

The purified peptides were investigated using mass spectrometry, confirming the expected masses for Aβ, here shown for Aβ40 (Fig. 3A). Using 13C-15N-double-labeled Aβ40 and Aβ42 we performed nuclear magnetic resonance (NMR) experiments to confirm the purity and structural state of the purified peptides. We recorded 1H-15N-HSQC experiments (Fig. 3B and Supplementary Fig. S3) where the chemical shifts of the cross-peaks coincide with previous assignments reported in the literature, revealing a monomeric, predominantly unstructured conformation of the purified Aβ peptides39,40. To analyze the secondary structure of monomeric Aβ42 we applied circular dichroism (CD) spectroscopy. The initial CD spectra (Fig. 3C) indicated a predominantly unstructured conformation as previously reported20,21,41,42. Taken together, these experiments confirm that our method results in monomeric Aβ40 and Aβ42 peptides.

Figure 3
figure 3

(A) Mass spectrum of Aβ40 showing a pure peptide with an average mass of 4331 Da. The inset shows a zoom of the 5 + charged ion. (B) 1H-15N-HSQC spectrum of 15 μM Aβ42 recorded at 5 °C, revealing monomeric peptide. (C) Aggregation kinetics of 10 μM Aβ42 at 37 °C under continuous stirring recorded by CD spectroscopy. The spectra exhibit a structural transition from a predominantly unstructured state to a β-structure. The lower panel shows the time dependence of the CD extremes at 198 nm (squares) and 218 nm (circles), with the same color code as used for the CD spectra. (D) Aggregation kinetics of 50 μM 4FF-Aβ42 at 25 °C monitored by 19F-NMR spectra of the signals around −116 ppm, exhibiting attenuation of 4FF-signals. The color code represents the same time points in both panels.

Production of 4-fluoro-Phe-labeled Aβ peptides

The present approach also opens new opportunities for NMR studies that require more complex isotope labeling approaches associated with reduced protein yields. For example, we have used the NT*FlSp tag to produce monomeric Aβ42 incorporating 4-fluoro-Phe (4FF-Aβ42) in milligram yields. The expression was performed similarly as described above, but glyphosate and DL-tyrosine was supplemented to bacterial cultures at an OD600nm value of 0.6. Further, DL-4-fluorophenylalanine was added when the OD600nm value reached 0.8 and cell expression was induced. The 1H-15N-HSQC spectrum of 15N-labeled 4FF-Aβ42 revealed again a monomeric peptide (Supplementary Fig. S3).

Aggregation kinetics of native and isotope-labeled Aβ42

To ensure that the isolated peptides behave as expected, we investigated the aggregation kinetics starting from monomeric Aβ peptides. Recording CD signals under continuous stirring at 37 °C of 10 μM Aβ42, a structural conversion from an unstructured to a β-structured conformation was observed (Fig. 3C), where the isodichroic point at 208 nm indicates a two-state transition. Furthermore, we used 50 μM 4FF-Aβ42 for real-time aggregation 19F-NMR studies at 25 °C, revealing a decrease of 4FF-signals over time (Fig. 3D). The signal loss can be fitted to a sigmoidal decline, with an aggregation half time of 258 ± 5 min under the conditions used.

Alternatively, Aβ aggregation kinetics can be monitored using the fluorescence dye thioflavin T (ThT), for a detailed elucidation of the nucleation mechanism. Here, we conducted ThT experiments on Aβ42 in 20 mM sodium phosphate buffer, pH 8.0, at 37 °C under quiescent conditions at different initial Aβ42 monomer concentrations, [Aβ] (Fig. 4). The final fluorescence intensity exhibits a linear dependence on the initial monomer concentration (Fig. 4D), suggesting that the total amount of initially monomeric peptides forms ThT-active fibril material, as previously shown for Aβ12,13,42. The aggregation half times of Aβ42 used here exhibit a simple relation τ1/2 [Aβ]γ, with γ = −1.0 ± 0.1, corresponding to the slope in a double logarithmic plot (Fig. 4C). This value is in the same range as found for AβM42 with an initial methionine, where γ = −1.3 was reported12. For γ = −1.0 a multi-step secondary nucleation model describes better the observed aggregation traces compared to a single-step secondary nucleation model (Fig. 4A,B). The multi-step model additionally includes saturation of secondary nucleation and was previously shown to be applicable for the shorter Aβ40 and AβM40 variants13,29,42 and for AβM42 at pH 7.443, which all exhibit higher γ-values, but also describes well the kinetics of AβM42 at pH 8.044. Hence, these results confirm that the native and isotope-labeled peptides obtained herein are highly pure and in a monomeric state, which is essential for accurate and reproducible aggregation kinetics experiments.

Figure 4
figure 4

(A,B) Aggregation kinetics of Aβ42 at different concentrations from 1.0 (violet) to 9.0 μM (red) recorded by ThT fluorescence experiments fitted with a multi-step (A) and simple secondary nucleation model (B). The kinetic traces fit best to the multi-step secondary nucleation model, reflected by a lower χ2 value. (C) Aggregation half times, 𝜏1/2, plotted against the initial peptide concentration, [Aβ], exhibit a γ coefficient of γ = −1.0 ± 0.1. (D) The final ThT intensity of the normalized aggregation traces in (A) exhibits a linear relation to [Aβ].

Conclusions

Taken together, we have developed a biomimetic tool that provides facile, fast and inexpensive production of pure and monomeric Aβ40 and Aβ42 peptides. The high yield obtained also in minimal medium enables efficient generation of isotope-labeled Aβ peptides. Peptides produced by our protocol recapitulate the behavior of Aβ peptides obtained by other means, which indicate the applicability of using NT*FlSp for generating functional Aβ peptides. The NT*FlSp-tag holds great potential, also when compared to NT*MaSp6, for efficient production of medically relevant aggregation-prone peptides and proteins. This is important since the majority of new pharmaceuticals are biologics and facile protocols for efficient production of proteins that are difficult to produce are needed.

Methods

Expression and purification protocol

The synthetic gene coding for NT FlSp from Nephila clavipes with the D40K and K65D mutations (NT*FlSp) was ordered from GenScript (GenScript Biotech, Netherlands). The NT*FlSp gene was ligated into pT7 plasmid containing TEV recognintion site (TRS)-Aβ40/Aβ42 as described previously6. The plasmids were transformed into chemically competent E. coli BL21 (DE3) cells and expressed as described previously45. In short, 1 mL overnight culture was inoculated to 100 mL LB medium (1/100) or 100 mL M9 overnight culture was inoculated to 1 L M9 minimal medium (10/100) with 70 mg/l kanamycin. Cells were grown at 30 °C at 120 RPM to OD600nm around 0.8–0.9, where the temperature was lowered to 20 °C, and 0.1 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG) was added and the cells were incubated overnight. To isolate the cells from media, the bacterial culture was centrifuged at 5,000 × g for 20 minutes at 4 °C and the cell pellets resuspended in 40 mL 20 mM Tris-HCl pH 8.0, split in to two 50 mL falcon tubes and stored at −20 °C. The frozen cells were thawed and urea was added to a concentration of 8 M. The cells were sonicated to obtain a clear solution. The lysate was then loaded on 2 × 5 mL Ni-NTA column (GE Healthcare). Unbound proteins were washed away with 15 mM Imidazole in 20 mM Tris-HCl, pH 8 and 8 M urea. The fusion protein was eluted with 300 mM imidazole in 20 mM Tris-HCl, pH 8 and 8 M urea. The fractions containing the fusion protein were pooled and then dialyzed overnight against 20 mM Tris-HCl pH 8.0 at 4 °C. To remove the His6-NT*FlSp part, the fusion protein was cleaved with TEV protease (1:20–1:30, enzyme to substrate, w/w) at 4 °C overnight in 20 mM Tris-HCl pH 8, 0.5 mM EDTA and 1 mM DTT. After TEV cleavage, the sample was dissolved in 15 mL 7 M guanidine-HCl and separated on a Superdex 30 26/600PG size exclusion column (Fig. 2C). The correct size of Aβ, NT*FlSp and TEV was confirmed by SDS/PAGE in a 4–20% polyacrylamide gel, stained with Coomassie brilliant blue dye (Fig. 2B). For expression of 15N- and 13C- labelled NT* FlSp -Aβ, the same procedure was used except that M9 minimal medium containing 15NH4Cl (1 g/L M9) and 13C-glucose (4 g/L M9) was used. The plasmid pRK793 for TEV expression was obtained from addgene (addgene.org, deposited by David S. Waugh) and was expressed as described above and purified as described previously46.

Determination of yields

Both NT*FlSp-Aβ40 and NT*FlSp-Aβ42 was transformed into BL21 (DE3) E. coli cells and spread onto an agar plate with kanamycin. 5 starting cultures of LB and M9 were inoculated with individual colonies and incubated at 31 °C overnight. The expression was performed as described above in 100 mL LB and M9 media. 100 μL of each culture was taken before and after induction, lyophilized, dissolved in SDS loading buffer and boiled for 10 minutes at 96 °C. 1 μL of each induced sample and 1 μL uninduced sample from each condition was loaded on a 4–20% mini protean TGX gel (BioRad) and blotted on a PVDF membrane (GE healthcare). 5% w/v non-fat dry milk/PBS was used to block the membrane after blotting for 1 h, followed by incubation with 6E10 primary antibody in 5% w/v non-fat dry milk, 0.1% Tween/PBS overnight at 4 °C. The membranes were washed three times with 0.1% Tween/PBS, and ECL anti-mouse secondary antibodies in 5% w/v non-fat dry milk and 0.1% Tween/PBS were added for 1 h at room temperature. Enhanced chemiluminescence detection reagent (GE Healthcare) was added and images were acquired using an AI600 imaging system (GE healthcare). The concentration of each sample was calculated by integration of the peaks from IMAC (fusion protein) and SEC (monomeric Aβ) with an extinction coefficient ε280 = 2,980 M−1cm−1 for the fusion protein and 1,424 M−1 cm−1 for Aβ. Western blot intensities were analyzed by ImageJ software47 and average and standard deviation from 5 replicates was calculated using yields from one full purification of each condition. Values are listed in Table 1.

Expression protocol of 4FF-Aβ42

The plasmid pT7His6NT*FlSp-TEV recognition site -Aβ42 was transformed into chemically competent E. coli BL21(DE3) cells. Colonies were inoculated to 10 mL LB medium with 70 mg/L kanamycin and grown at 30 °C and 200 r.p.m. to OD600nm > 1.0. 0.5 mL day culture was inoculated to 25 mL M9 medium with 70 mg/l kanamycin and grown at 30 °C and 200 r.p.m. overnight. 10 mL overnight culture was inoculated to 1 L M9 medium and cells were further grown at 30 °C. Uniform labeling with 4-fluorophenylalanine was achieved by the introduction of 1 g/L glyphosate and 75 mg/L DL-tyrosine to shaking bacterial cultures at 30 °C which had reached an OD600nm of 0.6. Once cells achieve an OD600nm of 0.8, 30 mg/L DL-4-fluorophenylalanine was added. The incubation temperature was lowered to 20 °C and expression was induced with the addition of IPTG to 0.1 mM, the cells were incubated overnight and were harvested by 7,000 × g centrifugation at 4 °C.

Evolutionary relationships of the NT domains of different spidroins

The evolutionary history of the NT domains from different spidroins was inferred by the Neighbor-Joining method with the Poisson correction. Evolutionary analyses were conducted in MEGA748. The analysis involved 67 amino acid sequences. In the spider silk gland (liquid protein), the repetitive region of MaSp, consisting of GGX, polyA, GX and GPGQQ, is disordered and partially helical34,35,36,37,38, and MiSp and FlSp share identical motifs33. However, there are 127-aa spacer in MiSp, which adopt α-helical conformation, whereas the 27-aa spacer in FlSp is predicted to fold to β-hairpin33. The large repetitive domains of AcSp and TuSp adopt α-helical conformation31,32, and the repetitive domains of PySp is also predicted to adopt α-helical conformation by PSIPRED v3.3 (http://bioinf.cs.ucl.ac.uk/psipred/).

Mass spectrometry

Purified Aβ40 was diluted 1:10 in H2O/acetonitrile/formic acid (70:30:0.2) and directly infused into a Waters LCT Time of flight mass spectrometer (MS Vision, NL) equipped with an offline nanospray source using borosilicate capillaries (Thermo Scientific). The capillary voltage was 1.5 kV and the cone voltage was 200 V. Spectra were acquired between m/z 500 and 4000 and the mass scale was calibrated with Cesium Iodide. Data were analyzed using MassLynx 4.1 (Waters).

Nuclear magnetic resonance (NMR)

1H-15N HSQC spectra were recorded on a 500 MHz or 700 MHz Bruker Avance spectrometer equipped with cryogenic probes. The HSQC spectrum of Aβ40 was recorded at 500 MHz at 8 °C using 75 μM peptide concentration in 16 mM sodium phosphate buffer, pH 7.4, with 0.02% NaN3 and 0.2 mM EDTA. For Aβ42 the peptide concentration was 15 μM in 20 mM sodium phosphate buffer, pH 6.8, recorded at 5 °C and 700 MHz. The spectra were recorded using 2048 × 128 complex points and 32 scans per transient. For Aβ42 we recorded the HSQC at 15 μM directly after the SEC purification, ensuring the monomeric state of the peptide.

19F-NMR experiments were recorded using 50 μM 4FF-Aβ42 in 20 mM sodium phosphate buffer, pH 7.4 with 0.03% NaN3 and 1 mM EDTA at 25 °C and 565 MHz. 19F spectrum was acquired with  512 transients and 1.0 s pulse delay between each transient. Line broadening of 1.0 Hz was used to process the final spectrum. The 1H-15N HSQC spectrum of 15 μM 4FF-Aβ42 in 20 mM sodium phosphate buffer, pH 7.4, with 0.02% NaN3 and 0.2 mM EDTA, was recorded at 4 °C on a 600 MHz Bruker Avance Neo spectrometer equipped with a cryogenic probe.

Circular dichroism (CD)

CD measurements of 10 μM Aβ42 in 20 mM sodium phosphate buffer, pH 8.0, at 37 °C were performed in a quartzglass Suprasil 10 × 4 mm CD cuvette (Hellma Analytics) where the optical path length was 4 mm, using a Chirascan CD spectrometer (Applied Photophysics). A resolution of 1.0 nm and a bandwidth of 1 nm were chosen for the aggregation kinetics experiments42. During the enire measurement the sample was continuously stirred at around 1200 r.p.m and each 3 min a new CD spectrum was recorded to follow the aggregation kinetics.

Thioflavin T (ThT) fluorescence kinetics experiments

For ThT aggregation kinetics experiments 1 to 9 μM monomeric Aβ42 was used, which was obtained after SEC purification45. ThT fluorescence was measured as described previously45 using 96-well microplates, where each well contained 80 μl sample solution with 10 μM ThT.

Analysis ThT aggregation kinetics

Aggregation traces were first analyzed using a fit to a sigmoidal function, revealing the aggregation half time, τ1/229,42,45. Subsequently, the aggregation traces were normalized and averaged over six replicates. The averaged aggregation half times are related to the initial monomer concentration, [Aβ], by τ1/2 [Aβ]γ where γ reflects the slope in a double-logarithmic plot (Fig. 4C). Further, we applied a nucleation model including primary and secondary nucleation in addition to fibril-end elongation12,49,50. In order to account for saturation of secondary nucleation an equilibrium constant (Michaelis constant) KM can be introduced, referring to a multi-step secondary nucleation model13. The kinetic equations for the time dependence of the fibril mass fractions for the two models can be found in refs. 12,13,29,45. The models were applied to describe the kinetic traces using a global fit analysis12,13. The kinetic fitting parameter are listed in Supplementary Table S1.