Production and Purification of Filovirus Glycoproteins in Insect and Mammalian Cell Lines

Filoviruses are highly virulent pathogens capable of causing severe disease. The glycoproteins of filoviruses are the only virally expressed proteins on the virion surface and are required for receptor binding. As such, they are the main candidate vaccine antigen. Despite their virulence, most filoviruses are not comprehensively characterized, and relatively few commercially produced reagents are available for their study. Here, we describe two methods for production and purification of filovirus glycoproteins in insect and mammalian cell lines. Considerations of expression vector choice, modifications to sequence, troubleshooting of purification method, and glycosylation differences are all important for successful expression of filovirus glycoproteins in cell lines. Given the scarcity of commercially available filovirus glycoproteins, we hope our experiences with possible difficulties in purification of the proteins will facilitate other researchers to produce and purify filovirus glycoproteins rapidly.

(variant Yambuku, isolate Mayinga) GP 1,2 has been used as part of commercially available ELISAs for quantitation of antibody responses 17 , and a soluble, modified EBOV Yambuku-Mayinga GP 1,2 ectodomain has been used to determine the crystal structure of GP 1,2 bound to NPC1 18 . The ectodomain of both EBOV 8 and MARV 19 with mucin-like domain deletions have been produced previously for crystallization studies. Additionally, efforts such as vaccine development use different organismal cell types as platforms to produce filovirus GP 1,2 , including mammalian cells [20][21][22] and insect cells 23 . Specifically, authors have successfully used the Sf9-baculovirus system to produce full-length Ebola glycoproteins for use in VLPs 23 and nanoparticle vaccines 24,25 . Other groups have also successfully used poly-histidine (6xHis) tags to purify full-length EBOV glycoproteins 26 . Some insect-derived filovirus (predominantly EBOV) GP 1,2 s are commercially available, but most mammalian-derived filovirus GP 1,2 s are not.
To close gaps in filovirus GP 1,2 availability, we report on two systems for production and purification of filovirus GP 1,2 s in insect (Sf9) and mammalian (human) cell lines, respectively. Using these systems, we have successfully expressed EBOV, BDBV, TAFV, SUDV, MARV, and LLOV GP 1,2 s and developed techniques for rapid production of soluble variants thereof. We recently used these techniques to successfully produce ebolavirus GP 1,2 s for glycosylation analysis of their glycans 27 . Our expression systems may be broadly applicable for production and affinity purification of other soluble proteins from insect and mammalian cells.
We also demonstrate here that the ebolavirus GP 1,2 proteins obtained using the two systems have important differences in glycosylation. These differences encompass the number of glycans, the type of glycan species, and the distribution of glycans at specific sites. We consider that these data to have implications for downstream usages of the produced proteins, such as binding assays, where glycosylation of the proteins may impact function.

Results
Modification of filovirus GP 1,2 sequences. We made changes to the sequence of the filovirus glycoproteins of interest to aid in expression and purification, including: mutating the furin cleavage site for purification of GP 1,2 complexes; truncating the GP 2 s upstream of the transmembrane domain to produce soluble GP 1,2 complexes; mutating the editing site of the GP genes to ensure exclusive expression of GP 1,2 complexes; and adding 6xHis tags to the C-termini of GP 2 s for purification GP 1,2 complexes (Fig. 2). pre-sGP is proteolytically cleaved by furin into mature and homodimerized secreted glycoprotein (sGP) and secreted Δ-peptide. EBOV RNAdependent RNA polymerase (L) stuttering at a 7U-editing site within the GP gene infrequently results in the addition or subtraction of cognate A residues into nascent mRNAs, thereby disrupting the sGP open reading frame (ORF) and joining the sGP ORF upstream of the editing site with overlapping ORFs downstream. mRNAs with an 8A editing site result in the expression of preGP. preGP is proteolytically cleaved by furin into subunits GP 1 and GP 2 , which remain connected through a disulfide bond in the form of a heterodimer (GP 1,2 ). mRNAs with a 6A or 9A editing site result in expression of pre-ssGP, which is proteolytically matured into homodimeric secondary secreted glycoprotein (ssGP). The GP expression strategies of other ebolaviruses and of cuevaviruses follow the same pattern as that of EBOV. Marburgvirus GP genes, on the other hand, only contain a single ORF encoding GP 1,2 . Orange-colored Y's signify glycosylations.
SCIeNtIFIC REPORTS | 7: 15091 | DOI:10.1038/s41598-017-15416-3 Generation of baculoviruses containing modified filovirus GP 1,2 s. GP genes were synthesized with the modifications discussed above and cloned into pFastBac1 plasmids to generate bacmids for transfection. After transfection of bacmids into the Sf9 cells, expressed GP 1,2 s produced were verified by western blot and plaque purified to ensure that clonal baculovirus was used for infections.
Purification of Sf9-produced GP 1,2 s using nickel columns. As the GP 1,2 s produced by the baculovirusinfected Sf9 cells do not contain a transmembrane region, all GP 1,2 s were released into Sf9 culture media. Total media were collected at day 3 post-transfection, replaced, collected again at day 4, and pooled together for purification. For each purification, between 2-4 T150 flasks of Sf9 cells were used for overall final yields ranging from 400 µg to 800 µg of GP 1,2 .
Purified proteins were verified by western blot using antibodies against either the specific GP 1,2 , or against the His-tag in the case of LLOV GP 1,2 . Purity was confirmed by periodic acid-Schiff (PAS) staining (preservation of glycosylation patterns) and by Colloidal Blue staining for detection of overall protein. Using this method of production, high levels of GP 1,2 purity were obtained with the nickel column (Fig. 3b,c, Supplementary Fig. S1).  Additionally, we attempted protein production in a different insect cell line, High Five cells (ATCC CRL-10859), which have been reported 28 to produce greater amounts of protein upon infection with baculoviruses. In our small-scale experiments, High Five cells did not yield greater protein than Sf9 cells, but this yield is likely to protein specific (data not shown).

Plasmid vector backbone affects expression level of proteins.
In our experiments, MARV/Musoke GP 1,2 could not be expressed using pcDNA3.1 + . The sequence of the MARV/Musoke GP gene used was truncated at residues encoding amino acid 636 (preGP numbering), i.e., upstream of the predicted transmembrane region 29 . This sequence was identical to the sequence used in the MARV/Musoke GP gene cloned into the baculovirus that successfully produced MARV GP 1,2 in the insect cell system. Likewise, expression of the MARV/Musoke GP gene encoding a GP 1,2 truncated to encode only amino acid residues 1-648 or 1-644 was unsuccessful in the pcDNA3.1 + background (Table 1). In contrast to commercially available MARV GP 1,2 expression plasmids, which use codon-optimized GP genes for expression in mammalian cells, all our modified sequences were based on the wild-type GP sequences. Transfer of the MARV/Musoke GP gene sequence from pcDNA3.1 + into the pCAGGs expression vector did not lead to successful expression of MARV GP 1,2 , despite successful expression of unmodified full-length MARV GP by pCAGGs and not pcDNA3.1 + ( Table 1, Supplementary Fig. S2). However, MARV/ Angola (1-648) GP 1,2 could be successfully expressed from pCAGGs ( Table 1, Supplementary Fig. S2).
Mammalian HEK 293T culture conditions are incompatible with nickel column purification. After verifying successful filovirus GP 1,2 expression from plasmids, we transfected plasmids into HEK 293T cells for purification and harvested media on day 3 post-transfection. Initially, we again used the HisTrap Excel Nickel Column for purification. However, elutions from the column contained an additional contaminating band ( Fig. 4b) of approximately 80 kDa. This protein was glycosylated, as determined by PAS staining, but did not react with antibodies against GP 1,2 .
We attempted to remove the contaminant through multiple methods. Initial attempts to modify the HisTrap Excel protocol by doubling the wash buffer volume, increasing imidazole concentration in wash buffer, and increasing imidazole concentration in sample did not eliminate the contaminant. We used 8 M of urea to dissociate a possible protein-protein interaction with the contaminant. However, running the supernatant on a native gel revealed that the contaminating protein was not interacting with the GP 1,2 s. Due to the size difference between the GP 1,2 s and the contaminant, we also attempted dialysis of the elution with membranes with pore sizes of 100 kDa, 300 kDa, or 1,000 kDa. We then used size exclusion columns to remove the contaminant, but without success. We tried altering the purification method first by using a His-streptavidin tag-enrichment kit with a streptavidin column and then by replacing the nickel column with a cobalt column. Both approaches have been reported to reduce binding of contaminating proteins due to lower affinity for the His-tag. Yet, these approaches were not effective. The contaminating protein was identified by mass spectrometry as bovine serotransferase, a component of the fetal bovine serum (FBS) used in the HEK 293T cell media.

Purification of HEK 293T cell-produced GP 1,2 s with anti-His affinity resin columns. Anti-His Affinity
Resin (GenScript) was used for purification of filovirus GP 1,2 to avoid bovine serotransferase protein contamination. T150 flasks containing HEK 293T cells at 50-80% confluency were transfected with plasmids and, on day 3 post-transfection, media were collected for use in purification. Each purification on the Anti-His Affinity Resin gave an overall protein yield ranging from 150 µg to 500 µg, depending on GP 1,2 . Different GP genes resulted in different GP 1,2 expression levels, with EBOV/Makona and EBOV/Yambuku genes expressing more GP 1,2 s than those of other filoviruses in both insect and mammalian cells (data not shown). Purified GP 1,2 s were again verified by western blot, and purity was confirmed by PAS and Colloidal Blue staining ( Fig. 4c-e). Using this approach for glycan analysis, we confirmed the uniformity of the individual GP 1,2 s produced, demonstrating consistent GP 1,2 glycosylation over multiple batches 27 . Enzymatic digest of glycans from HEK 293T and SF9-cell produced GP 1,2 . EBOV/Makona GP 1,2 produced in HEK 293T cells has a molecular weight by electrophoresis of around 160kDa, compared to just ≈110 kDa molecular weight of Sf9-produced EBOV/Makona (Fig. 5a,b). To demonstrate the contribution to molecular weight from glycosylation, we removed N-linked glycans from both proteins using PNGase F, and showed that ≈31% of 160 kDa HEK 293T cell-derived EBOV/Makona GP 1,2 , or 50 kDa total molecular weight, are N-linked glycans, compared to ≈35%, or ≈38 kDa of 110 kDa Sf9 cell-derived EBOV/Makona GP 1,2 . We then used a deglycosylation mix of enzymes that removes the majority of N and O-linked glycans and found that an additional ≈24%, or 38 kDa, of the HEK 293T cell-derived protein are O-linked glycans. However, the deglycosylation enzyme mix did not further reduce the molecular weight of the Sf9-derived EBOV/Makona protein, suggesting little or no observable O-linked glycosylation on the insect cell derived-proteins.
Analysis of N-linked glycans in HEK 293T and Sf9 cell produced GP 1,2 s. Our group has previously published in detail on the composition of the N-glycans on ebolavirus GP 1,2 27 . The broad differences in ebolavirus GP 1,2 produced in HEK 293T cells are summarized in Fig. 5c. Briefly, the majority of the N-linked glycans are of the complex type, with few high mannose and hybrid glycans. The majority of the glycan species imparted on the mammalian GP 1,2 s are fucosylated. Di-antennary N glycans which are the dominant complex glycan structures for both EBOV GP 1,2 samples are significantly less represented for BDBV, TAFV and SUDV GP 1,2 . The latter proteins showed increased levels of tri-tetra-, and penta-antennary N-glycans. Sialylated N-glycans correspond in all HEK 293T derived GP 1,2 samples to about 15% of total N-glycans. Analysis of the Sf9-cell derived ebolavirus GP 1,2 proteins reveals simpler glycan profiles, with a fucosylated core N-glycan (Fuc)1 (Man)3 -GlcNAc)2 as the largely dominant structure and some high-mannose type glycans (Fig. 5d). The N-glycans found on Sf9 cell derived GP 1,2 were in accordance with the glycosylation potential of this expression system 30 lacking notably complex galactosylated or sialylated glycans.   Fig. 6, obtained by analysis of the sequence using database search on EXPAZY Glycomod. The prediction of glycosylation sites does not differentiate based on cell or organism type but is only based on the presence of the N-glycosylation sequon Asn -X-Ser/Thr within the protein sequence. In a preliminary study combining the analysis of glycopeptides prior to and after enzymatic glycosylation, occupation of most of the 17 theoretical N-glycosylation sites with N-glycans could be observed. Sf9-derived EBOV/Makona GP 1,2 contained15/17 detectable occupied sites while HEK 293T-derived EBOV/Makona GP 1,2 , revealed the presence of 13/17 detectable occupied sites. At the present level of analysis we cannot exclude that the remaining theoretical glycosylation sites are also occupied since the relatively large sizes of the corresponding tryptic glycopeptides may have prohibited their analysis by MALDI-TOF mass spectrometry. There are significant differences in the variety of different glycan species found at any given N-glycan occupied site, with the HEK 293T-derived EBOV/Makona GP 1,2 having a greater number, up to 13 different N-glycans at an individual site (N172 corresponding to amino acid 204 of the construct), compared to the Sf9-derived, which had a maximum of 4 N-glycans at a given site (N264 corresponding to amino acid 296 of the construct). The heterogeneity of site specific N-glycosylation between mammalian and insect cell derived GP 1,2 is in accordance with the observed differences of overall N-glycan profiles of proteins produced in these expression systems. Significantly, the two predicted glycosylation sites in GP 1,2 shown to be necessary for VSV pseudotype entry and production 31 of full-length EBOV/Mayinga GP 1,2 , were also found to be glycosylated in our EBOV/Makona GP 1,2 produced by either cell type.

Discussion
In the present study, we demonstrate two techniques for producing and purifying filovirus GP 1,2 s. We modified the GP gene sequences from ebolaviruses, MARV, and LLOV to express released GP 1,2 s by truncating the sequences to remove the transmembrane domains. For higher expression, we altered the RNA-editing sites of the ebolavirus and LLOV GP genes (Fig. 2). This approach prevented expression of sGP and ssGP, which could interfere with growth of cells in culture and downregulate GP 1,2 expression 32 . EBOV GP 1,2 has been reported to be cytotoxic when expressed at high levels 33,34 , which would not be ideal for protein production. However, this property has been attributed to the transmembrane region of the protein 35 , and thus deleting the transmembrane domain may have circumvented this potential problem.
We mutated the furin cleavage sites of filovirus GP 1,2 s to prevent dissociation of the GP 1 and GP 2 subunits during the purification process (Fig. 2). This step enabled maintenance of equal expression levels of GP 1 and GP 2 and circumvented purification of only GP 2 (which contained the His-tag). However, we did not codon-optimize the GP gene sequences because codon optimization is known to increase protein expression levels, thereby possibly leading to cytotoxicity 33,34 . The modified sequences were cloned into either pcDNA3.1 + , a commonly used expression plasmid for mammalian cell transfections, or into a shuttle vector, pFastBac1, for baculovirus (AcMNPV) production. Insect cells are a commonly used system for protein production in part because of the success of baculovirus systems. Baculoviruses are versatile vectors for protein production in insect cells that are easily scaled up for high output 28,36 .
We successfully expressed modified ebolavirus and LLOV GP 1,2 s in HEK 293T cells using the pcDNA3.1 + expression plasmid. However, we were unable to successfully produce MARV GP 1,2 in the HEK 293T cell system using pcDNA3.1 + (Table 1). Non-modified full-length MARV GP 1,2 was also not successfully produced in the HEK 293T system using the pcDNA3.1 + vector but was produced using the vector pCAGGs despite the encoded sequences being identical. Further, using the pCAGGs expression vector, we could not express modified MARV/ Musoke GP 1,2 , but could express modified MARV/Angola GP 1,2 (Table 1, Supplemental Fig. 2). The amino acid sequence difference between the two glycoproteins is 7.2% 37 . Thus, the vector choice is gene-specific in the mammalian system, even among genes from related viruses. Therefore, attempting expression with multiple vectors may be necessary for a protein of interest.
Attempts to purify mammalian-derived GP 1,2 through the HisTrap Excel Nickel Column system that we used for Sf9 purification were unsuccessful due to a component of the FBS that binds to the nickel column (Fig. 4b). This component was identified as bovine serotransferase. To our knowledge, this paper is the first report of an incompatibility of the nickel column system and mammalian tissue culture that uses FBS. The Anti-His Affinity Resin approach that we detail here may therefore be applicable to purification of many mammalian cell-produced proteins. Another alternative may be the use of serum-free 293T cell culture media. In summary, we demonstrated that different purification techniques are required for mammalian and insect filovirus GP 1,2 production.
We demonstrate that there are key differences in the type of glycosylation imparted on the ebolavirus GP 1,2 s produced using the two methods outlined, for insect and mammalian cell culture. Key differences in complexity of the N-linked glycans between the two systems were found for all of the ebolavirus GP 1,2 s. The mammalian derived-proteins demonstrated high percentages of complex N-linked glycans compared to the majority of simple N-linked glycans and high mannose structures typically found in SF9 insect cell derived protein. The HEK 293T derived GP 1,2 s showed some heterogeneity in the complexity of antennary structures with the two EBOV samples containing predominantly di-antennary complex glycans and BDBV, TAFV and SUDV harboring complex glycans with higher antennary structures. The mammalian system also produced N-glycans that were sialylated. Although it remains controversial whether some insect protein production systems are able to impart sialylation, with some well controlled studies suggesting that they can at a minimal level 38 . Using our Sf9 cell production system, we did not see sialylation on any insect-derived N-linked glycans. As sialylation has been shown to play an important role in the function of some glycoproteins, it is possible that this difference could impart significant functional differences 39 . Insect cell ability to impart O-linked glycans has been controversial, with some studies demonstrating limited O-linked glycans present on proteins derived from Sf9 cells 40 . Our analysis reveals the possibility of limited GalNAc-only O-linked glycans on the EBOV/Makona analyzed by monosaccharide quantification. However, while there is some evidence of limited O-linked glycosylation from insect cell-produced GP 1,2 s, mammalian cell O-linked glycosylation is extensive, a key difference between the two systems of production. For lesser studied ebolaviruses such as Bundibugyo and Taї Forest viruses, few reagents are available, and the few commercially available proteins are often produced in insect cells. We demonstrate that the post-translational modifications are different in mammalian compared to insect systems for production of filovirus GP 1,2 s. A comparison between the two systems may be useful in deciding how to produce large-scale amounts of protein for purposes such as vaccination, enzyme-linked immunosorbent assays, or the study of receptor or antibody binding kinetics. Our approaches successfully produced filovirus GP 1,2 s in both systems, therefore allowing phenotypic comparisons between the expressed proteins.
We created GP 1,2 -expressing plasmids based on GP genes from one cuevavirus, four ebolaviruses, and two marburgviruses. All but two plasmids are based on reference GP sequences: Glycoprotein sequence modifications. A number of GP gene modifications was required to produce soluble filovirus GP 1,2 s. An additional adenosyl was added to the cuevavirus and ebolavirus 7A co-transcriptional editing sites (yielding 8A-GP) to ensure expression of GP 1,2 in the absence of sGP and ssGP production 41 . The coding regions for the GP 2 transmembrane regions were removed to ensure secretion. Truncation occurred at amino acid 692 (LLOV preGP numbering), 650 (ebolavirus preGP numbering), and 638 (MARV preGP numbering). A region encoding a C-terminal polyhistidine (6xHis) tag was added to all genes to aid purification of expressed proteins through affinity chromatography. Finally, furin cleavage of preGPs was prevented by site-directed mutagenesis leading to a single arginine-to-lysine change in the VYFRRKR furin cleavage site. This change is known not to affect the protein's function 42 , which is why we continue to refer to GP 1,2 s in this manuscript.
AcMNPVs expressing GP 1,2 s were verified for correct orientation 72 h after transfection of Sf9 cells in 6-well plates by western blot of cell supernatants. Supernatants containing AcMNPVs was then used for plaque purification with 10-fold dilutions. Individual plaques were purified by adding an agarose plug to a T150 Sf9 flask and incubating for 4 days. Supernatants from these flasks were titered using plaque assays of 95% confluent Sf9 cells and a 4% agarose overlay.
Purification of Sf9-produced glycoproteins. Sf9 cells were infected with AcMNPVs at a multiplicity of infection of 0.01. Media were harvested on days 3 and 4 post-inoculation and pooled. Modified GP 1,2 s were purified using the HisTrap excel system (GE Healthcare Bio-Sciences, Pittsburgh, PA), following the manufacturer's instructions. Briefly, Sf9 supernatants were passed through a Ni 2 -chelated HisTrap Excel column (GE Healthcare) using a variable flow mini-pump (Fisher Scientific). The column was washed with wash buffer (20 mM of sodium phosphate, 0.5 M of NaCl and 20 mM of imidazole [GE Healthcare]) and eluted in 5 ml of elution buffer (20 mM of sodium phosphate, 0.5 M of NaCl, and 500 mM of imidazole).

Purification of glycoproteins from HEK 293T cells.
Anti-His Affinity Resin (GenScript) was used for purification of GP 1,2 s following the manufacturer's instruction. Briefly, resin was incubated with supernatant and rotated for 30 min and resins were placed into Econo-Pac Disposable Chromatography columns (BioRad, Hercules, CA) and washed with Tris-buffered saline (pH 7.4). Resins were then incubated with hexa-His peptide (GenScript) at a concentration of 0.5 µg/ml. Eluents containing proteins of interest were concentrated using Pierce Protein Concentrator (Thermo Fisher Scientific, Waltham, MA) for downstream use.
Glycan composition analysis. Once permethylated, purified glycans were solubilized in 20 µL of a 1:1 ratio methanol/DI water mix. A mix of 2 µL non-dilute and 2 µL N-glycans with 2 µL of 2,5-dihydroxybenzoic acid (LaserBio Labs, Sophia-Antipolis, France) matrix solution (10 mg/ml in 1:1 ratio methanol/DI water). Positive ion reflectron MALDI-TOF mass spectra were acquired using an Autoflex III mass spectrometer (Bruker Daltonics, Billerica, MA, USA). The acceleration and reflector voltage conditions were voltage 12 × 1977 V and 90% laser, and the spectra obtained by accumulation of 2,000 shots. The spectra were calibrated with an external standard (PepMix 4, LaserBioLabs, Sophia-Antipolis, France). To elucidate the glycan profiles of the sample, several spectra were obtained from different spots and the values averaged. To interpret the structures of the glycans corresponding to monisotopic masses after deisotoping of the spectra, the EXPAZY GlycoMod tool was used, along with GlycoWorkBench. Relative intensities of glycans were calculated to establish the glycan profile for each spectrum and mean values for the glycan intensities with standard deviations were determined.
Deglycosylation of the enriched fraction. To remove glycans from the peptides, 5 µl of suspended glycopeptides were adjusted to 10 mM sodium acetate buffer pH 5 and 1 µl of PNGase A and 0.7 µl of PNGase F (Promega, Madison, WI, USA) or PNGase A (for insect cell protein) were added to deglycosylate during 15 hours at 37 °C.