Introduction

Ebola virus is a member of the Filoviridae family1,2. Since its initial discovery in 1976, it has caused recurring outbreaks of disease in Central and West Africa upon spillover into the human population from an as-yet unidentified animal host reservoir, or recrudescence from convalescent humans3,4. Detection of viral RNA and isolation of a new ebolavirus species (Bombali) from bats have pointed to these animals as a likely reservoir3,4,5, similar to the related Marburg virus (MARV) for which the evidence is more established6,7,8. Outbreaks of ebolaviruses have typically been limited to the order of 10–1000 cases by contact tracing and isolation, but in 2013–2015 an outbreak with over 28000 confirmed cases and over 11000 deaths occurred in Sierra Leone, Liberia and Guinea2,9. This outbreak accelerated the development of an effective vaccine and improved therapies against Ebola virus disease10. Still, clinical manifestation of Ebola virus infection has historically been associated with mortality rates ranging from 30% to 90% and even the most successful therapies to date provide only a modest improvement of mortality rates and don’t offer a cure for advanced disease2,10. Six species of ebolavirus have currently been discovered, including Ebola (a.k.a. Zaire; EBOV), Sudan (SUDV), Bundibugyo (BDBV), Tai Forest (TAFV), Reston (RESTV) and Bombali (BOMV), of which all but the latter two are known to cause severe disease in humans2.

The ebolaviruses are enveloped and contain an 18 kb genome of non-segmented, negative sense, single-stranded RNA that encodes seven genes: NP, VP35, VP40, GP, VP30, VP24 and L2. The GP gene encodes the full-length envelope glycoprotein (GP) as well as two truncated secreted versions (sGP and ssGP) by transcriptional editing11,12,13. The full-length envelope GP is a trimeric class I viral fusion protein and plays an important role in host cell attachment and entry11. Following virus internalization by macropinocytosis14,15,16,17, GP binds the Niemann-Pick C1 (NPC1) receptor, triggering fusion of the viral envelope with the host membrane, thereby delivering the ribonucleoprotein complexes in the cytosol where replication will take place18,19,20. GP is also the primary target of antibodies produced upon natural infection or vaccination and of monoclonal antibodies developed as antiviral therapeutics21. Full-length GP is translated as a ~670 amino acid precursor and is cleaved by host furin into two disulfide-linked subunits: GP1 and GP222,23. GP1 is responsible for receptor binding and consists of 4 domains: base, head, glycan cap and mucin-like domain (MLD)11. The GP2 subunit contains the fusion peptide and has the strongest conservation between different members of the filoviruses24,25,26.

There are up to 17 N-linked glycosylation sites in ebolavirus GP, 15 of which are in the GP1 subunit, primarily in the glycan cap and MLD (see Fig. 1a). The N-linked glycans mediate host-cell attachment through C-type lectins DC-SIGN/L-SIGN27,28,29 and have been implicated in shielding GP from binding by neutralizing antibodies28,30,31,32,33,34,35,36. Whereas the overall sequences and especially the N-linked glycosylation sites in the base, head and glycan cap are relatively well conserved among ebolavirus species, those in the MLD are highly variable. Further, the MLD is also modified with numerous O-linked glycans. The GP2 subunit contains two N-linked glycosylation sites that are conserved in all known mammalian filoviruses and play important roles in GP expression, stability and cell entry36,37. Besides the numerous N- and O-linked glycans, there are also two predicted tryptophan C-mannosylation motifs in GP. These motifs consist of a WXXW sequence near the glycan cap, and a tandem WXXWXXW sequence in the membrane-proximal region of GP, where a mannose residue may be linked to the C2 atom of the first tryptophan’s indole group. The biological function of C-mannosylation is generally not well understood, but known to play a role in the folding, stability and trafficking of secreted glycoproteins, including components of the complement system and gel-forming mucins38,39. The C-mannosylation motifs in GP are conserved in all ebolavirus species and the related Lloviu virus (LLOV), but not Marburg virus (MARV). So far, C-mannosylation in the glycan cap has been confirmed in the secreted version sGP40, but its presence in full-length GP and role in the infection cycle remain unclear.

Fig. 1: Sequence and structure analysis of Filovirus GP glycosylation.
figure 1

a Schematic of filovirus GP domain structure with annotated N-linked glycosylation and C-mannosylation. The cladogram on the left is based on the full GP sequences. Domain coloring as indicated below the diagram. Conserved N-linked glycans in ebolavirus species are annotated on top. b Pseudomodel of EBOV GP with core pentasaccharide of N-linked glycans shown as orange spheres. Model is built with GLYCAM based on PDB ID 5JQ3. The yellow lines indicate the approximate location of the MLD, connecting the respective termini of the glycan cap and GP2 subunits.

Glycomics studies have confirmed the presence of both complex N- and O-linked glycans in GP30,35, but little is known about the site-specific patterns of glycan processing. As glycans play a crucial role in host cell attachment and immune evasion, a better understanding of these patterns in the context of GP structure may help understand mechanisms of infectivity and epitope shielding. Here, we present an in-depth glycoproteomics study of the N-, O- and C-linked glycans of ebolavirus GP. We compare recombinant soluble GP ectodomains of EBOV and BDBV from both human HEK293 and insect S2 cells. Of the 29 ebolavirus outbreaks on records, 22 were caused by EBOV, including the West African outbreak of 2013–2016, while BDBV provides a comparison species also linked to severe human disease. The recombinant soluble ectodomains of GP derived from HEK293 and S2 cells are widely used in structural and biochemical studies, as well as for serological assays and for antibody discovery. These findings therefore relate to much of the available literature describing the structure of ebolavirus GP, as well as the GP-directed antibody response to natural infection and vaccination. We demonstrate that the conserved N-linked glycans at N257 and especially N563 are enriched in under-processed oligomannose and hybrid structures in both viral species and cellular expression platforms, suggesting a specific role in host cell attachment through binding of the cell surface lectins DC-SIGN/L-SIGN (which have a markedly higher affinity for oligomannose glycans). We observe that the MLD is modified by numerous O-glycans, comprising a mixture of truncated Tn-antigen and extended, sialylated core 1 and 2 structures, depending on the expression platform. Moreover, we find several O-linked glycosylation sites within the serine/threonine residues of N-linked glycosylation sequons, as well as evidence for O-linked glycans outside the MLD in both EBOV and BDBV GP. We also confirm C-mannosylation in the glycan cap of both ebolavirus species, which only occurs in the HEK293 expression platform. These key findings were confirmed in glycoproteomics experiments on virus-like particles (VLPs) formed by co-expression of full-length GP and VP40 in HEK293 cells. These VLPs consist of plasma membrane-derived vesicles, are similar in morphology to authentic virions and better mimic GP in its native, membrane-anchored state. We discuss the observed glycosylation profile in the context of known structures of GP in complex with neutralizing antibodies. Our findings provide a framework to understand the contributions and restrictions of GP glycosylation to the neutralization epitopes of antiviral antibodies.

Results

We compared the pattern of predicted N-linked glycosylation sites (NXS/T) and C-mannosylation sites (WXXW) of all known ebolavirus species and the related filoviruses MARV and LLOV (see Fig. 1a and the sequence alignment in Supplementary Fig. S1). The number of predicted N-linked glycosylation sites varies from 9 in BOMV to 17 in EBOV and RESTV. Two of these glycosylation sites are located in the GP2 subunit (N563 and N618) and they are conserved in all ebolavirus species, MARV and LLOV. All remaining sites are located within the GP1 subunit, especially the glycan cap and MLD. Only 4 sites in GP1 are fully conserved in all ebolavirus species: N40 in the base (also present in LLOV), N204 in a flexible loop between the base and head domains, and N257 and N268 in the glycan cap. All other N-linked glycosylation sites in the glycan cap are shared between a smaller set of ebolavirus species, but virtually all N-linked sites within the MLD are unique, in line with the disordered nature and high overall sequence variability of this region. There are 3 predicted C-mannosylation sites conserved in all ebolavirus species and LLOV, but conspicuously missing in MARV. The first WXXW motif is situated in the glycan cap, at W288 (in EBOV), close to the junction with the MLD. In addition, a tandem WXXWXXW motif is situated at W645/W648 in the membrane proximal region of the GP2 subunit.

To visualize the N-glycan shield, we built a pseudomodel of EBOV GP with the core pentasaccharide of each site linked to the corresponding residue of GP1/GP2 (see Fig. 1b). Note that almost all differences in N-linked glycosylation sites between EBOV and BDBV GP occur within the unmodelled MLD, with the exception of N238, which is a sequon encoded in EBOV, but missing in BDBV. The pseudomodel of BDBV GP would otherwise be fully equivalent to the EBOV model presented here. The GP trimer forms a chalice-shaped structure with GP2 as the stem, and GP1 as the bowl on top. The conserved sites N40, N204, N257, N268, N563 and N618 are distributed evenly across the structure, whereas the remaining sites are situated primarily at the rim of the bowl extending outwards from the glycan cap. The glycans occupy much of the available surface of GP. Moreover, the disordered MLD connects the tip of the glycan cap with the lower base of the cup and can be expected to further shield the surface of GP. This pseudomodel includes only the common core pentasaccharide of the N-linked glycans and it is not known how the glycans are processed in the context of folded GP, as predicted sites are not always glycosylated and the processing from oligomannose precursors to hybrid and mature complex glycans may depend on many unpredictable factors, including local structural constraints.

We investigated the patterns of site-specific glycosylation of ebolavirus GP with LC-MS/MS based glycoproteomics experiments, using recombinant soluble ectodomains (GPΔTM) of EBOV and BDBV, as well as the corresponding full-length GP from virus-like particles produced by co-expression with VP40. We compared GPΔTM from human HEK293 and insect S2 cells, both commonly used for structural biology studies, experimental immunizations, antibody selection and serological tests. Our results cover 14/17 and 17/17 predicted sites in EBOV GP from HEK293 and S2 cells, respectively, as well as 12/14 predicted sites of BDBV GP from both expression platforms (see Fig. 2 and Supplementary Data S1). As expected, the N-linked glycosylation patterns of GP from HEK293 and S2 cells are dominated by complex and paucimannose/hybrid glycans, respectively. The glycosylation of GP from especially HEK293 cells is extremely heterogeneous, with some sites carrying over 40 unique glycan compositions. Predicted sites N278 and N391 in BDBV were only detected as unglycosylated asparagines (no unglycosylated asparagine is detected for any of the other sites in EBOV or BDBV, which are then presumably completely occupied). Most detected glycan compositions are compatible with di-, tri- and tetra-antennary, galactosylated complex glycans with or without a single (core) fucose residue and a variable number of terminal sialic acids, as previously described in glycomics analyses30,35.

Fig. 2: N-linked glycosylation profiling of ebolavirus GP.
figure 2

a Overview of site-specific N-linked glycan processing in ebolavirus GPΔTM from HEK293 and S2 cells as determined by LC-MS/MS. The glycans were classified by HexNAc content as truncated, paucimannose, oligomannose, hybrid or complex. Shown is the average of a duplicate experiment. b Pseudomodel of EBOV GP (as in Fig. 1), with glycans colored by main class (oligomannose, hybrid, complex) as observed in HEK293 cells.

While complex glycans dominate the overall picture, selected sites show clear and robust enrichment of unprocessed glycans, (i.e. oligomannose and hybrid structures in the HEK293-derived samples). These include particularly the conserved N257 and N563 sites, in both EBOV and BDBV GP. In good agreement with these observations in the HEK293-derived samples, N257 and N563 are also enriched in unprocessed oligomannose glycans in the S2-derived samples of both EBOV and BDBV GP, indicating that processing of these sites is somehow structurally restricted. Our pseudomodel of EBOV GP indicates that N257 may be partially buried between the head domain and glycan cap (i.e. its first asparagine-linked GlcNAc residue), and N563 similarly between the head domain and GP2. Whereas sites N40/N268/N454 in EBOV GP, and N400/N454 in BDBV GP also show elevated levels of unprocessed glycans in selected samples, we refrain from any conclusions on these sites due to a relatively shallow coverage in the underlying mass spec data and the lack of agreement between HEK293/S2 or EBOV/BDBV samples. Nevertheless, the data clearly indicate a lack of processing at the conserved N257 and N563 sites in both tested ebolavirus species and expression platforms. These findings are confirmed in LC-MS/MS experiments of EBOV and BDBV virus-like particles derived from HEK293 cells, where we also detected a large fraction of oligomannose and hybrid glycans at N257/N563 against a background of highly processed complex glycans at the remaining covered sites (see Supplementary Figs. S2 and S3). The abundance of unprocessed glycans is most prominent at site N563, where the vast majority of glycans consists of hybrid and oligomannose forms in both full-length GP and GPΔTM. At site N257, the abundance of unprocessed glycans is markedly lower in the full-length EBOV GP from VLPs.

Our experiments also cover the C-mannosylation site at W288 in the glycan cap (see Fig. 3). The GPΔTM constructs used here are truncated before the second C-mannosylation motif at W645/W648 and therefore not covered in these experiments. The GP samples derived from HEK293 cells both contain a mixture of C-mannosylated and unmodified W288, with an estimated occupancy of 1–10%. In contrast, this modification is completely absent in both samples derived from S2 cells. The presence of C-mannosylated W288 was confirmed in the glycoproteomics experiments on full-length GP in the virus-like particles formed by co-expression with VP40 (see Supplementary Fig. S4). Unfortunately, we could not detect any peptides that cover the second C-mannosylation motif at W645/W648 in these samples.

Fig. 3: C-mannosylation in ebolavirus GP.
figure 3

a schematic domain structure of EBOV GP with highlighted C-mannosylation sites. The modelled structure of C-mannosylated W288 was based on average Fo-Fc density (shown in green) of twelve isomorphic GP crystal structures. b LC-MS/MS spectrum of C-mannosylated W288 from BDBV GPΔTM from HEK293 cells. Note the prominent c11-c12 peaks provide direct evidence for the presence and localization of the Hex (+162 Da) modification of W288.

The W288 site is situated in a lesser ordered region of the glycan cap (the β17-β18 loop) just before the start of the MLD, which is deleted in the constructs of most structural studies. While most available GP structures do not model the corresponding region, we identified a set of 12 deposited isomorphic GP crystal structures of HEK293-derived material with electron density for W288 and its adjacent residues41,42,43,44,45. The individual crystal structures did not show a clear Fo-Fc density corresponding to the C2-linked mannose, but after averaging all available electron density maps, a clear ring structure did appear. The weak observed electron density is consistent with the low occupancy of the modification observed in our glycoproteomics experiments. The C2-linked mannose residue was modelled in the extra density, positioning it at the exposed surface of the glycan cap, pointing towards the center of the β17-β18 loop.

We also mapped out the patterns of O-linked glycosylation in ebolavirus GP (see Fig. 4). In contrast to N- and C-linked glycosylation, there is no clear sequence motif to predict O-linked glycosylation sites. The modification generally occurs in serine/threonine-rich disordered regions, such as the MLD of filovirus GPs. Whereas the presence of O-glycans in the MLD is well-known, the precise localization of these modifications remains unclear (the MLD contains more than 50 possible S/T residues). For these experiments, we first removed all N-linked glycans by PNGase F digestion. This reduces the complexity of the glycopeptide mixture to facilitate O-linked glycopeptide identification and site localization, while leaving a clear mark at the digested N-glycan site by deamidation of the asparagine residue (resulting in a + 1 Da mass shift). Due to challenges with site-localization and the presence of multiple O-glycans per peptide, we choose to report only the confident O-glycopeptide identifications as such, but not a quantitative profile of the glycosylation sites in the MLD.

Fig. 4: O-linked glycosylation in the MLD of ebolavirus GP.
figure 4

a GPΔTM from EBOV b GPΔTM from BDBV. Glycans drawn within a box represent multiple detected compositions per site. Glycans connected to multiple indicated sites could not be unambiguously localized from the LC-MS/MS data.

In EBOV GP, we detected 12 unique O-linked glycosylation sites in the MLD of GPΔTM from HEK293 cells versus 12 in S2 cells, with 5 sites in common. In BDBV GP we detected 16 unique O-linked glycosylation sites in the MLD of GPΔTM from HEK293 cells versus 8 in S2 cells, with 7 sites in common. Whereas O-linked glycosylation was dominated by simple Tn antigen and core 1 structures in samples from S2 cells, samples from HEK293 cells also contained extended and sialylated core 1 and core 2 structures, especially in EBOV GP. Multiple unique glycan compositions were often detected for a given site, further adding to the extreme heterogeneity of GP due to its glycosylation.

We also detected several O-glycosylation sites outside the MLD of both EBOV and BDBV GP (see Fig. 5). In BDBV GP we detected O-linked glycosylation at T280, with 6 unique glycan compositions amounting to an estimated total occupancy of ~10%. This threonine residue is part of a putative NPT glycosylation sequon, but we only detect the unmodified asparagine, which remains unprocessed presumably because of the following proline residue. The modified threonine is shared only by BDBV and TAFV GP, but is absent in EBOV, SUDV, BOMV, RESTV, MARV and LLOV. We also detected O-linked glycosylation at T206 in EBOV GP, with 6 unique glycan compositions and an estimated occupancy of ~5%. This threonine is part of the glycosylation sequon of N204, which is fully occupied by N-glycans as evidenced by the deamidated asparagine and the N-linked glycoproteomics data discussed earlier. The N-linked glycosylation sequon including the modified threonine is conserved among all ebolavirus species, but does not exist in MARV and LLOV. Residues adjacent to this sequon show substantial variation between ebolavirus species and modified T206 was not detected in the BDBV GP samples. The close juxtaposition of N- and O-glycans is also observed in the MLD of both EBOV and BDBV GP, where T335 is part of the N-linked glycosylation sequon of N333 and detected in GPΔTM from both species as an O-linked glycosylation site. Similarly, S348/T388/S438 in EBOV GP and T402/T488 in BDBV GP are all part of N-glycosylation sequons. Finally, we also observed O-linked glycosylation within the strep-tag of the constructs (see Supplementary Fig. S5). The presence of the O-linked glycans at T206 (EBOV) and T280 (BDBV) outside the MLD could be confirmed in our glycoproteomics measurements of full-length GP from virus-like particles (see Supplementary Fig. S4).

Fig. 5: O-linked glycosylation outside the MLD in ebolavirus GP.
figure 5

a Pie charts represent the occupancy of the O-linked modification. b Sequence conservation of the detected O-linked glycosylation sites. c Distribution of glycan types at the indicated sites (bars indicate average of both replicate measurements shows as solid circles).

This high extent of glycosylation must be accommodated by antibodies against ebolavirus GP. To understand the contribution of glycans to neutralization epitopes (and restrictions they impose) we screened the Protein Data Bank for structural models of ebolavirus GP in complex with neutralizing antibodies and also looked for linear B-cell epitopes reported in literature that span the glycosylation sites we detected in our experiments (see Fig. 6a)46,47,48,49,50,51,52,53,54,55,56. The glycan-rich epitopes we report in this overview include both cases of direct contacts between modelled GP glycans and Complementarity Determining Regions of the antibodies, as well as brushing interactions of adjacent glycans with the framework regions of the variable domains, which may sterically restrict binding. It should be noted that glycans are typically incompletely modelled in GP-antibody structures and that inference of these brushing interactions is not an exact determination. This overview indicates that neutralizing antibodies span a broad range of epitopes that cover essentially all N-linked glycosylation sites (N204 and the entire MLD have not yet been modelled in structural studies and are therefore not represented in this analysis). The conserved GP1 glycans (N40, N257, and N268) all contribute to the epitopes of neutralizing antibodies, with possible glycan-antibody interactions for N268 reported in the epitopes of as many as 7 unique monoclonal antibodies. This includes the therapeutic monoclonal antibody Mab114, which makes additional contacts with N238. The components of the therapeutic ZMapp mixture (c2G4, c4G7 and c13C6) also interact with N40, N238, N268 and N563. The epitopes of the three components in the therapeutic REGN-EB3 mixture are not defined to atomic detail (and therefore not included in the overview), but published negative stain EM reconstructions suggest possible interaction with N563 and the glycan cap57.

Fig. 6: Glycan-containing epitopes of monoclonal antibodies against ebolavirus GP.
figure 6

a Overview of glycosylation sites associated with indicated epitopes. Direct contacts and clashes with CDRs are indicated in red, brushing interactions with framework regions in light orange. b Highlighted structure of 14G7 in complex with a linear epitope from the MLD (PDB ID:2Y6S), showing that the O-linked glycosylation site T485 is deeply buried in the cleft between heavy (dark red) and light chain (light pink).

Besides N-glycans, we also noted several putative interactions with C-mannosylated W288 and O-linked glycosylation sites. The monoclonal antibodies BDBV-329 and BDBV-43 are in close proximity to T280 with their CDRH3 and framework 3 regions, respectively. The monoclonal antibody 66-3-9c binds a linear epitope that spans the C-mannosylation site W288. Similarly, c6D8 binds a linear epitope that spans the O-linked glycosylation site S399. The monoclonal antibody 14G7 binds to a linear epitope in the MLD that spans the O-linked glycosylation site T485. A crystal structure of the 14G7 Fab in complex with its unglycosylated epitope reveals that T485 is buried deep within the cleft between heavy and light chains, where it is in direct contact with CDRH3 residues (see Fig. 6b). It is therefore unlikely to accommodate the bulky O-glycans detected in our experiments, further highlighting the potential epitope shielding effects of not just N-glycans, but also O-glycans in the MLD.

Discussion

Here we have presented a detailed overview of glycosylation in ebolavirus GP, using glycoproteomics to resolve the patterns of site-specific N-, O- and C-linked glycans. In the GP samples derived from HEK293 cells, we observed heterogeneous, complex N-glycosylation overall, but noted enrichment of unprocessed glycans at two conserved sites N257 and especially N563. It is known that ebolavirus GP interacts with cell surface lectins DC-SIGN/L-SIGN in a oligomannose glycan dependent manner27,28,29. Our results indicate that the unprocessed glycans present at N257 and N563 may be primarily responsible for the interaction, thereby facilitating host cell attachment and infectivity. Indeed, Lasala and colleagues recently demonstrated that the enhanced infectivity of the A82V Makona-GP variant that arose during the 2013 West African epidemic results from DC-SIGN-mediated host cell attachment through glycans at N257 and N56358. They showed that removal of these glycans by mutagenesis results in decreased pseudovirus binding and infection of Jurkat-DC-SIGN cells of the A82 and especially the V82 Makona GPs. These functional studies are consistent with our findings that N257 and N563 are enriched in unprocessed glycans and point to an important role for glycosylation at N257 and N563 in DC-SIGN mediated host cell attachment. This also raises the possibility that monoclonal (therapeutic) antibodies in direct contact with N257 and N563 glycans (see Fig. 6) neutralize infection by blocking the DC-SIGN interaction to prevent host cell attachment, but this remains to be confirmed in future studies.

We also detected up to 16 unique site-specific O-glycans in the MLD of GP, revealing a heterogeneous mixture of not only simple Tn-antigen (i.e. a single GalNAc), but also extended sialylated core 1 and core 2 structures. The sites detected in our experiments are likely just the tip of the iceberg, as the dense decoration of the MLD with O-glycans may make proteolytic digestion of the MLD especially difficult and the presence of multiple glycans in the same peptide makes it exponentially more challenging to confidently make assignments from the raw LC-MS/MS data. We detected several O-linked glycosylation sites on the serine/threonine residues of NXS/T sequons of N-linked glycans, raising interesting questions about interplay between the two types of glycosylation. Moreover, we found two O-linked glycosylation sites outside the MLD. The close juxtaposition of N- and O-linked glycans, and O-linked glycosylations outside the MLD were recently also independently observed by Bagdonaite and colleagues59.

We further confirmed the presence of the C-mannosylation site in GP at W288, which is completely conserved in all ebolavirus species and LLOV, but not MARV. The second motif at W645/W658 is also missing in MARV, but whereas the MARV sequence at W288 completely diverges from the ebolavirus species, the MARV region corresponding to W645/W648 is similarly rich in tryptophan residues (see Fig. 3a). The MARV GP sequence in this region is thereby primed to acquire a C-mannosylation motif through a single deletion or a tryptophan substitution at any of four adjacent positions. Conversely, this could also indicate that the C-mannosylation motifs were present in a common ancestor with ebolaviruses and LLOV but lost in MARV. Whereas C-mannosylation is known to be important for the stability and folding of secreted human glycoproteins38,39, its role in ebolavirus replication and pathogenesis remains unclear.

The presence of heterogeneous N-, O- and C-linked glycosylation add up to a staggering complexity of GP composition. In the case of EBOV GP, the 17 N-glycosylation sites alone, each linked to on average a dozen unique glycan compositions, already give rise to an enormous number of permutations. Add to this the heterogeneity of O-linked glycosylation and a picture emerges where no two copies of GP on a virion are strictly identical. Meanwhile, the glycans represent a major component of the overall GP structure. The 17 N-glycans already contribute approximately one quarter of the molecular weight of GP, all situated at its exposed surface (counting ~74 kDa of polypeptide and on average 1.5 kDa per N-glycan). Although antibodies evidently mount a neutralizing response to infection or vaccination, the high variability of the exposed GP surface due to heterogeneous glycosylation must frustrate the overall binding efficiency of antibodies that accommodate glycans in their epitope and restrict good binders to only the core elements of glycans that are common between the countless variations of GP present on the surface of mature virions. From this perspective, the glycans contribute to immune evasion not only by sterically shielding neutralization epitopes, but also by blurring the molecular identity of the envelope glycoprotein.

Glycoproteomics studies on other envelope glycoproteins from divergent virus species, such as HIV-1, Lassa virus, MERS-CoV, SARS-CoV-2 and the herpesviruses, all show a similar trend of heterogeneous complex glycosylation with unprocessed glycans at selected sites60,61,62,63,64,65,66. From studies on HIV-1 gp120 it has been shown that the lack of processing of certain N-glycans is caused by local crowding and reduced accessibility to processing enzymes, resulting in enrichment of oligomannose glycans67,68,69,70. Neither glycan at N257 and N563 in ebolavirus GP fits this description, but both have their first N-linked GlcNAc residue partially buried in interactions with surrounding side chains. An intriguing possibility is that these interactions at the base of the glycan limit its conformational degrees of freedom to negatively impact processing at the antennae. Whereas N-linked glycosylation is universally known to play a role in the replication cycles of enveloped viruses, O-linked glycosylation is less well-studied and perhaps less common. Recent studies on a range of herpesvirus glycoproteins, SARS-CoV-2 Spike, the attachment proteins G of paramyxoviruses, and hepatitis C virus E2 point to a role in envelope glycoprotein processing, trafficking, host cell attachment, and immune evasion60,61,71,72,73,74. The observed glycosylation patterns of ebolavirus GP discussed here relate to human HEK293 and insect S2 cells. These cell types are widely used for the production of recombinant GP, VLPs, pseudoviruses, and authentic virus particles, so our results relate to much of the available literature and ongoing research on ebolavirus. How this relates to the glycosylation patterns of infected cells and tissues during ebolavirus disease in humans remains to be studied. Early during infection, ebolavirus infects primarily macrophages, Kupffer cells and dendritic cells, but subsequently spreads systemically and infects many different cell types75. The N-glycans presented on human macrophages and dendritic cells show broad overlap in composition with the N-linked glycans we detected in recombinant ebolavirus GP from HEK293 cells, where it should be noted that the N-glycans of macrophages are enriched in paucimannose and oligomannose structures, while an increased number of LacNAc repeats has been observed in mature dendritic cells76,77.

All three types of glycosylation observed in ebolavirus GP (i.e. N-, O- and C-linked) will alter its antigenic surface. The overview provided in Fig. 6 illustrates how the site-specific glycosylation observed in our experiments contributes to and restricts the epitopes of currently known neutralizing antibodies. Whereas bulkier types of glycans at specific sites may indeed modulate the binding affinity of the indicated antibodies, the presented overview shows quite the opposite of the glycans’ shielding effects in ebolavirus GP. That interpretation would be a kind of survivorship bias, as the monoclonal antibodies have been selected for binding and neutralization. The effective shielding would perhaps be better illustrated by the antibodies that don’t bind or neutralize ebolavirus infection because of the steric clashes with glycans. Several studies have illustrated this by showing greatly enhanced sensitivity of (pseudo) ebolavirus to serum neutralization after glycan removal by mutagenesis31,32. Lenneman and colleagues showed in two separate studies that removal of glycans from the GP1 subunit and at N563 results in a 5–10 fold increase in sensitivity to neutralization of pseudoviruses by whole IgG purified from the serum of immunized or convalescent cynomolgus macaques. Clark and colleagues reported that mice immunized with VLPs derived from insect cells yield higher antibody titers in serum, compared to immunization with VLPs derived from human cells, presumably due to the restricted shielding of the smaller paucimannose glycans of insect cells, compared to the extended complex structures in human cells78. In addition, Wec and colleagues showed how removal of N-linked glycosylation at N563 changes pseudovirus neutralization by a panel of monoclonal antibodies79. They demonstrate that removal of the N563 glycan improved the neutralization by some antibodies, providing direct evidence for an epitope shielding effect. However, other antibodies showed loss of neutralization after glycan removal at N563, stressing that glycans can also be integral parts of neutralization epitopes and that the term ‘shielding’ provides only a limited analogy to the full range of effects that glycans have on the antigenic landscape of viral glycoproteins.

The shielding effect of N-glycans in ebolavirus GP are not just relevant to antibody binding, but also proteolytic processing in relation to cell entry and receptor binding. Ebolaviruses rely on Cathepsin L/P cleavage following uptake into the host cell by macropinocytosis80. Cleavage by cathepsins removes the glycap cap and MLD to expose the receptor-binding domain and prime GP for NPC1 binding to trigger membrane fusion and subsequent host cell entry. Lenneman and colleagues demonstrated that removal of the N-linked glycans from the GP1 subunit renders GP more sensitive to proteolysis by exogeneous thermolysin and makes host cell entry independent from cathepsin cleavage34. By shielding GP from premature proteolysis, glycosylation steers processing of GP towards cathepsins at the appropriate phase of the ebolavirus replication cycle.

Future studies may shed light on how the heterogenous glycan composition of GP may modulate antibody binding or proteolytic processing during host cell entry. Similarly, the close juxtaposition of N- and O-linked glycan raises the intriguing possibility of an interplay between the two types of modifications that could be explored in future studies. Furthermore, the exact role of C-mannosylation and O-linked glycans outside the MLD also remain to be investigated. Nevertheless, we have presented a detailed overview of ebolavirus GP glycosylation that may provide a useful framework for future immunological, functional and structural studies.

Methods

Ebolavirus GP sequence analysis

The indicated full-length GP reference sequences were downloaded from UniProt. The LLOV-GP sequence had to be reconstructed from the two separate GP1 and GP2 entries in UniProt. Sequence alignment was performed with ClustalX 2.181. The sequence IDs and resulting alignment are provided in the Supplementary Information. The cladogram was generated with FigTree (version 1.4.4). N-linked glycosylation sites were predicted by identifying all NXS/T sequences with NetNGlyc-1.0. C-mannosylation sites were predicted by identifying all WXXW sequences by manual inspection.

Pseudomodel building of glycosylated EBOV GP

A homology model EBOV GP (strain Mayinga ‘76) was generated with SWISS-MODEL to fill in missing loops using PDB ID 5JQ3 as a template45,82. The core pentasaccharides were added to GP1 and GP2 subunits separately with GLYCAM Glycoprotein Builder (GLYCAM Web, Woods group 2021). The full trimer was reconstructed by alignment with the biological assembly of 5JQ3. The loop containing N204 was manually removed because it produced clashes with neighboring subunits in the full trimer. The figures were generated with ChimeraX 1.2.583.

Glycan-containing epitopes of monoclonal antibodies

Structures of monoclonal antibodies in complex with ebola virus GP were retrieved from the PDB (with PDB IDs: 2Y6S, 3CSY, 3S88, 5FHC, 5KEL, 5KEM, 5KEN, 6EA7, 6N7J, 6PCI, 6QD7, 6QD8, 6S8D, 7KEJ, 7KEW, 7KEX, 7KFE, 7KF9 and 7KFB)46,47,48,49,51,52,53,54,55,56,84. The structures were aligned with the glycosylated ZEBOV GP pseudomodel described above, using the MatchMaker function of ChimeraX 1.2.5, and glycans within 6 Å of the CDRs or framework regions of the modelled antibodies were included in the overview.

Modelling of C-mannosylated W288

We identified 12 isomorphous published crystal structures of HEK293-derived EBOV GP samples in the PDB (PDB IDs: 6F6N, 6F6I, 6F54, 6NAE, 5JQB, 5JQ7, 5JQ3, 6G9B, 6G9I, 6G95, 6HRO and 6HS4)41,42,43,44,45. To obtain higher signal-to-noise ratios from these maps, the 2Fo-Fc and Fo-Fc difference maps were averaged using COOT85. This averaged map showed clear ring-shaped electron density next to W288 in the Fo-Fc difference map at a contour level of 2.7 root mean square deviation. A C-mannosyl group was modelled next to W288 using the EBOV GP structure 6HS4 as a template in COOT. To accommodate realistic geometry, the tryptophan had to be repositioned slightly, albeit still in agreement with the local electron density. Care was taken to model the mannose with a ring-flipped 1C4 chair conformation86,87,88.

Production and purification of ebola virus GP ectodomains

Constructs for recombinant, soluble EBOV (strain Mayinga 1976) and BDBV GP (strain Uganda 2007) include residues 33 to 637, preceded by a BiP leader sequence which is removed in processing. The transmembrane region is deleted and replaced with a C-terminal double-Strep tag. EBOV and BDBV GP were produced by both transient transfection of HEK293T cells and stable transfection of Drosophila melanogaster S2 cells. Lipofectamine 3000 (Invitrogen) was used to transfect HEK293T cells, and Effectene (Qiagen) was used to produce stable S2 cells with a modified pMT-puro vector plasmid containing the GP gene of interest and stable selection of transfected cells with 6 µg/mL puromycin as described89. HEK293T cells were grown at 37 °C with 5% CO2 in DMEM media (Gibco) supplemented with 10% FBS in T75 flasks and expanded into 10-stack flasks (Corning) for transfection as described90. S2 cells were selected at 27 °C in complete Schneider’s medium and then transferred to Insect Xpress medium (Lonza) for large-scale expression in 2-liter Erlenmeyer flasks. Secreted GP ectodomain expression was induced with 500 mM CuSO4, and supernatant harvested after 4 days. Ebola virus GP was engineered with a double Strep-tag at the C terminus to facilitate purification using Strep-trap HP 5 mL column (GE) and then further purified by Superdex 200 size exclusion chromatography (SEC) in 25 mM Tris-buffered saline (Tris-HCl, pH 7.5, 150 mM NaCl [TBS]).

Production and purification of ebolavirus-like particles

EBOV and BDBV virus-like particles were produced by transfecting HEK293T cells. Polyethylenimine (PEI) was used to transfect HEK293T cells with a modified phCMV plasmid containing the full-length GP gene of interest and a modified pTriEx plasmid containing the full-length EBOV VP40 gene at a 2:5 ratio (w:w), respectively. The VLP supernatant was clarified by centrifugation after 48 h. The clarified supernatant was further purified using a 20% sucrose cushion ultra-centrifuge spin at 106,800 x g for 3 h. The cushion and supernatant was carefully decanted and the pellet washed with sterile PBS 2 times. Following the wash, the pellet was incubated overnight in 0.75 mL of PBS and resuspended.

Glycoproteomics sample preparation

For N-linked glycan analysis, the recombinant GP was denatured at 95 °C in a final concentration of 2% sodium deoxycholate (SDC), 200 mM Tris/HCl, 10 mM tris(2-carboxyethyl)phosphine, pH 8.0 for 10 min followed with 30 min reduction at 37 °C for 30 min. Samples were next alkylated by adding 40 mM iodoacetamide and incubated in the dark at room temperature for 45 min. 3 μg recombinant GP was used for each protease digestion. Samples were split in three for parallel digestion with trypsin (Promega), alpha lytic protease (Sigma), and gluC (Sigma)-trypsin. For each protease digestion, 18 μL of the denatured, reduced, and alkylated samples was diluted in a total volume of 100 μL 50 mM ammonium bicarbonate, adding proteases in a 1:15 ratio (w:w) for incubation overnight at 37 °C. For the gluC-trypsin digestion, gluC was added first for two hours, followed by incubation with trypsin overnight. After overnight digestion SDC was removed through precipitation by adding 2 μL formic acid (FA) and centrifugation at 14,000 x g for 20 min. Following centrifugation, the supernatant containing the peptides was collected for desalting on a 30 µm Oasis HLB 96-well plate (Waters). The Oasis HLB sorbent was activated with 100% acetonitrile and subsequently equilibrated with 10% formic acid in water. Next, peptides were bound to the sorbent, washed twice with 10% formic acid in water and eluted with 100 µL of 50% acetonitrile/5% formic acid in water (v/v). The eluted peptides were vacuum-dried and resuspended in 100 µL of 2% formic acid in water. For O-linked glycan analysis, the recombinant GP was first treated with PNGase F (Sigma) to remove N-glycans. 4 μL PNGase F was added to the sample in PBS and incubated at 37 °C overnight. Following N-glycan removal, GPs were digested following the same protocol as for N-linked glycan analysis, using parallel digestion with trypsin and aLP. Both N- and O-linked analyses were performed in duplicate.

Glycoproteomics LC-MS/MS measurements

For each sample and protease digestion, approximately 0.15 μg of peptides were run by online reversed phase chromatography on an Agilent 1290 UHPLC or Dionex UltiMate 3000 (Thermo Fisher Scientific) coupled to a Thermo Scientific Orbitrap Fusion mass spectrometer. A Poroshell 120 EC C18 (50 cm × 75 µm, 2.7 µm, Agilent Technologies) analytical column and a ReproSil-Pur C18 (2 cm × 100 µm, 3 µm, Dr. Maisch) trap column were used for peptide separation. The duplicate samples were analyzed with two different mass spectrometry methods, using identical LC-MS parameters and distinct fragmentation schemes. In one method, peptides were subjected to Electron Transfer/Higher-Energy Collision Dissociation fragmentation. In the other method, all precursors were subjected to HCD fragmentation, with additional EThcD fragmentation triggered by the presence of glycan reporter oxonium ions. A 90-min LC gradient from 0% to 44% acetonitrile was used to separate peptides at a flow rate of 300 nl/min. Data was acquired in data-dependent mode. Orbitrap Fusion parameters for the full scan MS spectra were as follows: a standard AGC target at 60 000 resolution, scan range 350–2000 m/z, Orbitrap maximum injection time 50 ms. The ten most intense ions (2+ to 8+ ions) were subjected to fragmentation. For the EThcD fragmentation scheme, the supplemental higher energy collision dissociation energy was set at 27%. MS2 spectra were acquired at a resolution of 30,000 with an AGC target of 800%, maximum injection time 250 ms, scan range 120–4000 m/z and dynamic exclusion of 16 s. For the triggered HCD-EThcD method, the LC gradient and MS1 scan parameters were identical. The ten most intense ions (2+ to 8+) were subjected to HCD fragmentation with 30% normalized collision energy from 120–4000 m/z at 30,000 resolution with an AGC target of 100% and a dynamic exclusion window of 16 s. Scans containing any of the following oxonium ions within 20 ppm were followed up with additional EThcD fragmentation with 27% supplemental HCD fragmentation. The triggering reporter ions were: Hex(1) (129.039; 145.0495; 163.0601), PHex(1) (243.0264; 405.0793), HexNAc(1) (138.055; 168.0655; 186.0761), Neu5Ac(1) (274.0921; 292.1027), Hex(1)HexNAc(1) (366.1395), HexNAc(2) (407.166), dHex(1)Hex(1)HexNAc(1) (512.1974), and Hex(1)HexNAc(1)Neu5Ac(1) (657.2349). EThcD spectra were acquired at a resolution of 30,000 with a normalized AGC target of 400%, maximum injection time 250 ms, and scan range 120–4000 m/z.

Glycoproteomics data analysis

The acquired data was analysed using Byonic (v3.9.6 and v4.491) against a custom database of recombinant ebola virus GP protein sequences and the proteases used in the experiment, searching for glycan modifications with 12/24 ppm search windows for MS1/MS2, respectively. Up to ten missed cleavages were permitted using C-terminal cleavage at R/K for trypsin, R/K/E/D for gluC-trypsin, or T/A/S/V for alpha lytic protease. For N-linked analysis, carbamidomethylation of cysteine was set as fixed modification, oxidation of methionine/tryptophan as variable common 1, and hexose on tryptophan as variable rare 1. N-glycan modifications were set as variable common 2, allowing up to max. 2 variable common and 1 rare modification per peptide. All N-linked glycan databases from Byonic were merged into a single non-redundant list to be included in the database search. All reported glycopeptides in the Byonic result files were manually inspected for quality of fragment assignments (with scores ≥ 200). All glycopeptide identifications from both EThcD and HCDpdEThcD runs were merged into a single non-redundant list per sequon. Glycans were classified based on HexNAc content as truncated (≤ 2 HexNAc; < 3 Hex), paucimannose (2 HexNAc, 3 Hex), oligomannose (2 HexNAc; > 3 Hex), hybrid (3 HexNAc) or complex (> 3 HexNAc). Byonic search results were exported to mzIdentML format to build a spectral library in Skyline (v20.1.0.3192) and extract peak areas for individual glycoforms from MS1 scans. The full database of variable N-linked glycan modifications from Byonic was manually added to the Skyline project file in XML format. Reported peak areas were pooled based on the number of HexNAc, Fuc or NeuAc residues to distinguish truncated, paucimannose, oligomannose, hybrid, and complex glycosylation, or the degree of fucosylation and sialylation, respectively. For O-linked analysis, all the same protease digestion parameters and peptide modifications were used, with the addition of deamidation at asparagine/glutamine as variable rare 1. O-glycan modifications were set as variable common 6, allowing a maximum of 6 variable common and 2 rare modifications per peptide.

Statistics and reproducibility

All reported values represent the average of duplicate experiments.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.