Dynamic interplay between catalytic and lectin domains of GalNAc-transferases modulates protein O-glycosylation

Protein O-glycosylation is controlled by polypeptide GalNAc-transferases (GalNAc-Ts) that uniquely feature both a catalytic and lectin domain. The underlying molecular basis of how the lectin domains of GalNAc-Ts contribute to glycopeptide specificity and catalysis remains unclear. Here we present the first crystal structures of complexes of GalNAc-T2 with glycopeptides that together with enhanced sampling molecular dynamics simulations demonstrate a cooperative mechanism by which the lectin domain enables free acceptor sites binding of glycopeptides into the catalytic domain. Atomic force microscopy and small-angle X-ray scattering experiments further reveal a dynamic conformational landscape of GalNAc-T2 and a prominent role of compact structures that are both required for efficient catalysis. Our model indicates that the activity profile of GalNAc-T2 is dictated by conformational heterogeneity and relies on a flexible linker located between the catalytic and the lectin domains. Our results also shed light on how GalNAc-Ts generate dense decoration of proteins with O-glycans.

In this plot the flexible linker is fixed to its crystallographic structure. This modification leads to a decrease of ≈ 5,000 fold in activity with respect to that reported in Fig. 5c and changes in the glycosylation profile of the enzyme.  (d)-SAXS curves at 1, 2.5 and 10 mg/ml were also measured and analysed. They are omitted here for simplicity.
(f)-Maximum intramolecular distance, Dmax, determined from the pair-wise distance distribution, p(r), with the program GNOM.
(g) Estimated molecular weight of the particles in solution based on Porod's volume computed with PRIMUS divided by 1.6. Theoretical MW of GalNAc-T2 is 56.7 kDa.
(h)-Radius of gyration, Rg, derived from the p(r) computed with the program GNOM.

Synthesis of peptides and glycopeptides
Our initial rationale to choose the glycopeptides in this work was inspired by an earlier work in which the glycosylation profile of GalNAc-T2 differed depending on whether naked MUC5AC or glycopeptides such as MUC5AC-13, MUC5AC-3 and MUC5AC-3-13 were used as initial substrates (Supplementary Table 1

Classical molecular dynamics simulations
MD simulations of the enzyme were performed with the Amber11 software package.
The protein was modeled with the FF99SB force field, whereas all carbohydrate molecules were modeled with the GLYCAM06 force field 3 . The MD simulations were carried out in several steps. First, the system was minimized, maintaining the protein, peptide and substrate molecules fixed. In a second step, the entire system was allowed to relax. Weak spatial constraints were initially added to the protein and substrates to gradually reach the desired temperature of 300 K, while the rest of the system was allowed to move freely. The constraints were subsequently removed and the system was subjected to 100 ps of constant pressure MD simulation to adjust the density of the water environment. Afterwards, 100 ns of constant volume MD simulation were performed. During this time, the peptide molecule remained in a solvated environment without approaching the active site of the enzyme. As no one of the three possible glycosylation positions of the peptide approached the active site during the 100 ns time-scale window, this process was subsequently activated using the metadynamics algorithm 4 .

Classical metadynamics simulations of substrate binding
A snapshot of the equilibrium MD simulation was taken for the metadynamics simulations, which were performed with NAMD2.9 software. As a first approach to for radiation damage that were subsequently averaged (no systematic radiation effects were observed). Buffer scattering profiles were subtracted from the sample ones using standard procedures 8 . The values of the forward scattering, I(0), and radii of gyration, R g , were calculated from the experimental SAXS patterns using Guinier's approximation (Supplementary Fig. 4 and 5),

Theoretical modeling of the SAXS results
To reproduce the SAXS distribution of the radius of gyration, we consider the crystal structure of GalNAc-T2-UDP-MUC5AC-13 complex, and define the linker corresponding to residues 437-449 (where A and B correspond to the alpha-carbon atoms of residues 437 and 449, respectively; Fig. 5a).
To generate the model, we made the following assumptions: 1) The catalytic (residues 75-436) and lectin domains (449-569) are rigid bodies. For the sake of simplicity and to avoid introducing three more variables corresponding to the Euler angles of rotation, their relative orientation is fixed to that found in the crystal structure of GalNAc-T2-UDP-MUC5AC-13 complex.
2) The linker residues are "phantoms", meaning that neither contribute to the radius of gyration, nor interact with each other or with the rest of the protein.
3) The linker is described as a Worm-Like-Chain (WLC) 11 , which is a classical model of polymer physics for semi-flexible polymers. In WLC a polymer is represented as a continuous curve, with just an elastic energy, opposed to bending. At any temperature, its equilibrium configurations are the result of a trade-off between the elastic energy that favours stretched and loopy conformations, and entropy. Thus the behavior of the WLC can be characterized by specifying just the contour length ! and the persistence length ! . The latter is a measure of the rigidity of the curve: roughly, two portions of the chains which are more than 2 ! away can have any relative orientation, while below that separation, their orientations will be correlated.

4)
According to the coarse-grained spirit of the model, we describe the interaction between the two domains by simply adding a sigmoidal potential that depends on the distance between the ends of the linker = = − (Fig. 5a): Such potential is − at =0; it halves itself at = ! and then goes to 0 for increasing ; the smaller ! , the sharper the transition. Here, ! and ! are regarded as adjustable parameters. We verified that the results, at the qualitative level, do not depend strongly on the choice of the potential.

5)
The WLC is an isotropic model: setting an end of the polymer at the origin of the reference frame, there is no intrinsically angular preference for the position of the other end, that will be spherically distributed around the origin. We limit the angular freedom in our model, by restricting the second end of the linker within a cone "in front of" the catalytic domain (in a sense specified below), thus preventing configurations where the lectin domain is situated opposite to the catalytic one (Fig.   5a).
In this framework, the radius of gyration ! of any protein conformation X will become a function of the distance vector . Indeed, denoting ! the position of atom "i" of the protein in conformation X, we can write the following equation that neglects the contributions from the linker residues: where ! , ! , ! , ! are the indices of the first atom of the protein, first atom of the linker, last atom of the linker, and last atom of the protein, respectively, and Substituting the above equations in the expression for the radius of gyration, we can relate the latter to the end-to-end vector of the linker, = − , as is independent from . The above expression for the radius of gyration allows to relate the probability distribution for the latter, as revealed by SAXS, to the probability density of the end-to-end vector of the linker. The latter will be given by !" is the distribution for the end-to-end distance in WLC as reported in Eq. 21 by Becker et al 12 . The exponential represents the Boltzmann factor due to the interaction and , is the angular probability distribution, that we add to the isotropic WLC as discussed in item 5 above.
In the spirit of the coarse-grained model, and for the sake of simplifying the calculations, we use spherical coordinates, selecting as the polar direction for the angle (we have verified that the angle between and the ! − ! axis is reasonably small, around 18 degrees), and keep isotropy for rotations of an angle around the polar axis, so that is the partition function, that ensure the proper normalization of the probability.
The population of the radius of gyration observed in the experiments will be given by:

Modeling the enzymatic activity
To the hypotheses above, we add another assumption: 6) To consider the enzymatic activity, we introduce a ligand peptide bound at P 2, again described as a WLC.
To this end, we consider points P 1 , P 2 (Fig. 5a) that are located in the active site and the α subdomain of the lectin domain, respectively, and correspond to the position of the alpha carbons of the acceptor and prior glycosylated sites.
We then let the protein domains and ligand, move freely, with the only constraint that the ligand is always bound to the lectin domain at P 2 . The glycosylation reaction at P 1 will take place when the linker and ligand conformation are such that the acceptor site of the ligand is found at its "active site" position P 1 in the catalytic domain.
The rate at which the reaction takes place will depend on the dynamics of the ligand and protein, as well as on the distance (in residues) between the prior glycosylated and the acceptor sites of the ligand.
We thus estimate the enzymatic activity as a function of the prior glycosylated and the acceptor sites distance that relies on the product of the equilibrium probability for the linker and the ligand.
Upon introducing, as above, ! ! = ! − , ! ! = ! − , and remembering that the linker end-to-end distance is = − , we start by defining: which is the "correct" distance vector between the prior glycosylated and the acceptor sites on the ligand for the reaction to take place; such vector depends on the protein conformation through the linker end-to-end vector . For a ligand peptide with the the prior glycosylated and the acceptor sites separated by residues, the probability that the glycosylation residues are found in a volume ! around a distance vector , when the protein linker end-to-end distance vector is found in volume ! around , is: For the sake of simplicity, we will assume that the density , , is simply the product of the separate distributions for the linker and ligand, as if they were statistically independent, and the conformation of the former wouldn't affect the variability of the latter. Moreover, we ignore any angular bias on , as if the ligand's free end were completely free to move in any direction, and assume a WLC behavior for the ligand, too.
This yields , , = !"# , , , where the latter factor is the equilibrium probability for the linker, and the former is the isotropic WLC expression for the ligand, assuming a persistence length for the latter: Finally, the probability that, for a given length lc of the linker and a separation l between the prior glycosylated and the acceptor sites in the ligand, the latter meet the correct sites on the protein, for any protein conformation, is found by imposing = ! and integrating over the linker conformations: To dissect the role of the linker and the ligand flexibility in determining the activity, we consider the extreme case of the linker completely frozen in its crystallographic conformation. In this case, the above equation for , is replaced by: and depends on the separation "l"of the prior glycosylated and the acceptor sites of the peptide (in the above equation ! is the end-to-end distance of the linker in the crystal structure).