To the Editor:

The importance of carbohydrates both to fundamental cellular biology and as integral parts of therapeutics (including antibodies) continues to grow. The presence of the correct glycans is important for the beneficial effects of therapeutic glycoproteins and is likely to be increasingly required by regulatory agencies. However, carbohydrates (and other small molecules) are handled poorly in macromolecular structural biology. When such small molecules are present in macromolecule structures, they are often reported with stereo- and regiochemical errors and in unlikely conformations. Stereo- and regiochemistry should always be correct, and although conformational distortions may reflect interactions taking place in a complex1, most are also likely to be erroneous—resulting from poor chemical understanding and lack of appropriate stereochemical restraints in refinement, often against low-resolution data2.

Pyranoside sugars have clear conformational preferences dictated by a minimization of angle, torsional and trans-annular strains, resulting in a favored chair (C) conformation. While carbohydrate-active enzymes can force a distortion from this minimal-energy conformation to a higher-energy one in order to enable catalysis3, this is a specialized event; of all the sugars deposited in the PDB, 65% sit undisturbed on N-glycosylation trees.

Using the Privateer4 software, we computed the real-space correlation coefficient (RSCC) against positive omit electron density (as a bias-minimized fit to observed density) for all N-glycan–forming D-pyranosides from the PDB as well as calculating their conformation-determining Cremer-Pople puckering parameters5. This subgroup was chosen because all are expected to be in the preferred 4C1 chair conformation. A number of anomalies appeared from this analysis. First, 64% of all N-glycan D-pyranosides have RSCCs of <0.8, reflecting poor fit to the electron density. Indeed 12% have RSCC <0.5. Reflecting comments by others6, it is clear that a number of models have been built incorrectly from the start: approximately 5% of the N-acetyl-D-glucosamine (GlcNAc) moieties attached to asparagine residues show an incorrect α-linkage, and some monosaccharides are then built 'upside down', resulting in a 1C4 conformation. In addition, there are more subtle errors reflecting incorrect refinement of the deposited sugar that could nevertheless have enormous implications when interpreted in a biological context. A plot of ring conformation against resolution of the X-ray analysis is shown in Figure 1. As expected, at atomic resolutions (<1.2 Å, model precision better than 0.01 Å), all sugars showing high RSCC are 4C1 chairs (yellow cluster). However, as the resolution gets lower and model precision poorer, unexpected higher-energy conformations start to appear. Most of these models also show low RSCC (<0.8; blue entries).

Figure 1: Distribution of D-pyranoside ring conformations as a function of resolution for all N-linked sugars (at distance <2.0 Å) in the PDB as of January 2015, identified by their Chemical Component Dictionary IDs: NAG, NDG, MAN, BMA, BGC, GLC, GAL and GLA.
figure 1

E/H, envelopes and half-chairs; B/S, boats and skew-boats; wavy lines denote the main ring plane. For clarity, an envelope is depicted at θ = 45° and a half-chair at θ = 135°, and skew-boat is omitted from the equator.

Although energetically unfavorable models may reflect a poor knowledge of glycochemistry and 'optimistic' density interpretation (reflected in low RSCCs), it is nevertheless clear that in many cases macromolecular crystallographers are failing to apply appropriate conformational restraints to encourage chemically sensible models at lower resolutions (>1.6 Å). Although community re-refinement efforts such as PDB_REDO7 have led to substantial improvement in protein models, many sugars are still in high-energy conformations as a result of re-refinement without dihedral restraints. Torsion restraints, which approximate the eponymous energy barriers, can be used to penalize models with eclipsed conformations, encouraging a particular ring puckering for sugars. However, the perceived difficulty of modeling torsional preferences often results in these restraints being tacitly turned off, regardless of the resolution, in many refinement and model-building programs. This creates a vicious circle: publication and deposition of incorrect structures informs subsequent statistical analyses that suggest the deposited structures are 'normal'.

Problems with the refinement of protein structures in the 1980s led to the rise of standard dictionaries, consistent refinement strategies, better graphics programs and community-accepted best practices. Ligands generally, and carbohydrates especially, got left behind. The fundamental roles of carbohydrates in cell biology and medicine, the extraordinary experimental advances in carbohydrate synthesis and the large increase in eukaryotic expression systems now demand improved refinement protocols for these key biological species, too.