3D structure of a glycoprotein spike

A 3D structure of a glycoprotein spike that helps coronaviruses attach to cells. The spike is decorated with glycans that stop antibodies binding.Credit: X. Xiong et al./J. Virol.

Just 22 amino acids are all that’s needed to make all the world’s proteins. Four nucleotide bases encode biology’s blueprints in DNA. But when it comes to another, equally crucial, class of biomolecules called glycans, scientists don’t even know if there is an equivalent alphabet that the cell uses to make them, says bioinformaticist Jaya Srivastava of the Indian Institute of Technology in Mumbai.

Glycans are sugar-based polymers that coat cells and decorate most proteins, forming glycoproteins. They are crucial for biological processes such as immune regulation and intercellular interactions. This makes the apparent lack of a glycan alphabet1 surprising, and reflects an enduring issue: just how little scientists know about sugars.

More than 30 years ago, chemist Carolyn Bertozzi was astounded by the paucity of chemical information about glycoproteins. At least half of all mammalian proteins are glycosylated — meaning they have at least one glycan attached. Without the correct sugary suffixes, proteins misfold or become unstable or non-functional. “The biological importance of glycans was well established by the 1980s,” says Bertozzi, now at Stanford University in California. “But it was very hard for biologists to answer any questions in glycoscience, because they didn’t have the tools.”

Proteins and DNA could easily be manipulated in the lab, but that wasn’t true of glycans. As a result, studies of sugars have lagged behind research into other macromolecules. This is in part because glycans are not synthesized using any known template, and because they can change dynamically depending on a cell’s metabolic state. What’s more, sugar isomers — molecules with the same chemical formula but different structures — can be used to build varied glycans, but are almost impossible to tell apart on the basis of molecular weight alone.

In 2015, the US National Institutes of Health established the Common Fund Glycoscience programme to develop overarching technologies for studying glycans in biomedicine. At the time, researchers identified a lack of tools as the greatest hurdle in glycobiology. Now, they’re beginning to address it.

Bertozzi and others have pioneered methods to image glycans in living or fixed tissues. Thanks to improvements in mass spectrometry and Raman spectroscopy, researchers can more easily identify and characterize glycoproteins. Several scientists, including Srivastava, are developing open databases — such as UniCarbKB, GlyTouCan and the Glycan Mass Spectral Database — that can be used to identify sugars and common glycosylation sites on proteins. Others have focused on high-throughput techniques, including arrays that capture data from hundreds of glycans or glycoproteins at once.

“Things that used to take an entire PhD can now be done in a matter of weeks,” Bertozzi says. “To me, this feels like an inflection point for the field.”

Sugar spotting

When Bertozzi set up her first lab at the University of California, Berkeley, in 1996, she began to work on a fundamental tool: a way to visualize a sugar on a cell, in the same way that proteins can be tagged with a fluorescent marker and picked out under a microscope.

The technique she developed, now widely used, is known as bio-orthogonal chemistry. It relies on marking sugars with a small, biologically unreactive chemical group that can slip undetected past the enzymes that attach glycans to proteins. Once this tagged sugar has been incorporated into a complex glycan and draped over a protein, a fluorescent dye can be snapped onto that chemical group in the cell, allowing the glycan to be visualized under a microscope.

“The key was that we needed to find two functional groups that would react with each other, but neither would react with anything else in the body,” says Bertozzi. This ‘bio-orthogonality’ is what counts: “They need to be chemically invisible in the biological world.” She and her colleagues have applied bio-orthogonal tools to identify glycoproteins that are unusually abundant in, or unique to, prostate-cancer tissues; used them to track where cells with different surface glycoproteins migrate in the zebrafish jaw during development; and more.

Comparison of imaging techniques on human breast-cancer tissue

Various N-glycans can be located in human breast-cancer tissue using mass spectrometry (right) and compared with the same section of tissue coloured using typical cell stains (left).Credit: Richard Drake

Now others are extending the concept. Instead of tagging a sugar with a chemically reactive group and then coupling it to a dye or fluorescently labelled antibody in separate steps, chemical glycobiologist Peng Wu at Scripps Research in La Jolla, California, and his colleagues devised a way to tether the sugar to the dye directly, without the chemical group linker. That works because many of the enzymes that synthesize glycans will function even if their sugar substrate is toting a bulky fluorescent dye or labelled antibody. “The molecular weight of the sugar intermediate in reactions is 400–500 daltons,” Wu says. “No one thought it’d be possible to introduce an antibody with a molecular weight of 150 kilodaltons on to the sugar and have the reaction still work.”

In a study2 last year, Wu’s team injected zebrafish embryos at the one-cell stage with two dye-labelled sugars, and tracked the tagged molecules through development using confocal microscopy. When compared with the two-step bio-orthogonal reaction, these labelled sugars yielded stronger signals from deep tissues such as the zebrafish head2.

Abundant arrays

Such tools can reveal facets of glycan metabolism, but to crack the glycome, which encompasses all of a cell’s glycans, glycobiologists require a different tool set. “High-throughput methods are essential for glycoscience to keep pace with discoveries in proteomics and other fields,” says Lara Mahal, a chemist at the University of Alberta in Edmonton, Canada.

In 2002, for instance, researchers adapted one of the original high-throughput tools of genomics, the DNA microarray, to glycoscience. The glycan array is a slide dotted with synthetic polymers that can help to identify proteins that bind to sugars, and researchers using it have identified, for example, differences between the cellular binding sites for human and avian influenza viruses3. But glycan arrays present sugars at high densities and without their cellular protein and lipid partners. As a result, they might not reflect true biological interactions. So Mahal turned to nature’s original glycan binders, proteins called lectins. By putting these on an array, she created a tool that binds to all the glycans in a sample, whether they are isolated sugar fragments or are attached to proteins, lipids or other biomolecules4.

To reveal the diversity and abundance of glycans on proteins, researchers today are blending these approaches with a tool of metabolomics and proteomics research called MALDI mass spectrometry imaging. Mass spectrometry identifies molecules on the basis of their mass and ionic charge. Proteomics researcher Anand Mehta at the Medical University of South Carolina in Charleston and his colleagues have combined mass-spectrometry imaging with arrays of glycoprotein-binding antibodies to measure the relative amounts of glycans bound to different proteins present in samples such as human blood serum, which can contain hundreds of glycosylated proteins5. “You can quickly see which proteins’ glycosylation patterns are altered in cirrhosis, cancer or other diseases,” he says.

At the University of Copenhagen, glycoscientist Henrik Clausen and his team have designed a cell-based glycan array by pruning the sugars off a common cell line called HEK293, and then reintroducing the genes for 170 glycan-synthesis enzymes6. Subsets of cells express different enzymes — and thus, different surface glycoproteins — and serve essentially as spots on an array. But rather than imaging the results with a microarray reader, the researchers use flow cytometry, a method in which cells are scanned individually with a laser to identify bound molecules. Turning to the enzymes rather than the sugar structures alone places glycome research in its biological context, Clausen says. “Not only do you learn what structures they bind to, you find out what genes and enzymes are involved in making that structure.”

Dissecting bonds

Clausen is also working to address another vexing aspect of glycobiology. Despite significant advances in understanding sugars’ complex structures, Clausen says, “we are still quite far from being able to, in an unbiased analysis, understand which sugars are at what sites on what protein”.

Last year, the US National Institute of Standards and Technology in Gaithersburg, Maryland, provided 76 labs around the world with samples of a specific glycosylated antibody and asked them to identify the sugars present and their locations in the antibody protein. The teams reported three broad chemical groups of glycans containing sialic acid, fucose, galactose or their derivatives. But their detailed assessments varied widely7.

Using mass spectrometry, researchers effectively identified glycan linkages, but in many instances failed to differentiate between sugar isomers. Labs also struggled with a class of sugars called O-linked glycans, which are connected to an oxygen atom in an amino acid. There’s no specific amino acid or sequence that marks the location of an O-linked glycan, and although many analytical tools require glycans to be separated from their protein backbones, no single enzyme can cleave all such groups. N-linked glycans, by contrast, are attached to asparagine residues in a conserved sequence of four amino acids on proteins, and can all be sheared off the protein using an enzyme that leaves a characteristic molecular ‘scar’, Clausen says. “That’s why our understanding of the N-glycoproteome — not in terms of structures but where the sugars are — is decades ahead of all types of O-glycans.”

Last year, Clausen’s team developed a method to try to close that gap. First, the team used its cell-based glycan arrays to create a library of mass spectra from O-linked sugars representing more than 2,000 glycoproteins. Using this, the group was able to detect and quantify 269 O-linked glycans without the need for a preset range of ions and spectra, which currently are available only for N-linked glycans8.

Other groups have adapted Raman spectroscopy, a method that uses molecules’ vibrational spectra as signatures, to visualize glycans on cell surfaces. One study9 applied the method to living tissues, and identified glycosylation patterns that were unique to breast- and brain-cancer cells. “Most Raman studies so far have focused on simple model proteins, so to see it used on an actual biological sample is really interesting,” says radiology researcher Sharon Pitteri at Stanford.

Raman spectroscopy is “a good match” for the relatively abundant sugars found in biological tissue, says Ewan Blanch, a physical chemist at RMIT University in Melbourne, Australia. But attempts to use it have been hampered by a lack of reference data. Technological advances are improving matters, Pitteri says. Historically, researchers had to cleave sugars from proteins and study glycans separately. Now, they can slice glycoproteins in different ways to study sugars in the context of protein fragments, then cleave the two apart to examine the sugar and protein individually. These tools are particularly helpful for O-glycans, she adds.

Mainstream merging

Researchers are also working to better integrate glycomics with wider biomedical research. Such connections can help to identify not just how glycans are altered in cancer, immune dysfunction or other diseases, but also why. “If you tell a cell biologist that his protein binds di-sialo-fucosyl-polyLacNAc, he knows nothing,” Clausen says. “But if you tell him that the protein glycosylation requires these four genes to be expressed, he can go back to genetics and manipulate that glycosylation.”

This step is also crucial for therapeutics, Mahal adds, because drug developers are “not likely to target the glycan, but the enzyme that makes it”.

Indeed, large-scale screens frequently implicate glycan-processing enzymes in various processes and diseases, making it possible — and even necessary — for biologists to reckon with glycoscience. “When we take the bias out of biological inquiry, it often sends us back to glycoscience,” Bertozzi says.