Journal home
Advance online publication
Current issue
Archive
Press releases
Methagora
Focuses
Guide to authors
Online submissionOnline submission
Permissions
For referees
Free online issue
Contact the journal
Subscribe
naturejobs
For Advertisers
work@npg
naturereprints
About this site
For librarians
Application notes
 
NPG Resources
Nature
Nature Biotechnology
Nature Protocols
Nature Genetics
Nature Chemical Biology
Nature Cell Biology
Nature Neuroscience
Nature Reviews Genetics
Nature Reviews Molecular Cell Biology
Nature Reviews Drug Discovery
Nature Conferences
NPG Subject areas
Biotechnology
Cancer
Chemistry
Clinical Medicine
Dentistry
Development
Drug Discovery
Earth Sciences
Evolution & Ecology
Genetics
Immunology
Materials Science
Medical Research
Microbiology
Molecular Cell Biology
Neuroscience
Pharmacology
Physics
Browse all publications
Review
Nature Methods 2, 817 - 824 (2005)
Published online: 21 October 2005; | doi:10.1038/nmeth807

Glycomics: an integrated systems approach to structure-function relationships of glycans

Rahul Raman1, S Raguram1, Ganesh Venkataraman1, 3, James C Paulson2 & Ram Sasisekharan1

1 Biological Engineering Division, Center for Biomedical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.

2 Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA.

3 Present address: Momenta Pharmaceuticals, 675 West Kendall Street, Cambridge, Massachusetts 02142, USA.

Correspondence should be addressed to Ram Sasisekharan rams@mit.edu

In comparison with genomics and proteomics, the advancement of glycomics has faced unique challenges in the pursuit of developing analytical and biochemical tools and biological readouts to investigate glycan structure-function relationships. Glycans are more diverse in terms of chemical structure and information density than are DNA and proteins. This diversity arises from glycans' complex nontemplate-based biosynthesis, which involves several enzymes and isoforms of these enzymes. Consequently, glycans are expressed as an 'ensemble' of structures that mediate function. Moreover, unlike protein-protein interactions, which can be generally viewed as 'digital' in regulating function, glycan-protein interactions impinge on biological functions in a more 'analog' fashion that can in turn 'fine-tune' a biological response. This fine-tuning by glycans is achieved through the graded affinity, avidity and multivalency of their interactions. Given the importance of glycomics, this review focuses on areas of technologies and the importance of developing a bioinformatics platform to integrate the diverse datasets generated using the different technologies to allow a systems approach to glycan structure-function relationships.
Given the limited number of genes in the entire genome of mammals including humans, protein post-translational modifications (PTM) regulating protein function have a more important role in cell phenotype than was previously suspected. The central dogma of molecular biology has been revised to include protein PTM such as glycosylation. Glycosylation, or the attachment of glycans or carbohydrates to proteins, is perhaps the most extensive and complex form of protein PTM, and it provides for the needed functional diversity to generate extensive phenotypes from a limited genotype.

Based on their backbone chemical structure, glycans can be classified broadly as linear and branched sugars. Branched glycans are present as N-linked and O-linked glycosylation on glycoproteins or on glycolipids1, 2, 3. The majority of the linear sugars are glycosaminoglycans, which contain long polymers of sulfated disaccharide repeat units that are O-linked to a core protein, forming a proteoglycan aggregate4 (Fig. 1). There is accumulating evidence for the role of glycans in cell growth and development1, 5, 6, 7, 8, 9, tumor growth and metastasis10, 11, 12, 13, 14, 15, anticoagulation16, 17, 18, immune recognition/response19, 20, 21, 22, 23, cell-cell communication24, 25 and microbial pathogenesis20, 26, 27, 28, 29, 30. Owing to their ubiquitous presence at the cell-extracellular interface, glycans are located in an environment of many proteins such as growth factors, cytokines, immune receptors, enzymes and others. The numerous biological roles of glycans are attributed to their interactions with these proteins, and thus, glycans modulate protein activity at the cell-extracellular interface.

Figure 1. Chemical diversity of glycans.
Figure 1 thumbnail

(a) Different classes of glycans in the symbol nomenclature developed as a collaborative effort to homogenize glycan representation. Directionality is from nonreducing end at the top to the reducing end at the bottom with the arrows indicating the extension at the nonreducing end. Linkages between monosaccharides contain the anomeric configuration of the monosaccharide (a, alpha and b, beta) and the oxygen atom in the reducing end monosaccharide to which it is linked to. "/" is used to represent either-or case (b3/4 means b3 or b4). In the case of complex N-linked glycans, the common terminal motifs attached to Gal are shown in a dotted box. Abbreviations HS, CS and DS correspond to heparin or heparin sulfate, chondroitin sulfate and dermatan sulfates, respectively.



Full FigureFull Figure and legend (49K)
The study of glycans presents unique challenges that necessitate a systems approach involving multiple components as well as integration of information at molecular, cellular, tissue and higher levels31. Three fundamental and interrelated aspects of glycans make this field both intriguing and challenging. First, the biosynthesis of glycans is a non–template driven process involving coordinated expression of several glycosyltransferases, some of which have additional tissue-specific isoforms1, 2, 3, 32, 33. The complex biosynthesis and lack of proofreading machinery leads to inherent heterogeneity and large diversity of glycan structures. Furthermore, this complicates investigation by a functional genetics approach to knock in or knock out specific structures and directly evaluate their effect on the whole-organism phenotype. Second, the chemical heterogeneity and diversity of glycans has challenged the development of analytical techniques to accurately define their chemical structures. Also, owing to their mode of biosynthesis, ubiquitous subcellular distribution and glycoprotein diversity arising from one or more glycosylation sites, glycans always need to be considered as a heterogeneous mixture of different chemical structures when isolated from cells and tissues. Finally, understanding the biochemical basis of glycan-protein interaction is complicated by the multivalency and graded affinity involving an ensemble of glycans making multiple contacts with multivalent protein binding sites4, 24. Thus, glycomics—defined as a systems or integrated approach to glycan investigation—is necessary to truly delineate glycan structure-function relationships.

Recognizing the need to take an integrated approach to advance glycan structure-function relationships, several international collaborative efforts, namely the Consortium for Functional Glycomics (CFG; a multimillion dollar initiative funded by US National Institute of General Medical Sciences), EuroCarb, the Japanese Consortium for Glycomics and many other resources have been established (Table 1). Motivated by the need to address the challenges outlined above, these collaborative efforts are resulting in the development of novel resources and technologies for glycomics.

Table 1. Large-scale glycomics initiatives
Table 1 thumbnail

Full TableFull Table
Using CFG as a model system, this review aims to provide a perspective on the different technologies ranging from a functional genetics approach to structural characterization of glycans and biochemical aspects of glycan-protein interactions. The focus of this review is on the datasets provided by these technologies, how they are interrelated and how they need to be integrated to enable glycomics. The review also discusses the development of a bioinformatics platform that bridges multiple datasets collected using the different technologies to provide a systems framework for glycomics. Although emphasis is given to branched sugars in this review, the overall concepts behind the integrated systems approach are applicable to both linear and branched sugars.

Functional genetics approach to glycomics
Toward obtaining functional readouts on the various biological roles of glycans, there have been advances in transgenetic technologies to evaluate the effect of knockouts of glycan biosynthesis enzymes. The number of known glycan biosynthetic enzymes has increased dramatically since the identification of the first set of glycosyltransferase genes in 1980s. Human and mouse glycosyltransferases have been primarily used in engineering glycosylation of proteins and antibodies that are used as therapeutics to improve their therapeutic parameters31. The glycosyltransferases have also been used in de novo synthesis of specific glycan structures34, 35.

There are 98 genes in humans corresponding to glycosyltransferases that have been annotated in the KEGG database (Table 2). As a part of the collaboration with the Japanese glycomics initiative, the CFG has expanded this list to around 200 genes and is now in the process of annotating them in terms of the reaction specificity of the enzymes. Characterization of the glycosyltransferases has permitted studies in which these genes are knocked out in somatic cell cultures to investigate effects of alteration or complete inhibition of glycosylation on cellular phenotype1. Given that glycans are at the interface of the cell-extracellular region, it was necessary to develop whole-organism genetics to understand how genotype influences the phenotype of the entire organism. Advances in mammalian transgenetic technologies have led to engineering knockouts of these genes in vivo.

Table 2. Web-based resources for glycomics
Table 2 thumbnail

Full TableFull Table
Whole-organism functional genetic studies have provided valuable insights by directly linking the role of glycosylation of proteins and glycan diversification to the phenotype at the cellular and the whole-organism level. For example, knockouts of the GlcNAc and GalNAc transferases involved in the early stages of N-linked and O-linked glycan biosynthesis, and diversification often result in severe phenotypes—specifically, developmental defects, immune dysfunction, inflammation deficits and even embryonic lethality1. This is perhaps not surprising as such mutations affect the majority of cells and their glycoproteins. But transgenic mice containing a knockout of later-stage enzymes such as fucosyl and sialyl transferases, which are involved in the cell type diversification and capping of N-linked and O-linked glycans, provide more discrete and consequently more difficult-to-detect phenotypes. Recent phenotype analysis of knockout strains of sialyl and fucosyl transferases has revealed interesting phenotypes that provide evidence for specific glycan sequences in mediating aspects of cell-surface biology36, 37, 38.

Large initiatives such as CFG now are generating transgenic mouse lines representing knockouts of later-stage fucosyl and sialyl transferases. These transgenic mice are subject to a battery of phenotype analysis studies, namely (i) hematology and coagulation chemistry, (ii) histological staining of tissues, (iii) immunology assays such as FACS, Ig level analysis, measurement of B- and T-cell proliferation upon induction with various agents and cytokines and (iv) various metabolism and behavioral tests. These studies have generated volumes of new data that provide many parameters to quantify distinct phenotypic abnormalities in these mice.

Owing to the chemical diversity of glycans, the above highlight the complexity and challenges in unraveling how glycans modulate whole-organism phenotype. Thus it is necessary to couple the functional genetics and whole-organism phenotyping studies with measuring gene expression of glycan biosynthesis enzymes, their binding proteins and to correlate these measurements with the repertoire of glycan structures present on specific tissue or cell types. The development of technologies for taking such an approach is discussed in the following sections.

Development of glyco-gene microarray for glycomics
Measurement of simultaneous expression of several thousand genes in different cells to construct genetic networks and pathways has been an important component of a systems approach to molecular and cell biology. There have been considerable advances in the development of commercially available genome-wide microarrays (such as Affymetrix chips) to improve gene expression measurements, such as enhancing signal-to-noise ratio, among others. Investigating gene expression of enzymes involved in glycan biosynthesis and that of glycan binding proteins provides a new dimension to study processes at the cell-extracellular interface. There are challenges in using genome-wide arrays to investigate the dynamic nature of glycans-protein interactions. Some of these challenges arise from a limited representation of glycan biosynthesis enzymes on human and mouse genome microarrays and limited sensitivity in measuring expression of these genes relative to other downstream events39.

Glyco-gene–based DNA microarrays, which focus on glycan biosynthesis and binding protein genes, were designed to overcome the above challenges. After careful consideration of choice of DNA printing technology, DNA probe format and appropriate sequence for the probes, an Affymetrix array–based glyco-gene microarrays were designed39. These customized microarrays have been valuable resources for advancing glycomics. For example, over the past few years, around 400 samples representing various tissues and cell types have been analyzed on the CFG glyco-gene microarrays as a part of focused experiments to study differences in expression of glycan biosynthesis genes. The focus of the experiments include analysis of different tumor cell lines, cell types or tissues from glycosyltransferase or glycan binding protein (GBP) knockout mice strains, cells under mechanical stress such as chondrocytes and others. Thus these glyco-gene microarrays provide information on simultaneous expression of glycan biosynthetic enzymes that can be then correlated with the actual glycan structures, which had been characterized in a given sample.

Glycan analysis—from high-throughput to fine structure characterization
An important aspect of functional glycomics is the characterization of the primary chemical structure of glycans. Owing to heterogeneity of glycans arising from their complex nontemplate-driven biosynthesis, glycans isolated from cells and tissues comprise a heterogeneous repertoire of structures. Independent of the challenges in isolating glycans, there is a practical need to characterize the entire repertoire of glycan structures from the cell surface or on proteins as glycan-GBP interactions involve multivalent binding with several glycan structures on the cell surface. Several biochemical and analytical methodologies have been developed to address the above challenges.

There has always been trade-off between the sensitivity of fine structure characterization of glycan mixture (in terms of the exact sequence of each glycan) and the ability to perform a high-throughput analysis of a large number of glycans in the mixture. Mass spectrometric (MS) methods have been valuable in obtaining high-throughput mass profile of glycans from entire cells and tissues40, 41, 42, 43, 44, 45. Matrix-assisted laser desorption/ionization (MALDI)-MS–based analysis of human and mouse cells and tissues (as done by CFG) has provided a snapshot of the mass profiles along with the most likely set of glycan structures annotated by an expert. More recently, automated algorithms to annotate MALDI-MS spectra based on incorporating the domain knowledge have been developed46. Whereas such a high-throughput analysis provides a good snapshot of the most likely structures to be present in a given tissue, the exact structures of the glycans in terms of the explicit monosaccharides and linkages are difficult to assign, particularly at higher molecular weights. A more-detailed linkage analysis of each glycan in the mixture is necessary for rigorous assignment of structure and also to quantify its relative abundance in the mixture.

High-performance liquid chromatography (HPLC) is a well-developed technique to obtain a profile of glycans in a mixture based on their elution profile47, 48. The glycans are labeled using a radioactive or fluorescent tag and are quantified based on the intensity values. HPLC-based analyses also provide quantitative information on relative abundance of different glycans. One such approach involves the use of a library of exo-glycosidases that specifically cleave individual monosaccharides from the nonreducing end of the glycan. The shifts in the chromatographic elution profile of glycans upon treatment with each glycosidase are used to assign the structure based on the specificity of cleavage48.

Fine structure characterization of glycans involves selection of a mass or chromatographic peak followed by fragmentation or depolymerization and characterization of the fragments formed. Techniques involving different types of MALDI-MS and electrospray ionization (ESI)-MS analysis capture specific mass ions and fragment them to do a MS-MS or MS-MS-MS fragmentation pattern analysis. There have been considerable advances in the development of computational methods for automatic assignment of glycan structure based on MS fragmentation patterns49, 50, 51, 52. The fragment ions are compared to those obtained from theoretical fragmentation of known glycan structures, and the structure of the unknown glycan is assigned. More recently, a highly sensitive approach using Fourier transform ion cyclotron mass spectrometry (FT-ICR-MS) was used to characterize different glycans including glycolipids53, 54, 55. Further, nanoscale liquid delivery using chip-based electrospray interface has been coupled with tandem MS as well as FT-ICR-MS for high-sensitivity characterization of glycans56, 57.

Nuclear magnetic resonance (NMR) spectroscopy is another powerful technique for obtaining important sequence information on glycans. The one-dimensional proton and carbon spectra (specifically anomeric nuclei) of a glycan mixture along with the coupling constants from homonuclear (gradient selected correlation spectroscopy, gCOSY and total correlated spectroscopy, TOCSY) and heteronucelar (heteronuclear multiple quantum correlation, HMQC and heteronuclear multiple bond correlation, HMBC) spectra provide quantitative information on distinct monosaccharides58, 59, 60. For example, the ratio of abundances of glucose to galactose to mannose can be obtained. The anomeric chemical shifts of the monosaccharides can be classified further based on the neighboring monosaccharide (at the reducing end), which would provide the abundance of the specific linkage between the two monosaccharides. This information is particularly important for terminal sialic acids, which can be alpha2-3– or alpha2-6–linked to the penultimate monosaccharide. The characteristic NMR chemical shifts and coupling constants of various glycans in literature have been compiled into databases to facilitate assignment of glycan structures based on NMR data (Table 2).

Techniques for fine structure characterization of glycans have limitations when considering biological samples with a large mixture of glycans. The requirement of high sample amounts (in the case of NMR) and multiple steps of enzymatic or other fragmentation methods for larger glycan mixtures complicate the use of these techniques for high-throughput analysis. To enhance the utility of the above-mentioned analytical methods, informatics-based sequencing methodologies that incorporate data from multiple complimentary techniques as constraints to assign unique glycan sequence in an unbiased manner have been developed61, 62 (Fig. 2). As these methods use the best set of attributes provided by an analytical method on glycans, they also facilitate rapid and improved characterization of an ensemble of glycan structures.

Figure 2. Informatics approach to characterize glycans.
Figure 2 thumbnail

Schematic of the technology developed to sequence HS-glycosaminoglycans in a rapid and unbiased fashion. The best set of attributes obtained using each methodology, namely disaccharide composition from capillary electrophoresis, monosaccharide composition and linkage information from NMR and chain length from MALDI-MS are incorporated as constraints into a computational framework that iteratively reduces the large solution space to the final set of solutions that satisfy all the experimental constraints.U, uronic acid (alpha-L-iduronic or beta-D-glucoronic acid); H, alpha-D-glucosamine; linkage between U-H and H-U is 1–4. X represents sulfation sites (SO3-) and Y represents sites of sulfation or acetylation (COCH3).



Full FigureFull Figure and legend (48K)
Biochemical analysis of specificity of glycan-protein interactions
GBPs are often used to describe proteins that bind to N-linked and O-linked glycans and mediate cell adhesion, trafficking and signalling events in inflammation and immune responses2, 3, 63. The main classes of GBPs include C-type lectins, galectins and siglecs. The GBPs also act as receptors for viruses, bacteria and other microbes that use the glycans attached to cell-surface glycoproteins as ligands for the host-cell GBPs. The glycan binding sites on GBPs typically accommodate mono- to tetrasaccharide glycan ligand motifs. The interaction between any single glycan binding site and the glycan motif is low affinity with values in the micro- to millimolar range. Most of the physiological glycan-GBP interactions, however, are multivalent involving binding of an ensemble of glycan motifs to multimeric CRDs formed by association of GBPs. GBPs are either expressed as soluble or membrane-bound proteins in the monomeric or multimeric forms with multiple glycan binding sites. Also, the GBPs can either be dispersed on the cell surface or localized in a microenvironment such as microvilli or clathrin-coated pits24.

Based on the multivalency and distribution of GBPs on the cell surface, different synthesis strategies have been adopted to construct appropriate multivalent glycan ligand probes for high-affinity binding. Advances in chemical synthesis strategies such a solid-phase synthesis64 and the one-pot chemo-enzymatic approach34, 35 have led to synthesis of hundreds of glycan structures that capture the diversity of the glycans present at the cell surface. The primary utility for these large multivalent glycoconjugates has been in competitive assays to assess the relative binding affinities of GBPs24 and for designing inhibitors to physiological glycan-GBP interactions. On the other hand, lectin-based approaches such as lectin columns and lectin arrays have been used to fingerprint glycosylation on glycoproteins65. The proteins used in these approaches are typically plant lectins or antibodies with defined specificities toward specific glycan motifs.

To expand the current knowledge of the glycan ligand specificity of individual GBPs, arrays of synthetic and physiological glycans are being developed66, 67, 68. There are many approaches to development of glycan arrays. For example, the CFG has developed two types of glycan arrays—a well-based plate array on which a specific glycan ligand is prepared as a solution of fixed concentration and a solid-phase, printed array comprising NHS-activated glass slides on which the glycans are printed68. Compared to the well-based array, the printed array better mimics the physiological distribution of glycans on a cell surface that will be presented to the multivalent GBPs. The GBP is introduced into the array upon treatment with a primary antibody, and the signal is obtained by using a secondary antibody attached to a fluorophore, similar to the enzyme-linked immunosorbent asssay (ELISA). Typically, to promote multivalent high-affinity binding, conditions such as protein concentration are optimized to ensure that the GBP would be present in the dimeric or other multimeric forms. The glycan arrays have been extensively used to screen for novel ligand specificities for GBPs and for development of antibodies to target specific glycan motifs (Fig. 3). For example, using the CFG glycan arrays, detailed insights into the distinct ligand specificities of dendritic cell–specific ICAM grabbing nonintegrin (DC-SIGN) and DC-SIGN–related protein (DC-SIGNR) were identified. Both of these GBPs bound to high mannose–containing glycan structures, whereas only DC-SIGN bound to Lewisx (motif in which Gal and Fuc are linked b4 and a3, respectively, to GlcNAc) and other fucosylated motifs20. This analysis was important in understanding the structural basis for differences in host cell and pathogen recognition by these proteins. Thus the glycan array data are rapidly expanding the knowledge on the ligand specificity of different GBPs, thus providing a biochemical context to understanding how cellular phenotype is modulated by glyco-related gene expression.

Figure 3. Glycan arrays to identify novel glycan specificities of proteins.
Figure 3 thumbnail

The layout of a well-based glycan array; the principle of detecting the relative binding of GBP using primary and secondary antibodies is highlighted in the red box. The data obtained from the array in the form of a two-dimensional image with signal intensities corresponding to each well are transformed into a bar chart to facilitate ranking of the glycan ligands based on their relative affinities for the protein screened. The details of the glycan structure corresponding to one of the high-affinity binders are shown. Data collected on the glycan array with the ability to navigate through these data as shown in the figure are available via user-friendly interfaces at the CFG website (Table 2).



Full FigureFull Figure and legend (103K)
Bioinformatics platform for glycomics
It is clear that there is a need to cut across multiple datasets to truly understand the structure-function relationships of glycans. A critical component that enables this process is a bioinformatics platform to store, integrate and process the information generated by the above methods and disseminate it in a meaningful fashion via the internet to the scientific community worldwide. The evolution of information generation for glycomics is different from that for genomics and proteomics. Representation of glycan chemical structure analogous to the primary sequence of proteins or DNA was challenging owing to the chemical complexity and branching patterns69. Issues with representation of glycans were augmented by challenges in characterizing glycan structures using analytical techniques in the past. Owing to these challenges, earlier efforts to develop databases for storing glycan structures such as Complex Carbohydrate Structures Database (CCSD) had to be discontinued. In recent years, with the recognition of the importance of glycobiology and with advances in technology for characterizing glycan structures, academic and commercial organizations including the CFG are making considerable efforts to build databases such as Glycosuite database70, KEGG Glycan database71 and tools46, 72, 73, 74, 75 for representation and analysis of glycan structures (Table 2).

An important aspect of the systems approach to glycan investigation is to define relationships between different entities that would facilitate the integration of information. As an analogy, gene ontologies developed by the Gene Ontology Consortium go beyond cataloging gene information. They capture relationships between molecular function of the gene product in context of a biological process. To capture complex relationships between diverse data, it is necessary to develop an object-based relational database. Datasets from the different glycomics technologies include Microsoft Excel spread sheets to ASCII-delimited data of field and field-value pairs in which each individual parameter is important for data analysis and integration. For example, there are three primary objects in glycomics datasets—GBPs, glycan biosynthetic enzymes and the glycan structures. The different methodologies that provide datasets are further organized into secondary and other levels of objects with defined inter-relationships and relationships to the primary objects. The Supplementary Note online comprises an animated presentation showing the important components and features of large-scale glycomics databases using CFG as a model system.

The blueprint of the object-based relational database is the data model or ontology diagram that captures data definitions and inter-relationships, which is quite complex for glycomics databases (Supplementary Note). It is therefore important to develop a software architecture that keeps this complexity hidden from the user during data acquisition and dissemination. The three-tier software architecture comprising the back-end relational database to store the data and annotate their relationships, a middleware application layer that communicates between the database and the user interface, and the top layer comprising the user interfaces to the database is best-suited for this purpose. This software architecture facilitates the scientist to easily deposit the data into the database, which is automatically organized into the relational tables by the middleware application layer.

Central to the data integration is the ability to link orthogonal data sets derived from identical or similar samples. For example, the gene expression profile of a specific tissue or cell line isolated from a given strain of transgenic mice needs to be automatically associated in the database with orthogonal information such as glycan profile, histological staining and immunological profile from a similar or identical sample. Such an integration would allow researchers to cut across multiple datasets and start asking the questions such as "does the expression of glycosyltransferase correlate with glycan profile of that tissue?" or "can the pathological analysis of the tissue be explained on the basis of gene expression profile?".

Another emerging concept in data integration is the molecule page interface, which provides a portal to information and data ranging from molecule to mouse. The molecule pages are evolving into standardized interfaces not only for glycomics but also for genomics and proteomics initiatives76. In the case of glycomics, the molecule page interface was introduced by CFG to capture information pertaining to different families of glycan binding proteins (Supplementary Note). The CFG molecule pages contain three main components: (i) automatic acquisition of information from other public databases on that molecule, (ii) automatic interface with CFG data pertaining to that molecule, (iii) contribution from a group of experts on that particular molecule.

Finally, the bioinformatics platform needs to support computational tools to perform data mining analysis on the large scale glycomics data sets. For example the prediction of glycan structures based on gene expression profiles of glycan biosynthetic enzymes and the identification of patterns in an ensemble of glycans that govern the multivalent high-affinity interactions with specific GBPs are now enabled because of the user-friendly access to diverse datasets via relational databases.

Conclusions and future directions
Technological advances in DNA microarrays and mass spectrometric methods coupled with availability of genome-wide sequence information has steered postgenomics research towards a 'systems' approach to understanding cellular phenotype as a function of its gene and protein components. Glycomics has emerged as a fundamental field in providing an important dimension to this approach.

The challenges that lie ahead for advancing glycomics include: (i) explaining how glycan diversity is regulated as a function of its biosynthesis, (ii) understanding the basis for specificity of glycan-protein interactions, and (iii) elucidating how an ensemble of glycans displayed on the cell surface govern extracellular signal transduction and cell-cell communication via multivalent interactions with proteins. The availability of diverse data sets and their integration via object-oriented relational databases has motivated the development of computational tools to perform data mining and pattern analysis to begin addressing these questions. It is important to address these kinds of questions before modeling biochemical pathways and network interactions.

There has also been an increasing awareness for the need to develop data exchange formats such as XML for consistent description of glycan structures74 and glycomics data sets across different large scale glycomics initiatives. There is a practical need to set standards for incorporating glycan structures into a database to develop the glycan database into an international resource similar to GenBank and SwissProt. In summary, it is envisioned that large scale glycomics initiatives would continue their focus on developing and applying technologies to advance this important field.

Note: Supplementary information is available on the Nature Methods website.

Published online: 21 October 2005.

 Top
REFERENCES
  1. Lowe, J.B. & Marth, J.D. A genetic approach to mammalian glycan function. Annu. Rev. Biochem. 72, 643–691 (2003). | Article | PubMed | ISI | ChemPort |
  2. Varki, A. et al. Essentials of Glycobiology (Cold Spring Harber Laboratory Press, New York, 1999).
  3. Taylor, M.E. & Drickamer, K. Introduction to glycobiology (Oxford University Press, Oxford and New York, 2003).
  4. Raman, R., Sasisekharan, V. & Sasisekharan, R. Structural insights into biological roles of protein-glycosaminoglycan interactions. Chem. Biol. 12, 267–277 (2005). | Article | PubMed | ISI | ChemPort |
  5. Hwang, H.Y., Olson, S.K., Esko, J.D. & Horvitz, H.R. Caenorhabditis elegans early embryogenesis and vulval morphogenesis require chondroitin biosynthesis. Nature 423, 439–443 (2003). | Article | PubMed | ISI | ChemPort |
  6. Inatani, M., Irie, F., Plump, A.S., Tessier-Lavigne, M. & Yamaguchi, Y. Mammalian brain morphogenesis and midline axon guidance require heparan sulfate. Science 302, 1044–1046 (2003). | Article | PubMed | ISI | ChemPort |
  7. Lin, X. Functions of heparan sulfate proteoglycans in cell signaling during development. Development 131, 6009–6021 (2004). | Article | PubMed | ISI | ChemPort |
  8. Haltiwanger, R.S. & Lowe, J.B. Role of glycosylation in development. Annu. Rev. Biochem. 73, 491–537 (2004). | Article | PubMed | ISI | ChemPort |
  9. Cipollo, J.F., Awad, A.M., Costello, C.E. & Hirschberg, C.B. N-glycans of Caenorhabditis elegans are specific to developmental stages. J. Biol. Chem. 280, 26063–26072 (2005). | Article | PubMed | ISI | ChemPort |
  10. Sasisekharan, R., Shriver, Z., Venkataraman, G. & Narayanasami, U. Roles of heparan-sulphate glycosaminoglycans in cancer. Nat. Rev. Cancer 2, 521–528 (2002). | Article | PubMed | ISI | ChemPort |
  11. Liu, D., Shriver, Z., Venkataraman, G., El Shabrawi, Y. & Sasisekharan, R. Tumor cell surface heparan sulfate as cryptic promoters or inhibitors of tumor growth and metastasis. Proc. Natl. Acad. Sci. USA 99, 568–573 (2002). | Article | PubMed | ChemPort |
  12. Fuster, M.M., Brown, J.R., Wang, L. & Esko, J.D. A disaccharide precursor of sialyl Lewis X inhibits metastatic potential of tumor cells. Cancer Res. 63, 2775–2781 (2003). | PubMed | ChemPort |
  13. Ishida, H. et al. A novel beta1,3-N-acetylglucosaminyltransferase (beta3Gn-T8), which synthesizes poly(N-acetyllactosamine), is dramatically upregulated in colon cancer. FEBS Lett. 579, 71–78 (2005). | Article | PubMed | ISI | ChemPort |
  14. Iwai, T. et al. Core 3 synthase is downregulated in colon carcinoma and profoundly suppresses the metastatic potential of carcinoma cells. Proc. Natl. Acad. Sci. USA 102, 4572–4577 (2005). | Article | PubMed | ChemPort |
  15. Dube, D.H. & Bertozzi, C.R. Glycans in cancer and inflammation–potential for therapeutics and diagnostics. Nat. Rev. Drug Discov. 4, 477–488 (2005). | Article | PubMed | ISI | ChemPort |
  16. Casu, B., Guerrini, M. & Torri, G. Structural and conformational aspects of the anticoagulant and anti-thrombotic activity of heparin and dermatan sulfate. Curr. Pharm. Des. 10, 939–949 (2004). | Article | PubMed | ISI | ChemPort |
  17. Petitou, M. & van Boeckel, C.A. A synthetic antithrombin III binding pentasaccharide is now a drug! What comes next? Angew. Chem. Int. Edn Engl. 43, 3118–3133 (2004). | Article | ChemPort |
  18. Shriver, Z., Liu, D. & Sasisekharan, R. Emerging views of heparan sulfate glycosaminoglycan structure/activity relationships modulating dynamic biological functions. Trends Cardiovasc. Med. 12, 71–77 (2002). | Article | PubMed | ISI | ChemPort |
  19. Kinjo, Y. et al. Recognition of bacterial glycosphingolipids by natural killer T cells. Nature 434, 520–525 (2005). | Article | PubMed | ISI | ChemPort |
  20. Guo, Y. et al. Structural basis for distinct ligand-binding and targeting properties of the receptors DC-SIGN and DC-SIGNR. Nat. Struct. Mol. Biol. 11, 591–598 (2004). | Article | PubMed | ISI | ChemPort |
  21. Crocker, P.R. Siglecs in innate immunity. Curr. Opin. Pharmacol. 5, 431–437 (2005). | PubMed | ISI | ChemPort |
  22. Rudd, P.M., Wormald, M.R. & Dwek, R.A. Sugar-mediated ligand-receptor interactions in the immune system. Trends Biotechnol. 22, 524–530 (2004). | Article | PubMed | ISI | ChemPort |
  23. Rudd, P.M., Elliott, T., Cresswell, P., Wilson, I.A. & Dwek, R.A. Glycosylation and the immune system. Science 291, 2370–2376 (2001). | Article | PubMed | ISI | ChemPort |
  24. Collins, B.E. & Paulson, J.C. Cell surface biology mediated by low affinity multivalent protein-glycan interactions. Curr. Opin. Chem. Biol. 8, 617–625 (2004). | Article | PubMed | ISI | ChemPort |
  25. Crocker, P.R. Siglecs: sialic-acid-binding immunoglobulin-like lectins in cell-cell interactions and signalling. Curr. Opin. Struct. Biol. 12, 609–615 (2002). | Article | PubMed | ISI | ChemPort |
  26. Fry, E.E. et al. The structure and function of a foot-and-mouth disease virus-oligosaccharide receptor complex. EMBO J. 18, 543–554 (1999). | Article | PubMed | ISI | ChemPort |
  27. Ganesh, V.K., Smith, S.A., Kotwal, G.J. & Murthy, K.H. Structure of vaccinia complement protein in complex with heparin and potential implications for complement regulation. Proc. Natl. Acad. Sci. USA 101, 8924–8929 (2004). | Article | PubMed | ChemPort |
  28. Liu, J. et al. Characterization of a heparan sulfate octasaccharide that binds to herpes simplex virus type 1 glycoprotein d. J. Biol. Chem. 277, 33456–33467 (2002). | Article | PubMed | ISI | ChemPort |
  29. Mahdavi, J. et al. Helicobacter pylori SabA adhesin in persistent infection and chronic inflammation. Science 297, 573–578 (2002). | Article | PubMed | ISI | ChemPort |
  30. Miller, S.I., Ernst, R.K. & Bader, M.W. LPS, TLR4 and infectious disease diversity. Nat. Rev. Microbiol. 3, 36–46 (2005). | Article | PubMed | ISI | ChemPort |
  31. Shriver, Z., Raguram, S. & Sasisekharan, R. Glycomics: a pathway to a class of new and improved therapeutics. Nat. Rev. Drug Discov. 3, 863–873 (2004). | Article | PubMed | ISI | ChemPort |
  32. Sasisekharan, R. & Venkataraman, G. Heparin and heparan sulfate: biosynthesis, structure and function. Curr. Opin. Chem. Biol. 4, 626–631 (2000). | Article | PubMed | ISI | ChemPort |
  33. Sugahara, K. et al. Recent advances in the structural biology of chondroitin sulfate and dermatan sulfate. Curr. Opin. Struct. Biol. 13, 612–620 (2003). | Article | PubMed | ISI | ChemPort |
  34. Blixt, O. et al. Chemoenzymatic synthesis of 2-azidoethyl-ganglio-oligosaccharides GD3, GT3, GM2, GD2, GT2, GM1, and GD1a. Carbohydr. Res. 340, 1963–1972 (2005). | Article | PubMed | ISI | ChemPort |
  35. Hanson, S., Best, M., Bryan, M.C. & Wong, C.H. Chemoenzymatic synthesis of oligosaccharides and glycoproteins. Trends Biochem. Sci. 29, 656–663 (2004). | Article | PubMed | ISI | ChemPort |
  36. Homeister, J.W., Daugherty, A. & Lowe, J.B. alpha(1,3)fucosyltransferases FucT-IV and FucT-VII control susceptibility to atherosclerosis in apolipoprotein E-/- mice. Arterioscler. Thromb. Vasc. Biol. 24, 1897–1903 (2004). | PubMed | ISI | ChemPort |
  37. Smithson, G. et al. Fuc-TVII is required for T helper 1 and T cytotoxic 1 lymphocyte selectin ligand expression and recruitment in inflammation, and together with Fuc-TIV regulates naive T cell trafficking to lymph nodes. J. Exp. Med. 194, 601–614 (2001). | Article | PubMed | ISI | ChemPort |
  38. Martin, L.T., Marth, J.D., Varki, A. & Varki, N.M. Genetically altered mice with different sialyltransferase deficiencies show tissue-specific alterations in sialylation and sialic acid 9-O-acetylation. J. Biol. Chem. 277, 32930–32938 (2002). | Article | PubMed | ISI | ChemPort |
  39. Comelli, E.M., Amado, M., Head, S.R. & Paulson, J.C. Custom microarray for glycobiologists: considerations for glycosyltransferase gene expression profiling. Biochem. Soc. Symp. 69, 135–142 (2002). | PubMed | ChemPort |
  40. An, H.J., Peavy, T.R., Hedrick, J.L. & Lebrilla, C.B. Determination of N-glycosylation sites and site heterogeneity in glycoproteins. Anal. Chem. 75, 5628–5637 (2003). | Article | PubMed | ISI | ChemPort |
  41. Cipollo, J.F., Costello, C.E. & Hirschberg, C.B. The fine structure of Caenorhabditis elegans N-glycans. J. Biol. Chem. 277, 49143–49157 (2002). | Article | PubMed | ISI | ChemPort |
  42. Dell, A. & Morris, H.R. Glycoprotein structure determination by mass spectrometry. Science 291, 2351–2356 (2001). | Article | PubMed | ISI | ChemPort |
  43. Morelle, W., Page, A. & Michalski, J.C. Electrospray ionization ion trap mass spectrometry for structural characterization of oligosaccharides derivatized with 2-aminobenzamide. Rapid Commun. Mass Spectrom. 19, 1145–1158 (2005). | Article | PubMed | ISI |