Main

Until now, annual influenza outbreaks and epidemics have been caused by the human-adapted H1N1 and H3N2 viruses that circulate in human populations. However, the ability of H5N1, H7N7 and H9N2 avian virus subtypes to infect humans has raised concern over the possibility of a new flu pandemic equal in impact to the 1918 H1N1 Spanish influenza.

The key first step in the infection, transmission and virulence of these viruses is the binding of HA to sialylated glycans on the epithelial cell surface3,4,7,8,9. Transmission from birds to humans is believed to be closely associated with the ability of the HA to switch its preference from α2-3 sialylated glycans (α2-3) to α2-6 sialylated glycans (α2-6), which are extensively expressed in the human upper respiratory epithelia1,8,10. Crystal structures of HAs from H1 (human and swine), H3 (avian and human) and H5 (avian) and their complexes with α2-3 and/or α2-6 oligosaccharides have provided molecular insights into key residues involved in specific HA-glycan interactions2,3,11,12,13,14,15 and glycan arrays have been used to investigate the glycan binding specificity of wild-type and mutant H1, H3 and H5 HAs16,17. The relationship between the HA glycan binding specificity and transmission efficiency has been demonstrated in ferrets, using the highly pathogenic and virulent 1918 H1N1 viruses18. The reliability of this animal model for studying human transmission is attributed to the predominance of human-like α2-6 in the upper respiratory tract epithelium of ferrets6,9. Switching the receptor binding specificity of the highly transmissible and pathogenic human H1N1 (A/South Carolina/1/18; SC18) virus from α2-6 to α2-3 has produced a virus (AV18) not transmissible in the ferrets18. Although A/New York/1/18 (NY18) H1N1 virus, which shows mixed α2-3/α2-6 binding, does not transmit efficiently, the A/Texas/36/91 (Tx91) H1N1 strain—that also binds to both α2-3 and α2-6—transmits efficiently18.

These confounding results with respect to the correlation between glycan binding specificity of HA and transmissibility of the viruses led us to pose several questions. First, how diverse are the sialylated glycan receptors in the human upper respiratory tissues and could this diversity account for the specificity in the tissue tropism of the virus? Second, are there nuances of glycan conformation that might play a role in α2-3 and/or α2-6 binding specificity of HA? Taken together, what are the glycan binding requirements of HA, beyond binding to sialylated glycans with a specific linkage, necessary for human adaptation? Currently, avian H5N1 viruses have α2-3 glycan specificity and also show some α2-6 binding19,20, but have not yet transmitted between humans. Answers to these questions would aid the understanding of human adaptation of these viruses and facilitate effective surveillance of the evolution of H5N1 into a potentially pandemic human virus. To address these issues, we developed a framework comprising four complementary analyses to investigate binding of human-adapted HA to α2-3 and α2-6. Integration of these complementary analyses led to the identification of the necessary glycan determinant for human adaptation of HA.

The first of the four complementary analyses aimed to delineate the diversity of sialylated glycans in the human upper respiratory tissues to answer, in part, a long-standing question on the different types of α2-6 glycans expressed in these tissues. To elaborate the diversity of α2-6, which predominate in the upper respiratory tissues8, human tracheal tissue sections were costained with Concanavalin/(ConA)/Jacalin and SNA-I/Jacalin (Fig. 1a). The glycan binding specificity of these lectins has been previously characterized using glycan arrays. As SNA-I binds to diverse α2-6, its binding pattern indicated the distribution of α2-6. Jacalin, which specifically binds to -Galβ1-3GalNAcα- and –GlcNAcβ1-3GalNAcα- motifs, characteristic of O-linked glycans, predominantly stained goblet cells. On the other hand, ConA, which specifically binds to mannose commonly found in high mannose, hybrid and complex type N-linked glycans, predominantly stained ciliated cells. The lectin binding patterns in Figure 1a demonstrated a wide distribution of N-linked α2-6, compared with the more localized distribution of O-linked α2-6 on the apical side of the human tracheal epithelium. Matrix-assisted laser desorption/ionization– mass spectrometry (MALDI-MS) profiling of N-linked sialylated glycans of a representative upper respiratory epithelial cell line was performed using techniques that were specifically modified to better suit the analysis of these complex acidic glycans. The results showed a substantial diversity (Fig. 1b), as well as predominant expression of α2-6 (Supplementary Fig. 1 online), in the human upper respiratory epithelium. Furthermore, desialylation and fragmentation of representative mass peaks using MALDI tandem time of flight (TOF-TOF) (Fig. 1c,d) demonstrated the presence of long oligosaccharide branches with multiple lactosamine repeats.

Figure 1: Glycan diversity in human upper respiratory tissues.
figure 1

(a) Costaining of tracheal tissue sections with ConA (red)/Jacalin (green) and SNA-I (red)/Jacalin (green). The localized regions of Jacalin binding correspond to goblet cells expressing O-linked glycans and the regions of conA binding correspond to ciliated cells expressing N-linked glycans (white arrow) on the apical side of the tracheal epithelium. The extensive binding of SNA-I to both goblet cells (costain with Jacalin in yellow) and ciliated cells indicates predominant expression of O-linked and N-linked α2-6 on the apical side. (b) MALDI-MS glycan profile of human bronchial epithelial (HBE) cells using graphical representation (without explicit linkage assignment) of possible sialylated glycan structures that satisfy the mass peaks (within ± 3.5 Daltons). HBEs predominantly express α2-6 (in comparison with α2-3) sialylated glycans (Supplementary Fig. 1). (c) Desialylation using Sialidase A and subsequent 2-AB labeling of the N-linked glycans observed in b to deconvolute the branching pattern from the number of sialic acids. The peaks highlighted in cyan in b and c were further analyzed using TOF-TOF MS. (d) The MS-MS profile of a representative peak at m/z 2148 shows critical fragment ions at m/z 548 and 713 and their corresponding counter ions (shown in red) that support the long oligosaccharide branch (with multiple lactosamine repeats) over multiple short lactosamine branches. MS-MS profile of m/z 2660 also supports a long oligosaccharide branch (data not shown). Glycans are represented using the graphical nomenclature adopted by the Consortium for Functional Glycomics (CFG).

The observation of extensive diversity of the glycan structures in the human upper respiratory tissues prompted the second complementary analysis, involving examination of the conformational features of these diverse sialylated glycans and their role in HA-glycan interactions. Analyses of all the HA-glycan cocrystal structures indicated that the orientation of the Neu5Ac sugar is fixed relative to the HA glycan binding site. A highly conserved set of amino acids Y98, S/T136, W153, H183, L/I194 (numbered based on H3 HA) across different HA subtypes are involved in anchoring the Neu5Ac sugar. The specificity of HA to α2-3 or α2-6 is therefore governed by interactions of the HA glycan binding site with the glycosidic oxygen atom and sugars besides Neu5Ac.

The conformation of the Neu5Acα2-3Gal linkage is such that the Gal and sugars beyond Gal (at the reducing end, and in both linear and branched α2-3) occupy a cone-like region of space. Hence, we define it as a cone-like glycan topology (Fig. 2). In the cone-like topology, HA interactions with the glycans primarily involve contacts with Neu5Ac and Gal sugars in a three-sugar (or trisaccharide) α2-3 motif (Neu5Acα2-3Galβ1-3/4GlcNAc-) and this observation is corroborated by the co-crystal structures2,3,12,13. In addition to the two critical amino acids, E190 and Q226 (Fig. 2), the contacts with the α2-3 motif and substitutions on this motif, such as sulfation and fucosylation, appear to involve key amino acid positions (Fig. 2). The variability of the amino acids in these positions potentially accounts for the differential binding specificity of HA to the diverse α2-3 sialylated glycans (see below). Compared with the Neu5Acα2-3Gal linkage, the presence of the C6-C5 bond provides additional glycan conformational flexibility to the Neu5Acα2-6Gal linkage. This enables the α2-6 motif to adopt a cone-like topology, as well as the potential to span a wider region of space analogous to the opening of an umbrella. We thus define the latter conformation—which spans a wider region on the HA surface than the cone-like topology—as an umbrella-like glycan topology (Fig. 2). In contrast to the cone-like topology, the length of the oligosaccharide and its degree of branching beyond a trisaccharide critically influence HA binding contacts in the umbrella-like topology. Therefore, the amino acids involved in interactions with the umbrella-like topology (key numbered positions shown in Fig. 2) are not conserved across the human-adapted H1 and H3 HAs. Depending upon the HA subtype, a combination of amino acids at these positions are involved in interacting with a long α2-6. This observation is corroborated by the cocrystal structures of H1 and H3 with α2-6 oligosaccharides3,11,14. Therefore, whereas the cone-like topology is characteristic of α2-3 as well as short α2-6 glycans such as single lactosamine branches, the umbrella-like topology is unique to α2-6 (Fig. 2) and is typically adopted by long glycans with multiple repeating lactosamine units.

Figure 2: Cone-like (left) and umbrella-like (right) topologies of α2-3 and α2-6 siaylated glycans.
figure 2

The topology of α2-3 and α2-6 is governed by the glycosidic torsion angles of the trisaccharide motifs—Neu5Acα2-3Galβ1-3/4GlcNAc and Neu5Acα2-6Galβ1-4GlcNAc, respectively (Supplementary Fig. 3 online). A parameter (θ)—the angle between the C2 atom of Neu5Ac and C1 atoms of the subsequent Gal and GlcNAc sugars in these trisaccharide motifs—characterizes the topology. Superimposition of the θ contour and the conformational maps of the α2-3 and α2-6 motifs showed that the α2-3 motif adopts 100% cone-like topology and α2-6 motif sampled both the cone-like and umbrella-like topologies (Supplementary Fig. 3). In the cone-like topology sampled by α2-3 and α2-6, the GlcNAc and subsequent sugars are positioned along a region spanning a cone. The interactions of HA with the cone-like topology primarily involves contacts of amino acids at the numbered positions (based on H3 HA numbering) with Neu5Ac and Gal sugars. On the other hand, in the umbrella-like topology, which is unique to α2-6, the GlcNAc and subsequent sugars bend toward the HA binding site (as observed in HA-α2-6 cocrystal structures). Longer α2-6 oligosaccharides (at least a tetrasaccharide) would favor this conformation as it is stabilized by intra-sugar van der Waals contact between acetyl groups of GlcNAc and Neu5Ac. HA interactions with sialylated glycans with the umbrella-like topology involve contacts of amino acids at the numbered positions (based on H3 HA numbering) with GlcNAc and subsequent sugars in addition to contacts with Neu5Ac and Gal sugars.

Interrogation of the HA-glycan cocrystal structures highlights the fact that the human-adapted H1 and H3 HAs have mutated from their presumed avian counterparts to gain additional contacts with α2-6 in the umbrella-like topology. Defining HA-glycan interactions based on trans and cis conformations (adopted by α2-3 and α2-6 linkages, respectively) is inadequate as this does not fully capture the structural features and conformational flexibility of the diverse sialylated glycans observed in human tissues. In contrast, the cone-like and umbrella-like classifications for HA binding represent the full extent of the structural diversity and conformational plurality of sialylated glycans and are able to distinguish the α2-3 and α2-6 binding of avian HAs from that of the α2-6 binding of human-adapted H1 and H3 HAs.

The requirements for HA binding to the cone-like and umbrella-like topologies were further corroborated using the third complementary data mining approach, involving analysis of the extensive glycan array data available for H1, H3 and H5 HAs2,16. The data mining analysis provided correlations between glycan features (Fig. 3a) abstracted from the glycans on the array and the HA binding to these glycans. These correlations are given as rules or classifiers (Fig. 3b) that verify the above structural constraints. Consistent with these observations, the distinct α2-3 classifiers (Fig. 3b) indicated that variations around the trisaccharide α2-3 motif primarily influence the differential α2-3 binding of H1, H3 and H5 HAs. On the other hand, the length-dependent α2-6 classifiers (Fig. 3b) support the critical role of oligosaccharide length in HA binding to α2-6 in the umbrella-like topology. The α2-6 classifier common to the human-adapted H1 and H3 HAs is consistent with its gain in ability to bind long α2-6. Although the glycan binding of wild-type and mutant H5N1 HAs is not supported by the long α2-6 classifier, it is consistent with both α2-3 and short α2-6 classifiers (Fig. 3b).

Figure 3: Data mining analysis of HA binding glycan array data.
figure 3

(a) Examples of the types of glycan features (e.g., pairs, triplets and quadruplets) abstracted from a representative complex glycan structure. A comprehensive set of these features was abstracted from the glycans in the glycan array. (b) Graphical representations of the complex classifier rules (Supplementary Table 1) for each HA analyzed using the glycan array. The α2-3 Type A represents broadest specificity, whereas the Type B and Type C classifiers represent constraints imposed by structural variations around the trisaccharide α2-3 motif. The α2-6 Type A represents binding to long α2-6, whereas Type B represents binding to short α2-6 (linear or branched). The core corresponds to either the spacer attached to the reducing end or the trimannosyl core in case of the single α2-6 biantennary glycan on the array. aBinding signals observed for fucosylated α2-3 motif only if it has GlcNAc[6S]. bBinding signals observed only for 6′-sialyl lactose. cBinding signals also observed for short 6′-sialyl lactosamine (Type B). dBinding signals are significantly lower than α2-3 Type B of H5N1 double mutant. eBinding signals observed only for short α2-6 with GlcNAc[6S]. fBinding signals just above background observed for α2-3 motif with GlcNAc[6S]. *The origin of A/Vietnam/1203/04 is avian but this viral strain was isolated from an infected human.

The final complementary analysis involved corroborating the above findings by establishing the tissue tropism and binding specificity of human-adapted HA to upper respiratory tissue glycans through investigating human tissue binding and direct glycan binding of recombinant HAs. The HAs from the hallmark pandemic SC18 and the human vaccine strain A/Moscow/10/99 H3N2 (Mos99) were chosen as representative human-adapted H1 and H3 candidates. SC18 and Mos99 HAs showed distinct binding patterns to human upper respiratory (tracheal) and deep lung (alveolar) tissues (Fig. 4a). More importantly, both H1 and H3 HA showed substantial binding to the apical side of tracheal tissue sections. As noted earlier (Fig. 1), long-branch α2-6 are predominantly expressed on the apical side of the upper respiratory epithelia. The α2-6 binding specificity of the human-adapted HAs was verified using dose-dependent direct binding of HA to defined α2-3 and α2-6 oligosaccharides (Fig. 4a). The binding of SC18 and Mos99 HAs over a range of concentrations to 6′SLN-LN indicated their high-affinity binding to the long α2-6 oligosaccharide. In addition to the long α2-6, Mos99 also bound with high affinity to the short α2-6 oligosaccharide (6′SLN) and at a relatively lower affinity to the long α2-3 oligosaccharide (3′SLN-LN-LN).

Figure 4: Glycan binding specificity of SC18 and Mos99 HAs, as well as H5N1 viruses.
figure 4

(a) Both SC18 and Mos99 HAs show substantial and preferential binding to the apical side of the tracheal tissue (green against as against propidium iodide staining in red) compared with the alveolar tissue sections, although binding of SC18 is more restricted than that of Mos99. The sialic acid–specific binding of HA is demonstrated by a substantial reduction in binding after pretreatment of tissues with Sialidase A (Supplementary Fig. 4 online). Moreover, competition experiments involving SC18 HA and SNA-I showed dramatic reductions in SC18 HA binding to tracheal sections (data not shown). Further AV18 HA does not bind to tracheal sections (data not shown). The binding specificity of SC18 and Mos99 HAs was demonstrated with dose-dependent direct binding to defined α2-3 and α2-6 oligosaccharides (shown on the right). The characteristic binding pattern of SC18 and Mos99 HAs is their binding at saturating levels to the long α2-6 (6′SLN-LN) over a range of HA dilution from 40 to 5 μg/ml. The narrow specificity of SC18 HA correlates with its restrictive tracheal tissue binding. On the other hand, the ability of Mos99 HA to bind to diverse sialylated glycans is consistent with its more extensive binding to tracheal tissue sections (as compared with SC18). The high affinity binding of SC18 and Mos99 HAs to long α2-6 was confirmed using another human-adapted H3N2 (A/Wyoming/3/03) HA (Supplementary Fig. 5 and Supplementary Methods online). (b) In contrast to the dose-dependent binding profile of the human-adapted SC18 and Mos99 HAs, Viet0304 and HK486 H5N1 viruses bind with high affinity to the α2-3 and minimal affinity to the long α2-6 oligosaccharide. The dose-dependent direct binding of the whole virus corroborates with the binding of soluble HA protein and the deep lung tissue tropism of the H5N1 HA is consistent with its α2-3 binding specificity (Supplementary Fig. 6 and Supplementary Methods online). HAU, Virus titer in Hemagglutinating Unit.

Taken together, the above findings show that the human-adapted HAs bind specifically to the long α2-6 (in the umbrella-like topology), which are predominantly expressed in the human upper respiratory tissues. The α2-3 binding of Mos99 further suggests that a switch in the glycan binding preference (that is, loss of ability to bind α2-3) is not a necessary determinant for human adaptation of HA. Tx91, also a mixed α2-6/α2-3 binding virus that shows HA binding to long α2-6 (Fig. 3b), is capable of efficient transmission18. On the other hand, NY18, another mixed α2-3/α2-6 binding virus that does not have HA binding specificity to long α2-6 (Fig. 3b), is not transmitted efficiently18. The efficient human adaptation of these viruses is, therefore, correlated with HA binding to sialylated glycans of a characteristic umbrella-like topology, going beyond the specific α2-3 or α2-6 linkage.

Understanding the glycan binding specificity of influenza A virus HA is critical for surveillance of the evolution of highly pathogenic strains, such as H5N1, which threaten to gain a foothold in the human population. Various strains of the highly pathogenic α2-3-specific H5N1 viruses show mixed α2-3/α2-6 binding5,20 and yet these viruses have shown inefficient transmission in ferrets5. However, dose-dependent direct binding studies of representative H5N1 viruses—A/Vietnam/1203/04 (Viet0304) and A/Hong Kong/486/97 (HK486)—showed minimal binding affinity to long α2-6 (Fig. 4b). The observed initiation of H5N1 infection in tissues predominantly expressing α2-6 (ref. 21) could possibly be explained by the α2-6 binding at high viral loads. Although additional factors such as the neuraminidase and other viral gene products may play a role in H5N1 viral transmission, a necessary condition for the human adaptation of its HA is to acquire mutations that would provide binding specificity to long α2-6. The glycan topology could also play an important role in the balance between the α2-6 binding of HA and cleavage properties of neuraminidase22,23. Thus, the identification of the α2-6 sialylated glycans present on lung tissue can now be used to analyze any correlation between glycan topology and neuraminidase stalk length in virulence during human-to-human transmission.

Three related issues have confounded the interpretation of glycan receptor specificity leading to human adaptation of HA. The first arises from the association of 'human-like receptors' exclusively with the Neu5Acα2-6Gal linkage, as evidenced by using short α2-6 oligosaccharides 6′SL and 6′SLN to define glycan binding specificity of the H5N1 viruses5,20. These and other earlier studies have predominantly focused on α2-3 and α2-6 linkages, as well as a switch in the linkage preference. However, as both α2-3 (short, long, linear and branched) and short α2-6 adopt the cone-like topology as opposed to long α2-6, (which adopts the characteristic umbrella-like topology), glycan topology—and not just linkage per se—is the key determinant for human adaptation of HAs. The second issue deals with the binding specificity of HA-glycan interactions. Many HA glycan array experiments17 have been performed at a relatively high HA concentration, without a dose-dependent relationship to establish specificity. Our dose-dependent binding studies enabled investigation of the HA-glycan binding affinities and thereby demonstrate the specificity of human-adapted HAs for binding long α2-6 (Fig. 4a). These studies can be further extended to derive parameters such as binding affinity constants to quantify the relative glycan binding affinities of different HAs. The two representative H5N1 viruses showed binding signals for long α2-6 at the highest viral concentrations. However, the binding affinity for long α2-6 is minimal compared to the high affinity for α2-3 over the entire viral concentration range (Fig. 4b). Interpretation of the glycan binding data at the highest concentration alone would have led to an erroneous conclusion that these viruses have acquired binding to human-like receptors. The third issue concerns the use of chicken red blood cells (cRBC) agglutination assays. Desialylated and α2-6 resialylated cRBC have been extensively used in agglutination assays to study α2-6 binding specificity of wild-type and mutant viruses18,24,25. MALDI-MS glycan profile of cRBC (Supplementary Fig. 2 online) shows limited abundance of N-linked α2-6 with long branches. Thus, α2-6 resialylation of these cRBC is unlikely to provide the required human-like receptors defined in the approach presented in this study.

These findings suggest that using glycan arrays with long α2-6 will be valuable for surveillance of the evolution of human adaptation of influenza A viral subtypes. Specifically, this can be achieved either by expanding the diversity of the glycans to include upper respiratory tissue–specific glycans on the current glycan microarrays, or through the development of a glycan array that includes representative long α2-6 structures based on those observed in the upper respiratory tissues. Furthermore, these arrays need to be developed for rapid dose-dependent analyses of HA binding to establish specificity to long α2-6 structures. Adaptation studies with the avian viruses using appropriate long α2-6 structures might provide the needed selection pressure for human adaptation. The ability of H5N1 to bind cone-like topology glycans with high specificity may explain why selection pressure strategies based just on glycan linkages have been unsuccessful. Finally, a recent study reported that the mutations in the glycan binding site of HA (leading to human receptor binding) could facilitate the generation of antibodies with better neutralizing sensitivity24. A sufficient understanding of the avian H5N1 HA mutations leading to long α2-6 binding specificity offers an opportunity for intervention through vaccine development to negate the eventuality of a H5N1 pandemic.

Methods

Lectin staining of human upper respiratory tissues.

Normal human trachea tissue sections (US Biological) were deparaffinized and rehydrated, followed by blocking endogenous biotin using the streptavidin/biotin blocking kit (Vector Labs). Sections were then incubated with FITC-labeled Jacalin, biotinylated conA and biotinylated Sambuccus nigra agglutinin (SNA-I) (Vector labs; 10 μg/ml in PBS with 0.05% Tween-20) for 3 h. After washing with TBST (Tris-buffered saline with 0.1% Tween-20), the sections were incubated with Alexa fluor 546 streptavidin (Invitrogen) for 1 h. Slides were washed with TBST and viewed under a confocal microscope (Zeiss LSM510 laser scanning confocal microscopy). All incubations were performed at 22 °C. The glycan array data for the plant lectins can be accessed from the Consortium for Functional Glycomics (CFG) web site, http://www.functionalglycomics.org/glycomics/publicdata/primaryscreen.jsp by selecting Plant lectins in the “Analyte Category.”

MALDI-MS and TOF-TOF MS analysis of N-linked glycans.

Human bronchial epithelial (HBE) cells were chosen as representative upper respiratory ciliated epithelial cells based on the extensive attachment of human-adapted H1N1 and H3N2 viruses to these cells1. These cells were harvested at >90% confluency with 100 mM citrate saline buffer and the cell membrane was isolated by homogenization. Washed and pooled RBCs (Rockland Immunochemicals) were lysed by resuspending in deionized water for 15 min and the cell membrane was isolated as described above. The cell membrane fractions were treated with PNGaseF (New England Biolabs) and the reaction mixture was incubated overnight at 37 °C. The reaction mixture was boiled for 10 min to deactivate the enzyme and the deglycosylated peptides and proteins were removed using a Sep-Pak C18 SPE cartridge (Waters). The glycans were further desalted and purified into neutral (25% acetonitrile fraction) and acidic (50% acetonitrile containing 0.05% trifluoroacetic acid) fractions using graphitized carbon solid-phase extraction columns (Supelco) and lyophilized.

The neutral and acidic fractions were analyzed by MALDI-TOF MS in positive and negative linear modes, respectively, with soft ionization conditions (accelerating voltage 22 kV, grid voltage 93%, guide wire 0.3% and extraction delay time of 150 ns). Matrices used were as follows: 10 mg/ml 6-aza-thiothymine in ethanol spotted on perfluorinated Nafion resin for acidic glycans and 40 mg/ml 2,5-dihydroxybenzoic acid in ethanol for neutral glycans. The peaks were calibrated as nonsodiated species using external glycan standards. The predominant expression of α2-6 sialylated glycans was confirmed by treatment of samples using Sialidase A and S. Briefly, the isolated glycans were incubated with 0.1 U of Arthrobacter ureafaciens sialidase (Sialidase A, cleaves both α2-3 and α2-6 sialic acid linkages) or Streptococcus pneumoniae sialidase (Sialidase S, specifically cleaves α2-3 sialic acid linkage) in a final volume of 100 μl of 50 mM sodium phosphate, pH 6.0, at 37 °C for 14 h. The glycans were purified as before and the sialidase-treated glycans were analyzed by MALDI-TOF MS.

Sialidase A–treated glycans were 2-aminobenzamide (2-AB) derivatized, as described previously26, purified27 and lyophilized. MS/MS mass spectra were acquired on an Applied Biosystems 4700 TOF/TOF Proteomics analyzer, equipped with delayed extraction and a UV laser (355 nm) controlled by AB 4700 Data Explorer software. 1 μl of a 5 mg/ml solution of α-cyano-4-hydroxycinnamic acid prepared in acetonitrile/water mixture (60:40, vol/vol) was seeded on a standard 192-well stainless steel MALDI sample plate and allowed to air dry. Subsequently, 1 μl of 2-AB–labeled glycan sample was spotted on the matrix and allowed to dry. Typically, 1,000–5,000 shots/spectrum were collected in the positive reflector mode for MS/MS spectra. CID mode (with air as the collision gas) with a chamber pressure of 1–6 × 10−6 Torr and acceleration voltage of 2 kV was used for glycan fragmentation and internal peptide standards were used for calibration.

Cloning, baculovirus synthesis, expression and purification of HA.

The soluble form of HA was expressed using the Baculovirus Expression Vector System (BEVS). H5N1 (A/Vietnam/1203/2004; Viet0304) baculovirus was created from a pAcGP67-Viet04-HA construct2 using Baculogold system (BD Biosciences) according to manufacturer's instructions. The crystal structure of the BEVS expressed uncleaved HA0 from SC18 (ref. 15) is identical to the structure of the SC18 HA isolated from the whole virus after bromelain treatment14. Studies have also shown that glycan binding specificity of BEVS-expressed HAs is in agreement with that of whole viruses17. H1, H3 and H5 baculoviruses were used to infect 500 ml suspension cultures of Sf9 cells (Invitrogen) cultured in Sf900 II SFM medium (Invitrogen). The infection was monitored and the cells were harvested 3–4 d after infection. The soluble form of HA was purified from the supernatant of the infected cells using the protocol described previously15. Briefly, the supernatant was concentrated using Centricon Plus-70 centrifugal filters (Millipore) and the soluble HA was recovered from the concentrated cell supernatant by performing affinity chromatography using Ni-NTA beads (Qiagen). Eluting fractions containing HA were pooled and dialyzed overnight with a 10 mM Tris-HCl, 50 mM NaCl buffer (pH 8.0). Subsequently, ion exchange chromatography was performed on the dialyzed samples using a Mono-Q HR10/10 column (GE Healthcare). The fractions containing HA were pooled together and subjected to ultrafiltration using Amicon Ultra 100 K NMWL membrane filters (Millipore). The protein was concentrated and reconstituted in PBS. The purified protein was quantified using Bio-Rad's protein assay (Bio-Rad).

Binding of H1, H3 and H5 HA to human lung tissues.

Normal human trachea (US Biological) and lung (US Biomax) tissue sections were deparaffinized, rehydrated and incubated with 1% BSA in PBS for 30 min to prevent nonspecific binding. SC18 and Mos99 were precomplexed with primary antibody (mouse anti-6 × His tag, Abcam) and secondary antibody (Alexa fluor 488 goat anti-mouse, Invitrogen) in a ratio of 4:2:1, respectively, for 20 min on ice. The tissue binding was performed over different HA concentrations by diluting the precomplexed HA in 1% BSA-PBS. Tissue sections were then incubated with the HA-antibody complexes for 3 h at 22 °C. Sections were counterstained with propidium iodide (Invitrogen; 1:100 in TBST), washed extensively and then viewed under a confocal microscope (Zeiss LSM510 laser scanning confocal microscopy). In the case of sialidase pretreatment, tissue sections were incubated with 0.2 units of Sialidase A (recombinant from Arthrobacter ureafaciens, Prozyme) for 3 h at 37 °C before incubation with the proteins.

Dose-dependent direct binding of H1, H3 HA and H5 viruses.

Streptavidin-coated high binding capacity 384-well plate (Pierce) was rinsed with PBS and each well was incubated with 50 μl of 2.4 μM solution of biotinylated glycans (3′SLN, 6′SLN, 3′SLN-LN, 6′SLN-LN and 3′SLN-LN-LN) in PBS overnight at 4 °C. LN corresponds to lactosamine (Galβ1-4GlcNAc) and 3′SLN and 6′SLN, respectively, correspond to Neu5Acα2-3 and Neu5Acα2-6 linked to LN. These glycans were obtained from the Consortium for Functional Glycomics (CFG). The plate was subsequently washed with PBS to remove excess glycan and used without further processing. Appropriate amounts of His-tagged HA protein, primary (mouse anti 6 × His tag IgG, Abcam) and secondary (HRP-conjugated goat anti Mouse IgG, Santacruz Biotechnology) antibodies were mixed in the ratio 4:2:1 and incubated on ice for 20 min. The highest HA concentration of 40 μg/ml was chosen based on typical concentrations used in CFG glycan array analysis17. The mixture (precomplexed HA) was made up to a final volume of 250 μl with 1% BSA in PBS buffer and 50 μl of precomplexed HA was added to each glycan-coated well and incubated at 22 °C for 2 h. The wells were extensively washed with PBS containing 0.05% Tween-20 followed by washes with PBS.

Virus stocks (from CDC) were propagated in the allantoic cavity of 10-d-old embryonated hens' eggs at 37 °C. The allantoic fluids were harvested 24 h after inoculation and inactivated by treatment with B-propiolactone (BPL; 1/1,000) for 3 d at 4 °C. Virus binding to the glycan-coated wells was performed as described20 by adding appropriate amount of virus to each well after diluting in PBS containing 1% BSA and incubating overnight at 4 °C. After rinsing excess virus with PBS containing 0.05% Tween-20 and PBS, the wells were incubated with antibody against the virus (ferret-anti-influenza A raised against A/Hong Kong/213/03 H5N1) for 5h at 4 °C. After extensive washing of the antibody, the plate was incubated with HRP-linked goat-anti-ferret antibody (Rockland Immunochemicals) for 2 h at 4 °C. After extensive washes with PBS containing 0.05% Tween-20 and PBS, in all cases, HRP activity was estimated using Amplex Red Peroxidase Assay Kit (Invitrogen) according to manufacturer's instructions. Appropriate negative controls were included and all assays were done in triplicate.

Data mining analysis of glycan microarray data.

The data from glycan microarray screening of H1, H3 and H5 HA were obtained from the Consortium for Functional Glycomics (CFG) web site, http://www.functionalglycomics.org/glycomics/publicdata/primaryscreen.jsp. From this web page, the “Other” option in the “Analyte category” was selected to obtain a list of microbial lectins screened on the glycan array. The data for all influenza virus HAs were obtained by sorting this list by investigator and searching the sorted list using “Jim Paulson” as the investigator name. The principle and methodologies for data mining analysis are well established in many fields and involve two main aspects: feature abstraction and classification. A variety of features, including all possible di-, tri- and tetrasaccharide combinations, present in each glycan in the glycan array were abstracted (Fig. 3a). The rationale behind choosing these features is based on the binding of di-, tri- or tetrasaccharides to the glycan binding site of HA. The final data set has features from the glycans, as well as the binding signals for each of the HAs screened on the array. The rule-induction classification method was used. One of the main advantages of this method is that it generates IF-THEN rules, which can be interpreted more easily when compared to the other statistical or mathematical methods (Supplementary Table 1 online).

Note: Supplementary information is available on the Nature Biotechnology website.