Trefoil factors share a lectin activity that defines their role in mucus

The mucosal epithelium secretes a host of protective disulfide-rich peptides, including the trefoil factors (TFFs). The TFFs increase the viscoelasticity of the mucosa and promote cell migration, though the molecular mechanisms underlying these functions have remained poorly defined. Here, we demonstrate that all TFFs are divalent lectins that recognise the GlcNAc-α-1,4-Gal disaccharide, which terminates some mucin-like O-glycans. Degradation of this disaccharide by a glycoside hydrolase abrogates TFF binding to mucins. Structural, mutagenic and biophysical data provide insights into how the TFFs recognise this disaccharide and rationalise their ability to modulate the physical properties of mucus across different pH ranges. These data reveal that TFF activity is dependent on the glycosylation state of mucosal glycoproteins and alludes to a lectin function for trefoil domains in other human proteins.

T he three human TFFs (TFF1, TFF2, and TFF3) 1 are ubiquitous in mucosal environments. They share substantial sequence similarity, which is suggestive of a conserved function, though their biological roles are not redundant 2 . The TFFs protect the mucosal epithelium by increasing the viscoelasticity of mucus 3,4 and enhancing epithelial restitution rates [5][6][7][8][9] . TFF overexpression is prevalent in adenocarcinomas [10][11][12] and a hallmark of chronic inflammatory diseases of the respiratory tract [13][14][15] . The contribution of TFFs to the progression of these diseases is unclear, although in the context of respiratory diseases any increase in mucus viscoelasticity might be expected to facilitate the formation of obstructions in the airways. Exploring the role of TFFs in these and other biological contexts has been confounded by a poor mechanistic understanding of how they exert their biological activities and a dearth of tools for antagonising TFF activity 2 .
The trefoil domain is comprised of three loops (foils) formed by three disulfide bonds (Fig. 1a). TFF1 and TFF3 are disulfidelinked homo-dimers with each monomer possessing a single trefoil domain, while TFF2 is a single-chain protein with two trefoil domains (Fig. 1a). The different ways in which these proteins bring together two trefoil domains impacts the relative orientation, flexibility, and distance between the domains [16][17][18] , which likely contributes to their non-redundant biological activities. An extensive list of TFF binding partners has accumulated within the literature and includes many cell surface and extracellular glycoproteins 2,19 , including: β-integrin, CD71, CXCR4/7, FCGBP, DMBT1, GKN2, PAR1/4, LINGO2 and the mucins MUC2, MUC5AC, and MUC6. In the context of the mucusthickening properties of the TFFs, the soluble mucins MUC5AC and MUC6 are the most relevant binding partners.
Lectin activities have also been reported for some of the TFFs. In the structure of porcine TFF2 a hydrophobic groove was hypothesised to bind glycans 20 . Sometime later it was confirmed that human TFF2 binds α-GlcNAc-terminated mucin-like O-glycans (Fig. 1b) 21 . A lectin-like activity has also been reported for TFF1, which binds Helicobacter pylori lipopolysaccharide (LPS) in an α-glucosidase-sensitive manner 22 , and gastric mucin 23 . While unique protein-protein interactions may occur between the TFFs and their many reported binding partners 24 , a TFF lectin activity would also explain their association with such a diverse array of glycoproteins. However, at present, the lectin activity of each TFF remains poorly characterised. The minimal glycan structure required for TFF2 binding remains ambiguous and its affinity for its cognate ligand is unknown 21 . It also remains unclear if TFF1 binds a similar glycan, since its affinity for H. pylori LPS is α-Glc dependent 22 . Furthermore, no data has been reported to suggest that TFF3 is also a lectin.
We sought to definitively address this issue by performing a comprehensive investigation of the lectin activities of all TFFs. Using a combination of ELISA, isothermal titration calorimetry (ITC) and tryptophan quenching assays, we demonstrate that the cognate ligand for all TFFs is the GlcNAc-α-1,4-Gal disaccharide. The binding mode of this disaccharide is revealed using X-ray crystallography and this information used to inform mutagenesis studies to determine which residues are critical for lectin activity. We demonstrate how these residues define the pH profile of lectin activity and how this correlates with the different biological roles of the TFFs. The lectin activity of the TFFs, and the presence of the GlcNAc-α-1,4-Gal disaccharide, is shown to be essential for cross-linking mucus glycoproteins, suggesting that the mucusthickening properties of the TFFs arise from their ability to reversibly and non-covalently cross-link these large glycoproteins. This information provides a framework for understanding a larger group of hitherto unrecognised mammalian lectins defined by the trefoil domain. Our findings highlight the importance of considering the glycosylation state of mucosal proteins when interpreting the biology of TFFs and is of particular relevance to current and future clinical trials involving the TFFs.
Results α-GlcNAc is required for mucin binding by all TFFs. To establish that TFF1 and TFF3 possessed the same α-GlcNAcdependent binding activity as TFF2 21 , we prepared site-selectively biotinylated monomeric TFF1 and TFF3 (mTFF1 bio and mTFF3 bio ) as well as biotinylated TFF2 (TFF2 bio ) for use as a positive control (Supplementary Table 1). Three mucin samples were also prepared: commercially-available porcine gastric mucin (pMucin), which possesses α-GlcNAc-terminated glycans; reduced and alkylated pMucin (pMucin red ), which also possesses α-GlcNAc-terminated glycans but lacks tertiary structure 25 ; and pMucin red treated with an α-N-acetylglucosaminidase from Clostridum perfringens (CpGH89) 26 to remove terminating α-GlcNAc from glycans (pMucin red+GH89 ). ELISA performed using these reagents (Fig. 1c) detected robust binding of the TFF2 bio control, mTFF1 bio , and mTFF3 bio to both pMucin and pMucin red , while no significant binding was observed for any TFF to pMucin red+GH89 . This established that all TFF-mucin interactions require α-GlcNAc-terminated glycans and that these interactions are independent of mucin tertiary structure.
Structural insights into disaccharide recognition by TFFs. Initial attempts to co-crystallise the TFFs and their cognate ligand were confounded by the exceptional solubility of these proteins. Eventually we obtained mTFF1 crystals, though they only yielded an apo structure (Supplementary Table 3 and Supplementary  Fig. 2). Following mTFF3 surface lysine methylation, a crystal of the mTFF3-GlcNAc-α-1,4-Gal complex was obtained and the structure determined to a resolution of 1.55 Å (Fig. 2a, b, Supplementary Table 3 and Supplementary Fig. 2). The disaccharide is accommodated within a hydrophobic cleft of TFF3 with ligand binding driven by sterical fitting and hydrogen bonds to the peptide backbone (Fig. 2c). Only two side chains, Asp20 and Trp47, make direct contacts with the disaccharide: Asp20 makes a bidentate hydrogen-bonding interaction between O-4 and O-6 of the non-reducing α-GlcNAc, while C-H-π interactions are made between the indole ring of Trp47 and the reducing-end Gal. Trp47 undergoes considerable motion between liganded and unliganded forms (Fig. 2d, Supplementary Fig. 3). These two residues are conserved in TFF1-3, although TFF1 and TFF2 feature an Asn in place of Asp20 (Fig. 2e). This binding mode is remarkably similar to that of a bacterial carbohydrate-binding module family 32 protein (CBM32), which binds the GlcNAc-α-1,4-Gal glycan (K d of 72 µM) in the same conformational pose and with analogous intermolecular interactions, despite sharing no sequence, structural or ancestral commonalities with the TFFs (Fig. 2f, Supplementary Fig. 4) 27 .
Mutagenesis of the ligand-binding Asn/Asp and Trp in biotinylated dimeric TFF1 and TFF3 constructs (dTFF1 bio and dTFF3 bio ) enabled an examination of the role that these residues play in mucin binding. Using our ELISA, the dTFF1 bio -N14A mutant was found to have reduced affinity for pMucin red while the dTFF3 bio -D20A mutant had barely any detectable affinity for pMucin red (Fig. 2a). The dTFF1 bio -W41A and dTFF3 bio -W47A mutants were also completely inactive by ELISA. We were unable to detect any binding of GlcNAc-α-1,4-Gal to dTFF1 bio -N14A or the other TFF mutants by ITC ( Supplementary Fig. 1), suggesting that the signal observed for dTFF1 bio -N14A most likely arises from amplification of a weak residual activity through avidity effects. Indeed, in this assay we have a polyvalent surface (immobilised mucin) interacting with a multivalent binding complex comprised of tetravalent streptavidin-HRP conjugates crosslinked by dimeric TFFs with two biotinylation sites.
Features that define the pH profile of TFF activity. An alignment of all mammalian TFF sequences revealed that all TFF1 proteins utilise a disaccharide-binding Asn, while all TFF3 proteins retain an Asp ( Supplementary Fig. 5). TFF1 operates in the low pH gastric mucosa, while TFF3 does not, which suggested that a ligand-binding Asn or Asp may impact the pH profile of the lectin activity. To supplement binding data collected at pH 7.4 ( Fig. 1e), the K d for the disaccharide-TFF complexes were determined at pH 5.0 and pH 2.6 using a tryptophan fluorescence quenching assay ( Supplementary Fig. 6). At pH 2.6, the K d for mTFF1 increased only slightly to 67 ± 1 µM, while for mTFF3 the K d was 350 ± 60 µM; six-fold higher than at pH 7.4. Like TFF1, TFF2 is abundant in gastric mucus, has a conserved Asn (Supplementary Fig. 5) and also binds mucins at low pH 21 . This result suggests that a non-ionisable disaccharide-binding Asn side chain is important for TFF activity at low pH.
Divalent lectin activity is required for mucin cross-linking. To demonstrate that the TFFs cross-link soluble mucins in a glycandependent manner, we performed agglutination assays using pMucin and our monomeric, dimeric, and mutant TFF1/3 constructs (Fig. 3b, Supplementary Fig. 7). Dimeric TFF1 and TFF3 induced a dose-dependent increase in light scattering over time at The α-GlcNAc-capped core 2 O-glycan identified as a ligand for TFF2 21 . c ELISA data demonstrating the ability of all TFFs to bind to pMucin and pMucin red but not pMucin red+GH89 . Data are presented as mean values ± SD for three independent replicates. d Representative ITC isotherms of mTFF1 (blue), TFF2 (orange), and mTFF3 (green) titrated against the GlcNAc-α-1,4-Gal disaccharide. e mTFF1 and mTFF3 binding to GlcNAc-α-1,4-Gal as determined by a tryptophan fluorescence quenching assay. Data are presented as mean values ± SD for three independent replicates. Source data are provided as a Source data file. TFF activity is abrogated by a glycoside hydrolase. We were interested in establishing if the enzymatic degradation of the GlcNAc-α-1,4-Gal structure could liberate TFFs from mucins, since an antagonist of all TFF activity would be a useful tool in the mucosal biology field. ELISAs were used to monitor the liberation of mucin-bound dTFF1 bio and dTFF3 bio by the CpGH89 enzyme (Fig. 3c). The efficacy of CpGH89 under these conditions were very similar for both TFFs, with EC 50 values of 0.8 ± 0.1 nM for TFF1 and 0.4 ± 0.1 nM for TFF3: this is a very effective tool for disrupting mucin-TFF interactions.
Other mammalian trefoil domains have lectin activities. It occurred to us that trefoil domains in other mammalian proteins, which are analogous to those in the TFFs, may also have a lectin activity. A phylogenetic analysis of all mammalian proteins with a trefoil domain revealed three clades of putative lectins associated with either amylose-processing enzymes, the GlcNAc-α-1,4-Galbinding mucosal TFFs studied here in detail, or the glycoproteinbinding zona pellucida proteins (Fig. 4). A re-evaluation of structures available for some of these proteins provided evidence that the trefoil domain of human lysosomal α-glucosidase binds isomaltose (Glc-α-1,6-Glc) (PDB ID: 5KZW) (Supplementary Fig. 8). Biological context suggests that the trefoil domains in sucrase-isomaltase and maltase-glucoamylase likely bind (iso)maltose, while it is not obvious what ligand the trefoil domains from zona pellucida sperm-binding protein 1 and 4 might bind.

Discussion
While previous efforts had identified α-GlcNAc-terminated O-glycans as ligands for TFF2 21 , it remained unclear what the minimal glycan structure required for binding was, due to the use of ring-opening reductive amination chemistry to prepare neoglycolipids and the absence of direct biophysical measurements of the TFF-glycan interaction. Here, we have used orthogonal techniques to demonstrate that the cognate ligand for all TFFs is the GlcNAc-α-1,4-Gal disaccharide and that the dissociation constant for these complexes is approximately 50 µM at physiological pH. This relatively weak interaction is typical of many lectins and probably reflects the avidity effects at play in mucus, where each mucin glycoprotein displays many hundreds of Oglycans. Our agglutination assays demonstrated that TFFmediated aggregation of mucins requires their divalent lectin activity. This supports a model of TFF-modulated mucus rheology where the large mucin glycoprotein polymers are reversibly and non-covalently cross-linked by the TFFs through their α-GlcNAc-terminated O-glycans. This phenomenon would be dominated by avidity effects and necessitates the facile association and dissociation of TFF-mucin complexes to maintain the fluid properties of the mucus. Our structure of TFF3 in complex with GlcNAc-α-1,4-Gal revealed how these simple proteins recognise their disaccharide ligand using just two residue side chains, and how this approach to binding the disaccharide mimics that of the CBM32 domains, which share no sequence or structural similarities with the TFFs. The structure also provides insights into how TFF1 might bind   H. pylori in an α-glucosidase sensitive manner 22 . The core oligosaccharide of many H. pylori strains bears a Glc-α-1,4-Galbranch 28 , while the N-acetyl group of GlcNAc-α-1,4-Gal bound to the TFFs is solvent exposed and makes no interactions with the peptide. Clearly, the Glc-α-1,4-Gal branch of H. pylori core glycans could be easily accommodated in the TFF binding site. We speculate that this unusual H. pylori LPS core oligosaccharide evolved to facilitate the adhesion of this stomach pathogen to the gastric mucosa in a TFF-dependent manner. Evidence that bacteria in mucosal environments have evolved to manipulate mucin-TFF interactions also comes from our observation that recombinant CpGH89, and by inference its native variant secreted by C. perfringens, is very effective at antagonising TFF activity. This alludes to a role for this enzyme and virulence factor both in scavenging host glycans for sustenance 29 and in disrupting the structural integrity of the mucosa. Both TFF1 and TFF3 have been the subject of clinical trials: TFF3 was administered to colitis patients 30 and a Lactococcus lactis strain engineered to secrete TFF1 is being administered to patients undergoing chemoradiation therapy to combat oral mucositis (clinicaltrials.gov ID NCT03234465). Our data definitively establishes that the cognate TFF ligand is the GlcNAc-α-1,4-Gal disaccharide and suggests that the other binding partners identified to date, which are all cell-surface and extracellular proteins, may bear mucin-like O-glycans terminated with α-GlcNAc. As such, the presence of the GlcNAc-α-1,4-Gal disaccharide in patient mucus samples could prove to be an important biomarker for predicting responses to the TFF therapies being investigated in the clinic.
While this work has focused on rigorously defining the interactions between TFFs and the soluble mucins, it remains unclear how TFF lectin activities might promote cell migration to achieve epithelial restitution. Conceivably, the divalent TFF lectins could promote the co-localisation of cell-surface glycoproteins like LINGO2, which immunoprecipitates with TFF3 19 , to facilitate signalling events that promote cell migration. Assessing the plausibility of this hypothesis is confounded by a paucity of knowledge concerning what proteins bear these unusual α-GlcNAc-terminated O-glycans, and in what biological contexts. These glycans are assembled in the Golgi by α-1,4-N-acetylglucosaminyltransferase (α4GnT) 31,32 , which is constitutively expressed only in gastric mucous and Brunner's gland cells 33 . Exploring how α4GnT expression changes in mucosal epithelia in response to inflammatory stimuli, and cataloguing which proteins are modified by this enzyme, are important next steps in understanding the biology of the TFFs.

Methods
Production of untagged monomeric TFF1/3. A series of dsDNA oligonucleotides encoding human TFF1 (UniProt ID: P04155) and TFF3 (UniProt ID: Q07654) with an N-terminal PelB signal peptide, no interchain Cys, and a C-terminal His 6 -tag codon-harmonised for E. coli (Supplementary Table 4) were synthesised (IDT) and cloned into the pET29b(+) vector (Novagen) using the NdeI and XhoI restriction sites. The resulting plasmids were verified using Sanger sequencing. Each plasmid was transformed into T7 Express cells (NEB) and plated onto LB-agar + 2% glucose + 50 µg ml −1 Kan and incubated at 37°C for 16 h. Single colonies were picked to generate overnight cultures, which were used to inoculate SB media + 0.2% glucose + 50 µg ml −1 Kan. The culture was incubated at 37°C and 220 rpm until it reached an OD 600  Perturbing TFF activity through mutagenesis and glycan degradation. a Affinity of wild-type and mutant dTFF1 bio (blue) and dTFF3 bio (green) for pMucin red , as determined by ELISA. Data are presented as mean values ±SD for three independent replicates. b pMucin agglutination assays using monomeric wild-type, dimeric wild-type or mutant TFF1 (top) and TFF3 (bottom) with optical density at 405 nm monitored with respect to time. c CpGH89-mediated displacement of dimeric TFF1 (blue circles) and TFF3 (green squares) from immobilised pMucin, as determined by ELISA. Data are presented as mean values ± SD for three independent replicates. Source data are provided as a Source data file.  Production of TFF2. A dsDNA oligonucleotide encoding human TFF2 (UniProt ID: Q03403) with an N-terminal gp67 signal peptide and a C-terminal His 10 -tag codonharmonised for Spodoptera frugiperda (Supplementary Table 4) was synthesised (IDT) and cloned into the pFastBac vector (ThermoFisher) using the SpeI and XhoI restriction sites. The resulting plasmid was verified using Sanger sequencing. Expression of TFF2 was achieved in Sf21 cells using the "Bac-to-Bac Baculovirus Expression System" (ThermoFisher) in accordance with the manufacturer's instructions. Briefly, one litre of cell culture at a density of 1 × 10 6 cells ml −1 was infected with 30 ml of P3 baculovirus and cultured at 27°C for 72 h. The culture was centrifuged (8000×g, 20 min, 4°C) and the supernatant collected, adjusted to pH 7.5 and filtered (0.45 µm). The supernatant was applied to a Ni-affinity column (HisTrap Excel, 5 ml, GE Healthcare), the column washed with 10 CV of 50 mM Tris, 300 mM NaCl, 40 mM ImH, pH 7.5, and the protein eluted with 50 mM Tris, 300 mM NaCl, 400 mM ImH, pH 7.5. Fractions containing product, as judged by SDS-PAGE, were pooled and further purified by size exclusion chromatography (Superdex ® 75 10/300, GE Healthcare) using 50 mM Tris-HCl, 150 mM NaCl, pH 7.5 as buffer. SDS-PAGE and intact ESI-MS data confirmed that TFF2 was homogenous and had seven disulfide bonds (Supplementary Figs. 9 and 10). A sample of TFF2 was non-specifically biotinylated using EZ-Link TM NHS-PEG 4 -biotin (ThermoFisher) according to the manufacturer's protocol. Structure determination of apo-mTFF1. Sitting drops comprised of 1 µl well solution (1 M ammonium sulfate, 0.1 M Tris-HCl, pH 8.5) and 1 µl mTFF1 solution (10 mg ml −1 ) supplemented with GlcNAc-α-1,4-Gal (5 mM) afforded small clusters of rod-like crystals after one month at 20°C. The crystals were cryo-protected by supplementing the mother liquor with 3 M ammonium sulfate before being collected on a crystal loop and cryogenically stored in liquid nitrogen. Data was collected on the Australian Synchrotron MX2 beamline 34 at a wavelength of 0.9537 Å and temperature of 100 K, then processed using XDS 35 . The structure was solved by molecular replacement using PHASER 36 42 . Sequence logos for each of the four mucosal TFF domains in the three human trefoil factors were generated by WebLogo (weblogo.berkeley.edu) 43 using full length protein sequences annotated as mammalian TFF1, TFF2, or TFF3 retrieved from the UniProt database 44 and aligned with Clustal Omega 45 . The phylogenetic tree of all mammalian trefoil domains was created as follows. All trefoil domain sequences corresponding to PF00088 were retrieved aligned according to their hidden Markov model logo generated by the Pfam database 46 . Based on UniProt 44 metadata entries marked as obsolete or truncated were discarded. The remaining sequences were annotated as belonging to one of ten groups (glucoamylase, isomaltase, lysosomal α-glucosidase, maltase, sucrase, TFF1, TFF2-1, TFF2-2, TFF3 or zona pellucida) based on their UniProt annotation. The remaining alignment was used as input to calculate phylogenetic distances with phylogeny.fr 47 . The phylogenetic tree was visualised with iTol 48 and colour-coded according to the described metadata. Where available, a structural representation of each clade of TFF-domains was created using PyMOL and included for eight of the ten groups mentioned above.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All relevant data are available from the authors upon request. Received: 17 January 2020; Accepted: 20 April 2020;