Introduction
Over a decade ago investigators discovered that most sulfatases contain an aldehyde-bearing formylglycine residue within their active site that is required for catalytic activity1. The formylglycine residue is produced by co- or post-translational modification of a conserved cysteine residue found within the "sulfatase motif" (Fig. 1a). Definedas a contiguous 13-amino-acid sequence, this motif contains a highly conserved CXPXR submotif, where X is usually serine, threonine, alanine or glycine. Formylglycine-generating enzyme (FGE) oxidizes cysteine to formylglycine and has recently been identified in eukaryotes2, 3 and prokaryotes (data not shown). Notably, FGE recognizes its substrates based solely on primary sequence (refs. 3,4 and data not shown). This observation suggests that a small peptide tag based on the sulfatase motif, which we have termed the "aldehyde tag," might direct formation of formylglycine independent of its context (Fig. 1b). Furthermore, many organisms have endogenous FGE activity5, which suggests that this post-translational modification system could serve as a general means for engineering proteins for site-specific labeling6, 7, 8, 9, 10, 11, 12, 13.
Figure 1: The sulfatase motif can serve as an aldehyde tag for site-specific protein modification.
(a) Sequence alignment of the sulfatase motifs from a variety of sulfatases found in diverse organisms. (b) Nucleotides encoding the 6-amino-acid sequence can be appended to or inserted within a gene of interest. Upon expression, the encoded cysteine is modified to an aldehyde, which can be used as a chemical handle for a variety of applications such as fluorophore labeling. fGly, formylglycine.
Full size image (77 KB)To evaluate this strategy, we expressed recombinant proteins with either N- or C-terminal aldehyde tags in Escherichia coli. We explored three protein targets: a C-terminally tagged maltose-binding protein (MBP), an N-terminally tagged human growth hormone (hGH) and an N-terminally tagged mycobacterial sulfotransferase (Stf0). Additionally, we tested two variants of the aldehyde tag: a 13-amino-acid tag that represents a full-length sulfatase motif (LCTPSRGSLFTGR) and a 6-amino-acid tag (LCTPSR) that includes only the core conserved residues (Table 1). Although E. coli has endogenous FGE activity14, we coexpressed the tagged proteins with an additional prokaryotic FGE from Mycobacterium tuberculosis (GenBank accession NP_215226) to maximize formylglycine formation efficiency (see Supplementary Methods and Supplementary Table 1 online).
Tryptic digestion of the 13-mer aldehyde-tagged Stf0 (ald13-Stf0) allowed direct identification of formylglycine by mass spectrometry (Supplementary Fig. 1 online). The presence of formylglycine was further confirmed by treating the tryptic digest with methoxyamine (Compound 1
), a reagent that reacts selectively with aldehydes and ketones to form the corresponding oximes (Supplementary Fig. 1).
To quantify the extent of conversion of cysteine to formylglycine for all three proteins, we used an MS-based standard addition assay (see Supplementary Methods and Supplementary Fig. 2 online). The conversion efficiency was >90% for all but one of the constructs tested in this assay (Table 1). Considering that the sulfatase motif is natively found at an interior region of the sulfatase primary sequence, these results suggest that efficient aldehyde formation is independent of the tag's location within the protein. When ald6-MBP was expressed in E. coli without exogenous FGE, conversion efficiency dropped to 32
5% (s.d.; Supplementary Fig. 3 online).
We next used the aldehydes for site-specific protein modification with aminooxy- or hydrazide-functionalized probes. A panel of aldehyde-tagged proteins was robustly labeled with Alexa Fluor 647 C5-aminooxyacetamide (Invitrogen; Fig. 2a). By contrast, control proteins in which the critical cysteine had been mutated to alanine demonstrated no significant labeling. Additionally, we showed that aldehyde-tagged proteins can be reversibly modified with multiple epitopes. We reacted ald6-MBP with biotin hydrazide (Compound 2
), then displaced the N-acylhydrazone with a more thermodynamically stable oxime by subsequent reaction with an aminooxy-conjugated Flag peptide (Fig. 2b). This process enables sequential modification of proteins with independent tags for purification, detection or other applications.
Figure 2: Selective labeling of aldehyde-tagged proteins.
(a) Fluorophore labeling of aldehyde-tagged proteins. Top, labeling with Alexa Fluor 647 C5-aminooxyacetamide; bottom, protein loading was assessed by Sypro Ruby staining. (b) Tandem labeling of ald6-MBP with two different probes. Lane 1, protein incubated with biotin hydrazide; lane 2, protein incubated with biotin hydrazide and subsequently modified with methoxyamine; lane 3, protein incubated with biotin hydrazide and subsequently modified with aminooxy-Flag. Top, western blot probed with
-biotin antibody; middle, same western blot probed with
-Flag antibody; bottom, same western blot stained with Ponceau S. (c) PEGylation of ald6-Stf0 with 5 kDa (5K), 10 kDa (10K) or 20 kDa (20K) aminooxy-PEG. Black arrowhead, protein-PEG conjugate; white arrowhead, unreacted protein.
Modification of pharmaceutical proteins with polyethylene glycol (PEG) chains can increase proteolytic stability while decreasing renal clearance and immunogenicity15. As a final application, we demonstrated the aldehyde tag's utility for site-specific PEGylation by reaction of ald6-Stf0 and ald6-MBP with aminooxy-PEG polymers of various molecular weights (Fig. 2c and Supplementary Fig. 4 online).
For a protein modification technique to find widespread use among biologists, it must be practical, general and available. The aldehyde tag fulfills all of these requirements. The motif is installed using basic molecular biology protocols and can function at either terminus or at an interior position of the protein. Furthermore, many aldehyde-specific reagents are commercially available. We envision many applications of this technology, including the introduction of biophysical probes and epitope tags onto proteins, as well as surface attachment of proteins in the context of microarrays. Additionally, combinatorial use of the aldehyde tag with other bioorthogonal labeling methods might offer new avenues for protein engineering.
Note: Supplementary information and chemical compound information is available on the Nature Chemical Biology website.
Author Contributions
I.S.C. and B.L.C. carried out cloning, expression, purification and fluorescent tagging of the constructs. I.S.C. quantified conversion to fGly and performed multiple epitope assays. B.L.C. performed PEGylation assays. C.R.B. directed the project. All authors worked together to compose the manuscript.

