With the rare exceptions of pyrrolysine and selenocysteine, a standard set of 20 amino acid building blocks, containing a limited number of functional groups, is used by almost all organisms for the biosynthesis of proteins. The use of Genetic Code Expansion technology to enable the site-specific incorporation of noncanonical amino acids (ncAAs) into proteins in living cells has transformed our ability to study biological processes and provided the exciting potential to develop modern medicines1,2,3,4,5,6,7,8. The genetic encoding of ncAAs with distinct chemical, biological, and physical properties requires the engineering of bioorthogonal translational machinery, consisting of an evolved aminoacyl-tRNA synthetase/tRNA pair and a “blank” codon1,6,9,10. The high intracellular concentration of ncAA required to render this machinery operative has usually been achieved via chemical synthesis of the ncAA and its exogenous addition at high levels to the cell culture medium. Although most ncAAs could penetrate cell membrane for the genetic incorporation, ncAAs bearing negative charges or polar structures normally exhibit a low cell penetration efficiency11,12,13,14. The relatively low intracellular concentrations of these ncAAs greatly limit the efficiency of ncAA incorporation into proteins using the Genetic Code Expansion technology12,13,15,16,17,18.

Strategies for engineering the structures of ncAAs or ncAA-binding proteins have been employed to improve the cellular uptake of ncAAs. In 2017, the Schultz group adopted a dipeptide strategy to enable the cellular uptake of phosphotyrosine. The phosphotyrosine-containing dipeptide can be synthesized and transported into cells via an adenosine triphosphate (ATP)-binding cassette transporter, followed by hydrolysis of the dipeptide by nonspecific intracellular peptidases13. In the same year, the Wang lab developed a two-step strategy for producing proteins with site-specific tyrosine phosphorylation14. This strategy utilized the incorporation of a phosphotyrosine anologue with a cage group, followed by chemical deprotection of the purified proteins. However, the synthesis and purification of these dipeptides are challenging, and the required post-purification treatments limit the applicability of this methodology to efficiently incorporate phosphotyrosine in living cells. As an alternative approach, periplasmic binding proteins (PBPs) have been engineered to have improved affinities for specific ncAAs16. These mutant PBPs enhanced uptake of the respective ncAAs up to fivefold, as evidenced by elevated intracellular ncAA concentrations and the yield of ncAA-containing green fluorescent proteins16. Nevertheless, the engineered PBP species are only applicable to a subset of ncAAs, and exogenous feeding of high concentrations of the ncAAs is still required. The problem of ncAA uptake could potentially be bypassed by intracellular biosynthesis of the ncAAs from basic carbon sources12,19,20,21,22,23,24,25. For example, phosphothreonine (pThr) cannot be detected intracellularly even when cells are incubated with 1 mM pThr12. The Chin group overcame the membrane impermeability of pThr by introducing the Salmonella enterica kinase, PduX, which converts L-threonine to pThr intracellularly12. This biosynthesis of pThr generated intracellular pThr at levels >1 mM, sufficient for genetic incorporation of this amino acid12. A similar strategy was recently applied to the creation of autonomous bacterial cells that can biosynthesize and genetically incorporate p-amino-phenylalanine (pAF), 5-hydroxyl-tryptophan (5HTP) and dihydroxyphenylalanine (DOPA), although no autonomous eukaryotic cells have been reported19,20,22,23,26. We see there that additional biosynthetic pathways for producing polar or negatively-charged ncAAs would greatly expand the utility of genetic code expansion methods.

Tyrosine sulfation is an important post-translational modification of proteins that is essential for a variety of biomolecular interactions, including chemotaxis, viral infection, anti-coagulation, cell adhesion, and plant immunity27,28,29,30,31,32,33,34. Despite its importance and ubiquity, protein sulfation has been difficult to study due to the lack of general methods for preparing proteins with defined sulfated residues31,35. To circumvent this challenge, efforts have been previously made to site-specifically incorporate sulfotyrosine (sTyr) using the Genetic Code Expansion technology36. The resulting sTyr incorporation systems have enabled several applications, including generation of therapeutic proteins with defined sulfated tyrosines, evolution of sulfated anti-gp120 antibodies, and confirmation of tyrosine sulfation sites35,37,38,39,40. To achieve reasonsble expression levels of sulfated proteins in E. coli, however, most studies have required the exogenous feeding of 3–20 mM sTyr to compensate for low intracellular uptake of extracellular sTyr37,40.

Here, we report the generation of metabolically modified prokaryotic and eukaryotic cells that can biosynthesize sTyr and incorporate it into proteins in a site-specific manner (Fig. 1a). sTyr is biosynthesized using a sulfotransferase discovered from a sequence similarity network (SSN). sTyr is subsequently incorporated into proteins in response to a repurposed stop codon. The molecular properties of this the sulfotransferase were explored using bioinformatics and computational approaches, revealing a loop structure and several residues in binding pocket within this enzyme responsible for its unique specificity for tyrosine. The further optimization of the genome and sTyr biosynthetic pathway of both prokaryotic and eukaryotic cells leads to greater expression yields of sulfated proteins than cells exogenously fed with sTyr. The utility of these sTyr autonomous cells is demonstrated by using them to produce highly potent thrombin inhibitors.

Fig. 1: Discovery of tyrosine sulfotransferase from sequence similar network.
figure 1

a sTyr was biosynthesized from tyrosine and PAPS in the presence of sulfotransferase identified in this study. The resulting biosynthesized sTyr was site-specifically incorporated into thrombin inhibitors, yielding enhanced thrombin inhibition b Sequence similar network (SSN) generated by EFI-EST server with RnSULT1A1 as an input sequence and E value of 5. Each circle stands for a representative node containing sequences with over 80% identity. Edges detection threshold was set at an alignment score of 110. The upper and lower yellow representative nodes are RnSULT1A1 (P17988) and HsSULT1C2 (O00338), respectively. c Schematic representation of reported sulfation reactions of P17988 and O00338. d Screening of tyrosine sulfotransferases with green fluorescent protein assay. All tested proteins are included in the representative nodes of b and NnSULT1C1 is the protein with red label (b). d Data are plotted as the means from n = 2 independent samples. a.u. stands for arbitrary unit.


Discovery of tyrosine sulfotransferase using a sequence similarity network

In nature, sulfotransferases allow many organisms to utilize an active form of sulfate, 3′-phosphoadenosine-5′-phosphosulfate (PAPS), for biosynthetic purposes41,42. Based on their substrate preference and cellular location, sulfotransferases can be grouped into three major families, tyrosylprotein sulfotransferase (TPST), cytosolic sulfotransferase (SULT), and carbohydrate sulfotransferase43,44. To identify the enzyme responsible for sulfation of cytoplasmic tyrosine, we focused on SULTs. These enzymes catalyze sulfation of a wide variety of endogenous compounds, including hormones, neurotransmitters, and xenobiotics43. Based on their reported substrate specificities, we examined SULT1A1 and SULT1A3 from Homo sapiens, SULT1A1 from Rattus norvegicus, and SULT1C1 from Gallus gallus43,45, all of which are known to recognize multiple phenolic substrates. To explore the activity of these sulfotransferases toward tyrosine, we used a green fluorescent protein assay20,46. These four sulfotransferase genes were codon-optimized for Escherichia coli and cloned into the pBad vector with DNA oligos in Supplementary Data 1. To generate a suppression plasmid for sTyr incorporation, we used pUltra-sTyr plasmid encoding the engineered Methanococcus jannaschii tyrosyl-tRNA synthetase (sTyrRS) and its corresponding MjtRNATyrCUA36,40. The suppressor plasmid (pUltra-sTyr) was used to suppress the amber codon (Asp134TAG) within a sfGFP variant encoded by the pLei-sfGFP134TAG plasmid in the presence of sTyr. Expression of full-length sfGFP was carried out in LB medium for 16 h in parallel with controls BL21(DE3) harboring pUltra-sTyr, pLei-sfGFP134TAG and pBad-Empty in the presence and absence of exogenously fed 1 mM sTyr. As expected, sfGFP was expressed in the presence of 1 mM sTyr fed in controls cells (Supplementary Fig. 1). Unfortunately, none of these four sulfotransferases led to sfGFP expression, indicating the failure of the biosynthesis of sTyr. To circumvent the limited substrate range of the reported sulfotransferases, we accessed the full repertoire of protein sequence diversity in nature by using a sequence similarity network (SSN, Fig. 1b)47. SSNs provide an effective way to visualize and analyze the relatedness of massive protein sequences on the basis of similarity thresholds of their amino acid sequences48. We initially created an SSN with EFI-ESI based on SULT1A1 from Rattus norvegicus as an input sequence, since its cognate substrate p-coumaric acid is similar to tyrosine (Fig. 1b, c)45. An alignment score of 110 was set to limit the edges and a sequence identity of 80% was used to generate representative nodes, which resulted in a final SSN of 391 representative enzyme sequences. Interestingly, we found that human SULT1C2, whose substrate is tyramine, was in a different cluster of the SSN (Fig. 1b, c)43. We hypothesized that enzymes with high sequence similarity to SULT1A1 from Rattus norvegicus and SULT1C2 from Homo sapiens would be potential candidates to carry out the sulfation of tyrosine. To test this hypothesis, we selected 27 sequences from the SSN based on their similarity to both RnSULT1A1 and HsSULT1C2. These selected genes were cloned into the pBad vector and tested with the green fluorescent protein assay. To our delight, a 2.5-fold increase in fluorescence was observed for cells expressing A0A091VQH7 compared to cells not given exogenous sTyr, suggesting that sTyr was biosynthesized intracellularly and incorporated into sfGFP proteins (Fig. 1d). A0A091VQH7 is a putative sulfotransferase from Nipponia nippon, with over 90% sequence identity with SULT1C1 reported in other species. Thus, we name A0A091VQH7 as NnSULT1C1 hereafter49.

Molecular basis of NnSULT1C1 action in the sulfation of tyrosine

To explore the origin of the unique tyrosine specificity of NnSULT1C1 among all the sulfotransferases tested, we analyzed the phylogenetic relationships of the enzymes. Sulfotransferase amino acid sequences were used to generate a phylogenetic tree using the unweighted pair group method with arithmetic mean by MEGA X software package (Supplementary Fig. 2)50. The tree is subdivided into three major subfamilies, among which NnSULT1C1 falls into subfamily I containing bird sulfotransferases. Most sequences from subfamilies II and III are derived from rodent and primate groups, respectively. To further analyze the molecular basis of the unique tyrosine specificity of NnSULT1C1, we performed a multiple sequence alignment of all sequences within subfamily I of the phylogenetic tree (Supplementary Fig. 3). This sequence alignment revealed that most regions of NnSULT1C1, including the PAPS-binding site, are highly conserved except for a highly variable region corresponding to NnSULT1C1 residues 94-102 (SIQEPPAAS) and residues likely involved in substrate binding pocket51,52. To explore the contribution of this highly variable region to substrate binding, the structure of NnSULT1C1 was predicted via Alphafold 2. Alphafold 2 is a machine learning approach that has been shown to predict protein structure with a high degree of accuracy53,54,55,56. More than 90% of the residues in the predicted NnSULT1C1 structure show Local Distance Difference Test values over 90, indicating they have a significant likelihood of predicting structure with a very high accuracy. Similar to the structures of other SULTs, the overall predicted structure of NnSULT1C1 is composed of classical α/β motifs (Fig. 2a)57,58. This structure includes a β sheet surrounded by α-helices, giving rise to a narrow substrate-binding site (Fig. 2a)59. We found that the highly variable region (94–102 residue) of NnSULT1C1 constitutes a loop for the substrate entry, which also aligns with the substrate entry loop of human SULT (Supplementary Fig. 4)43. The deletion of this loop on NnSULT1C1, however, only results in 22% decrease of its activity to produce fluorescent protein with sTyr (Fig. 2b).

Fig. 2: Exploring the mechanism of unique tyrosine specificity of NnSULT1C1.
figure 2

a NnSULT1C1 structure (blue) predicted by AlphaFold2 and its active site consisting PAPS and Tyr. Tyr was docked into NnSULT1C1 containing PAPS by Glide v8.1 in Schrödinger software. b Green fluorescent protein assay with wildtype NnSULT1C1 (wt) or NnSULT1C1 without the SIQEPPAAS (Δloop), p = 0.01. c Green fluorescent protein assay with wildtype NnSULT1C1 (wt) or NnSULT1C1 with alanine mutation at indicated resiudes, p = 0.004. d Structural similarity search of NnSULT1C1 using the PDBeFold web server. e Characterization of Tyr docking with NnSULT1C1 and its structurally similar sulfotransferases via docking score and nucleophilic attack distance. Docking scores were calculated using Glide v8.1 in Schrödinger software. Nucleophilic attack distance was defined as the distances between Tyr phenolic alcohol and PAPS sulfonate. f Comparison of tyrosine sulfation activity of NnSULT1C1 and its structurally similar sulfotransferases using green fluorescent protein assay. Cells without any sulfotransferase (−) were used as control. g–j Tyr docking position with NnSULT1C1 (g), mSULT1D1 (h), hSULT1A3 (i), and hSULT1C2 (j). PAPS and Tyr are shown as sticks with green carbon. Docking was performed by Glide v8.1 in Schrödinger software with the same parameters in a. b, c, f Data are plotted as the mean ± standard deviation from n = 3 independent samples. b, c Two-sided unpaired t-tests were performed with *p < 0.05; **p < 0.01. a.u. stands for arbitrary unit.

To further explore the other residues involved in substrate binding of NnSULT1C1, we performed protein-ligand docking using Glide v8.1 in Schrödinger software package v2018.460. The Tyr was docked to the NnSULT1C1 using OPLS_3 force field and the lowest energy pose was monitored61. For each docking experiment, 200 maximum output poses for each protein were set and Emodel energy was used for ranking the top 50 poses. The docking structure suggests that the α-amino group of Tyr is stabilized by NnSULT1C1 residues Glu161, Thr30, Ile33, and Trp93. The π-π stacking interactions between Tyr and Phe90 are likely to improve the packing interaction (Fig. 2a). The phenolic hydroxy group of Tyr is in the proper Lys-Lys-His catalytic site to engage in sulfuryl transfer. The His120 residue serves as a catalytic base that can remove the proton from Tyr. The Lys57 and Lys118 residues interact with and stabilize the sulfuryl group of PAPS and the phenolic hydroxy group of Tyr, respectively. To validate the contribution of these residues interacting with Tyr on NnSULT1C1 activity, Thr30, Ile33, Trp93, and Glu161 were mutated to alanine separately. Alanine mutation at Thr30, Trp93, or Glu161 significantly decreased the activity of NnSULT1C1 (Fig. 2c). Among these residues, the E161A mutation exhibits the largest decrease in activity, confirming its important interaction with Tyr. To further explore whether other sulfotransferases may also carry out the tyrosine sulfation, we performed a structure similarity search using the PDBeFold ( Based on the Q score, the three proteins with structures most similar to NnSULT1C1 are mouse SULT1D1 (pdb: 2zvq,, human SULT1A3 (pdb: 2a3r, and human SULT1C2 (pdb: 2gwh,, Fig. 2d). The overall secondary structure of NnSULT1C1 aligned well with 2zvq, which indicates its structural consistency with the other SULTs (Supplementary Fig. 5). To further illustrate the unique specificity of NnSULT1C1 for Tyr, dockings of Tyr to the most similar sulfotransferases, including mSULT1D1, hSULT1A3, and hSULT1C2, were carried out using Glide v8.1 in the Schrödinger software package v2018.4 following the same method of Tyr docking used for NnSULT1C160. Docking of Tyr to NnSULT1C1 exhibits the lowest Glide Docking score of −6.88 and the closest distance between the phenolic hydroxyl group and PAPS sulfonate (Fig. 2e). This result is consistent with the optimal ability of NnSULT1C1, to generate sTyr-containing sfGFP in the green fluorescent protein assay among all tested sulfotransferases (Fig. 2f). The key step of the sulfotransfer reaction involves an SN2-type nucleophilic attack on the PAPS sulfonate by the phenoxide of Tyr. Compared with mSULT1D1, hSULT1A3, and hSULT1C2, the docking of Tyr in NnSULT1C1 results in the closest distance (3.6 Å) between the sulfur atom of PAPS and the phenolic hydroxyl group (Fig. 2g–j). Furthermore, the acceptor phenolic hydroxyl group of Tyr lies on the backside of the S-O bond of PAPS in the Tyr docking structure with NnSULT1C1, indicating a more proper orientation for the nucleophilic attack (Fig. 2g).

Biosynthesis and genetic encoding of sTyr in Escherichia coli

Having identified NnSULT1C1 as a functional tyrosine sulfotransferase, we explored whether the biosynthesized sTyr can be genetically incorporated into proteins in E. coli in response to the amber codon. As an initial goal, we wanted to increase sTyr production in these cells in order to optimize its availability for incorporation into proteins. Since NnSULT1C1 utilizes tyrosine and PAPS for producing sTyr, we quantified sTyr production in five knockout E. coli cell lines in which the gene knockout has been shown to improve the yield of either tyrosine or PAPS in E. coli62,63,64,65. To evaluate the effect of knocking out these genes on the biosynthesis of sTyr, we transformed the suppression plasmid pUltra-sTyr, reporter plasmid pET22b-T5-sfGFP151TAG, and the biosynthesis plasmid pEvol-NnSULT1C1 into wildtype E. coli BW25113 or knockout strains (Fig. 3a). The expression of sfGFP with sTyr at position 151 (sfGFP-sTyr) was carried out in LB medium for 18 h. To our delight, we found that knockout of the cysH gene significantly improved the production of sTyr-containing sfGFP, compared to that seen in the wildtype BW25113 strain (Fig. 3b). CysH encodes the PAPS sulfotransferase responsible for degradation of PAPS to 3′-phosphoadenosine-5′-phosphate (PAP). This observation of enhanced sfGFP-sTyr production in BW25113ΔcysH is consistent with the previous report that knockout of cysH gene can increase cellular PAPS concentration and the production of sulfated products in E. coli65,66. Next, we examined whether manipulation of PAPS synthetic and recycling pathways in E. coli could further enhance intracellular PAPS levels. We amplified the gene cysDNC encoding adenosine-5′-triphosphate (ATP) sulfurylase and adenosine 5′-phosphosulfate kinase to increase the intracellular level of PAPS, followed by the introduction of the gene cycQ encoding adenosine‐3′,5′‐diphosphate (PAP) nucleotidase for PAP recycling45,66,67,68. We found that cells expressing all these genes exhibited the largest increase in fluorescence, suggesting a higher expression level of sfGFP-sTyr (Fig. 3c). The NnSULT1C1 expression level has a significant influence on the production of sfGFP-sTyr, since we found that the concentration of NnSULT1C1 inducer is important. Among all L-arabinose concentrations tested, NnSULT1C1 expression induced by 15 mg/L L-arabinose yielded the highest production of sfGFP-sTyr, even higher than cells with 27 mM external sTyr addition35,36,37,38,39,40. Thus, the addition of 15 mg/L L-arabinose was used in future experiments. (Fig. 3d). We also screened other conditions for sfGFP-sTyr expression, including expression medium, Tyr addition, SO42− addition, glycerol addition, which did not alter the expression level of sfGFP-sTyr (Supplementary Fig. 6). To examine the contribution of the biosynthetic pathway to intracellular sTyr concentration, we measured the intracellular sTyr concentrations in cells when sTyr was either biosynthesized or delivered via exogenous feeding. To our delight, the cellular concentration of sTyr in cells endowed with the sTyr biosynthetic pathway is 756.3 μM, which is 28-fold higher than that from cells exogenously fed with 1 mM sTyr and higher even than in cells fed with 27 mM sTyr (Fig. 3e). Consistent with these intracellular levels of sTyr, endogenous biosynthesis of sTyr results in much higher sfGFP-sTyr expression than that produced via exogenous feeding (Fig. 3d, e). To further investigate the efficiency and specificity of incorporation of biosynthesized sTyr in these autonomous E. coli cells, sfGFP-sTyr proteins derived from exogenously fed sTyr and from biosynthesized sTyr were purified by Ni2+-NTA affinity chromatography and characterized by SDS-PAGE and ESI-MS. Intact sfGFP was only expressed after exogenous sTyr feeding or after induction of sTyr biosynthesis. The yield of sfGFP-sTyr derived from biosynthetic sTyr is 5.67 mg/L sfGFP-sTyr under the optimal condition, compared with 1.5 mg/L sfGFP-sTyr produced by feeding with 1 mM exogenous sTyr (Fig. 3f). The mass of sfGFP-sTyr produced from biosynthetic sTyr was 27, 674 Da, which is in good agreement with the calculated mass. (Fig. 3g, h). To test the activity of NnSULT1C1 in vitro, its kinetics values were measured. It exhibits a Km, Vmax, and Kcat of 0.60 μM, 85.76 nmol/min/mg and 3.08 min−1, respectively (Supplementary Fig. 7). Its catalytic efficiency (Vmax/Km = 85.36 s−1 mM−1) is comparable with the activity of human SULT1C1 reported previously69,70.

Fig. 3: Generation of completely autonomous sTyr synthesizing E. coli.
figure 3

a Schematic representation of genetic circuits used for generating completely autonomous sTyr synthesizing E. coli. b Screening of the knockout strains for sfGFP-sTyr production after the expression of NnSULT1C1. c The roles of PAPS recycling enzymes in producing sfGFP-sTyr using ΔcysH BW25113 strain. d Production of sfGFP-sTyr from cells with the addition of chemically synthesized sTyr or the biosynthesized Tyr. The effect of NnSULT1C1 expression level on producing sfGFP-sTyr was screened by altering the concentration of inducer, L-arabinose (L-ara), p = 0.008. e Cellular concentrations of sTyr of cells with the addition of chemically synthesized sTyr or the biosynthesis of sTyr. f SDS-PAGE analysis of sfGFPs expressed in LB in the presence (+) or absence (−) of exogenous 1 mM sTyr or when inducing NnSULT1C1 expression (bio). g, h ESI-MS analysis of sfGFP-sTyr proteins expressed in cells with the addition of 1 mM chemically synthesized sTyr or the biosynthesis of sTyr. be Data are plotted as the mean + standard deviation from n = 3 independent samples. d Two-sided unpaired t-tests were performed with **p < 0.01. a.u. stands for arbitrary unit.

Biosynthesis and genetic incorporation of sTyr in mammalian cells

Post-translational tyrosine sulfation occurs exclusively in eukaryotes. Although this modification has been estimated to occur on 1% of all tyrosine residues in eukaryotic proteomes, its functional significance is not well understood41,71,72. One approach to determine the biological importance of protein tyrosine sulfation is to express sulfated protein in living cells in a site-specific and homogeneous fashion, a goal that is difficult to achieve by chemical synthesis or recombinant expression. Genetic code expansion based on E. coli-derived tyrosyl-tRNA synthetase (EcTyrRS)/tRNA has been proven to overcome these challenges by site-specifically incorporating sTyr in proteins in mammalian cells73,74. To promote the efficient expression of mammalian proteins sulfated on specific tyrosines, we have generated mammalian cells equipped with both sTyr biosynthetic and translational machinery. To generate mammalian cells capable of biosynthesizing sTyr, we used piggybac system to stably integrate NnSULT1C1 into the genome of HEK293T cells, yielding the HEK293T- NnSULT1C1 cell line (Fig. 4a)75. The EcTyrRS/tRNA pair was used to construct pAcBac2.tR4-sTyrRS/EGFP*, containing EGFP with a stop codon at position 39 as well as two copies of E. coli and Bacillus stearothermophilus tRNACUATyr (Fig. 4a)76. To evaluate the function of NnSULT1C1 in mammalian cells, pAcBac2.tR4-sTyrRS/EGFP* was transfected into HEK293T and HEK293T- NnSULT1C1 cells, which were then incubated in the presence or absence of exogenous sTyr. The expression of EGFP was monitored by confocal microscopy 2 days after transfection. As expected, the addition of 1 mM sTyr to HEK293T cells resulted in moderate expression of full-length EGFP, while minimal EGFP fluorescence was observed in the absence of sTyr addition (Fig. 4b). Gratifyingly, higher expression of EGFP was observed in HEK293T-NnSULT1C1 cells without exogenous sTyr addition than that seen in HEK293T cells fed with 3 mM sTyr. In addition to confocal imaging, flow cytometry was used to quantify expression levels of EGFP in cells fed with exogenous sTyr and in cells biosynthesizing sTyr. As shown in Fig. 4c and Supplementary Fig. 8, significantly higher EGFP fluorescence was observed in HEK293T-NnSULT1C1 cells endowed with sTyr biosynthetic capability than in HEK293T cells fed with 3 mM sTyr. As direct evidence of sTyr biosynthesis in mammalian cells, cellular sTyr concentration in HEK293T-NnSULT1C1 is more than that in HEK293T cells fed with 3 mM sTyr (Supplementary Fig. 9). The fidelity of site-specific incorporation of sTyr was evaluated by mass spectral analysis of purified sTyr-containing EGFP proteins. The observed mass was 29, 761 Da, consistent with the calculated mass of EGFP with sTyr at position 39 and observed mass of EGFP39sTyr purified from HEK293T with external sTyr addition (Fig. 4d and Supplementary Fig. 10). These results demonstrate that the generation of mammalian cells autonomously able to biosynthesize sTyr and incorporate it into proteins significantly enhances the expression level of sTyr-containing protein in mammalian cells.

Fig. 4: Generation of completely autonomous mammalian cells with sTyr-containing proteins.
figure 4

a Schematic representation of genetic circuits used for generating completely autonomous mammalian cells with sTyr-containing proteins. b Confocal images of HEK293T (exogenously fed) and HEK293T- NnSULT1C1 (bio) cells expressing sTyrRS, tRNACUA and EGFP containing an amber codon at Tyr39 position. Scale bar = 20 μm. c Flow cytometric analysis of EGFP expression levels of HEK293T (exogenously fed) and HEK293T- NnSULT1C1 (bio) cells with sTyrRS, tRNACUA and EGFP containing an amber codon at Tyr39 position. The normalized fluorescence was calculated by multiplying the geometric mean fluorescence by the percentage of EGFP-positive cells. Error bars represent standard deviations. d Mass spectra analysis of EGFP with sTyr (EGFP-39-sTyr) purified from HEK293T- NnSULT1C1 cells. c Data are plotted as the mean ± standard deviation from n = 3 independent samples. c Two-sided unpaired t-tests were performed with p = 0.001. **p < 0.01. a.u. stands for arbitrary unit.

Using completely autonomous sTyr biosynthetic cells to synthesize potent thrombin inhibitors with site-specific sulfation

Thrombin inhibitors represent an important class of anticoagulants used to prevent blood clotting. In addition, several thrombin inhibitors from hematophagous organisms have been shown to facilitate the acquisition and digestion of bloodmeal77,78,79. Recent studies have reported that post-translational sulfation of these proteins has a dramatic effect on their inhibitory activity31,80. For example, tyrosine sulfation of hirudin increases its affinity for thrombin by more than 10-fold80,81. Tyrosine sulfation of madanin-1 and chimadanin significantly increases their affinities for thrombin by promoting strong electrostatic interactions with positively-charged residues (Fig. 5a). Current methods for studying these site-specifically sulfated thrombin inhibitors rely heavily on solid-phase peptide synthesis and subsequent chemical ligation, processes that are time-consuming and may result in sub-optimal protein folding31,82,83. To explore the generation sTyr-containing thrombin inhibitors using cells endowed with autonomous sTyr biosynthetic machinery, we chose both madanin-1 and chimadanin identified in the salivary gland of haemaphysalis longicornis (Fig. 5b)31,84. As shown in Fig. 5a, sulfation of madanin-1 converts Tyr32 and Tyr35 to negative residues, thus enhancing madanin-1’s direct electrostatic interaction with the ε-amino groups of K236 and K240 located within the exosite II site of thrombin. To express the site-specifically sulfated thrombin inhibitors, we constructed plasmids encoding the thrombin inhibitor and substituted with amber codons at either or both of the indicated Tyr sites. sTyr-containing inhibitors were expressed by transforming ΔcysH BW25113 cells with pEvol-NnSULT1C1-cysDNCQ, pUltra-sTyr, and a plasmid encoding the thrombin inhibitor. In parallel, we utilized the ΔcysH BW25113 cells lacking the sTyr biosynthetic systems but exogenously fed with 3 mM sTyr. The site-specific sulfation of madanin-1 and chimadanin was further validated using SDS-PAGE and ESI-MS analysis (Fig. 5c and Supplementary Figs. 1113).

Fig. 5: Production of thrombin inhibitors with site-specific sTyr insertion using completely autonomous E. coli.
figure 5

a Madanin-1 with sulfation at Tyr32 and Tyr35 positions binds to exosite II site in thrombin, analyzed from PDB: 5L6N ( Surface representation of positive electrostatic potential in blue and negative electrostatic potential in red. b Amino acid sequences of madanin-1 and chimadanin. Sulfation sites are shown in red. c SDS-PAGE analysis of thrombin inhibitors with site-specific sTyr insertion expressed in completely autonomous E. coli. d, e Inhibition of thrombin activity by madanin-1 and chimadanin proteins. d, e Data were plot as mean ± standard error from n = 3 independent samples and fitted into Morrison model.

To test the thrombin inhibiting activity of the wildtype inhibitors and their sTyr-containing mutants, we performed chromogenic thrombin amidolytic activity assays in the presence of a range of concentrations of each inhibitor. Compared with wildtype madanin-1 (Ki = 16.0 ± 0.9 nM), incorporation of a single sTyr at either Tyr32 (Ki = 1.3 ± 0.1 nM) or Tyr35 (Ki = 6.1  ± 0.6 nM) position significantly enhanced its inhibition of thrombin (Fig. 5d and Supplementary Fig. 14). To our delight, madanin-1 mutants sulfated at both Tyr32 and Tyr35 exhibited the highest potency (Ki = 0.5 ± 0.1 nM) against thrombin activity (Fig. 5d and Supplementary Fig. 14). Following a similar trend, incorporating a single biosynthesized sTyr at either Tyr28 or Tyr31 of chimadanin yields more potent inhibition of thrombin activity (Ki = 0.6 ± 0.1 nM and 1.5 ± 0.1 nM, respectively) than achieved with wildtype chimadanin (Ki = 12.9 ± 0.1 nM, Fig. 5e and Supplementary Fig. 14). Double sulfation of chimadanin at both Tyr28 and Tyr31 further improved its Ki to 0.1 nM, consistent with the madanin-1 study (Fig. 5e and Supplementary Fig. 14). Furthermore, sTyr-containing thrombin inhibitors prepared using cells with completely autonomous sTyr biosynthetic machinery are more potent than chemically synthesized ones31. This may due to the fact that co-translational folding is more efficient than that achieved via chemical synhesis. These data demonstrate the advantages of producing therapeutic proteins with site-specific sTyr modifications using completely autonomous cells with the ability to biosynthesize and genetically encode the sTyr.


In this research, we have generated completely autonomous bacterial and mammalian cells endowed with machinery for both sTyr biosynthesis and site-specific incorporation into proteins. NnSULT1C1-mediated biosynthesis of sTyr from tyrosine and PAPS was discovered using a SSN, and the unique specificity of NnSULT1C1 for tyrosine was systematically explored using both bioinformatic and computational methods. Use of NnSULT1C1 and other optimized components allowed us to engineer both bacterial and mammalian cells capable of autonomously biosynthesizing sTyr and genetically incorporating it into proteins. The resulting cells produce site-specifically sulfated proteins at higher yields than cells exogenously fed with 3–27 mM sTyr. The value of these completely autonomous cells was further demonstrated via their use in the preparation of therapeutic sTyr-containing proteins with enhanced efficacy.

More than 300 ncAAs have been genetically incorporated into proteins in a site-specific manner, providing powerful tools for investigating protein structures and functions1,2,3,6,85,86,87,88,89,90,91,92,93. To date, utilizing these ncAAs in the context of Genetic Code Expansion has required both exogenous feeding and good membrane permeability of chemically-synthesized ncAAs. Cell membranes are poorly permeable to ncAAs with charged or polar structures. Thus, intracellular biosynthesis of these ncAAs is likely to significantly expand the utility of Genetic Code Expansion technology. Attempts to engineer cells for autonomous ncAA biosynthesis without external addition of precursors have frequently been hindered by the scarcity of verified biosynthetic pathways for producing ncAAs at high concentrations. For this reason, biosynthetic pathways for pAF, pThr, 5HTP, and DOPA are the only ones that have been applied to bacterial cells for intracellular ncAA biosynthesis from simple carbon sources12,19,20,21,22. We expect that the combination of bioinformatics and ncAA screening methods reported in this work can be a powerful strategy for enlarging the repertoire of biosynthesized ncAA for Genetic Code Expansion. Our study further reports the construction of a completely autonomous mammalian cell line capable of biosynthesizing sTyr and incorporating it into proteins in response to the amber codon. The creation of additional mammalian cells with the endogenous ability to biosynthesize ncAAs and use them for protein synthesis will expand the preparation of therapeutic proteins, as well as allow application of the Genetic Code Expansion technology at the level of whole organisms.


Sequence similarity network (SSN)

The SSN was generated by inputting the amino acid sequence of RnSULT1A1 as query sequence at The UniProt database was selected and the e value was set as 5. The resulting network was finalized by setting the alignment score threshold as 110 to generate edges representing pairwise sequence similarities. The representative node network with %ID of 80% was downloaded in the format of xgmml and visualized within Cytoscape.

Optimized expression of sfGFP-sTyr from sTyr biosynthesis

ΔcysH BW25113 cells, transformed with pUltra-sTyrRS, pET22b-T5-sfGFP151TAG, and pEvol-NnSULT1C1-cysDNCQ, were grown in Luria-Bertani (LB) medium at 37 °C. When the OD600 of the cell culture reached 0.6, NnSULT1C1 expression was induced by 15 mg/L l-arabinose and grown at 30 °C. After 6 h induction, the cells were diluted five times to OD 0.6. Expression of reporter sfGFP and sTyrRS were induced with 1 mM IPTG. Additional l-arabinose was also added to maintain its final concentration of 15 mg/L. The control cells transformed with pUltra-sTyrRS, pET22b-T5-sfGFP151TAG and pEvol-empty were grown under the same condition with an indicated concentration of sTyr. After growth at 30 °C for 18 h, cells were harvested by centrifugation at 4750 × g for 10 min and used for GFP fluorescence and cell optical density measurements. Proteins were purified on Ni-NTA resin (Qiagen) following the manufacturer’s instructions. The purified protein was used for SDS-PAGE and ESI-MS analysis.

Predicting the structure of NnSULT1C1 by AlphaFold

The structure of NnSULT1C1 was predicted by AlphaFold2 using GitHub AlphaFold code 2.0. The database, including reduced BFD, PDB70, MGnify, and Uniclust30, was used to filter structural templates. All other settings were set as default. Based on pLDDT, the top structure was output and used in this study.

Protein-ligand docking

The protein-ligand docking process was performed by Glide v8.1 using Schrödinger software package v2018.4. Glide uses the OPLS3 force field to evaluate the docking procedure. OPLS3 is an enhanced version of the OPLS_2005 all-atom force field to provide a larger coverage of organic functionality. Four protein structures, including 2zvq, 2a3r, 2gwh, and the predicted structure of NnSULT1C1, are taken into consideration for docking. The PAPS-binding site for the predicted NnSULT1C1 structure is inferred by aligning with the structure of 2a3r. For other structures, we used the original PAP sites in reported co-crystal structures to install PAPS. A short run of protein-ligand energy minimization was performed to remove the steric clashes for each of the complexes. The docking box was inferred from the position of dopamine in 2a3r. The RMSD is set to 0.5 to sample the distinct conformations. All parameters are set to default SP mode in the Glide software. The number of maximum output poses for each docking protein was set to 200 and the top 50 poses ranked by Emodel score were picked out. The best docking pose for each complex was compared using the Glide docking score, an empirical scoring function that approximates the ligand binding free energy in the unit of kcal/mol.

Characterization of HEK293T-NnSULT1C1 with confocal microscopy and flow cytometry

To generate HEK293T-NnSULT1C1, HEK293T were transfected with PB-NnSULT1C1 (100 ng) and Piggybac transposases plasmids (20 ng) with Polyjet In Vitro DNA Transfection Reagent (SignaGen Laboratories). 1 μg/mL puromycin was added to culture medium from Day 2 to Day 7 for selecting cells with genomic integration of NnSULT1C1. The puromycin concentration was raised to 3 μg/mL from Day 8 and maintained in the future. HEK293T and HEK293T-NnSULT1C1 cells were cultured in DMEM supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin at 37 °C and 5% CO2. HEK293T and HEK293T-NnSULT1C1 cells were transfected with pAcBac2.tR4-sTyrRS/GFP* with Polyjet In Vitro DNA Transfection Reagent (SignaGen Laboratories) in the presence or absence of the indicated concentration of sTyr. Mediums were changed 12–16 h after transfection. After 48 h of the transfection, cells were used for confocal microscopy where nucleus staining was performed by incubating cells with Hoechst 33342 (Life Technologies). After  being washed with PBS (pH 7.4) for three times, cells were imaged with Zeiss LSM710 confocal microscopy. The rest of cells were used for flow cytometry analysis with Sony SA3800 Flow Cytometer where a total of 20,000 cells were analyzed for each sample. Data were processed with FlowJo. Reported data are the average measurement of three independent samples prepared at the same time with the standard deviation.

Thrombin activity assay

N-(p-Tosyl)-GPR-pNA acetate (Cayman Chemicals) was used as a chromogenic substrate to test the amidolytic activity of human α-thrombin (Haematologic Technologies). Purified chi and mad inhibitors were buffer-exchanged to the assay buffer (pH 8) containing 50 mM Tris-HCl, 50 mM NaCl using PD-10 columns. Inhibition assays were performed in the assay buffer with 0.14 nM human α-thrombin, 100 μM substrate, and varying concentrations of inhibitors. The activity of thrombin was monitored by absorption at 405 nm. Inhibition constants (Ki) were determined based on a Morrison equation within GraphPad Prism. Three independent samples were prepared for each group.

Statistics and reproducibility

All statictics analysis were performed using GraphPad Prism. Similar results were obtained from three independent experiments.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.