Most genes are synthesized using seamless assembly methods that rely on the polymerase chain reaction1,2,3 (PCR). However, PCR of genes encoding repetitive proteins either fails or generates nonspecific products. Motivated by the need to efficiently generate new protein polymers through high-throughput gene synthesis, here we report a codon-scrambling algorithm that enables the PCR-based gene synthesis of repetitive proteins by exploiting the codon redundancy of amino acids and finding the least-repetitive synonymous gene sequence. We also show that the codon-scrambling problem is analogous to the well-known travelling salesman problem4, and obtain an exact solution to it by using De Bruijn graphs5 and a modern mixed integer linear programme solver. As experimental proof of the utility of this approach, we use it to optimize the synthetic genes for 19 repetitive proteins, and show that the gene fragments are amenable to PCR-based gene assembly and recombinant expression.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Ma, S., Tang, N. & Tian, J. DNA synthesis, assembly and applications in synthetic biology. Curr. Opin. Chem. Biol. 16, 260–267 (2012).
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods 6, 343–345 (2009).
Engler, C., Gruetzner, R., Kandzia, R. & Marillonnet, S. Golden gate shuffling: a one-pot DNA shuffling method based on type IIs restriction enzymes. PLoS ONE 4, e5553 (2009).
Laporte, G. The traveling salesman problem: an overview of exact and approximate algorithms. Eur. J. Oper. Res. 59, 231–247 (1992).
Pevzner, P. A., Tang, H. & Waterman, M. S. An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA 98, 9748–9753 (2001).
Kaplan, D. L., Mello, S. M., Arcidiacono, S., Fossey, S. &, S. K. in Protein Based Materials (eds McGrath, K. & Kaplan, D) 103–131 (Birkhäuser, 1998).
Cranford, S. W. & Buehler, M. J. Biomateriomics 165 (Springer, 2012).
McDaniel, J. R., Mackay, J. A., Quiroz, F. G. & Chilkoti, A. Recursive directional ligation by plasmid reconstruction allows rapid and seamless cloning of oligomeric genes. Biomacromolecules 11, 944–952 (2010).
Anderson, D. & Maugh, K. Escherichia coli expression vector encoding bioadhesive precursor protein analogs comprising three to twenty repeats of the decapeptide (Ala-Lys-Pro-Ser-Tyr-Pro-). US Patent 5,149,657 (1992).
Lyons, R. E. et al. Design and facile production of recombinant resilin-like polypeptides: gene construction and a rapid protein purification method. Protein Eng. Des. Sel. 20, 25–32 (2007).
Su, R. S.-C., Renner, J. N. & Liu, J. C. Synthesis and characterization of recombinant abductin-based proteins. Biomacromolecules 14, 4301–4308 (2013).
Cappello, J., Ferrari, F. & Richardson, C. Methods for preparing synthetic repetitive DNA. US Patent 5,641,648 (1997).
Cappello, J. & Causey, S. Peptides comprising repetitive units of amino acids and DNA sequences encoding the same. US Patent 6,018,030 (2000).
Widmaier, D. M. et al. Engineering the Salmonella type III secretion system to export spider silk monomers. Mol. Syst. Biol. 5, 309 (2009).
Tokareva, O., Michalczechen-Lacerda, V. A., Rech, E. L. & Kaplan, D. L. Recombinant DNA production of spider silk proteins. Microbiol. Biotechnol. 6, 651–663 (2013).
Amiram, M., Quiroz, F., Callahan, D. & Chilkoti, A. A highly parallel method for synthesizing DNA repeats enables the discovery of ‘smart’ protein polymers. Nature Mater. 10, 141–148 (2011).
Ousterout, D. G. et al. Reading frame correction by targeted genome editing restores dystrophin expression in cells from Duchenne muscular dystrophy patients. Mol. Ther. 21, 1718–1726 (2013).
Farmer, R. S., Top, A., Argust, L. M., Liu, S. & Kiick, K. L. Evaluation of conformation and association behavior of multivalent alanine-rich polypeptides. Pharm. Res. 25, 700–708 (2008).
McDaniel, J. R., Radford, D. C. & Chilkoti, A. A unified model for de novo design of elastin-like polypeptides with tunable inverse transition temperatures. Biomacromolecules 14, 2866–2872 (2013).
Shur, O. & Banta, S. Rearranging and concatenating a native RTX domain to understand sequence modularity. Protein Eng. Des. Sel. 26, 171–180 (2013).
Steiner, D., Forrer, P. & Plückthun, A. Efficient selection of DARPins with sub-nanomolar affinities using SRP phage display. J. Mol. Biol. 382, 1211–1227 (2008).
Lee, B. W. et al. Strongly binding cell-adhesive polypeptides of programmable valencies. Angew. Chem. Int. Ed. 49, 1971–1975 (2010).
Higashiya, S., Topilina, N. I., Ngo, S. C., Zagorevskii, D. & Welch, J. T. Design and preparation of β-sheet forming repetitive and block-copolymerized polypeptides. Biomacromolecules 8, 1487–1497 (2007).
Petka, W., Harden, J., McGrath, K., Wirtz, D. & Tirrell, D. Reversible hydrogels from self-assembling artificial proteins. Science 281, 389–392 (1998).
Davis, N. E., Ding, S., Forster, R. E., Pinkas, D. M. & Barron, A. E. Modular enzymatically crosslinked protein polymer hydrogels for in situ gelation. Biomaterials 31, 7288–7297 (2010).
Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nature Methods 11, 499–507 (2014).
Ma, S., Saaem, I. & Tian, J. Error correction in gene synthesis technology. Trends Biotechnol. 30, 147–154 (2012).
Hommelsheim, C. M., Frantzeskakis, L., Huang, M. & Ülker, B. PCR amplification of repetitive DNA: a limitation to genome editing technologies and many other applications. Sci. Rep. 4, 5052 (2014).
O’Brien, J. P. et al. in Silk Polymers Vol. 544, 10–104 (American Chemical Society, 1993).
Kurihara, H., Morita, T., Shinkai, M. & Nagamune, T. Recombinant extracellular matrix-like proteins with repetitive elastin or collagen-like functional motifs. Biotechnol. Lett. 27, 665–670 (2005).
Goeden-Wood, N. L., Conticello, V. P., Muller, S. J. & Keasling, J. D. Improved assembly of multimeric genes for the biosynthetic production of protein polymers. Biomacromolecules 3, 874–879 (2002).
Elmorjani, K. et al. Synthetic genes specifying periodic polymers modelled on the repetitive domain of wheat gliadins: conception and expression. Biochem. Biophys. Res. Commun. 239, 240–246 (1997).
Carlson, R. The changing economics of DNA synthesis. Nature Biotech. 27, 1091–1094 (2009).
Gendreau, M. & Potvin, J.-Y. Handbook of Metaheuristics Vol. 146 (Springer, 2010).
Whitaker, W. R., Lee, H., Arkin, A. P. & Dueber, J. E. Avoidance of truncated proteins from unintended ribosome binding sites within heterologous protein coding sequences. ACS Synth. Biol. 4, 249–257 (2015).
Meyer, D. E., Trabbic-Carlson, K. & Chilkoti, A. Protein purification by fusion with an environmentally responsive elastin-like polypeptide: effect of polypeptide length on the purification of thioredoxin. Biotechnol. Prog. 17, 720–728 (2001).
Meyer, D. E. & Chilkoti, A. Genetically encoded synthesis of protein-based polymers with precisely specified molecular weight and sequence by recursive directional ligation: examples from the elastin-like polypeptide system. Biomacromolecules 3, 357–367 (2002).
Tuller, T., Waldman, Y. Y., Kupiec, M. & Ruppin, E. Translation efficiency is determined by both codon bias and folding energy. Proc. Natl Acad. Sci. USA 107, 3645–3650 (2010).
Goodman, D. B., Church, G. M. & Kosuri, S. Causes and effects of N-terminal codon bias in bacterial genes. Science 342, 475–479 (2013).
Richardson, S. M., Wheelan, S. J., Yarrington, R. M. & Boeke, J. D. GeneDesign: rapid, automated design of multikilobase synthetic genes. Genome Res. 16, 550–556 (2006).
Markham, N. & Zuker, M. in Bioinformatics (ed. Keith, J.) Vol. 453, 3–31 (Humana Press, 2008).
We thank S. Mukherjee for valuable advice on mathematical optimization, and K. Dooley for useful discussions on soluble protein expression. This work was financially supported by the NIH through grant no. GM061232 to A.C. and by the NSF through the Research Triangle MRSEC (NSF DMR-11-21107). N.C.T. was supported by an NIH Biotechnology Training Grant (T32 GM008555).
The authors declare no competing financial interests.
About this article
Cite this article
Tang, N., Chilkoti, A. Combinatorial codon scrambling enables scalable gene synthesis and amplification of repetitive proteins. Nature Mater 15, 419–424 (2016). https://doi.org/10.1038/nmat4521
Nature Chemistry (2020)
Annual Review of Biomedical Engineering (2020)
Current Opinion in Biotechnology (2020)
Engineering the Architecture of Elastin‐Like Polypeptides: From Unimers to Hierarchical Self‐Assembly
Advanced Therapeutics (2020)
100th Anniversary of Macromolecular Science Viewpoint: Opportunities in the Physics of Sequence-Defined Polymers
ACS Macro Letters (2020)