Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Combinatorial codon scrambling enables scalable gene synthesis and amplification of repetitive proteins


Most genes are synthesized using seamless assembly methods that rely on the polymerase chain reaction1,2,3 (PCR). However, PCR of genes encoding repetitive proteins either fails or generates nonspecific products. Motivated by the need to efficiently generate new protein polymers through high-throughput gene synthesis, here we report a codon-scrambling algorithm that enables the PCR-based gene synthesis of repetitive proteins by exploiting the codon redundancy of amino acids and finding the least-repetitive synonymous gene sequence. We also show that the codon-scrambling problem is analogous to the well-known travelling salesman problem4, and obtain an exact solution to it by using De Bruijn graphs5 and a modern mixed integer linear programme solver. As experimental proof of the utility of this approach, we use it to optimize the synthetic genes for 19 repetitive proteins, and show that the gene fragments are amenable to PCR-based gene assembly and recombinant expression.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Computational analysis of codon scrambling.
Figure 2: Gene assembly of a diverse set of repetitive proteins.
Figure 3: Protein expression of a diverse set of repetitive proteins.


  1. 1

    Ma, S., Tang, N. & Tian, J. DNA synthesis, assembly and applications in synthetic biology. Curr. Opin. Chem. Biol. 16, 260–267 (2012).

    CAS  Article  Google Scholar 

  2. 2

    Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods 6, 343–345 (2009).

    CAS  Article  Google Scholar 

  3. 3

    Engler, C., Gruetzner, R., Kandzia, R. & Marillonnet, S. Golden gate shuffling: a one-pot DNA shuffling method based on type IIs restriction enzymes. PLoS ONE 4, e5553 (2009).

    Article  Google Scholar 

  4. 4

    Laporte, G. The traveling salesman problem: an overview of exact and approximate algorithms. Eur. J. Oper. Res. 59, 231–247 (1992).

    Article  Google Scholar 

  5. 5

    Pevzner, P. A., Tang, H. & Waterman, M. S. An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA 98, 9748–9753 (2001).

    CAS  Article  Google Scholar 

  6. 6

    Kaplan, D. L., Mello, S. M., Arcidiacono, S., Fossey, S. &, S. K. in Protein Based Materials (eds McGrath, K. & Kaplan, D) 103–131 (Birkhäuser, 1998).

  7. 7

    Cranford, S. W. & Buehler, M. J. Biomateriomics 165 (Springer, 2012).

    Book  Google Scholar 

  8. 8

    McDaniel, J. R., Mackay, J. A., Quiroz, F. G. & Chilkoti, A. Recursive directional ligation by plasmid reconstruction allows rapid and seamless cloning of oligomeric genes. Biomacromolecules 11, 944–952 (2010).

    CAS  Article  Google Scholar 

  9. 9

    Anderson, D. & Maugh, K. Escherichia coli expression vector encoding bioadhesive precursor protein analogs comprising three to twenty repeats of the decapeptide (Ala-Lys-Pro-Ser-Tyr-Pro-). US Patent 5,149,657 (1992).

  10. 10

    Lyons, R. E. et al. Design and facile production of recombinant resilin-like polypeptides: gene construction and a rapid protein purification method. Protein Eng. Des. Sel. 20, 25–32 (2007).

    CAS  Article  Google Scholar 

  11. 11

    Su, R. S.-C., Renner, J. N. & Liu, J. C. Synthesis and characterization of recombinant abductin-based proteins. Biomacromolecules 14, 4301–4308 (2013).

    CAS  Article  Google Scholar 

  12. 12

    Cappello, J., Ferrari, F. & Richardson, C. Methods for preparing synthetic repetitive DNA. US Patent 5,641,648 (1997).

  13. 13

    Cappello, J. & Causey, S. Peptides comprising repetitive units of amino acids and DNA sequences encoding the same. US Patent 6,018,030 (2000).

  14. 14

    Widmaier, D. M. et al. Engineering the Salmonella type III secretion system to export spider silk monomers. Mol. Syst. Biol. 5, 309 (2009).

    Article  Google Scholar 

  15. 15

    Tokareva, O., Michalczechen-Lacerda, V. A., Rech, E. L. & Kaplan, D. L. Recombinant DNA production of spider silk proteins. Microbiol. Biotechnol. 6, 651–663 (2013).

    CAS  Article  Google Scholar 

  16. 16

    Amiram, M., Quiroz, F., Callahan, D. & Chilkoti, A. A highly parallel method for synthesizing DNA repeats enables the discovery of ‘smart’ protein polymers. Nature Mater. 10, 141–148 (2011).

    CAS  Article  Google Scholar 

  17. 17

    Ousterout, D. G. et al. Reading frame correction by targeted genome editing restores dystrophin expression in cells from Duchenne muscular dystrophy patients. Mol. Ther. 21, 1718–1726 (2013).

    CAS  Article  Google Scholar 

  18. 18

    Farmer, R. S., Top, A., Argust, L. M., Liu, S. & Kiick, K. L. Evaluation of conformation and association behavior of multivalent alanine-rich polypeptides. Pharm. Res. 25, 700–708 (2008).

    CAS  Article  Google Scholar 

  19. 19

    McDaniel, J. R., Radford, D. C. & Chilkoti, A. A unified model for de novo design of elastin-like polypeptides with tunable inverse transition temperatures. Biomacromolecules 14, 2866–2872 (2013).

    CAS  Article  Google Scholar 

  20. 20

    Shur, O. & Banta, S. Rearranging and concatenating a native RTX domain to understand sequence modularity. Protein Eng. Des. Sel. 26, 171–180 (2013).

    CAS  Article  Google Scholar 

  21. 21

    Steiner, D., Forrer, P. & Plückthun, A. Efficient selection of DARPins with sub-nanomolar affinities using SRP phage display. J. Mol. Biol. 382, 1211–1227 (2008).

    CAS  Article  Google Scholar 

  22. 22

    Lee, B. W. et al. Strongly binding cell-adhesive polypeptides of programmable valencies. Angew. Chem. Int. Ed. 49, 1971–1975 (2010).

    CAS  Article  Google Scholar 

  23. 23

    Higashiya, S., Topilina, N. I., Ngo, S. C., Zagorevskii, D. & Welch, J. T. Design and preparation of β-sheet forming repetitive and block-copolymerized polypeptides. Biomacromolecules 8, 1487–1497 (2007).

    CAS  Article  Google Scholar 

  24. 24

    Petka, W., Harden, J., McGrath, K., Wirtz, D. & Tirrell, D. Reversible hydrogels from self-assembling artificial proteins. Science 281, 389–392 (1998).

    CAS  Article  Google Scholar 

  25. 25

    Davis, N. E., Ding, S., Forster, R. E., Pinkas, D. M. & Barron, A. E. Modular enzymatically crosslinked protein polymer hydrogels for in situ gelation. Biomaterials 31, 7288–7297 (2010).

    CAS  Article  Google Scholar 

  26. 26

    Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nature Methods 11, 499–507 (2014).

    CAS  Article  Google Scholar 

  27. 27

    Ma, S., Saaem, I. & Tian, J. Error correction in gene synthesis technology. Trends Biotechnol. 30, 147–154 (2012).

    CAS  Article  Google Scholar 

  28. 28

    Hommelsheim, C. M., Frantzeskakis, L., Huang, M. & Ülker, B. PCR amplification of repetitive DNA: a limitation to genome editing technologies and many other applications. Sci. Rep. 4, 5052 (2014).

    CAS  Article  Google Scholar 

  29. 29

    O’Brien, J. P. et al. in Silk Polymers Vol. 544, 10–104 (American Chemical Society, 1993).

    Google Scholar 

  30. 30

    Kurihara, H., Morita, T., Shinkai, M. & Nagamune, T. Recombinant extracellular matrix-like proteins with repetitive elastin or collagen-like functional motifs. Biotechnol. Lett. 27, 665–670 (2005).

    CAS  Article  Google Scholar 

  31. 31

    Goeden-Wood, N. L., Conticello, V. P., Muller, S. J. & Keasling, J. D. Improved assembly of multimeric genes for the biosynthetic production of protein polymers. Biomacromolecules 3, 874–879 (2002).

    CAS  Article  Google Scholar 

  32. 32

    Elmorjani, K. et al. Synthetic genes specifying periodic polymers modelled on the repetitive domain of wheat gliadins: conception and expression. Biochem. Biophys. Res. Commun. 239, 240–246 (1997).

    CAS  Article  Google Scholar 

  33. 33

    Carlson, R. The changing economics of DNA synthesis. Nature Biotech. 27, 1091–1094 (2009).

    CAS  Article  Google Scholar 

  34. 34

    Gendreau, M. & Potvin, J.-Y. Handbook of Metaheuristics Vol. 146 (Springer, 2010).

    Book  Google Scholar 

  35. 35

    Whitaker, W. R., Lee, H., Arkin, A. P. & Dueber, J. E. Avoidance of truncated proteins from unintended ribosome binding sites within heterologous protein coding sequences. ACS Synth. Biol. 4, 249–257 (2015).

    CAS  Article  Google Scholar 

  36. 36

    Meyer, D. E., Trabbic-Carlson, K. & Chilkoti, A. Protein purification by fusion with an environmentally responsive elastin-like polypeptide: effect of polypeptide length on the purification of thioredoxin. Biotechnol. Prog. 17, 720–728 (2001).

    CAS  Article  Google Scholar 

  37. 37

    Meyer, D. E. & Chilkoti, A. Genetically encoded synthesis of protein-based polymers with precisely specified molecular weight and sequence by recursive directional ligation: examples from the elastin-like polypeptide system. Biomacromolecules 3, 357–367 (2002).

    CAS  Article  Google Scholar 

  38. 38

    Tuller, T., Waldman, Y. Y., Kupiec, M. & Ruppin, E. Translation efficiency is determined by both codon bias and folding energy. Proc. Natl Acad. Sci. USA 107, 3645–3650 (2010).

    CAS  Article  Google Scholar 

  39. 39

    Goodman, D. B., Church, G. M. & Kosuri, S. Causes and effects of N-terminal codon bias in bacterial genes. Science 342, 475–479 (2013).

    CAS  Article  Google Scholar 

  40. 40

    Richardson, S. M., Wheelan, S. J., Yarrington, R. M. & Boeke, J. D. GeneDesign: rapid, automated design of multikilobase synthetic genes. Genome Res. 16, 550–556 (2006).

    CAS  Article  Google Scholar 

  41. 41

    Markham, N. & Zuker, M. in Bioinformatics (ed. Keith, J.) Vol. 453, 3–31 (Humana Press, 2008).

    Book  Google Scholar 

Download references


We thank S. Mukherjee for valuable advice on mathematical optimization, and K. Dooley for useful discussions on soluble protein expression. This work was financially supported by the NIH through grant no. GM061232 to A.C. and by the NSF through the Research Triangle MRSEC (NSF DMR-11-21107). N.C.T. was supported by an NIH Biotechnology Training Grant (T32 GM008555).

Author information




N.C.T. designed and performed experiments, developed algorithms, and prepared the manuscript. A.C. designed experiments and prepared the manuscript.

Corresponding author

Correspondence to Ashutosh Chilkoti.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

Supplementary Information (PDF 5579 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tang, N., Chilkoti, A. Combinatorial codon scrambling enables scalable gene synthesis and amplification of repetitive proteins. Nature Mater 15, 419–424 (2016).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing