Functional optimization of gene clusters by combinatorial design and assembly

Journal name:
Nature Biotechnology
Year published:
Published online


Large microbial gene clusters encode useful functions, including energy utilization and natural product biosynthesis, but genetic manipulation of such systems is slow, difficult and complicated by complex regulation. We exploit the modularity of a refactored Klebsiella oxytoca nitrogen fixation (nif) gene cluster (16 genes, 103 parts) to build genetic permutations that could not be achieved by starting from the wild-type cluster. Constraint-based combinatorial design and DNA assembly are used to build libraries of radically different cluster architectures by varying part choice, gene order, gene orientation and operon occupancy. We construct 84 variants of the nifUSVWZM operon, 145 variants of the nifHDKY operon, 155 variants of the nifHDKYENJ operon and 122 variants of the complete 16-gene pathway. The performance and behavior of these variants are characterized by nitrogenase assay and strand-specific RNA sequencing (RNA-seq), and the results are incorporated into subsequent design cycles. We have produced a fully synthetic cluster that recovers 57% of wild-type activity. Our approach allows the performance of genetic parts to be quantified simultaneously in hundreds of genetic contexts. This parallelized design-build-test-learn cycle, which can access previously unattainable regions of genetic space, should provide a useful, fast tool for genetic optimization and hypothesis testing.

At a glance


  1. Combinatorial design and construction of gene cluster libraries.
    Figure 1: Combinatorial design and construction of gene cluster libraries.

    (a) The design-build-test-learn cycle. (b) Cluster assembly steps (left) illustrate the application of different techniques at different stages of the hierarchy, and an assembly graph (right) traces the path from parts to complete clusters for the nifUSVWZM library. The complete constructs corresponding to the letter codes are provided in Supplementary Figure 3.

  2. Screening results for the nif cluster optimization in K. oxytoca.
    Figure 2: Screening results for the nif cluster optimization in K. oxytoca.

    (a) The rank-ordered list of 62 members of the nifUSVWZM library, sorted by nitrogenase activity (gray) and growth (OD600 = 0.88–0.74 (dark green), 0.73–0.69 (medium green), 0.68–0.62 (light green) and 0.60–0.28 (yellow)). Details of the part sequences and functions are provided in Supplementary Notes 3 and 11. (b) Robustness of clusters to changes in T7* RNAP concentration. Top left, illustration of how inducible expression of T7* RNAP feeds forward to effect expression of refactored gene cluster. Bottom, responses of clusters to changes in T7* RNAP concentration grouped by increasing activity, flat and decreasing activity. Red traces and labels highlight the robustness of the most active variant (#30) and the three that exhibit fast growth and high activity (#1, #61 and #68). Top right, cluster activities plotted against their 'robustness', calculated as the area under the induction curves (Online Methods). a.u., arbitrary units. (c) The activity of each fragment of the cluster is shown in a Klebsiella strain where the genes (labeled on the x axis) are knocked out and the remainder are wild type. nifHDKY activity is shown in K. oxytoca NF9 (ΔnifHDKY) as an example. Light bars indicate the fragments before optimization (from refactored v1.0); dark bars are those after optimization. (d) The activity of the refactored clusters in K. oxytoca NF26 in which the complete cluster is knocked out (Δnif). Cluster v1.0 was improved by the substitution of the optimized nifHDKY (v1.1), both the optimized nifHDKY and USVWZM#1 (v2.0) and the addition of the optimized nifENJ (v2.1). Error bars denote s.d. from two (OD) or four (nitrogenase activity) replicates performed on different days (ad). (e) The parts composition of the optimized cluster (v2.1). Part classes are indicated by symbols (top), part name (bottom) and function (middle). Function values denote REUs for promoter parts, arbitrary units for RBSs, and TS for terminators. Numbers at the corner of each rectangle measure the location (in bp) within the genetic construct. Part details and sequences are provided in Supplementary Note 11.

  3. Transcriptomic analysis of the optimized refactored clusters and nifUSVWZM library.
    Figure 3: Transcriptomic analysis of the optimized refactored clusters and nifUSVWZM library.

    (a) Strand-specific RNA-seq reads are mapped to the wild-type nif gene cluster (gray) and the refactored clusters before (v1.0, pink) and after (v2.1, blue) optimization. For the wild-type cluster, the transcription of the sense strand is shown in dark gray and the antisense strand is shown in light gray. The architecture of each gene cluster is shown and known promoters (green arrows) and terminators (red tees) are indicated. The purple lines map the genes from the wild-type to refactored clusters. (b) The ratio of the transcripts (RPKM values) for the refactored clusters compared to the wild type (v1.0, pink; v2.1, blue). x-axis labels correspond to each gene in the refactored nif gene cluster. (c) The ratios of proteins for the refactored clusters compared to wild type, as determined by global iTRAQ proteomics with directed mass spectrometry measurement (Online Methods). Asterisks mark proteins for which no ions were identified; error bars indicate the sample s.d. from two technical replicates of two biological replicates. The horizontal line marks the ratio of 1. (d) Representative RNA-seq traces for USVWZM#1 and USVWZM#32. The sequencing reads mapped to the sense and antisense strands are shown in above- and below-cluster diagrams, respectively. Examples are shown for the calculation of the strength of a promoter, expression level of a gene (RPKM) and terminator efficiency (Online Methods and Supplementary Note 10). (e) Expression of the nif genes (RPKM values) are shown for the nifUSVWZM, organized by activity and growth rate as in Figure 2a. Red box indicates variants that are both highly active and fast growing. The expression levels of the genomic nif genes are shown in blue. The scales of the y axes for the genomic and plasmid-carried genes are shown (bottom right). (f) Expression levels in nifUSVWZM variants. Red lines indicate the high and low extremes of the expression levels for variants #1, #61 and #68. (g) The expression ratios between all pairs of nifUSVWZM genes were calculated and averaged (Online Methods). Each box represents one member of the library; the red boxes show variants #1, #61 and #68. (h) Behavior of terminator parts in the nifUSVWZM library. n, the number of instances of the part. Each trace is the normalized number of mapped RNA-seq reads, which has been aligned by the location of the part (the beginning and end of the part are shown as vertical dashed lines). Gray traces represent every instance of the part in the library; the red trace is the average across all genetic contexts. Inset, histogram of percentage termination among library terminators; red vertical line indicates mean. (i) Part function for the weak (green), medium (orange) and strong (blue) promoters in the library, labeled as in Figure 3h. Middle, histograms showing the gain of transcription for library promoters; colored lines indicate average gain. Right, data for promoter strength collected in isolation using an mRFP expression construct (Supplementary Note 3). Error bars denote s.d.

  4. Transfer of refactored nif clusters into E. coli MG1655.
    Figure 4: Transfer of refactored nif clusters into E. coli MG1655.

    (a) Nitrogenase activity of synthetic clusters in E. coli MG1655 compared to the wild-type nif cluster expressed in K. oxytoca M5al. Data are shown for the original refactored cluster (v1.0)6, the optimized cluster (v2.1), the best obtained from the RBS library (RBS) and the intermediate containing the optimized USVWZM#1 in the v1.0 background (v1.2). Error bars represent s.d. of four experiments done on different days. (b) The RBS library built on the basis of v1.0 and screening data for E. coli. The most active mutant (top row) corresponds with RBS bar in a.


  1. Czar, M.J., Anderson, J.C., Bader, J.S. & Peccoud, J. Gene synthesis demystified. Trends Biotechnol. 27, 6372 (2009).
  2. Gibson, D.G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 5256 (2010).
  3. Kröger, J.D. et al. The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium. Proc. Natl. Acad. Sci. USA 109, E1277E1286 (2012).
  4. Arnold, W., Rump, A., Klipp, W., Priefer, U.B. & Pühler, A. Nucleotide sequence of a 24,206-base-pair DNA fragment carrying the entire nitrogen fixation gene cluster of Klebsiella pneumoniae. J. Mol. Biol. 203, 715738 (1988).
  5. Chan, L.Y., Kosuri, S. & Endy, D. Refactoring bacteriophage T7. Mol. Sys. Biol. 1, 2005.0018 (2005).
  6. Temme, K., Zhao, D. & Voigt, C.A. Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc. Natl. Acad. Sci. USA 109, 70857090 (2012).
  7. Temme, K., Hill, R., Segall-Shapiro, T.H., Moser, F. & Voigt, C.A. Modular control of multiple pathways using engineered orthogonal T7 polymerases. Nucleic Acids Res. 40, 87738781 (2012).
  8. Beatty, P.H. & Good, A.G. Future prospects for cereals that fix nitrogen. Science 333, 416417 (2011).
  9. Arnold, W., Rump, A., Klipp, W., Priefer, U.B. & Pühler, A. Nucleotide sequence of a 24,206-base-pair DNA fragment carrying the entire nitrogen fixation gene cluster of Klebsiella pneumoniae. J. Mol. Biol. 203, 715738 (1988).
  10. Eady, R.R., Issack, R., Kennedy, C., Postgate, J.R. & Ratcliffe, H.D. Nitrogenase synthesis in Klebsiella pneumonia: comparison of ammonium and oxygen regulation. J. Gen. Microbiol. 104, 277285 (1978).
  11. Lowe, D.J. & Thorneley, R.N.F. The mechanism of Klebsiella pneumoniae nitrogenase action: pre-steady-state kinetics of H2 formation. Biochem. J. 224, 877886 (1984).
  12. Dixon, R., Cheng, Q., Shen, G.F., Day, A. & Dowson-Day, M. Nif gene transfer and expression in chloroplasts: prospects and problems. Plant Soil 194, 193203 (1997).
  13. Dukes, P., Lamken, E. & Wilson, R. Workshop report: Combinatorial design theory (Banff International Research Station meeting 08w5098) (2008).
  14. Alper, H., Fischer, C., Nevoigt, E. & Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA 102, 1267812683 (2005).
  15. Beynon, J., Cannon, M., Buchanan-Wollaston, V. & Cannon, F. The nif promoters of Klebsiella pneumonia have a characteristic primary structure. Cell 34, 665671 (1983).
  16. Bilitchenko, L. et al. Eugene—a domain specific language for specifying and constraining synthetic biological parts, devices, and systems. PLoS ONE 6, e18882 (2011).
  17. Salis, H.M., Mirsky, E.A. & Voigt, C.A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27, 946950 (2009).
  18. Davis, J.H., Rubin, A.J. & Sauer, R.T. Design, construction and characterization of a set of insulated bacterial promoters. Nucleic Acids Res. 39, 11311141 (2011).
  19. Crook, N.C., Freeman, E.S. & Alper, H.S. Re-engineering multicloning sites for function and convenience. Nucleic Acids Res. 39, e92 (2011).
  20. Weber, E., Engler, C., Gruetzner, R., Werner, S. & Marillonnet, S. A modular cloning system for standardized assembly of multigene constructs. PLoS ONE 6, e16765 (2011).
  21. Wang, X. et al. Using synthetic biology to distinguish and overcome regulatory and functional barriers related to nitrogen fixation. PLoS ONE 8, e68677 (2013).
  22. Cannon, F.C., Dixon, R.A. & Postgate, J.R. Derivation and properties of F-prime factors in Escherichia coli carrying nitrogen fixation genes from Klebsiella pneumoniae. J. Gen. Microbiol. 93, 111125 (1976).
  23. Price, M.N., Huang, K.H., Arkin, A.P. & Alm, E.J. Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res. 15, 809819 (2005).
  24. Lim, H.N., Lee, Y. & Hussein, R. Fundamental relationship between operon organization and gene expression. Proc. Natl. Acad. Sci. USA 108, 1062610631 (2011).
  25. Liang, L.W., Hussein, R., Block, D.H.S. & Lim, H.N. Minimal effect of gene clustering on expression in Escherichia coli. Genetics 193, 453465 (2013).
  26. Endy, D., You, L., Yin, J. & Molineux, I. Computation, prediction, and experimental test of fitness for bacteriophage T7 mutants with permuted genomes. Proc. Natl. Acad. Sci. USA 97, 53755380 (2000).
  27. von Dassow, G., Meir, E., Munro, E.M. & Odell, G.M. The segment polarity network is a robust developmental module. Nature 406, 188192 (2000).
  28. Hamilton, T.L. et al. Transcriptional profiling of nitrogen fixation in Azotobacter vinelandii. J. Bacteriol. 193, 44774486 (2011).
  29. Yan, Y. et al. Global transcriptional analysis of nitrogen fixation and ammonium repression in root-associated Pseudomonas stutzeri A1501. BMC Genomics 11, 11 (2010).
  30. Poza-Carrión, C., Jiménez-Vicente, E., Navarro-Rodríguez, M., Echavarri-Erasun, C. & Rubio, L.M. Kinetics of nif gene expression in a nitrogen-fixing bacterium. J. Bacteriol. 196, 595603 (2014).
  31. Jeng, S.T., Gardner, J.F. & Gumport, R.I. Transcription termination by bacteriophage T7 RNA polymerase at rho-independent terminators. J. Biol. Chem. 265, 38233830 (1990).
  32. McAllister, W.T. & Morris, C. Utilization of bacteriophage T7 late promoters in recombinant plasmids during infection. J. Mol. Biol. 153, 527544 (1981).
  33. Cardinale, S. & Arkin, A.P. Contextualizing context for synthetic biology–identifying causes of failure of synthetic biological systems. Biotechnol. J. 7, 856866 (2012).
  34. Dixon, R.A. & Postgate, J.R. Genetic transfer of nitrogen fixation from Klebsiella pneumoniae to Escherichia coli. Nature 237, 102103 (1972).
  35. Dixon, R. & Cannon, F. Construction of a P plasmid carrying nitrogen fixation genes from Klebsiella pneumoniae. Nature 260, 268271 (1976).
  36. Moser, F. et al. Genetic circuit performance under conditions relevant for industrial bioreactors. ACS Synth. Biol. 1, 555564 (2012).
  37. Gorochowski, T.E., van den Berg, E., Kerkman, R., Roubos, J.A. & Bovenberg, R.A.L. Using synthetic biological parts and microbioreactors to explore the protein expression characteristics of Escherichia coli. ACS Synth. Biol. 3, 129139 (2014).
  38. Plackett, R.L. & Burman, J.P. The design of optimum multifactorial experiments. Biometrika 33, 305325 (1946).
  39. May, O., Voigt, C.A. & Arnold, F.H. in Enzyme Catalysis in Organic Synthesis: A Comprehensive Handbook 2nd edn. (eds. Drauz, K. & Waldmann, H.) Ch. 4 (Wiley-VCH Verlag, 2002).
  40. Ran, L. et al. Genome erosion in a nitrogen-fixing vertically transmitted endosymbiotic multicellular cyanobacterium. PLoS ONE 5, e11486 (2010).
  41. Stucken, K. et al. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications. PLoS ONE 5, e9235 (2010).
  42. Endy, D., You, L., Yin, J. & Molineux, I.J. Computation, prediction, and experimental tests of fitness for bacteriophage T7 mutants with permuted genomes. Proc. Natl. Acad. Sci. USA 97, 53755380 (2000).
  43. Densmore, D., Kittleson, J.T., Bilitchenko, L., Liu, A. & Anderson, J.C. Rule based constraints for the construction of genetic devices. Proc. 2010 IEEE ISCAS, doi:10.1109/ISCAS.2010.5537540 (2010).
  44. Suh, M.H., Pulakat, L. & Gavini, N. Functional expression of the FeMo-cofacter-specific biosynthetic genes nifEN as a NifE-N fusion protein synthesizing unit in Azotobacter vinelandii. Biochem. Biophys. Res. Commun. 299, 233240 (2002).
  45. Fischbach, M. & Voigt, C.A. Prokaryotic gene clusters: a rich toolbox for synthetic biology. Biotechnol. J. 5, 12771296 (2010).
  46. Stacy, G.S., Burris, R.H. & Evans, H.J. Biological Nitrogen Fixation (Chapman and Hall, 1992).
  47. Gibson, D.G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343345 (2009).
  48. Chen, Y.J. et al. Characterization of 582 natural and synthetic terminators and quantification of their design constraints. Nat. Methods 10, 659664 (2013).
  49. Stewart, W.D., Fitzgerald, G.P. & Burris, R.H. In situ studies on nitrogen fixation with the acetylene reduction technique. Science 158, 536 (1967).
  50. Giannoukos, G. et al. Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 13, r23 (2012).
  51. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  52. Barnett, D.W., Garrison, E.K., Quinlan, A.R., Stromberg, M.P. & Marth, G.T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 16911692 (2011).
  53. Li, H. et al. 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics 25, 20782079 (2009).

Download references

Author information


  1. Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Michael J Smanski,
    • Dehua Zhao,
    • YongJin Park,
    • Lauren B A Woodruff,
    • Johnathan Calderon,
    • D Benjamin Gordon &
    • Christopher A Voigt
  2. Electrical and Computer Engineering Department, Boston University, Boston, Massachusetts, USA.

    • Swapnil Bhatia &
    • Douglas Densmore
  3. Broad Technology Labs, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Lauren B A Woodruff,
    • Georgia Giannoukos,
    • Dawn Ciulla,
    • Michele Busby,
    • Robert Nicol,
    • D Benjamin Gordon &
    • Christopher A Voigt


M.J.S., D.Z. and C.A.V. conceived and designed the experiments and wrote the manuscript. M.J.S. performed the nifUSVWZM, monocistronic and RBS library construction and analysis. D.D. and S.B. performed the clustering analysis, wrote the design files and analyzed data. D.Z. constructed and analyzed the nifHDKY, nifENJ and complete cluster library. Y.P., D.B.G., M.B., G.G., R.N. and D.C. performed the RNA-seq experiments and analysis. L.B.A.W. and J.C. performed experiments.

Competing financial interests

S.B. and D.D. are co-founders of Lattice Automation, Inc., a company that produces biodesign automation software.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (9,537 KB)

    Supplementary Figures 1–27, Supplementary Tables 1 and 2 and Supplementary Notes 1–11

Excel files

  1. Supplementary Data File 1.eug (28.0 KB)

    Eugene file for nifUSVWZM libray design

  2. Supplementary Data File 2.eug (42.0 KB)

    Eugene file for newly identified rules for refactored nif cluster

  3. Supplementary Data File 3.xlsx (6,500 KB)

    Characterization data for nifUSVWZM library

  4. Supplementary Data File 4.xlsx (34 KB)

    Characterization data for nifHDKY, nifENJ, and full cluster libraries

  5. Supplementary Data File 5.xlsx (71 KB)

    Characterization data for 16-gene monocistronic library

  6. Supplementary Data File 6.xlsx (43 KB)

    Characterization data for 16-gene RBS swapping library

  7. Supplementary Data File 7.eug (27.5 KB)

    Eugene file for 16-gene monocistronic library

Additional data