A rapid and reliable strategy for chromosomal integration of gene(s) with multiple copies

Direct optimization of the metabolic pathways on the chromosome requires tools that can fine tune the overexpression of a desired gene or optimize the combination of multiple genes. Although plasmid-dependent overexpression has been used for this task, fundamental issues concerning its genetic stability and operational repeatability have not been addressed. Here, we describe a rapid and reliable strategy for chromosomal integration of gene(s) with multiple copies (CIGMC), which uses the flippase from the yeast 2-μm plasmid. Using green fluorescence protein as a model, we verified that the fluorescent intensity was in accordance with the integration copy number of the target gene. When a narrow-host-range replicon, R6K, was used in the integrative plasmid, the maximum integrated copy number of Escherichia coli reached 15. Applying the CIGMC method to optimize the overexpression of single or multiple genes in amino acid biosynthesis, we successfully improved the product yield and stability of the production. As a flexible strategy, CIGMC can be used in various microorganisms other than E. coli.

the numerous recombination steps. To reduce the operation time and steps, St-Pierre et al. 21 recently developed a novel method named ''clonetegration'' to integrate DNA into prokaryotic chromosomes.
Although several chromosomal integration strategies have been extensively reported, little attention has been paid to the integration of gene(s) with multiple copies. The only typical example in the literature is a chemically inducible chromosomal evolution (CIChE) system 22 . Through multiple rounds of evolution with increased antibiotic concentrations, an engineered strain with 40-60 copies of target genes on the chromosome was obtained via recAdependent homologous recombination. Recently, a flippase (FLP)dependent recombination from the yeast 2-mm plasmid has attracted our attention 23 . The only requirements for FLP-dependent recombination are a FLP recombinase and a 34-bp FLP recombination target (FRT) site. The FLP/FRT recombination system has mainly been used to excise selective markers in the Red recombination system 24 . However, because FLP recombinase catalyzes a reversible reaction, it could also be used to integrate DNA into the chromosome. Thus, in the present study, we developed a strategy for the chromosomal integration of gene(s) with multiple copies (CIGMC) based on FLP recombinase. With this method, multiple copies of a single gene or multiple genes can be stably integrated into the chromosome in one step.

Results
Development of the CIGMC system based on FLP/FRT recombination. The CIGMC system was developed based on FLP recombinase, which can catalyze recombination between an FRTcontaining circular DNA and an FRT-containing linear DNA (or genome DNA). Thus, the targeting strain in this strategy should possess at least one FRT site on the chromosome, which can be easily introduced by the Tn5 transposon 25 . In this study, GPT101, a previously constructed L-tryptophan-producing strain containing four FRT sites on the chromosome, was selected 26 . The recA gene, which encodes the DNA strand exchange and recombination protein, was then deleted in GPT101 to prevent possible homologous recombination. This generated an additional FRT site. Therefore, the final targeting strain, named GPF-5, contains five FRT sites on the chromosome. The construction outline of CIGMC was presented in Fig. 1.
The integrative plasmid pG-1 was constructed by ligating the FRT site, gfp, and kan in vitro. To avoid the replication of this plasmid in the target strain during recombination, which may result in falsepositive clones, an integrative vector without a replicon was initially used. The constructed integrative plasmid pG-1 was then electroporated into the host strain GPF-5, and the integrants were visually screened on plates and then fluorescently analyzed in microtiter Figure 1 | Outline of the chromosomal integration of gene(s) with multiple copies (CIGMC) system in Escherichia coli. recA, encoding the DNA strand exchange and recombination protein, was previously deleted to prevent subsequent homologous recombination that could reduce the integrated copy number. kan, kanamycin resistance gene; R6K, narrow-host-range replicon; gfp, green fluorescent protein; FRT, flippase recombination target. plates. The relative fluorescence units (RFU) per OD 600 of the integrants spanned from 114.3 to 495.0 (Fig. 2a). This observation demonstrated that pG-1 was successfully integrated into the chromosome at various copy numbers. Using quantitative polymerase chain reaction (qPCR), we proved that the integrated copy number ranged from 1-8, and with an average of 2.36 (Fig. 2b). Of the 150 detected strains, 63 had only one copy of pG-1 integrated, and only two strains had eight copies integrated (Fig. 2c). This possibly due to the relatively low final concentration (2.5 ng/mL) of the integrative plasmid pG-1.
To improve the final concentration of the integrative plasmid, a narrow-host-range replicon, R6K, was inserted into plasmid pG-1, generating plasmid pG-2. Because replicon R6K is strictly dependent on the p protein encoded by the pir gene, the integrative plasmid pG-2 can only replicate in strains carrying pir, for instance, E. coli BW25141 27,28 . Thus, a high concentration of pG-2 can be isolated for integration from E. coli BW25141 containing pG-2. Using pG-2, the maximal fluorescence intensity of the integrants increased from 495.0 to 675.9 RFU/OD 600 , which is equal to a copy number of 12 (Fig. 3). The average copy number also increased from 2.4 to 3.75, suggesting a positive correlation between the integrative plasmid concentration and the copy number of the integrated gene(s) on the chromosome. To obtain a strain containing a high copy number   of target genes, a high concentration of integrative plasmid appears to be a prerequisite.
Factors that affect the integrated copy number of CIGMC. The relationship between the integrated copy number and the final concentration of integrative plasmid was analyzed (Table 1), and a positive correlation between them was identified. The maximum integrated copy number reached 15 and the average integrated copy number was 5.41 when 40 ng/mL of pG-2 was applied. Further increases in plasmid concentration did not improve the integrated copy number, indicating that this reaches saturation. For general operation, 30 ng/mL of integrative plasmid is sufficient for CIGMC. Additionally, the increased number of FRT sites on the chromosome was predicted to improve CIGMC efficiency because of the increased availability of integration sites. To demonstrate this, we selected five strains with 1-5 FRT sites respectively as CIGMC targets. As shown in Table 2, an average integrated copy number of 2.28 and a maximum integrated copy number of five were obtained for the strain with only one FRT site. Increasing the number of FRT sites from two to five on the chromosome in single increments resulted in a maximum integrated copy number that increased from 5 to 7, 9, 10, and 12, while the average integrated copy number increased from 2.28 to 2.59, 2.81, 3.29, and 3.75, respectively. This result demonstrated that increasing the number of FRT sites on the chromosome is beneficial to CIGMC and that there is an approximately linear correlation between the number of FRT sites and the average integrated copy number.
Application of CIGMC to optimize gene overexpression. Because CIGMC generated a library of integrants with various copy numbers of target gene(s), this method would be ideal for optimizing the overexpression of the key gene in a metabolic pathway. A tryptophan-producing strain, GPT1002, was constructed previously by our group 26 . Three genes, tktA, aroG FR , and trpE FR , were overexpressed in this strain. Another enzyme shikimate kinase, encoded by aroK, was proven to have a pivotal function in aromatic amino acid biosynthesis 29 . An optimized overexpression of aroK was supposed to improve the L-tryptophan production. However, plasmidbased overexpression of aroK did not obtain a positive result. Therefore, we attempted to optimize the overexpression of aroK with CIGMC. The  integrated copy number of aroK was similar to that using GFP as a CIGMC target ( Fig. 4a and Supplementary Fig. S1). All CIGMC strains exhibited similar growth status with the control, indicated by the maximum OD 600 achieved after 24 h batch cultivation. Single-copy integration of aroK on the chromosome improved L-tryptophan production per OD 600 from 0.159 to 0.214 g/L. Two integrated copies of aroK further improved the L-tryptophan production to 0.298 g/L per OD 600 , which was 87.4% higher than that of the initial strain. However,  L-tryptophan production sharply dropped when the integrated copy number further increased. This demonstrates that the presence of two integrated copies of aroK is optimal and explains why plasmid-based overexpression was unsuccessful. Together, these findings suggest that CIGMC provides an optimization strategy to directly regulate gene overexpression on the chromosome, especially when single-copy integration with strong promoter and ribosome binding site (RBS) is not sufficient.
Application of CIGMC to optimize the metabolic pathway. In a previous study, we constructed a recombinant E. coli strain that can produce 0.38 g/L L-serine per OD 600 in batch cultivation by overexpressing serA FR , serB, and serC, encoding deregulated 3phosphoglycerate dehydrogenase, phosphoserine aminotransferase, and phosphoserine phosphatase respectively, in a medium-copy plasmid 30 . However, the L-serine production was unstable because of plasmid instability. To test if CIGMC could provide an alternative overexpression strategy, serA FR , serB, and serC were combined as an artificial operon in recombinant plasmid pG-7 ( Supplementary Fig.  S2) and were integrated into the chromosome together. Direct integration of this operon into the chromosome by CIGMC generated integrants with a relatively low integrated copy number, and the maximum L-serine production was only 0.183 g/L per OD 600 ( Supplementary Fig. S3). This indicated that either the operon was too large or that the three genes needed to be expressed at different levels. Thus, we designed an independent integration strategy, wherein all three genes were simultaneously integrated with separate integrative plasmids pG-4, pG-5, and pG-6. In this case, serA FR , serB, and serC were integrated into the host at different copy numbers. As shown in Fig. 4b-4d, a strain library with a combination of different copies of the three genes was generated in one step, and the optimal L-serine-producing strain could then be screened from the library.
The fermentation results showed that strains Y-7, Y-22, and Y-23 exhibited higher L-serine production than the control strain GPF-11, which was constructed by the plasmid-based overexpression of serA FR , serB, and serC ( Fig. 4e and Supplementary Fig. S4). Strain Y-7 produced 0.425 g/L/OD 600 L-serine, which is the highest among the 40 strains detected. qPCR analysis indicated that it contained 10 copies of serA FR , four of serB, and four of serC. Another strain, containing two copies of serA FR , two of serB, and three of serC, only produced 0.132 g/L/OD 600 L-serine. Analysis of all integrants indicated that strains with high copies of serA FR produced more L-serine, suggesting that serA FR is crucial for E. coli L-serine production. Additionally, different CIGMC strains exhibited similar maximum OD 600 to each other, indicating that little additional metabolic burden was generated as the copy number of target genes increased on the chromosome (Supplementary Fig. S4). We next tested the stability of the engineered strain, and demonstrate that the genetic constructs and L-serine production of Y-7 were stable after 10 rounds of subculture without antibiotics (Fig. 4f). By contrast, plasmid pYF-1 in GPF-11 was completely lost after six rounds of subculture without antibiotics, leading to a sharp decrease in L-serine production ( Supplementary Fig. S5).
In this study, we firstly used quantitative reverse transcription PCR (qRT-PCR) to verify the copy number of target genes ( Supplementary  Fig. S6-S8). However, the instability of RNA may influence the accuracy of the results. Thus, we then chose qPCR instead by directly using genome DNA as templates (Figure 2-4). Although most results were similar, some differences were also existence in the copy numbers obtained by these two methods, indicating qPCR was a better choice to determine the copy number of target gene(s) in CIGMC.

Discussion
In this study, a CIGMC strategy based on FLP/FRT site-specific recombination was developed. Using this strategy, we successfully integrated 15 copies of a single gene or 18 total copies of three genes into the chromosome of E. coli in one step; the only requirements were a target strain with recA deletion and FRT sites on the chromosome.
The initially constructed integrative plasmid did not contain a replicon. Thus, the concentration of donor plasmid used for integration was low. To increase its concentration, we introduced a narrow-host replicon, R6K, which can replicate in strains harboring the p protein. This approach significantly increased the concentration of integrative plasmid and the integration efficiency. Yu et al. 31 also reported that the targeting efficiency increased in a near-linear relationship with increasing concentrations of donor DNA from 0-300 ng in the l recombination system of E. coli. FLP recombinase can catalyze excision, inversion, integration, or translocation in response to different substrates. FLP/FRT recombination consists of four main steps: DNA binding, synapsis, recombination, and dissociation 32 . The in vitro study of Ringrose et al. 33 demonstrated that the conversion rate of the FLP-bound substrate DNA into the excised synaptic complex (k34) is 2.45-fold higher than that of the reverse process (k-34). Thus, they suggested that the preferred direction of FLP was excision. However, in our study, a high integrative plasmid concentration was used. When the final concentration of pG-2 was 20 ng/mL, each recipient cell would receive at least 1.34 3 10 4 copies of the integrative plasmid. [Because there were approximately 5 3 10 5 E. coli cells in 1 mL of competent cell solution, the relative molecular mass of the integrative plasmid pG-2 was 1.85 3 10 6 , and the Avogadro constant is 6.02 3 10 23 , each recipient competent cell would receive 20 3 10 29 3 6.02 3 10 23 /(1.85 3 10 6 3 5 3 10 5 ) 5 1.34 3 10 4 copies of pG-2 when 20 ng of pG-2 was applied]. Even when only partial pG-2 could be electroporated into the competent cells, the in vivo pG-2 concentration should be much higher than that of the chromosomal DNA, thus guaranteeing the occurrence of integration. Increasing the number of FRT sites on the chromosome also improved the integration efficiency and the average copy number (Table 2). This result is similar to the integration at the attB/attP site catalyzed by WC31 integrase 34 . We deduced that the increased number of FRT sites may improve the binding opportunity of FLP recombinase.
FLP/FRT recombination was previously used to integrate exogenous DNA at defined locations in E. coli 25,35 . However, this attempt was not further investigated, probably because FLP/FRT site-specific recombination has no obvious advantages over other recombination strategies. On the other hand, there is a pressing need to optimize the overexpression of gene(s) directly on the chromosome because of the lack of chromosome integration tools in this field. The common optimization strategies of gene overexpression include constructing a plasmid with different copy numbers or different strengths of promoters [36][37][38] , ribosome-binding sites 39,40 , or terminators 41 . Strains that contain various plasmids or plasmid libraries can then be screened with respect to their phenotypes. Libraries of promoters, ribosomebinding sites, or terminators can also be achieved on the chromosome using novel technologies such as multiplex automated genome engineering (MAGE) 42,43 . However, the expression level of the target gene cannot exceed the maximum strength of the promoters, which is normally too weak for overexpression. Moreover, the screening of such libraries is both tedious and difficult.
The CIGMC method provides a strategy to directly optimize the overexpression of a single gene or multiple genes in a metabolic pathway on the chromosome with several advantages. First, the direct integration of genes is more stable and reliable compared with plasmid-based overexpression. A previous study demonstrated that a recombinant strain harboring a medium-copy plasmid completely lost the ability to produce polyhydroxybutyrate (PHB) after 35 generations even in the presence of antibiotics, whereas the PHB accumulation of the recombinant strain derived from chromosome integration was stable without antibiotics for a long period 22  addition, as shown in Supplementary Figure S1 and Supplementary Figure S4, different CIGMC strains exhibited similar maximum OD 600 , indicating little additional metabolic burden was generated as the increased copy number of target genes on the chromosome. Second, a similar effect can be achieved with fewer copies of genes on the chromosome compared with plasmid-based overexpression. In our constructed amino acid producing strains, strain Y-7, with an average integrated copy number of six per gene, could produce more L-serine than the recombinant strain that harbors a plasmid of medium copy number (15-20 copies). In some cases, when plasmids with tandem promoters were employed, it was normally too weak for overexpression 44 , but the integration of target genes into the chromosome can overcome this problem. Finally, CIGMC can be used to integrate several genes simultaneously and to generate a gene(s) integration library in which every targeted gene has a random copy number on the chromosome. From this library, an optimized overexpression combination of several genes in the metabolic pathway can be obtained after screening. In our experiment, strain Y-7, which had 10 copies of serA FR , four of serB, and four of serC, exhibited the best combination of the three genes and produced the most L-serine (0.425 g/L/OD 600 ).
As an effective genome integration method, CIChE also exhibited several advantages for chromosomal integration. The maximum integrated copy number achieved by CIChE was 40-60, which is substantially higher than CIGMC. Moreover, CIChE could realize the integration of large operons with high copies, while only three copies of the serA FR -serB-serC operon were integrated into the chromosome by CIGMC. However, CIGMC can generate a strain library with combinations of different copy number from multiple genes in one step. By using 96-well plates and the high-throughput screening method, integrants with the desired phenotype can be selected in 3-4 days. By contrast, at least several weeks were needed for CIChE due to the chromosomal evolution process. Therefore, CIChE is fit for integrating $ 15 copies of target gene(s), while CIGMC is a better choice for the quick integration of 1-15 copies of target genes(s) and exploring the optimal integrated copy number of a single gene or multiple genes.
In summary, CIGMC directly generates a library of integrants with various copies of different genes integrated on the chromosome, which can be used to optimize the overexpression of gene(s). For single-gene integration together with GFP, fluorescence-activated cell sorting would be a good choice. To screen candidate integrants with a combination of multiple genes, online monitored microtiter plates are necessary. Because FRT/FLP recombination can function well in bacteria 45 and fungi 46 , CIGMC is a flexible strategy that can be applied in other microorganisms.

Methods
Bacterial strains. All strains used in this study are listed in Supplementary Table S1. E. coli strains DH5a and BW25141 were used as the hosts of recombinant DNA manipulation.
Plasmid construction. The plasmids and oligonucleotides used in this study are listed in Supplementary Tables S2 and S3, respectively. To construct plasmid pG-1, the FRT-kan-trc-gfp module was constructed using primers p-F/p-R and template plasmid pLYK. This DNA fragment was then digested with XhoI and self-ligated using T4 DNA ligase. To construct integrative plasmid pG-2, the FRT-kan-trc-gfp module and narrow-host-range replicon R6K were amplified using primers KG-F/ KG-R and R6K-F/R6K-R, and pG-1 and pKD4 were selected as templates. These two fragments were then assembled and cyclized with the one-step sequence-and ligation-independent cloning (SLIC) method 47 .
To integrate random copies of aroK into the genome of GPF-5, three DNA fragments, R6K-FRT-kan, lac promoter, and aroK, were generated using OFK-F/OFK-R, lac-F/lac-R, and aroK-F/aroK-R as the primers and plasmid pG-2, pCL1920, and the genome of wild-type E. coli W3110 as the templates, respectively. Next, these three fragments with 30-40 homologous bases were assembled and cyclized by the SLIC method to generate pG-3. Plasmid pG-4 was obtained in the same way by assembling and cyclizing three fragments: R6K-FRT-kan, the lac promoter, and serA FR .
To obtain integrative plasmid pG-5, two template plasmids, pCLB and pCLC, were initially constructed. Two genes involved in the L-seine biosynthesis pathway, serB and serC, were amplified by primers serB-F/serB-R and serC-F/serC-R, respectively, and the genome of wild-type E. coli W3110 was selected as the template. Individual double-digestion by HindIII and PstI was then performed for serB and serC, and pCLB and pCLC were generated by ligating these two DNA fragments into pCL1920, respectively. Afterwards, four DNA fragments, R6K-FRT, trc promoter, aadA1 gene, and lac-serB, were amplified by primers OF-F/OF-R, trc-F/trc-R, aadA1-F/aadA1-R, and LSB-F/LSB-R, and template plasmids pG-1, pLYK, pCL1920, and pCLB, respectively. These four fragments with 30-40 homologous bases were assembled using the SLIC method to obtain pG-5. Plasmid pG-6 was constructed through the same method except that fragments aadA1 and lac-serB were separately replaced by tetA and lac-serC. To construct plasmid pG-7, three DNA fragments, FRT-kan, R6K, and trc-serA FR -serB-serC, were amplified using primers G-1F/G-1R, G-2F/G-2R, and G-3F/G-3R, and template plasmids pG-1, pG-2, and pYF-1, respectively. Then, these three fragments were also assembled and self-cyclized using the SLIC method.
Gene deletion and insertion. To prevent subsequent homologous recombination, which could reduce the copy number of integrated gene(s), recA, encoding a DNA strand exchange and recombination protein with protease and nuclease activity, was deleted in GPT101 using the one-step inactivation method 24 . Primers recA-F/recA-R and template plasmid pKD4 were used to obtain the linearized DNA flanked by homologous sequences. Electroporation was then conducted according to the manufacturer's instructions. Positive clones on the plates were verified by PCR using the primers recA-TF/recA-TR, and the kanamycin cassette was removed by the helper plasmid pCP20 to obtain strain GPF-5. In addition, recA was also deleted in wild-type W3110, GPT98, GPT99, GPT100, and L-serine-producing strain YF-6 through the same method. To verify the integrated copy number of candidate strains, three control strains with only one copy of the relative resistant gene were constructed. Primers kan1-F/kan1-R, spc1-F/spc1-R, and tet1-F/tet1-R, as well as template plasmids pKD4, pG-5, and pG-6, were used to obtain the linearized DNA of kan, aadA1, and tetA for recombination, respectively. Next, the three fragments separately replaced the recA gene of YF-6. After electroporation and overnight cultivation, positive clones on the plates were verified by PCR using the primers recA-TF/recA-TR. The control strains GPF-7, GPF-8, and GPF-9 were thus obtained.
Integration of gene(s) by CIGMC. To integrate gene(s) with multiple copies, plasmid pCP20 overexpressing the FLP recombinase was first transformed into the target strain. A single clone of this target strain was pre-cultivated in 5 mL Luria-Bertani medium (1% tryptone, 0.5% yeast extract, and 1% NaCl) at 30uC and on a rotary shaker at 250 rpm overnight. Afterwards, 1 mL of the overnight cells were inoculated into 50 mL SOB medium (2% tryptone, 0.5% yeast extract, 0.05% NaCl, 2.5 mM KCl, and 10 mM MgCl 2 ) and cultivated at 30uC to an OD 600 of 0.5. Strains were then cultivated at 42uC to induce the expression of FLP recombinase for 20-30 min. Integrative plasmid was then electroporated into the host cells, and 1 mL SOC medium (2% tryptone, 0.5% yeast extract, 0.05% NaCl, 2.5 mM KCl, 10 mM MgCl 2 , and 20 mM glucose) was added to shocked cells and incubated for 1 h at 42uC. Strains with the relative antibiotic-resistant phenotype were selected on plates. The integrated copy number was roughly estimated using the fluorescent method and accurately determined by qPCR.
To investigate the influence of the FRT site number on the efficiency of CIGMC, GPF-1, GPF-2, GPF-3, GPF-4, and GPF-5, containing one, two, three, four, and five FRT sites on their chromosomes, respectively, were used in the CIGMC reaction. To explore the influence of integrative plasmid concentration on the CIGMC efficiency, integrative plasmid pG-2 was used with different final concentrations ranging from 5 ng/mL to 50 ng/mL. The final concentration of integrative plasmid in CIGMC was defined as the mass of supplemented integrative plasmid per unit volume of competent cells.
Process of strain construction using CIGMC. An L-tryptophan-producing strain was constructed by introducing the integrative plasmid pG-3 containing the FRTkan-lac-aroK module into the genome of GPF-5. The copy number of aroK was determined using qPCR with GPF-10 containing one copy of kan as a control. Ltryptophan production was then evaluated using the integrants with different copies of aroK and plasmid pTAT containing the L-tryptophan synthesis genes.
An L-serine-producing strain was constructed by introducing integrative plasmid pG-7 containing the serA FR -serB-serC module into the genome of GPF-6. The integrated copy number was determined using qPCR. To realize an independent integration of genes serA FR , serB, and serC, the three integrative plasmids pG-4, pG-5, and pG-6 were constructed and simultaneously integrated into the chromosome of GPF-6 by CIGMC. GPF-7, GPF-8, and GPF-9 were selected as control strains to determine the integrated copy number of serA FR , serB, and serC, respectively. The L-serine production of the integrants with different combinations of serA FR , serB, and serC was then evaluated. qPCR analysis. The integrated copy number of target gene(s) was detected by qPCR on genomic DNA isolated from the candidate strains using the TIANamp Bacteria DNA Kit (Tiangen Biotech, Beijing, China). qPCR was performed with SYBR Premix Ex TaqII (Takara Bio, Dalian, China) following the protocol of the LightCycler 480 RT-PCR System (Roche, Basel, Switzerland). Primers used in qPCR are listed in Supplementary Table S3.
Analytical methods. Cell growth was monitored at OD 600 with a spectrophotometer (Shimazu, Kyoto, Japan). L-tryptophan was determined using the fluorometric method 49 . L-serine was quantitatively analyzed using high-performance liquid chromatography (Shimadzu) equipped with a Venusil AA column (250 mm 3 4.6 mm, Agela Technologies, USA). The fluorescence of integrants was determined in 96-well microtiter plates as described previously 44 .