Benefits of using genomic insulators flanking transgenes to increase expression and avoid positional effects

For more than 20 years, plant biologists have tried to achieve complete control of transgene expression. Until the techniques to target transgenes to safe harbor sites in the genome become routine, flanking transgenes with genetic insulators, DNA sequences that create independent domains of gene expression, can help avoid positional effects and stabilize their expression. We have, for the first time, compared the effect of three insulator sequences previously described in the literature and one never tested before. Our results indicate that their use increases transgene expression, but only the last one reduces variability between lines and between individuals. We have analyzed the integration of insulator-flanked T-DNAs using whole genome re-sequencing (to our knowledge, also for the first time) and found data suggesting that chiMARs can shelter transgene insertions from neighboring repressive epigenetic states. Finally, we could also observe a loss of accuracy of the RB insertion in the lines harboring insulators, evidenced by a high frequency of truncation of T-DNAs and of insertion of vector backbone that, however, did not affect transgene expression. Our data supports that the effect of each genetic insulator is different and their use in transgenic constructs should depend on the needs of each specific experiment.

www.nature.com/scientificreports www.nature.com/scientificreports/ driven by the nopaline synthase (pNOS) promoter, also in tobacco 8 . These results could, however, not be replicated in Arabidopsis thaliana first generation plants, where the chiMARs was found to have no influence on the level or variability of expression of transgenes driven by the 35S promoter 9 . In fact, later studies applying different transformation methods reported no boost effect on transgene expression of Arabidopsis wild type plants 10 , but an increase in silencing mutant backgrounds 3 .
In 1996 11 , it was shown that stably transformed tobacco cell lines in which a GUS reporter gene was flanked by the tobacco MAR isolated from a genomic clone containing a root specific gene (Rb7) 12 produced more than 140 times more GUS enzyme activity than control transformants without it. However, the use of Rb7 did not reduce variation between different transformants.
The effect of the Rb7 MAR increasing transgene expression on tobacco cell lines was also reported in 2003 13 , that analyzed in depth the specificity of the results depending on the promoter used. They reported that highly active promoters exhibited significant increases in GUS activity in constructs flanked by Rb7 compared to controls, but its presence did not significantly increase GUS activity when driven by weak promoters. Importantly, most transgenes flanked by the insulator showed a large reduction in the number of low expressing GUS transformants, suggesting that MARs can reduce the frequency of gene silencing.
Following that line, the effects of Rb7 were tested in conjunction with regulated transcription using a doxycycline-inducible luciferase transgene within tobacco cell cultures 14 . The Rb7 lines showed higher reporter gene expression levels and avoided silencing apparition in the absence of active transcription from condensed chromatin spreading.
Another well characterized genetic insulator, defined initially by its ability to block interactions between enhancers and promoters when positioned between them, is the petunia transformation boost sequence (TBS) 15 . This sequence has been shown to function in Arabidopsis and tobacco plants, and a detailed analysis of the motifs it contains showed that several specific regions are required for maximum enhancer-blocking function 16 .
It was only a few years ago that another work showed that the TBS could similarly function in synthetic constructs sheltering transgenes promoters from the host plant genome regulatory elements. The TBS sequence was found to produce enhanced transgene expression in tobacco plants, but did not prevent gene silencing in transformants with multiple and rearranged gene copies 17 .
It has been almost 25 years since the description of the first insulators and new examples are still being discovered nowadays 18 , but their use is not common practice in plant genetic engineering. This is in part due to the trouble that cloning them through traditional methods entails, and because the reports on their effects are scattered over different systems, organisms and transformation methods that do not allow for a clear comparison between them.
Targeting transgenes to a specific integration site in the plant genome might reduce chromosomal position effects, but until there are routine efficient techniques for directed gene targeting in plants, another alternative method needs to be developed for that purpose.
With the advent of modular cloning techniques that allow rapid and straight forward generation of multigene constructs, the incorporation of genetic insulators to the flanks of T-DNAs is no longer a problem. Therefore, we decided to perform a systematic and parallel study comparing the activity and effectivity of incorporating different boundary elements flanking transgenes as a strategy in T-DNA design to maximize and stabilize transgene expression. We have, moreover, used whole genome re-sequencing for the molecular characterization of the insertion of insulator-flanked T-DNAs, finding interesting results that point to previously unknown functions of the barrier sequences.
Whole Genome Re-sequencing. Isolation of Arabidopsis genomic DNA was performed using a DNeasy Plant Mini Kit (Qiagen). Samples were sent to Novogene Co., Ltd. for library construction and sequencing. There, genomic DNA of each sample was randomly sheared into short fragments of about 350 bp. These fragments were subjected to library construction using the Illumina TruSeq Library Construction Kit, strictly following manufacturer's instructions. As followed by end-repairing, dA-tailing and further ligation with Illumina adapters, the required fragments (between 300 bp and 500 bp) were selected by PCR and amplified. After gel electrophoresis and subsequent purification, the required fragments were obtained for library construction.
Quality control of the constructed libraries were performed afterwards. Qubit 2.0 fluorometer (Life Technologies) was used to determine the concentration of the DNA libraries. After that, a dilution to 1 ng/µl was done and the Agilent 2100 bioanalyzer was used to assess the insert size. Finally, a quantitative real-time PCR (qPCR) was performed to detect the effective concentration of each library. Pair-end sequencing was performed on the Illumina platform, with the read length of 150 bp at each end.
Bisulfite conversion and sequencing. Genomic DNA of 12 days-old plants of line chiMARs 6.13 was extracted using a DNeasy Plant Mini Kit (Qiagen). Bisulfite treatment was done using the EZ DNA Methylation Gold kit (Zymo Research) following the manufacturer's instructions. Amplification from converted DNA was performed with NXT Taq PCR kit (EURx) using primers 642 (AATTTCCCGGACGTAGCGTA) and 635 (ATCCAAGCTTTCAAGCCACAC). PCR fragments were checked on an 1% agarose gel for size verification. 4 µl of PCR product was cloned into pGEM-T Easy (Promega) and transformed into chemically competent E. coli DH5α cells. Nine clones were selected for the analysis. Plasmid DNA of each clone was sent for sequencing (GATC), and results were checked using Geneious version 10.2.2 software 24 . Comparison of the converted clones to the original unconverted sequences was done using CyMate software 25 , to count the converted/unconverted cytosines at each site. Percenatge of DNA methylation was calculated as (number of methylated C residues in each context (CG, CHG or CHH)/total number of C residues in that context) * 100.

Results
Since the advent of plant genetic transformation, plant biologists have tried to maximize transgene expression level and minimize variability by flanking transgenes with genetic insulators. There are numerous studies that describe the use of a certain insulator sequence in a host organism and analyze different aspects of its barrier and enhancer-blocking ability, but they are performed in such diverse conditions that do not allow for comparison and their results are sometimes contradictory. Our work consists on the use four different insulator sequences flanking a LUC transgene with the aim of conducting a definitive parallel and systematic analysis of their effect on transgene integration, expression level and variance in Arabidopsis seedlings.
Taking advantage of the capacities of modular cloning systems, we generated five identical constructs harboring the firefly luciferase transgene driven by the constitutive mannopine synthase Agrobacterium gene promoter www.nature.com/scientificreports www.nature.com/scientificreports/ (pMAS) and followed by the Basta resistance selection marker cassette. One of these constructs was used as a control, and the other four were flanked by different sequences reported in the literature to have some type of insulator activity (Fig. 1A). The insulator sequences used in this work were the MAR located next to the tobacco root specific gene Rb7 (Rb7) 12 , the chicken lysozyme A MAR region (chiMARs) 5 , the petunia transformation booster sequence (TBS) 15 and one of the scaffold/matrix attachment region sequences isolated from Arabidopsis chromosome 4 (AtS/MAR10) 26 .
The pMAS promoter is known to be most active in the roots of the emerging seedlings and also very active in the cotyledons and lower leaves, with progressively less signal towards the apex of the shoot 27,28 . Accordingly, a time course study of the LUC expression conferred by the pMAS showed that its activity was maximum in young seedlings (Fig. 1B). Given these results, for the following experiments, LUC activity was always measured in 12 day old seedlings. Eight 3:1 segregating Arabidopsis Col0 T2 lines were randomly selected and a 100% Basta resistant T3 line coming from each of them was used for LUC activity imaging to assess their levels of transgene expression (Fig. 1C). Our results confirmed previous reports, indicating that all constructs flanked by insulator elements led to plants with increased transgene expression than the control (Fig. 1D).
Another property of insulator sequences is their ability to decrease variability between transgenic lines transformed with the same construct reducing the positional effects. When the transgene was flanked by Rb7, chi-MARs or TBS, the increase in LUC expression described above was accompanied also by a statistically significant increase in the coefficient of variation between lines, which measures the extent of variation in relation to the mean within a population ( Fig. 2A, B). Line 40.01 from AtS/MAR10 behaved very differently from the rest in terms of expression (Fig. 1D). We confirmed it was an outlier (expression value above Q3 + 1.5 × InterQuartileRange) and thus, did not consider it for this analysis. When the outlier line data was removed, the presence of AtS/MAR10 flanking the transgene led to the opposite effect than the rest of insulators, a statistically significant reduction in the coefficient of variation between lines, or what is the same, a reduction in inter-line variation ( Fig. 2A, B).
To analyze the level of variation between genetically identical individuals within a population, we measured the expression of 16 seedlings from each line, and evaluated the effect of insulators on inter-individual (intra-line) variation (Fig. 2C). For Rb7, chiMARs and TBS, the increase in expression induced was not homogeneous between individuals and, as a result, there was a greater variance in these lines compared to the control. For AtS/ MAR10, there was a small variance, similar to that of the control with no insulator (CV around 25%) (Fig. 2D).
Next, we compared LUC expression in segregating lines from the T2 generation with homozygous lines from the T3 generation, in an effort to establish if, in our system in study, LUC expression was dependent on gene dosage. Our experiments confirm an increase in expression in all T3 lines compared to T2, consistent with the establishment of homozygous populations (Fig. 2E). Rb7 and AtS/MAR10 lines were the ones where expression increased most in the transition to T3 (T2/T3 expression ratio of 4.1 and 5.5 respectively versus 2.5 of Control, 1.8 of chiMARs and 3.1 of TBS).
In an effort to further characterize the insulator lines in more detail than previous works, we proceeded to perform whole genome re-sequencing (WGR) in some of the lines obtained by transformation with each construct (Fig. 3A). The results allowed us to select 21 lines with a single T-DNA insertion locus. Even though all the lines showed a 3:1 Basta resistance segregation in the T2, we found three T3 lines in which there were multiple insertions in different chromosomes, suggesting that some of them were not leading to proper transgene expression. An interesting finding was that AtS/MAR10 40.01, the outlier line that showed abnormally high LUC expression, had two insertions in the same region of chromosome 1, what could explain its behavior as a single locus in our segregation analysis and the reported increased transgene expression. The WGR data also allowed us to map the T-DNA insertion site of each line and to identify the deletions in the host genome associated with the insertion ( Figure S2, Fig. 3B and Table 2). Surprisingly, integration was not homogeneous among all chromosomes (we found none of the mapped insertions to be located in chromosome 2), and for Rb7 lines there was a clear preference for insertion within chromosome 3 (60%, 3 out of 5 lines) and with the T-DNA in the 3′->5′ direction (100%, 5 out of 5 lines), while for the rest of the lines chromosome 3 integrations and reverse T-DNA insertions only represented a 31% in each case (5 out of 16 for each) ( Table 2).
The existence of a selection bias towards T-DNA integrations in euchromatin where the transgenes used for selection of transformants are efficiently expressed has been reported previously in the literature 29 . This was the case for most of the insertions we mapped (insertion sites in euchromatin, chromatin states 1 to 7 as described in 30 , Fig. 3C), and when we plotted LUC activity versus state of the chromatin at the T-DNA insertion site, we could observe that lines grouped high or low depending on the construct they belonged to, and not left or right depending on the chromatin state where the T-DNA integration was located (Fig. 3C). However, 2 lines carrying the chiMARs insulator presented T-DNA insertions in regions of the host genome featuring "chromatin state 8", described as an A/T rich heterochromatic region characterized by methylated DNA and chromatin modifications such as H3K9me2 and H3K27me1 30 .
We performed an analysis of the DNA methylation levels in the junction between the host genome and the T-DNA insertion for chiMAR line 6.13 and our results show that the DNA at the insertion site is indeed heavily methylated while the DNA of the T-DNA remains devoid of this chromatin modification even in the T3 generation, consistent with a boundary role of the insulator avoiding the repressive mark spreading (Fig. 4).
The data from WGR also allowed us to characterize the genomic sequence generated as a result of the T-DNA integration, and we could observe that for 8 out of 17 of the lines that contained insulator sequences, we had evidence of a lack of precision in the insertion of the RB, while that was not the case for any of the 4 control lines (Fig. 5)  www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/

Effect of insulators on transgene expression level and variation between lines. Most previous
works have reported positive evidence of the effects of insulators on transgene expression, although some works can be found in the literature that report no such effect. The experiments were, however, very diverse in terms of species (some experiments had been done in tobacco and others in Arabidopsis) and in terms of method of transformation (some performed in primary transformants after regeneration and some in floral-dipped Arabidopsis).
It was an important motivation for the present study to compare the effects of the different isolators in the same conditions: organism, developmental stage and transformation method. Our results do in fact support most results from literature, since we detect an increase in expression for lines where LUC is flanked by any of the four insulators, and previous negative results could reflect a dependency of the function of insulators on the experimental conditions. Noteworthy, the use of AtS/MAR10, that had never been tested before for insulator activity, resulted in a moderate but very consistent increase in LUC expression. www.nature.com/scientificreports www.nature.com/scientificreports/ In our hands, neither chiMARs, Rb7 nor TBS had an effect on reducing inter-line or inter-individual variation, in fact they increased them significantly. However, previous studies on the effect of chiMARs had highlighted its effect on the reduction of expression variability among transgenic lines 7,8 . This inconsistence could derive from a few factors in which our study differs basically from these other works. First, in our system we have used the pMAS promoter (versus the p35S used by Mlynarova et al. 7,8 ) which never reaches such high levels of expression as the p35S,   www.nature.com/scientificreports www.nature.com/scientificreports/ but that results in normally distributed expression levels in populations of transformants 9 . It might be possible that the chiMARs works reducing the variance of strong promoters but its effect is not so apparent in promoters with an intrinsically low level of variation such as pMAS, like Mankin et al. 13 described for Rb7. Second, in our study we have analyzed expression in homozygous T3 lines, that are already established lines with low variance in comparison with the T1 transformants analyzed by Mlynarova et al. 7,8 . It is interesting to note that the levels of variability between lines in the LUC control are in the same range as the variability between genetically identical individuals (around 30%), supporting the consistency and small intrinsic variance of our experimental set up in which we analyze T3.
In fact, it is striking that AtS/MAR10 is able to diminish inter-line variance, proving efficient in modifying both of the parameters measured, increasing transgene expression and reducing variability between lines, what makes it the best performing of the insulators analyzed.

Effect of insulators on T-DNA insertion. Two interesting observations have been made regarding the effect of
insulators on the insertion of T-DNAs. On the one hand, it is reported that T-DNA integrations recovered by selection are mostly located in "open chromatin" or euchromatin, while, without selection, integration is biased towards regions with marks of heterochromatin 29 . This is explained by the silencing of the selection genes when integration takes place within heterochromatin, a phenomenon that prevents transformant recovery. Our results show the ability of chiMAR to shelter T-DNAs from heterochromatin spreading and to allow for transgene expression regardless of the position effect.
On the other hand, the observation of an increased frequency of truncated T-DNAs in the lines containing insulators had been reported before by Li et al. 31 . Our results can be interpreted in the light of a role of insulators in the protection of transgenes at the right border end of the T-DNA from deletions. This would also explain the low correlation of expression between reporter genes located within the same T-DNA observed in many previous studies, and shown to improve by the use of insulators flanking them 8 .
An in silico analysis of the insulators sequences using NonB DB 32 showed that the 5′ region of the AtS/MAR10 contains an inverted repeat and a mirror repeat rich in purines, features that lead to the formation of cruciform and triplex structures, respectively, which have been associated with genomic instability 33 . Future experiments could be directed at understanding the role of these repeats in the insertion of truncated T-DNAs or vector backbone in constructs harboring AtS/MAR10.
As a general conclusion, we can state that there are many different insulators described in the literature with very different properties. Their functions might reflect differences in their action mechanisms and their use in transgenic constructs should depend on the needs of a specific experiment. www.nature.com/scientificreports www.nature.com/scientificreports/ In our experimental setup, the best performing insulators were Rb7 in terms of increase of transgene expression, and AtS/MAR10 in terms of reducing variance.
Plant biologists should invest more efforts in the development of technologies that can render transgenes with high and stable expression with rapidity and ease. The future of synthetic biology and biotechnology projects depends on our ability to stabilize transgene expression and alleviate interference with the host genome regulation. In this work we show that the use of genetic insulators can help achieve these objectives with their simple addition at the flanks of the constructs used for transformation.

Data Availability
All materials, data and associated protocols are available to readers.