Nonlinear control of transcription through enhancer–promoter interactions

Chromosome structure in mammals is thought to regulate transcription by modulating three-dimensional interactions between enhancers and promoters, notably through CTCF-mediated loops and topologically associating domains (TADs)1–4. However, how chromosome interactions are actually translated into transcriptional outputs remains unclear. Here, to address this question, we use an assay to position an enhancer at large numbers of densely spaced chromosomal locations relative to a fixed promoter, and measure promoter output and interactions within a genomic region with minimal regulatory and structural complexity. A quantitative analysis of hundreds of cell lines reveals that the transcriptional effect of an enhancer depends on its contact probabilities with the promoter through a nonlinear relationship. Mathematical modelling suggests that nonlinearity might arise from transient enhancer–promoter interactions being translated into slower promoter bursting dynamics in individual cells, therefore uncoupling the temporal dynamics of interactions from those of transcription. This uncovers a potential mechanism of how distal enhancers act from large genomic distances, and of how topologically associating domain boundaries block distal enhancers. Finally, we show that enhancer strength also determines absolute transcription levels as well as the sensitivity of a promoter to CTCF-mediated transcriptional insulation. Our measurements establish general principles for the context-dependent role of chromosome structure in long-range transcriptional regulation.

Chromosome structure in mammals is thought to regulate transcription by modulating three-dimensional interactions between enhancers and promoters, notably through CTCF-mediated loops and topologically associating domains (TADs) [1][2][3][4] . However, how chromosome interactions are actually translated into transcriptional outputs remains unclear. Here, to address this question, we use an assay to position an enhancer at large numbers of densely spaced chromosomal locations relative to a fixed promoter, and measure promoter output and interactions within a genomic region with minimal regulatory and structural complexity. A quantitative analysis of hundreds of cell lines reveals that the transcriptional effect of an enhancer depends on its contact probabilities with the promoter through a nonlinear relationship. Mathematical modelling suggests that nonlinearity might arise from transient enhancer-promoter interactions being translated into slower promoter bursting dynamics in individual cells, therefore uncoupling the temporal dynamics of interactions from those of transcription. This uncovers a potential mechanism of how distal enhancers act from large genomic distances, and of how topologically associating domain boundaries block distal enhancers. Finally, we show that enhancer strength also determines absolute transcription levels as well as the sensitivity of a promoter to CTCF-mediated transcriptional insulation. Our measurements establish general principles for the context-dependent role of chromosome structure in long-range transcriptional regulation.
Transcriptional control in mammals critically depends on enhancers, which control tissue specificity and developmental timing of many genes 5 . Enhancers are often located hundreds of kilobases away from target promoters and are thought to control gene expression by interacting with the promoters in the three-dimensional space of the nucleus. Chromosome conformation capture (3C) methods 6 revealed that enhancer-promoter interactions predominantly occur within sub-megabase domains known as topologically associating domains (TADs). These mainly arise from nested looping interactions between sites that are bound by the DNA-binding protein CTCF that act as barriers for the loop extrusion activity of cohesin 7 .
TAD boundaries and CTCF loops are thought to favour enhancer-promoter communication within specific genomic regions and disfavour it with respect to surrounding sequences 1,3,4,8 . However, this view has recently been challenged by reports that disruption of TAD boundaries 9,10 or depletion of CTCF and cohesin 11,12 do not lead to systematic changes in gene expression, and that some regulatory sequences can act across TAD boundaries 13 . The manipulation of single CTCF sites has also been reported to result in variable effects on gene expression 2,4,10, [14][15][16][17][18] . The very notion that physical proximity is required for transcriptional regulation has been questioned by the observed lack of correlation between transcription and proximity in single cells 19,20 . Thus, it is highly debated whether there are indeed general principles that determine how physical interactions enable or prevent enhancer action 21 . Enhancer-promoter genomic distance might also contribute to transcriptional regulation 22,23 , but it is unclear whether an enhancer acts uniformly within a TAD 24,25 , or whether its effect depends on the genomic distance from a promoter 23,26 .

Enhancer action depends on genomic distance
Addressing these questions requires a quantitative understanding of the relationship between transcription and enhancer-promoter interactions in conditions in which confounding effects by additional regulatory and structural interactions are minimized. Here we provide such a description using an experimental assay in which an enhancer is mobilized from an initial location and reinserted at large numbers of genomic positions with respect to a promoter. This enables the measurement of transcription levels as a function of the enhancer location and, therefore, of enhancerpromoter contact frequencies (Fig. 1a). Specifically, we generated mouse embryonic stem (mES) cells carrying a transgene in which a promoter drives the expression of enhanced green fluorescent protein (eGFP). 572 | Nature | Vol 604 | 21 April 2022

Article
The eGFP transcript is split in two by a piggyBac transposon containing the cognate enhancer of the promoter (Fig. 1b). After expression of the PBase transposase, the transposon is excised and reintegrated randomly into the genome, but preferentially in the vicinity of the initial site 27 . Excision leads to reconstitution of functional eGFP of which the expression is used to isolate clonal cell lines by sorting single eGFP + cells (Fig. 1c, d). This enables the rapid generation of hundreds of cell lines, each with the enhancer in a distinct genomic position. Enhancer position and eGFP expression are then determined in every cell line (Fig. 1d).
To minimize confounding effects, we integrated the transgene within a 560 kb TAD on chromosome 15 carrying minimal regulatory and structural complexity. This TAD does not contain expressed genes or active enhancers, is mostly composed of 'neutral' chromatin 28 except for a repressive ~80 kb region at its 3′ side (Extended Data Fig. 1a), and displays minimal structure mediated by two internal forward CTCF sites (Extended Data Fig. 1a, b). To further decrease the structural complexity, we deleted the two internal CTCF sites. This led to the loss of the associated loops (Extended Data Fig. 1c) and resulted in a simple homogeneous internal structure, as revealed by capture-C with tiled oligonucleotides spanning 2.9 Mb around the transgene ( Fig. 1e and Extended Data Fig. 1c).
We first heterozygously inserted a single copy (Extended Data Fig. 1e) of a version of the transgene carrying the mouse Sox2 promoter and the essential 4.8 kb region of its distal enhancer known as Sox2 control region (SCR) 29,30 (Extended Data Fig. 1d and Methods), from which we deleted its single CTCF site, which is not essential for transcriptional regulation at the endogenous locus 17 . Transgene insertion did not lead to substantial structural rearrangements within the TAD besides new moderate interactions with the CTCF sites at the 3′ and 5′ end of the   Fig. 1h). The light blue area shows the interval between the mean ± s.d. of eGFP levels in three promoter-only cell lines. i, Data as in h, colour-coded according to SCR genomic orientation.
Nature | Vol 604 | 21 April 2022 | 573 TAD (Extended Data Fig. 1f). Mobilization of the piggyBac-SCR cassette led to random genomic reinsertions with a preference for chromosome 15 itself (Extended Data Fig. 1g). Individual experiments resulted in several tens of cell lines of which the eGFP levels were unimodally distributed (Fig. 1f), generally higher than those detected in control lines in which transcription was driven by the Sox2 promoter alone (Fig. 1f), and remained stable over cell passages (Fig. 1g). Mean eGFP levels in single cell lines were linearly correlated with average numbers of eGFP mRNAs measured using single-molecule RNA fluorescence in situ hybridization (smRNA-FISH) (Extended Data Fig. 1h). We therefore used flow cytometry as a readout of transcriptional activity.
Mapping of piggyBac-SCR positions in more than 300 cell lines revealed that, although in around 15% of them the transposon had not been successfully mobilized, in 99% of those in which it had (262 out of 264), the enhancer reinserted within the initial TAD ( Fig. 1h and Extended Data Fig. 1i). In the two cell lines in which the enhancer transposed outside the TAD, eGFP levels were comparable to basal transcription driven by promoter-only control cell lines (Extended Data Fig. 1j). Notably, within the TAD, expression levels decreased with increasing enhancerpromoter genomic distance (Fig. 1h). Genomic distance accounted for a tenfold dynamic range in gene expression, from around 5 to 60 mRNAs per cell on average on the basis of smRNA-FISH calibration (Extended Data Fig. 1h). Insertions downstream of the non-transcribed Npr3 gene generated lower transcription levels (Fig. 1h), possibly because this is a predominantly repressive region. Mild positive and negative deviations from the average decay in transcription levels indeed correlated with local enrichment in active and repressive chromatin states, respectively (Extended Data Fig. 1k). Consistent with the classical notion derived from reporter assays that enhancer activity is independent of genomic orientation 31 , enhancers inserted in forward or reverse orientations generated equivalent transcription levels (Fig. 1i). Interestingly, cell-to-cell heterogeneity in eGFP levels (assessed using coefficients of variation (CVs)) showed an opposite trend to mean expression levels and increased with increasing enhancer-promoter genomic distance (Extended Data Fig. 1l; examples of eGFP intensity distributions are provided in Extended Data Fig. 1m). Importantly, these results did not depend on the specific fluorescence gate used to define eGFP + cells (Extended Data Fig. 1n, o). Together, these data show that the range of activity of the enhancer extends to the entire TAD and is delimited by its boundaries. However, transcription levels and their cell-to-cell variability quantitatively depend on enhancer-promoter genomic distance.

Enhancer contacts modulate burst frequency
We next examined the relationship between transcription levels and contact probabilities. Although reads from the wild-type allele might underemphasize changes introduced by the heterozygous insertion of the transgene, contact patterns detected in capture-C did not change substantially in individual cell lines in which the SCR was mobilized compared to the founder line before piggyBac mobilization (Extended Data Fig. 2a). Thus, the ectopic enhancer and promoter do not create prominent specific interactions, which enabled us to use capture-C data from the founder line (Methods) 32 to infer contact probabilities between promoter and enhancer locations (Fig. 2a). Contact probabilities steeply decayed with increasing genomic distance from the promoter, fell considerably while approaching TAD boundaries (from 1 to around 0.05) and further dropped by a factor of around 3 across boundaries (Fig. 2a). This is consistent with previous estimations 33 confirmed using cross-linking and ligation-free methods 34 and is representative of the contact probabilities  Article experienced by promoters in mES cells (Extended Data Fig. 2b, c). However, such a trend is at odds with our observation that transcription levels rather mildly decreased inside the TAD and dropped to promoter-only levels outside its boundaries ( Fig. 1h and Extended Data Fig. 2d). Interestingly, plotting the mean eGFP mRNA numbers as a function of contact probabilities revealed a highly nonlinear relationship (Fig. 2b). We sought to understand whether such a nonlinear relationship could be related to how enhancer-promoter interactions translate into transcription in individual cells. Transcription occurs in intermittent bursts 35 that give rise to variable mRNA numbers in single cells. smRNA-FISH analysis revealed substantial cell-to-cell variability in eGFP mRNA numbers in a panel of cell lines in which promoter-SCR contact probabilities ranged from zero (promoter-only control cell line) to one (Fig. 2c). Similar to eGFP protein distributions (Extended Data Fig. 2e), CVs of mRNA distributions increased with decreasing contact probabilities (Extended Data Fig. 2f). Bursty promoter behaviour can generally be described in terms of a two-state model of gene expression 36 in which the promoter stochastically switches with rates k on and k off between an OFF and an ON state in which transcription can initiate with rate µ. Consistent with this notion, mRNA number distributions (Fig. 2d) and mean transcription levels ( Fig. 2e) in individual cell lines could be well approximated by a phenomenological two-state model in which the 'on' rate k on (and therefore the burst frequency) nonlinearly depends on enhancer-promoter contact probability through a Hill function ( Fig. 2f and Supplementary Information, model description). Interestingly, the best agreement with experimental data occurred with a Hill coefficient (h) of 2.8 (95% confidence interval = 2.4-3.2; Extended Data Fig. 3a, b). This corresponds to a sigmoidal transcriptional response in which the enhancer would be no longer able to activate the promoter outside the approximately threefold drop in contact probabilities generated by TAD boundaries (Fig. 2g, h). Importantly the sigmoidal behaviour of k on was not an artefact due to systematic errors in estimation of contact probabilities (Extended Data Fig. 3c), confounding effects of CTCF sites and repressive chromatin in the 3′ part of the TAD, or inclusion of promoter-only cell lines in the fit (Extended Data Fig. 3d). Alternative two-state models in which 'off' or initiation rates depend on contact probability rather than the on rate failed to reproduce the observed decrease in CV with contact probabilities (Supplementary Information, model description).

Mechanistic model of enhancer regulation
We next examined which mechanism could in principle generate such a phenomenological two-state model with sigmoidal modulation of k on . Enhancer-promoter contacts are stochastic 32,37,38 and probably dynamic 39 in single cells. Molecular processes that are thought to transmit regulatory information from enhancers to promoters (such as recruitment of transcription factors and coactivators, assembly of the Mediator complex 40 ), as well as those that are associated with promoter operation itself (such as pre-initiation complex assembly, RNA polymerase II pausing and release 41,42 ) are also stochastic and dynamic 43 . We reasoned that the interplay between the timescales of these processes might generate nonlinear effects, as was recently hypothesized to explain promoter bursting 44 . To investigate this concept in a quantitative manner, we developed a mechanistic model describing the simple hypothesis that, in single cells, the on rate of the promoter is transiently increased after stochastic interactions with an enhancer. We assumed that enhancer-promoter interactions occur and disassemble with rates k close and k far , corresponding to a steady-state contact probability of k close / (k close + k far ) (Fig. 3a). When the enhancer is close to the promoter, it triggers one or more (n) reversible regulatory steps that transmit information to the promoter with forward and reverse rates k forward and k back (Fig. 3b). These steps are an abstract representation of any stochastic regulatory processes occurring at the enhancer-promoter interface. When the enhancer is far, no information is transmitted to the promoter and regulatory steps can only revert at rate k back (Fig. 3b). The promoter operates in a basal two-state regime with a small on rate (k on basal ) ( Fig. 3c) unless all regulatory steps have been completed. In this case, the promoter transiently enters an 'enhanced' two-state regime with a higher on rate (k on enh ), thus transiently increasing its transcriptional activity ( Fig. 3c and Supplementary Information, model description). A transient increase in promoter activity therefore requires enhancer interactions that are either long enough (Extended Data Fig. 4a) or frequent enough (Extended Data Fig. 4b) to allow the completion of the n regulatory steps.
This mechanistic model does not generally reproduce the phenomenological two-state behaviour observed in Fig. 2e, f for the ectopic Sox2 promoter. However, when the timescales of enhancer-promoter interactions are faster than those of intermediate regulatory steps, and both are faster than the promoter's intrinsic bursting dynamics (k close,far ≫ k forward,back ≫k on basal,enh , k off , µ) (Fig. 3d, e), the mechanistic model reduces to an apparent two-state model ( Fig. 3f and Supplementary Information, model description). If forward transitions through n > 1 regulatory steps are favoured over backward reactions (k forward > k back ), then the on rate of the apparent two-state model (k on app ) depends sigmoidally on contact probabilities (Fig. 3g). This shows that, in principle, the promoter's phenomenological two-state behaviour with sigmoidal modulation of k on observed in Fig. 2e, f could arise from stochastic enhancer-promoter interactions being transmitted into slower promoter ON/OFF dynamics through small numbers of intermediate regulatory processes. The resulting sigmoidal transcriptional response would enable an enhancer to act efficiently even when contact probabilities rapidly decay away from the promoter (Extended Data Fig. 2d), and contribute to block enhancer action when small drops in contact probabilities occur across TAD boundaries (Fig. 2h). The mechanistic model also predicts that enhancer-promoter contacts should not correlate with transcription bursts (Fig. 3e), as recently suggested by simultaneous imaging of Sox2 transcription and genomic locations flanking the endogenous Sox2 and SCR 20 .
Finally, we verified that, when reduced to a two-state model, the mechanistic model could simultaneously fit the experimental transcriptional response to contact probabilities and smRNA-FISH distributions (Fig. 3h 20 , promoter ON/OFF transitions that occur in the timescale of several minutes (considering that the time unit in the model is mRNA lifetime, expected to be around 1.5 h) 45 (Extended Data Fig. 4c, d). Regulatory processes at the interface between enhancers and promoters have been estimated to occur in the order of tens of seconds 41,43,46 , consistent with the condition that intermediate regulatory steps should be faster than bursting kinetics (Fig. 3f). The requirement that enhancer-promoter interactions should be even faster (Fig. 3f) therefore predicts that they should occur on a timescale of seconds or less.

Enhancer strength controls insulation levels
We next set out to examine whether CTCF binding affects the observed nonlinear relationship between transcription and contact probabilities.

Article
To this aim, we repeated the enhancer mobilization assay in mES cells in which only one of the two internal CTCF sites was homozygously deleted. The remaining forward CTCF site is located 36 kb downstream of the transgene and loops onto the reverse CTCF sites at the 3′ end of the domain (Fig. 4a). SCR mobilization in this context resulted in 172 cell lines of which the transcription levels were indistinguishable from those generated in the 'empty' TAD, except across the CTCF site that severely, but not completely, insulated the ectopic Sox2 promoter from the enhancer (Fig. 4b). Transcription levels across the CTCF site were about 60% lower than those generated in the absence of the CTCF site (Fig. 4c). Strikingly, this occurred in the absence of notable changes in the promoter's interaction probabilities with the region downstream of the CTCF site, at least in the current experimental set-up (capture-C data with 6.4 kb resolution) (Fig. 4c). This suggests that a single CTCF site might exert transcriptional insulation through additional mechanisms beyond simply driving physical insulation, possibly depending on site identity 47 and flanking sequences 16 .
The SCR is a strong enhancer that accounts for most of the transcriptional activity of endogenous Sox2 29,30 . We reasoned that a weaker enhancer should lead to a different transcriptional response to contact probabilities with the promoter. There are two ways in which the parameters in the model shown in Fig. 3f might change when reducing enhancer strength. The ratio between transition rates through regulatory steps k forward and k back (β in Fig. 3h) might decrease, resulting in a slower transmission of regulatory information (Fig. 5a). This would generate a transcriptional response with maximal transcriptional levels that are similar to those generated by the SCR but different sensitivity to changes in contact probabilities (Fig. 5a). Alternatively (although not exclusively), the on rate in the enhanced promoter regime k on enh could decrease (Fig. 5b). This would conserve the shape of the transcriptional response but decrease the maximal transcription level (Fig. 5b). To test these predictions, we performed the enhancer mobilization assay using a truncated version of the SCR (Extended Data Fig. 5a). This contained only one of the two ~1.5 kb subregions that share similar transcription-factor-binding sites 29 and independently operate as weaker enhancers of the Sox2 promoter in transient reporter assays 29 (Extended Data Fig. 5b). Mobilization of the truncated SCR in mES cells with a forward CTCF site downstream of the promoter (compare with Fig. 4a) led to 74 eGFP + cell lines displaying approximately twofold lower transcription levels compared with those generated by the full-length SCR at comparable genomic distances (Fig. 5c). In contrast to the full-length SCR, the truncated enhancer was completely insulated from the promoter by the CTCF site (Fig. 5c). Thus, the level of functional insulation generated by the same CTCF site depends on the strength of the enhancer. In the region upstream of the CTCF sites, the transcriptional response generated by the truncated SCR ( Fig. 5d) was in quantitative agreement with model predictions under the hypothesis that enhancer strength decreases the on rate rather than changing the intermediate regulatory steps (Fig. 5b), and could be predicted using the full-length SCR best-fit parameters with a two-fold decreased k on enh . This further strengthens our interpretation that enhancer strength modulates the ability of the promoter to turn on, possibly by regulating chromatin state, transcription factor binding or RNA polymerase II dynamics at the promoter 35,44 .
In the nonlinear transcriptional response that we identified, high sensitivity in the low contact probability regime (that is, at long genomic distances) might contribute to secure insulation by TAD boundaries of even strong enhancers such as the SCR. Interestingly, in mES cells, the contact probabilities of most (~75%) active promoters with the nearest TAD boundary are comparable to those experienced by the ectopic Sox2 promoter in our experiments (lower than 0.2) (Extended Data Fig. 5c). These promoters should therefore experience the same insulation mechanisms. The remaining promoters are closer (or adjacent) to a TAD boundary and therefore experience larger contact probabilities with the boundary, at which the transcriptional response is less sensitive (Extended Data Fig. 5d). However, interestingly, drops in contact probabilities across a boundary increase with decreasing genomic distance from the boundary itself (Extended Data Fig. 5d). This might contribute to the functional insulation of this class of promoters. Boundaries associated with clusters of CTCF sites might also benefit from the fact that insulation from CTCF sites can exceed the changes in contact probabilities that they generate (Fig. 4).

Discussion
Our study provides unbiased and systematic measurements of promoter output as a function of large numbers of enhancer positions with minimal confounding effects. The analysis of hundreds of cell lines enables us to move beyond locus-specific observations, and establishes a quantitative framework for understanding the role of chromosome Nature | Vol 604 | 21 April 2022 | 577 structure in long-range transcriptional regulation. Our data reveal that, within a TAD, absolute transcription levels generated by an enhancer depend on its genomic distance from the promoter and are determined by a nonlinear relationship with their contact probabilities. Minimal regulatory and structural complexities introduce deviations from this behaviour and might therefore confound its detection outside a highly controlled genomic environment, notably when studying regulatory sequences in their endogenous context 23 . Mathematical modelling suggests that the observed nonlinear transcriptional response involves a modulation of the promoter's burst frequency, which could arise from transient enhancer-promoter interactions being translated into slower promoter bursting dynamics in individual cells. In addition to readily explaining the absence of correlation between transcription and physical proximity in single-cell experiments, this argues that the absence of such correlation should not be interpreted as the absence of causality. Although alternative explanations cannot be ruled out (such as cooperative effects through biomolecular condensates 21,48 ), our model provides a simple explanatory framework for both population-averaged and single-cell behaviour of enhancer-driven transcription, based on a minimal set of general and realistic hypotheses. Future live-cell imaging experiments with improved spatial and temporal resolution 49 will probably enable the testing of the model's prediction that enhancerpromoter interactions should occur on a timescale of seconds or less, therefore enabling the assessment of the model's premises. Finally, our study reveals that enhancer strength is not only a determinant of absolute transcription levels, but also of the level of insulation provided by CTCF. Our data therefore imply that transcriptional insulation is not an intrinsic absolute property of TAD boundaries or CTCF interactions but, rather, a graded variable depending on enhancer strength, boundary strength and distance from a promoter.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-04570-y.

Generation of enhancer-promoter piggyBac targeting vectors
Homology arms necessary for the knock-in, the Sox2 promoter, the SCR and the truncated version of the SCR (Ei) were amplified from E14 mES cell genomic DNA by Phusion High-Fidelity DNA Polymerase (Thermo Fisher Scientific, F549) using primers compatible with Gibson assembly cloning (NEB, E2611). The targeting vector was generated starting from the 3-SB-EF1-PBBAR-SB plasmid 50 , gifted by Rob Mitra. To clone homology arms into the vector, BspEI and BclI restriction sites were introduced using Q5 Site-Directed Mutagenesis Kit (NEB, E0554). The left homology arm was cloned using Gibson assembly strategy by linearizing the vector with BspEI (NEB, R0540). The right homology arm was cloned using Gibson assembly strategy by linearizing the vector with BclI (NEB, R0160). The Sox2 promoter was cloned by first removing the Ef1a promoter from the 3-SB-EF1-PBBAR-SB vector using NdeI (NEB, R0111) and SalI (NEB, R0138) and subsequently using Gibson assembly strategy. The SCR and its truncated version (truncated SCR or Ei) were cloned between the piggyBac transposon-specific inverted terminal repeat sequences (ITR) by linearizing the vector with BamHI (NEB, R3136) and NheI (NEB, R3131). A transcriptional pause sequence from the human alpha2 globin gene and an SV40 poly(A) sequence were inserted at both 5′ and 3′ ends of the enhancers using Gibson assembly strategy. A selection cassette carrying the puromycin resistance gene driven by the PGK promoter and flanked by FRT sites was cloned in front of the Sox2 promoter by linearizing the piggyBac vector with the AsiSI (NEB, R0630) restriction enzyme. A list of the primers used for cloning is provided in Supplementary Table 1.

Generation of founder mES cell lines carrying the piggyBac transgene
The gRNA sequence for the knock-in of the piggyBac transgene on chromosome 15 was designed using the online tool (https://eu.idtdna. com/site/order/designtool/index/CRISPR_SEQUENCE) and purchased from Microsynth AG. gRNA sequence was cloned into the PX459 plasmid (Addgene) using the BsaI restriction site. E14 mES cell founder lines carrying the piggyBac transgene were generated using nucleofection with the Amaxa 4D-Nucleofector X-Unit and the P3 Primary Cell 4D-Nucleofector X Kit (Lonza, V4XP-3024 KT). Cells (2 × 10 6 ) were collected with accutase (Sigma-Aldrich, A6964) and resuspended in 100 µl transfection solution (82 µl primary solution, 18 µl supplement, 1 µg piggyBac targeting vector carrying the SCR, truncated SCR or promoter alone, and 1 µg of PX459 ch15_gRNA/Cas9) and transferred into a single Nucleocuvette (Lonza). Nucleofection was performed using the protocol CG110. Transfected cells were directly seeded in prewarmed 37 °C culture in E14 standard medium. Then, 24 h after transfection, 1 µg ml −1 of puromycin (InvivoGen, ant-pr-1) was added to the medium for 3 days to select cells transfected with PX459 gRNA/ Cas9 vector. Cells were then cultured in standard E14 medium for an additional 4 days. To select cells with insertion of the piggyBac targeting vector, a second pulse of puromycin was carried out by culturing cells in standard medium supplemented with 1 µg ml −1 of puromycin. After 3 days of selection, single cells were isolated by fluorescence-activated cell sorting (FACS) on 96-well plates. Sorted cells were kept for 2 days in standard E14 medium supplemented with 100 µg µl −1 primocin (Invi-voGen, ant-pm-1) and 10 µM ROCK inhibitor (STEMCELL Technologies, Y-27632). Cells were then cultured in standard E14 medium with 1 µg ml −1 of puromycin. Genomic DNA was extracted by lysing cells with lysis buffer (100 mM Tris-HCl pH 8.0, 5 mM EDTA, 0.2% SDS, 50 mM NaCl, proteinase K and RNase) and subsequent isopropanol precipitation. Individual cell lines were analysed by genotyping PCR to determine heterozygous insertion of the piggyBac donor vector. Cell lines showing the corrected genotyping pattern were selected and expanded. A list of the primers used for genotyping is provided in Supplementary Table 1.

Puromycin resistance cassette removal
Cells (1 × 10 6 ) were transfected with 2 µg of a pCAG-FlpO-P2A-HygroR plasmid encoding for the flippase (Flp) recombinase using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer's instructions. Transfected cells were cultured in standard E14 medium for 7 days. Single cells were then isolated using FACS on 96-well plates. Genomic DNA was extracted by lysing cells with lysis buffer (100 mM Tris-HCl pH 8.0, 5 mM EDTA, 0.2% SDS, 50 mM NaCl, proteinase K and RNase) and subsequent isopropanol precipitation. Individual cell lines were analysed by genotyping PCR to verify the deletion of the puromycin resistance cassette. A list of the primers used for genotyping is provided in Supplementary Table 1. Cell lines showing the correct genotyping pattern were selected and expanded. Selected cell lines were processed for targeted Nanopore sequencing with Cas9-guided adapter ligation (nCATS) 51 and only the ones showing unique integration of the piggyBac donor vector were used as founder lines for the enhancer mobilization experiments.

Mobilization of the piggyBac-enhancer cassette
A mouse codon-optimized version of the piggyBac transposase (PBase) was cloned in frame with the red fluorescent protein tagRFPt (Evrogen) into a pBroad3 vector (pBroad3_hyPBase_IRES_tagRFPt) using Gibson assembly cloning (NEB, E2611). Cells (2 × 10 5 ) were transfected with 0.5 µg of pBroad3_hyPBase_IRES_tagRFPt using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer's instructions. To increase the probability of enhancer transposition, typically 12 independent PBase transfections were performed at the same time in 24-well plates. Transfection efficiency as well as expression levels of hyPBase_IRES_tagRFPt transposase within the cell population were monitored by flow cytometry analysis. Then, 7 days after transfection with PBase, individual eGFP + cell lines were isolated using FACS in 96-well plates. Sorted cells were kept for 2 days in standard E14 medium supplemented with 100 µg ml −1 primocin (InvivoGen, ant-pm-1) and 10 µM ROCK inhibitor (STEMCELL Technologies, Y-27632). Cells were cultured in E14 standard medium for additional 7 days and triplicated for genomic DNA extraction, flow cytometry analysis and freezing.

Sample preparation for mapping piggyBac-enhancer insertion sites in individual cell lines
Mapping of enhancer insertion sites in individual cell lines was performed using splinkerette PCR. The protocol was performed as described previously 52 with a small number of modifications. Genomic DNA from individual eGFP + cell lines was extracted from 96-well plates using the Quick-DNA Universal 96 Kit (Zymo Research, D4071) according to the manufacturer's instructions. Purified genomic DNA was digested by 0.5 µl of Bsp143I restriction enzyme (Thermo Fisher Scientific, FD0784) for 15 min at 37 °C followed by a heat-inactivation step at 65 °C for 20 min. Long (HMSpAa) and short (HMSpBb) splinkerette adapters were first resuspended with 5× NEBuffer 2 (NEB, B7002) to reach a concentration of 50 µM. Then, 50 µl of HMSpA adapter was mixed with 50 µl of HMSpBb adapter (Aa+Bb) to reach a concentration of 25 µM. The adapter mix was denatured and annealed by heating it to 95 °C for 5 min and then cooling to room temperature. Then, 25 pmol of annealed splinkerette adapters was ligated to the digested genomic DNA using 5 U of T4 DNA ligase (Thermo Fisher Scientific, EL0011) and incubating the samples for 1 h at 22 °C followed by a heat-inactivation step at 65 °C for 10 min. For splinkerette amplifications, PCR 1 was performed combining 2 µl of the splinkerette sample, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp1 and 0.1 µM of PB5-1 (or PB3-1) primer, and splinkerette PCR 2 was performed using 2 µl of PCR 1, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp2 and 0.1 µM of PB5-5 (or PB3-2) primer. The quality of PCR amplification was checked by agarose gel electrophoresis. Samples were sent for Sanger Sequencing (Microsynth AG) using the PB5-2 (or PB3-2) primer. A list of the primers used for splinkerette PCRs and sequencing is provided in Supplementary Table 1. Mapping of enhancer insertion sites in individual cell lines was performed as described in the 'Mapping of piggyBac-enhancer insertion sites in individual cell lines' section.
Flow cytometry eGFP fluorescence intensity measurements and analysis eGFP + cell lines were cultured in serum + 2i medium for 2 weeks before flow cytometry measurements. eGFP levels of individual cell lines were measured on the BD LSRII SORP flow cytometer using BD High Throughput Sampler (HTS), which enabled sample acquisition in 96-well plate format. Measurements were repeated three times for each clone. Mean eGFP fluorescence intensities were calculated for each clone using FlowJo and all three replicates were averaged.

Normalization of mean eGFP fluorescence intensities
Mean eGFP fluorescence levels of each cell line measured in flow cytometry were first corrected by subtracting the mean eGFP fluorescence intensities measured in wild-type E14 mES cells cultured in the same 96-well plate. The resulting mean intensities were then normalized by dividing them by the average mean intensities of all cell lines where the SCR was located within a 40 kb window centred at the promoter location, and multiplied by a common factor.
Sample preparation for high-throughput sequencing of piggyBac-enhancer insertion sites Cells (5 × 10 5 ) were transfected with 2 µg of PBase using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer's instructions. Transfection efficiency as well as expression levels of PBase within the cell population were monitored by flow cytometry analysis. Then, 5 days after transfection with PBase, genomic DNA was purified using the DNeasy Blood & Tissue Kit (Qiagen, 69504). To reduce the contribution from cells in which excision of piggyBac-enhancer did not occur, we depleted eGFP sequences using an in vitro Cas9 digestion strategy. gRNA sequences for eGFP depletion were designed using the online tool (https://eu.idtdna.com/site/order/designtool/index/ CRISPR_SEQUENCE) (Supplementary Table 1). Custom-designed Alt-R CRISPR-Cas9 crRNAs containing the gRNA sequences targeting eGFP (gRNA_1_3PRIME and gRNA_2_3PRIME), Alt-R CRISPR-Cas9 tracrRNA (IDT, 1072532) and Alt-R Streptococcus pyogenes Cas9 enzyme (IDT, 1081060) were purchased from IDT. In vitro cleavage of the eGFP fragment by Cas9 was performed according to the IDT protocol 'In vitro cleavage of target DNA with ribonucleoprotein complex'. In brief, 100 µM of Alt-R CRISPR-Cas9 crRNA and 100 µM of Alt-R CRISPR-Cas9 tracrRNA were assembled by heating the duplex at 95 °C for 5 min and allowing to cool to room temperature (15-25 °C). To assemble the RNP complex, 10 µM of Alt-R guide RNA (crRNA:tracrRNA) and 10 µM of Alt-R SpCas9 enzyme were incubated at room temperature for 45 min. To perform in vitro digestion of eGFP, 300 ng of genomic DNA extracted from the pool cells transfected with the PBase was incubated for 2 h with 1 µM Cas9/RNP. After the digestion, 40 µg of proteinase K was added and the digested sample was further incubated at 56 °C for 10 min to release the DNA substrate from the Cas9 endonuclease. After purification using AMPURE beads XP (Beckman Coulter, A63881), genomic DNA was digested by 0.5 µl of Bsp143I restriction enzyme (Thermo Fisher Scientific, FD0784) for 15 min at 37 °C followed by a heat-inactivation step at 65 °C for 20 min. Annealed splinkerette adapters (Aa+Bb; 125 pmol) were then ligated to the digested genomic DNA using 30 U of T4 DNA ligase HC (Thermo Fisher Scientific, EL0013), and the samples were incubated for 1 h at 22 °C followed by a heat-inactivation step at 65 °C for 10 min. For splinkerette amplifications, 96 independent PCR 1 reactions were performed combining 100 ng of the splinkerette sample, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp1 and 0.1 µM of PB3-1 primer, and splinkerette PCR 2 was performed using 4 µl of PCR 1 product, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp2 and 0.1 µM of PB3-2 primer. A list of the primers used for splinkerette PCRs is provided in Supplementary Table 1. Splinkerette amplicon products were processed using the NEB Ultra II kit according to the manufacturer's protocol, using 50 ng of input material. Mapping of genome-wide insertions was performed as described in the 'Mapping of piggyBac-enhancer insertion sites in population-based splinkerette PCR' section.

Sample preparation for tagmentation-based mapping of PiggyBac insertions
PiggyBac integrations in pools of cells were mapped using a Tn5-transposon-based ITR mapping technique based on ref. 53 with minor alterations. Cells (2 × 10 5 ) were transfected with 0.5 µg of PBase using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer's instructions in 24-well plates. Eight independent transfections were performed in parallel. Transfection efficiency as well as expression levels of PBase within the cell population were monitored by flow cytometry analysis. Then, 7 days after transfection with PBase, 6 cell pools of 10,000 cells from low GFP values (gates low 1 and low 2) and 6 cell pools of 337 cells of high GFP values (gate high) were sorted in a 24-well plate. Sorted cells were kept for 2 days in standard E14 medium supplemented with 100 µg ml −1 primocin (InvivoGen, ant-pm-1) and 10 µM ROCK inhibitor (StemCell Technologies, Y-27632). Cells were cultured in E14 standard medium for either 1 passage (pools from gates low 1 and low 2) or 2 passages (pools from gate high) and genomic DNA from individual pools was extracted using the Quick-DNA Miniprep Plus Kit (Zymo Research, D4069) according to the manufacturer's instructions. The Tn5 transposon was produced as described in ref. 54

Article
The PCR mix was 5 µl of enrichment PCR, 1 µl of 10 µM primers, 2 µl dNTPs (10 mM), 4 µl 5× Phusion HF Buffer and 0.25 µl Phusion HS Flex polymerase (NEB), in a final volume of 25 µl and amplified as follows: 30 s at 98 °C; 3 cycles of 10 s at 98 °C, 20 s at 62 °C and 30 s at 72 °C; and 8 cycles of 10 s at 98 °C, 50 s at 72 °C. In PCR 2 the N7xx (Illumina, Nextera Index Kit) adapters were added to the PiggyBac specific locations as follows. PCR was performed with TAC0103 (both ITRs) and N7xx. The PCR mix was 2 µl of PCR1, 1 µl of 10 µM primers, 2 µl dNTPs (10 mM), 4 µl 5× Phusion HF Buffer and 0.25 µl Phusion polymerase (Thermo Fisher Scientific), in a final volume of 22 µl and amplified as follows: 30 s at 98 °C; 10 cycles of 10 s at 98 °C, 20 s at 63 °C and 30 s at 72 °C. Then, 5 µl of library was checked on a 1% agarose gel and different samples were pooled according to smear intensity. Finally, the library was purified by bead purification using CleanPCR (CleanNA) beads at a ratio 1:0.8 sample:beads. The final library was sequenced using the Illumina MiSeq (150 bp, paired-end) system. Mapping of genome-wide insertions was performed as described in the 'Mapping of piggyBac-enhancer insertion sites by tagmentation' section.

Deletion of genomic regions containing CTCF-binding sites
gRNA sequences for depletion of the genomic regions containing the CTCF-binding sites were designed using the online tool (https:// eu.idtdna.com/site/order/designtool/index/CRISPR_SEQUENCE) and purchased from Microsynth AG (Supplementary Table 1). gRNA sequences were cloned into the PX459 plasmid (Addgene) using the BsaI restriction site. To remove the first forward CTCF-binding site (chromosome 15: 11520474-11520491), 3 × 10 5 cells were transfected with 0.5 µg of PX459 CTCF_KO_gRNA3/Cas9 and 1 µg of PX459 CTCF_ KO_gRNA10/Cas9 plasmids using Lipofectamine 2000 (Thermo Fisher Scientific, 11668019) according to the manufacturer's instructions. To remove the second forward CTCF-binding sites (chromosome 15: 11683162-11683179), 1 × 10 6 cells were transfected with 1 µg of PX459 gRNA2_CTCF_KO/Cas9 and 1 µg of PX459 gRNA6_CTCF_KO/Cas9 plasmids using Lipofectamine 2000 (Thermo Fisher Scientific, 11668019) according to the manufacturer's instructions. Then, 24 h after transfection, 1 µg ml −1 of puromycin was added to the medium for 3 days. Cells were then cultured in standard E14 medium for an additional 4 days. To select cell lines with homozygous deletion, single cells were isolated by FACS on 96-well plate. Sorted cells were kept for 2 days in E14 standard medium supplemented with 100 µg ml −1 primocin (InvivoGen, ant-pm-1) and 10 µM ROCK inhibitor (STEMCELL Technologies, Y-27632). Cells were then cultured in standard E14 medium. Genomic DNA was extracted by lysing cells with lysis buffer (100 mM Tris-HCl pH 8.0, 5 mM EDTA, 0.2% SDS, 50 mM NaCl, proteinase K and RNase) and subsequent isopropanol precipitation. Individual cell lines were analysed by genotyping PCR to determine homozygous deletion of the genomic regions containing the CTCF-binding sites. Cell lines showing the corrected genotyping pattern were selected and expanded. A list of the primers used for genotyping is provided in Supplementary Table 1.

smRNA-FISH
Cells were collected with accutase (Sigma-Aldrich, A6964) and adsorbed on poly-l-lysine (Sigma-Aldrich, P8920) precoated coverslips. Cells were then fixed with 3% PFA (EMS, 15710) in PBS for 10 min at room temperature, washed with PBS and kept in 70% ethanol at −20 °C. After at least 24 h incubation in 70% ethanol, the coverslips were incubated for 10 min with freshly prepared wash buffer composed of 10% formamide (Millipore Sigma, S4117) in 2× SSC (Sigma-Aldrich, S6639). The coverslips were hybridized overnight (around 16 h) at 37 °C in freshly prepared hybridization buffer composed of 10% formamide, 10% dextran sulfate (Sigma-Aldrich, D6001) in 2× SSC and containing 125 nM of RNA-FISH probe sets against Sox2 labelled with Quasar 670 (Stellaris) and against eGFP labelled with Quasar 570 (Stellaris). After hybridization, the coverslips were washed twice with wash buffer prewarmed to 37 °C for 30 min at 37 °C with shaking, followed by 5 min incubation with 500 ng ml −1 DAPI solution (Sigma-Aldrich, D9564) in PBS (Sigma-Aldrich, D8537). The coverslips were then washed twice in PBS and mounted on slides with Prolong Gold medium (Invitrogen, P36934) and cured at room temperature for 24 h. The coverslips were then sealed and imaged within 24 h.

RNA-FISH image acquisition
Images were acquired on a Zeiss Axion Observer Z1 microscope equipped with 100 mW 561 nm and 100 mW 642 nm HR diode solid-state lasers, an Andor iXion 885 EMCCD camera, and an α Plan-Fluar ×100/1.45 NA oil-immersion objective. Quasar 570 signal was collected with the DsRed ET filter set (AHF Analysentechnik, F46-005), Quasar 670 with Cy5 HC mFISH filter set (AHF Analysentechnik, F36-760) and DAPI with the Sp. Aqua HC-mFISH filter set (AHF Analysentechnik, F36-710). The typical exposure time for RNA-FISH probes was set to around 300-500 ms with 15-20 EM gain and 100% laser intensity. DAPI signal was typically imaged with an exposure time of 20 ms with EM gain 3 and 50% laser intensity. The pixel size of the images was 0.080 × 0.080 µm with a z-step of 0.25 µm for around 55-70 z-planes.

Image processing and quantification of mRNA numbers
Raw images were processed in KNIME, python and Fiji to extract the numbers of RNAs per cell. The KNIME workflow described below is based on a previously published workflow 55 . z-stacks were first projected to a maximal projection for each fluorescence channel. Individual cells were then segmented using the DAPI channel using Gaussian convolution (σ = 3), followed by filtering using global threshold with Otsu filter, watershed and connected component analysis for nuclei segmentation. Cytoplasmic areas were then estimated with seeded watershed. Cells with nuclei partially outside the frame of view were automatically excluded. Cells containing obvious artifacts, wrongly segmented or not fully captured in xyz dimensions were manually excluded from the final analysis. Spot detection is based on the Laplacian of Gaussian method implemented in TrackMate 56 . For the channels containing RNA-FISH probes signal, RNAs spots were detected after background subtraction (rolling ball radius 20-25 pixels) by selecting spot size 0.2 µm and threshold for spot detection based on visual inspection of multiple representative images. Spot detection is based on the Laplacian of Gaussian method from TrackMate. Subpixel localization of RNA spots was detected for RNA channels and a list of spots per cell for each experimental condition and replicate was generated. Spots in each channel were then aggregated by cell in python to extract the number of RNAs per cell.

Enhancer reporter assays
To generate vectors for the enhancer reporter assay, the Sox2 promoter, SCR and the truncated versions of the SCR (Ei and Eii) were amplified from E14 mES cell genomic DNA with Phusion High-Fidelity DNA Polymerase (Thermo Fisher Scientific, F549) using primers compatible with Gibson assembly strategy. The Sox2 promoter was cloned into the 3-SB-EF1-PBBAR-SB vector as described above. The SCR and the truncated versions Ei and Eii were cloned in front of the Sox2 promoter by linearizing the vector with AgeI (NEB, R3552) and subsequently using Gibson assembly cloning. A transcriptional pause sequence from the human α2-globin gene and an SV40 poly(A) sequence was inserted at both the 5′ and 3′ ends of the enhancers. To test enhancers activity, 3 × 10 5 cells were co-transfected with 0.5 µg of the different versions piggyBac vectors and 0.5 µg of pBroad3_ hyPBase_IRES_tagRFPt using Lipofectamine 2000 (Thermo Fisher Scientific, 11668019) according to the manufacturer's instructions. As a control, only 0.5 µg of the piggyBac vector carrying the Sox2 promoter was transfected. 24 h after transfection, cells were collected and analysed by flow cytometry.

Capture-C sample preparation
Cells (20 × 10 6 ) were cross-linked with 1% formaldehyde (EMS, 15710) for 10 min at room temperature and quenched with glycine (final concentration, 0.125 M). Cells were lysed in 1 M Tris-HCl pH 8.0, 5 M NaCl and 10% NP40 and complete protease inhibitor (Sigma-Aldrich, 11836170001) and enzymatically digested using 1,000 U of MboI (NEB, R0147). Digested chromatin was then ligated at 16 °C with 10,000 U of T4 DNA ligase (NEB, M0202) in ligase buffer supplemented with 10% Triton X-100 (Sigma-Aldrich, T8787) and 240 µg of BSA (NEB, B9000). Ligated samples were de-cross-linked with 400 µg proteinase K (Macherey Nagel, 740506) at 65 °C and phenol-chloroform purified. 3C library preparation and target enrichment using a custom-designed collection of 6,979 biotinylated RNA 'baits' targeting single MboI restriction fragments chromosome 15: 10283500-13195800 (mm9) (Supplementary Table 2; Agilent Technologies; designed as in ref. 57 Table 1). Custom-designed Alt-R CRISPR-Cas9 crRNAs (5 crRNAs targeting the region upstream and 5 crRNAs targeting the region downstream the integrated transgene), Alt-R CRISPR-Cas9 tracrRNA (IDT, 1072532) and Alt-R SpCas9 enzyme (IDT, 1081060) were purchased from IDT. Sample preparation and Cas9 enrichment were performed according to a previously described protocol 51 with a few modifications. Genomic DNA from mES cell founder lines was extracted using the Gentra Puregene Cell Kit (Qiagen, 158745) according to the manufacturer's instructions. The quality of the high molecular mass DNA was checked using the TapeStation (Agilent) system. Typically, 5 µg of high molecular mass DNA was processed for incubation using shrimp alkaline phosphatase (rSAP; NEB, M0371) for 30 min at 37 °C followed by 5 min at 65 °C to dephosphorylate DNA-free ends. For Cas9 enrichment of the target region, all ten Alt-R CRISPR-Cas9 crRNAs were first pooled at an equimolar amount (100 µM) and subsequently incubated with 100 µM of Alt-R CRISPR-Cas9 tracrRNA at 95 °C for 5 min to assemble the Alt-R guide RNA duplex (crRNA:tracrRNA). To assemble the RNP complex, 4 pmol of Alt-R SpCas9 enzyme was incubated with 8 pmol Alt-R guide RNA (crRNA:tracrRNA) at room temperature for 20 min. In vitro digestion and A-tailing of the DNA were performed by adding 10 µl of the RNP complex, 10 mM of dATP (NEB, N0440) and 5 U of Taq Polymerase (NEB, M0267) and incubating the samples for 30 min at 37 °C followed by 5 min at 72 °C. Adapter ligation for Nanopore sequencing was performed using the Ligation Sequencing Kit (Nanopore, SQK-CAS109) according to the manufacturer's instructions. After purification with AMPure PB beads (Witec, 100-265-900), the samples were loaded into the MniION system, selecting the SQK-CAS109 protocol.

Capture-C analysis
Capture-C data were analysed using HiC-Pro 60 (v.2.11.4); the parameters can be found at GitHub (https://github.com/zhanyinx/Zuin_Roth_2021). In brief, read pairs were mapped to the mouse genome (build mm9). Chimeric reads were recovered after recognition of the ligation site. Only unique valid pairs mapping to the target regions were used to build contact maps. Iterative correction 61 was then applied to the binned data. The target regions can be found at GitHub (https://github.com/ zhanyinx/Zuin_Roth_2021). For SCR_ΔΔCTCF, SCR_ΔCTCF and the derived clonal lines, data from replicate one were used to make the quantification and plots throughout the manuscript.

Differential capture-C maps
To evaluate the structural perturbation induced by the insertion of the transgene and the mobilization of the enhancer (ectopic sequences), we accounted for differences in genomic distances due to the presence of the ectopic sequence. In the founder cell line (for example, SCR_ ΔΔCTCF), insertion of the transgene modifies the genomic distance between loci upstream and downstream the insertion site. To account for these differences, we generated distance-normalized capture-C maps in which each entry corresponds to the interaction normalized to the corrected genomic distance between the interacting bins. Outliers (defined using the interquartile rule) or bins with no reported interactions from capture-C were treated as noise and filtered out. Singletons, defined as the top 0.1 percentile of Z-score, were also filtered out. The Z-score is defined as (obs - exp)/stdev, where obs is the capture-C signal for a given interaction and exp and stdev are the genome-wide average and standard deviation, respectively, of capture-C signals at the genomic distance separating the two loci. We next calculated the ratios between distance normalized and noise-filtered capture-C maps. A bilinear smoothing with a window of 2 bins was applied to the ratio maps to evaluate the structural perturbation induced by the insertion of the ectopic sequence.

Chromatin state calling with ChromHMM
Chromatin states were called using ChromHMM 28 with four states. The list of histone modification datasets used is provided in Supplementary Table 3. States with enrichment in H3K9me3 and H3K27me3 were merged, therefore resulting in three chromatin states: active (enriched in H3K27ac, H3K36me3, H3K4me1 and H3K9ac), repressive (enriched in H3K9me3 and H3K27me3) and neutral (no enrichment).

Mapping of piggyBac-enhancer insertion sites in population-based splinkerette PCR
To identify true-positive enhancer re-insertion sites, we first filtered out reads containing eGFP fragments. We then retained only read pairs for which one side mapped to the ITR sequence and the other side mapped to the splinkerette adapter sequence. We mapped separately the ITR/splinkerette sides of the read pair to the mouse genome (build mm9) using BWA mem 62 with the default parameters. Only integration sites that had more than 20 reads from both ITR and splinkerette sides were retained.

Mapping of piggyBac-enhancer insertion sites in individual cell lines
To map the enhancer position in individual cell lines, Sanger sequencing (Microsynth) without the adapter sequences were filtered out. The first 24 bp of each read after the adapter was then mapped to the mouse genome (mm9) using vmatchPattern (Biostrings v.2.58.0). The script used to map Sanger sequencing can be found at GitHub (https://github.com/zhanyinx/Zuin_Roth_2021).

Mapping of piggyBac-enhancer insertion sites by tagmentation
Before aligning paired-end sequencing reads, reads were filtered using an adaptation of cutadapt 63 , processing each read pair in multiple steps. Sequence patterns originating from Tn5 and each ITR were removed. The paired-end reads coming from both ITRs were treated the same. First, the presence of the unique part of the 5′ ITR and 3′ ITR sequence was detected at the start of the second read of the pair and, if present, this sequence was trimmed. Next, the sequence up to and including the TTAA site that was found on both the 5′ITR and 3′ITR was trimmed off. This sequence only partly contained the respective primers used for each ITR, and was used to filter reads that contained the sequence expected for a correct PCR product starting at the transposon. The sequence up to, but not including, the TTAA was removed. Next, all of the other sequence patterns coming from either Tn5 or the ITR were removed from the 5′ end of the first read in the pair and the 3′ end of both reads.
After filtering and trimming the reads, the reads were aligned to a reference genome with an in silico insertion of the split-GFP construct, but with a single TTAA motif instead of the PiggyBac transposon. This was done by aligning the homology arms found in the plasmid against mm10 reference genome. The complete sequence on the reference matching both arms was replaced by the plasmid sequence inserted.
Alignment was performed using Bowtie2 with the fragment length set to a minimum of 0 bp and maximum of 2,000 bp and the very-sensitive option was used. After reads were aligned to the genome, sambamba 64 was used to remove duplicates and samtools 65 was used to filter out read pairs that were not properly paired. We then designated, for each read pair, the position of the first 4 nucleotides of the second read as a putative insertion site. To calculate the fraction of reads originating from the non-mobilized position, the number of read pairs that overlapped the non-mobilized position (the TTAA replacing the PiggyBac of the in silico insert) was divided over the total number of reads originating from putative insertion sites supported by at least one read pair with a mapping quality higher than 2. Confident insertions were identified as those with at least one read for both 5′ and 3′ ITR.

Calibration of the mean number of mRNAs per cell with smRNA-FISH
A linear model was used to predict the average number of eGFP mRNAs on the basis of the mean eGFP intensity. The model was fitted on 7 data points corresponding to the average number of eGFP mRNAs obtained using single-molecule RNA fluorescence in situ and the mean eGFP intensity obtained by flow cytometry (Extended Data Fig. 1h; R 2 = 0.9749, P < 0.0001, t -te st).

Mathematical model and parameter fitting
The phenomenological two-state model (Fig. 2) and the apparent twostate model deduced from the mechanistic enhancer-promoter model (Fig. 3) were both fitted simultaneously to the mean eGFP levels measured in individual cell lines and to the distributions of RNA numbers measured by smRNA-FISH in six cell lines where the SCR was located at different distances from the promoter. The mean number of mRNAs was calculated analytically and the steady-state distribution of the number of mRNA per cell was approximated numerically ( Supplementary  Information, model description). The parameters for the phenomenological two-state model are the minimum on rate k on 0 , the minimum on rate k on 1 , the off rate k off , the initiation rate µ and the constant c and Hill exponent h, which together control the nonlinear dependency of k on on contact probability. The parameters for the apparent two-state model are the basal on rate k on basal , the enhanced on rate k on enh , the off rate k off , the initiation rate µ, the ratio between the forward and backward rates of the regulatory steps β and the number of regulatory steps n. All of these parameters were considered to be free in the fitting procedure. The apparent two-state model was also fitted to the binned mean number of mRNA molecules inferred from the eGFP + cell lines with the truncated version of the SCR (Fig. 4). In this case, three versions of the apparent two-state model were fitted to the data using log-transformed likelihood ratios. The parameter β (version 1) or k on enh (model 2) or both (model 3) were considered to be free parameters, whereas the other parameters were fixed to the best fit values obtained for the full-length SCR dataset. Using log-transformed likelihood ratios, the fit of the three versions was compared to the fit of the model for which all of the parameters were considered to be free. The mathematical description of the enhancer-promoter communication model, the derivation of the apparent two-state model, and the fitting procedures are explained in detail in the Supplementary Information (model description).

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
All capture-C, RNA-seq, Oxford Nanopore, tagmentation and population-based splinkerette PCR sequencing fastq files generated in this study have been uploaded to the Gene Expression Omnibus (GEO) under accession number GSE172257.     Fig. 1g). Average eGFP values calculated within equally spaced 20 kb bins (black dots) are shown. Mean mRNA numbers per cell were inferred from eGFP counts using calibration with smRNA-FISH, see Extended Data Fig. 1h. Shaded light blue area indicates the interval between mean +/-standard deviation of eGFP levels in three promoter-only cell lines. j. Same plot as Fig. 1h showing the only two SCR insertions we detected outside the TAD boundaries (brown dot) and on another chromosome (yellow dot). k. Left: Log10 average eGFP expression (from Fig. 1h) as a function of log10 absolute genomic distance between transgene position and SCR reinsertion. Points are colour-coded as in panel A (chromHMM active, neutral, and repressive states). Black line denotes linear regression. Black circles denote SCR reinsertions within the Npr3 gene body. Right: deviations of eGFP expression levels from the linear regression correlate with chromatin states called using ChromHMM (n: active = 16; neutral = 83; Npr3 = 17; repressive = 7). Reinsertion of SCR within active or repressive regions respectively increases or decreases enhancer activity compared to neutral regions. Box plot: centre line denotes the median; boxes denote lower and upper quartiles (Q1 and Q3, respectively); whiskers denote 1.5x the interquartile region (IQR) below Q1 and above Q3; points denote outliers. l. Coefficients of variation (CV) of eGFP levels measured by flow cytometry plotted against SCR insertion locations in eGFP+ cell lines (light red dots). Data are presented as mean +/-standard deviation (n = 3 measurements in different days). Shaded light blue area indicates the interval between mean +/-standard deviation of eGFP level CVs in three promoter-only cell lines. m. Representative eGFP distributions (normalized to mean eGFP level) in clones with increasing absolute genomic distance (1.7 kb, 42.4 kb, 112.5 kb, and 259.43 kb) between the mobilized enhancer and the ectopic Sox2 promoter. Vertical line indicates normalized mean eGFP levels. n. FACS plot showing standard (top) and less stringent (bottom) gates on eGFP levels used for single cells sort and insertion analysis of corresponding clonal cell lines. o. Left: FACS plot showing the gates used to sort pools of cells for tagmentation-based mapping of PiggyBac-enhancer insertions. For gates "low 1" and "low 2", six pools of 10000 cells were sorted while for gate "high", six pools of 337 cells were sorted. Gate "high" corresponds to the standard gate used to isolate eGFP positive cell lines for the mobilization experiments. Centre: Barplot showing the fraction of sequencing reads mapping to non-mobilized enhancer cassette determined by tagmentation-based mapping from the different pools sorted in gates "low 1", "low 2" and "high". See Methods for a detailed description of the protocol. Right: Numbers and genomic locations of confident insertion sites (identified as those with at least one read for both 5′oth 5 mapping from the different pools sorted in gates "low 1", "low 2" and "higeGFP gates.    Fig. 2 | Analysis of chromosome structure around the transgenic locus and genome-wide in mES cells. a. Top: capture-C maps (6.4 kb resolution) of four cell lines where the SCR (black arrow) has been reinserted at different distances from the promoter (blue arrow). Bottom: differential contact map between individual cell lines and the founder line. Grey pixels: correspond to 'noisy' interactions that did not satisfy quality control filters (see Methods). Right: barplot showing the change in average interaction probabilities between the SCR reinsertion and the cassette, calculated using a square of 5 bins (6.4 kb resolution) centred at the cassette SCR reinsertion interaction. b. Left: example of Hi-C heatmap in mES cells at 6.4 kb resolution. Centre: scheme depicting how the probability of interaction between a promoter and the region immediately before the nearest TAD boundary (P in , 12.8 kb i.e. two 6.4 kb bins before the boundary called using CaTCH 66 ) and after the nearest TAD boundary (P out ) are calculated. Right: distribution of contact probability between all active promoters in mES cellss and the closest inner TAD boundary (P in ) (n = 9655). Box plot description as in Extended Data Fig. 1k. c. Box plots showing the distribution of contact probability changes within the TAD and across the closest TADs boundary for all active promoters in mES cells (n = 9655) whose contact probability outside the TAD is higher than 0.001 (n = 834). Box plot description as in Extended Data Fig. 1k; outliers not shown. d. Contact probabilities of the founder line from the location of the ectopic Sox2 transgene (black line) and normalized averaged mean number of mRNAs per cell (highest value = 1) generated in individual eGFP+ lines by the SCR mobilization are plotted as a function of its genomic position (dashed red line). The average is calculated within equally spaced 20 kb bins as in Fig. 1h (black dots). e. Coefficients of variation (CV) of eGFP levels measured by flow cytometry plotted against contact probabilities between the ectopic Sox2 promoter and the locations of SCR insertions. Data are presented as mean values +/-standard deviation (n = 3measurements in different days). Shaded light blue area indicates the interval between mean +/-standard deviation of eGFP level CVs in three promoter-only cell lines. f. Coefficients of variation (CV) of mRNA number per cell measured by smRNA-FISH plotted against contact probabilities between the ectopic Sox2 promoter and the locations of SCR in the cell the lines shown in Fig. 2c, d. Data are presented as mean values +/-standard deviation (n = 3 technical replicates).    This case illustrates a scenario where, the enhancer-promoter interaction is long enough to allow the completion of the 2 regulatory steps and transiently increases the promoter activity. b. In an alternative scenario, the interactions are shorter but frequent enough to allow the completion of the 2 regulatory steps and transiently increase the promoter activity. c. Parameter values and 95% confidence intervals for the best fitting apparent two-state model. The rates are in the unit of RNA decay rate (δ   Fig. 2e). Shaded areas correspond to promoters with contact probability with the closest TAD boundary below 0.2. d. Left panel: scheme of how the probabilities of interaction between promoter and the region before (P in ) and after the TAD boundary (P out ) are calculated, same criteria as in Extended Data Fig. 2b. Central panel: promoters with higher contact probabilities with TAD boundaries experience stronger drops of contact probability across boundaries. Right panel: promoters closer to TAD boundaries experience a stronger drop of contact probability across boundaries.