A gain-of-function single nucleotide variant creates a new promoter which acts as an orientation-dependent enhancer-blocker

Many single nucleotide variants (SNVs) associated with human traits and genetic diseases are thought to alter the activity of existing regulatory elements. Some SNVs may also create entirely new regulatory elements which change gene expression, but the mechanism by which they do so is largely unknown. Here we show that a single base change in an otherwise unremarkable region of the human α-globin cluster creates an entirely new promoter and an associated unidirectional transcript. This SNV downregulates α-globin expression causing α-thalassaemia. Of note, the new promoter lying between the α-globin genes and their associated super-enhancer disrupts their interaction in an orientation-dependent manner. Together these observations show how both the order and orientation of the fundamental elements of the genome determine patterns of gene expression and support the concept that active genes may act to disrupt enhancer-promoter interactions in mammals as in Drosophila. Finally, these findings should prompt others to fully evaluate SNVs lying outside of known regulatory elements as causing changes in gene expression by creating new regulatory elements.

A large proportion of the single nucleotide variants (SNVs) associated with human traits and predisposition to human disease lie within non-coding regions of the genome 1 . Therefore, the causative variants are presumed to affect the fundamental regulatory elements of the genome including enhancers, promoters and boundary elements 2,3 . We previously described the emergence of a new transcriptional unit in a nonregulatory region of the α-globin locus which causes downregulation of α-globin expression (α-thalassaemia) in individuals from Melanesia. In the same study we identified a candidate for the causal mutation: a single nucleotide (T to C) change which resulted in a new binding site for the erythroid master regulator GATA-1 and the production of a new RNA transcript 4 . This provides an example of a trait associated SNV that appears to create a new regulatory element rather than disrupting an existing element. Few "gain-of-function" SNVs have been identified [4][5][6][7] and as yet the prevalence and full repertoire of mechanisms by which such SNVs might alter gene expression have not yet been determined.
The candidate SNV (a T to C transition at coordinate hg19 chr16:209,709) which is thought to down regulate α-globin expression is found within an unremarkable non-coding region of the α-globin locus. This multi gene cluster lies within a relatively small, well characterised topologically associated sub-domain (a sub-TAD of~80 kb) flanked by CTCF-boundary elements within the telomeric region of chromosome 16 [8][9][10][11] . The sub-TAD includes embryonic (ζ -HBZ), fetal/adult (α-HBA1/2) and theta (θ-HBQ) globin genes which are regulated by four enhancers (R1, R2, R3 and R4) arranged in the order 5′-R1-R2-R3-R4-ζ-α2-α1-θ1-3′. The locus is only active in erythroid cells where active regulatory regions are found in open chromatin associated with acetylated lysine at position 27 of histone H3 (H3K27ac). Enhancers are marked by monomethyl lysine 4 of histone H3 (H3K4me1) and promoters by trimethyl lysine 4 of histone H3 (H3K4me3). Although we have previously shown that the emergence of a new site of transcriptional activity is associated with downregulation of α-globin expression 4 , whether the T to C change is causative and the mechanism by which this disrupts normal regulation is not yet clear.
Here we demonstrate that a point mutation in a non-regulatory region of the human α-globin locus can establish a promoter element that is capable of affecting the expression of nearby native genes. We show that that activity of the novel element cannot be easily explained by a competition model since it disrupts the native promoter-enhancer interactions in an orientationdependent manner. Our observations underline the link between the order and orientation of the fundamental elements of the genome and highlight the complexity of gene regulation.

Results
Point mutation creates new promoter that downregulates native genes. To investigate the SNV created by the T-C transition we generated induced pluripotent stem cell (iPSC) lines from cells that carry the C allele (C-SNV) at position 209,709 (hg19) on both copies of chromosome 16 ( Supplementary Fig. 1a, b). The iPSC lines were differentiated down the erythroid pathway ( Supplementary Fig. 1c) 12 . To characterise the new transcriptional unit we analysed chromatin accessibility (ATAC-seq), the epigenetic landscape (ChIP-seq) and transcription profile (RNAseq) of the α-globin locus. In the mutant cells (C-SNV) there is a new region of open chromatin associated with the site of the candidate (T-C) mutation (Fig. 1A). This newly formed accessible site is bound by RNA polymerase II (RNAP II) and is marked by prominent peaks of H3K27ac and H3K4me3 while the difference in H3K4me1 between the wild type and the mutant C-SNV cells is far less pronounced (Fig. 1B). This chromatin signature is consistent with that observed at most active promoters [13][14][15][16] . Strand specific RNA-seq demonstrates that the C-SNV promoter produces both non-polyadenylated (pA-) and polyadenylated (pA+) transcripts, but only in the sense direction with respect to the published human genome (Fig. 1C). Thus, transcription is directed away from the enhancers, in the same direction as transcription of the α-globin genes. To determine if the mutant cells have reduced levels of α-globin mRNA, as seen in individuals carrying this mutation, we performed qPCR analysis on three mutant C-SNV iPSC clones and three independently generated normal human iPSC lines (WT) differentiated to erythroblasts. There was a roughly two-fold reduction in α-globin mRNA in the C-SNV erythroid cells compared to the wild type lines (Fig. 1D). These findings show that the iPSC system recapitulates the phenotype observed in erythroid cells from individuals carrying the C-SNV.
To understand how the local DNA sequence might facilitate the establishment of the novel C-SNV promoter we studied the site at the 5' end of the transcribed region. We mapped the transcription start site (TSS) using RLM-RACE and identified the most likely transcription initiation element to be the core promoter element XCPE1 17 ( Supplementary Fig. 2). The T-C variant generates a recognition motif for the erythroid transcription factor (TF) GATA1 and this particular GATA1 site is part of a wider half E-box-GATA motif, commonly co-occupied by TAL1 [18][19][20] . Correspondingly the site of the mutation is bound both by GATA1 and TAL1 in primary erythroblasts homozygous for the mutation 21 . In addition, we identified a strong binding motif for another key erythroid TF KLF1, in the immediate vicinity of the GATA1 motif ( Fig. 2A). Using ChIP-seq we found a near two-fold increase in GATA1 occupancy and over four-fold increase in KLF1 binding in the vicinity of the C-SNV (Fig. 2B). This is of interest since both GATA1 and KLF1 are master regulators of erythroid genes [22][23][24][25] , have a role in establishing and/ or maintaining chromatin conformation [26][27][28][29] , and are often found to co-occupy the same sites in erythroid cells 30 . Of interest, we analysed the predicted chromatin accessibility of the human α-globin gene cluster with and without the candidate SNV using a recently established, convoluted neural network (DeepHaem) [https://github.com/rschwess/deepHaem]. DeepHaem has previously been used to predict chromatin accessibility and its effect on higher order chromatin structures using DeepC 31 . This network predicts that the C-SNV mutation alone is sufficient to create an open chromatin site whose accessibility increases as cells differentiate along the erythroid pathway ( Supplementary  Fig. 2b).
To determine experimentally whether the T to C transition is the sole causative variant we introduced this point mutation homozygously into wild type cells and reverted the mutation in mutant cells (for genetic screening and quality control see Supplementary Fig. 2). Editing the mutant C allele to a T (C-T lines) abolished the accessible chromatin site surrounding the mutation and transcription at the site of the SNV became undetectable (Fig. 2C-E). Conversely, introducing the mutant C allele in wild type cells (T-C lines) produces a region of accessible chromatin at the site of the SNV and transcription of this region ( Fig. 2C-E). We performed qPCR to determine if the T to C transition causes downregulation of α-globin. The C-T lines showed an over two-fold increase in α-globin mRNA when compared to its isogenic control C-SNV line. Conversely, the T-C lines showed over 55% reduction in α-globin transcript (Fig. 2D). These results prove that the T to C transition is both necessary and sufficient to cause the observed phenotype and that the accumulation of erythroid TFs around the TSS is causative rather than corelative in creating a new promoter element.
The new promoter sequesters enhancers away from native promoters. The emergence of a new promoter, induced by key erythroid transcription factors like GATA1, in an erythroidspecific locus suggests that its expression may be regulated by the local enhancers. Indeed, GATA1 is thought to promote chromatin looping through its binding partners, among those that have shown to play a role in chromatin conformation are the cofactor FOG1 32-34 and the multimeric LDB1-complex. The LDB1complex consists of TAL1/E-protein heterodimer bound to GATA1 via LMO2 and LDB1 35 and is thought to achieve transcriptional regulation through facilitating interactions between distal regulatory elements and promoters 23,36,37 . Genome wide studies have identified frequently occurring composite TAL1 and GATA1 binding sites known as Ebox-GATA1 motifs at erythroid regulatory elements 38 confirming their co-association. In addition, GATA1 has been shown to cooperate with the TF NF-E2 to contribute to the recruitment of the ATPase component of SWI/ SNF chromatin remodelling complex-BRG1, in order to facilitate the formation of accessible chromatin 38 and to maintain higher order chromatin structure 27,29 . In particular, BRG1 is required for long range interactions in the mouse α-globin locus 39 where GATA1 binds both the enhancers and the globin promoters early  40 . To test if the C-SNV element comes into close proximity with the α-globin enhancers, we assessed the interaction between key sites in the α-globin locus using Capture-C 41 . Analysis of the interaction profiles of C-SNV erythroid cells and wild type cells, from the viewpoint of the new promoter, shows an increase in mean interaction frequency between the α-globin enhancers and the new promoter (Fig. 3A).
To determine if this interaction occurs at the expense of contacts between the α-globin enhancers and their cognate promoters we performed Capture-C from the viewpoint of the promoters of the α-globin genes (HBA1 and HBA2). This showed a marked decrease in mean interaction frequency between the α-globin promoters and enhancers in the presence of the new C-SNV promoter (Fig. 3B). To confirm that this observation holds true from the point of view of the α-globin enhancers we also performed Capture-C from the R2 enhancer (Fig. 3C). This demonstrates that in erythroid cells homozygous for the C allele, the α-globin enhancers display a higher mean interaction frequency over the region of the active C-SNV promoter and a reduction of interactions with the α-globin promoters, and that this is likely to be an important component underlying the reduction in α-globin expression.
Reduced contacts between the enhancers and the α-globin promoters in the presence of the newly formed promoter could be explained by a model of mutually exclusive promoter competition, which has previously been proposed to explain several forms of complex gene regulation 42,43 . Alternatively, this could be explained by the formation of an insulator element by the new promoter, reminiscent of that observed at a subset of Drosophila promoters which act as enhancer blockers; a type of insulator element that restricts enhancer-promoter interactions 44 . Structural boundary elements that regulate chromatin architecture are commonly associated with the presence of the CCCTC-binding factor (CTCF) 45,46 . A CTFC ChIP-seq analysis showed that there are no new CTCF peaks that appear in the C-SNV lines, demonstrating that the activity of the C-SNV promoter alone is the cause of the shift in chromatin interactions (Fig. 3D).
The competition model cannot explain the full effect of the new promoter. We next wanted to understand whether the effect of the C-SNV promoter was due to mutually exclusive promoter competition which should result in the promoter exerting its effect on chromatin interactions and gene expression irrespective of its position in the sub-TAD. Alternatively, if the C-SNV promoter is acting as an enhancer blocker its location (and perhaps orientation) relative to the native promoters and enhancers in the locus would determine its effect on gene regulation. To test the competition model, we placed the sequence of the active C-SNV promoter upstream of the α-globin enhancers in a wild type line, in the antisense orientation, directed away from the enhancers (Fig. 4A) (for genome editing design and genetic screen see Supplementary Fig. 3). Importantly, this insertion still lies within the α-globin sub-TAD and therefore should be equally accessible to the enhancers. In addition, the ectopically placed C-SNV promoter lies closer to the major R2 enhancer (14 kb) than the α-globin genes (59 kb) or the original position of the C-SNV (46 kb). ATAC-seq showed that the transposed C-SNV promoter sequence opens chromatin in this new position (Fig. 4B). The associated chromatin showed a large increase in H3K27ac and H3K4me3, and a less pronounced increase in H3K4me1 (Fig. 4C) together with unidirectional transcription originating at the TSS of the transposed C-SNV promoter (Fig. 4D). This transcript extends away from the enhancers on what is now the antisense strand of DNA. Thus, when placed within the sub-TAD upstream of the enhancers, the C-SNV sequence can still act as a bona fide promoter. Interestingly, when the promoter was placed in the sense orientation, it again created a region of open chromatin marked by H3K27ac and H3K4me1 but not by H3K4me3. In this case there were no detectable RNA transcripts ( Supplementary  Fig. 4). Together, these observations suggest that the orientation of this promoter relative to other elements may determine whether or not it is recognised as a promoter.
To test if the transposed transcriptionally active C-SNV promoter affected α-globin expression, we performed qPCR analysis. There was a reduction (25.6%, p = 0.0322) in mean α-globin transcript levels seen between cells where the transcriptionally active C-SNV promoter is placed upstream of the α-globin enhancers and wild type cells. However, we observed a larger reduction (55.7%, p = 0.0005) in cells of the same genomic background in which the C-SNV promoter is in its original position between the enhancers and α-globin promoters (T-C) (Fig. 4E). This is not sufficient to explain the severity of the phenotype that the C-SNV promoter produces in its native location. To determine if the transposed C-SNV promoter exerts any effect on chromatin interactions in the α-globin locus when placed upstream of the enhancers we performed Capture-C from enhancer R2 and the promoters of HBA1 and HBA2. This showed no major change in chromatin interactions between the α-globin promoters and their enhancers in line with the minor change in Fig. 1 New transcriptional unit bears the marks of a unidirectional promoter and causes α-globin downregulation. A Chromatin accessibility in the αglobin locus as measured by ATAC-seq. The enhancer elements (R1-R4) are highlighted in orange, the site of the T to C mutation is highlighted in green (labelled SNV for single nucleotide variant), gene annotation by Refseq is in blue. Read-densities represent an average of 3 independent differentiation experiments, 3 independent wild type iPSC lines (labelled WT) or 3 iPSC clones obtained from the same patient material homozygous for the C allele of the SNV located at coordinate (hg19) chr16:209,709 (labelled C-SNV), differentiated to erythroblasts. Coordinates (hg19) chr16:108,000-238,000. B ChIPseq, highlighted regions are as in A. Read-densities represent an average of 3 independent experiments, 3 replicates for wild type iPSC line AH017-13 (in blue) or 3 C-SNV iPSC clones obtained from the same patient material (in red) differentiated to erythroblasts. The tracks are overplayed on top of each other, black indicates shared signal while red and blue indicate signal unique for the mutant and wild type lines, respectively. Coordinates (hg19) chr16:108,000-238,000. The level of the signal in the middle of the C-SNV transcriptional unit is most likely affected by the presence of a variable number tandem repeat (inter-ζ VNTR), a 1 kb sequence in the reference genome which in reality can be much larger (over 2 kb). The artificial reduction of the reference genome means that the same signal is collapsed to a smaller length resulting in at least 2-fold signal increase over the middle of the region of enrichment at the C-SNV element. Since the exact sequence or size of the repeat is not known in the genomes analysed, this has not been corrected for. C Strand-specific RNA-seq of polyadenylated selected (pA+) and non-polyadenylated (pA−) RNA, read density (in RPKM) represents an average of 3 independent experiments, 3 replicates for wild type line AH017-13 or 3 clones of C-SNV iPSCs differentiated to erythroid cells. The region of the T to C mutation is highlighted in green (SNV), gene annotation by Refseq is in blue, pseudo genes are in pink. Coordinates (hg19) chr16:209,000-217,000. D qPCR quantification of HBA1/HBA2 in reference to RPS18 in mRNA obtained from 3 independent wild type iPSC lines (WT) or 3 iPSC clones obtained from the same patient material (C-SNV) differentiated to erythroblasts. All lines were differentiated twice (one replicate was removed as an outlier due to low levels): WT (n = 6) in blue, C-SNV (n = 5) in red. Violin plots display median (dotted black line) quartile lines (coloured dotted line) and individual data points (black dots). P-values are obtained using unpaired, two-tailed student t-test. 11.29 p= 0.0302   4F. These observations indicate that placing the element within the sub-TAD but outside of the region between the enhancers and promoters does not significantly interfere with enhancer-promoter interactions as judged by 3 C experiments. This is in line with our recent studies showing that the extension of the α-globin domain to encompass additional genes does not result in promoter competition but results in the formation of a larger chromatin hub where the extra elements together with the native elements take part in multiway interactions without causing major changes in α-globin expression 47 . Together these findings suggest that although there may be some competition between the newly formed promoter and the enhancers, the C-SNV promoter may also act by blocking enhancer-promoter interactions when placed between the α-globin enhancers and their promoters. One caveat of this interpretation lies in the uncertainty of whether promoter strength might determine its ability to successfully compete with other promoters. Indeed, the signal levels of the promoter mark H3K4me3 are much lower in the promoter insertion lines, for this reason we returned to examining the promoter in its native position.

T G C T G A G C T G C C A C A C C C A C AT TAT YA G A A A ATA A C A G C A C A G G C T T G G G G T G G A G G C G G G A C A C A A G A C TA G C C A G A A G G A G A A A G A A A G G T G A A A A G C T G T T G G T G C A A G G A A G C T C T T G G TAT T T C C
The new promoter acts as orientation-dependent enhancerblocker. Elements which limit the interactions between promoter and enhancer in mammalian cells such as boundaries have been shown to commonly act in an orientation dependent manner 48 . It is possible that this novel element may act in a similar manner, although we could find no evidence for orientation-specificity in promoters that act as enhancer-blockers in Drosophila or yeast. To test if the orientation of the C-SNV element contributes to its ability to block enhancer-promoter interactions, we inverted the active C-SNV promoter sequence in its natural position (Fig. 5A) (for genome editing design and genetic screen see Supplementary  Fig. 5). Asking if the element still acted as a unidirectional promoter after the inversion we showed that the inverted element is still capable of opening chromatin, is marked by H3K4me3 and is capable of recruiting RNAP II (Fig. 5B, C). Importantly, the ATACseq and H3K4me3 signal levels over the C-SNV promoter  were similar to those in the C-SNV lines, suggesting that even when inverted promoter strength is maintained. We also proved that the unidirectional nature of the transcriptional activity is preserved, with transcripts with similar levels of expression now extending towards the enhancers (Fig. 5D). To determine whether the C-SNV promoter still blocked enhancer-promoter interactions when inverted, we performed Capture-C from the α-globin promoters and enhancer R2, and qPCR analysis for α-globin expression. This showed that after inversion of the C-SNV promoter the α-globin promoters interact more frequently with their enhancers while the contacts between the α-globin enhancers and the inverted C-SNV promoter decrease (Fig. 5F). Consistent with this, the levels of α-globin mRNA return to wild type levels following the inversion of the C-SNV promoter (Fig. 5E). Since the C-SNV sequence appears to act as an equally strong promoter in either orientation, it should "compete" equally for the enhancer in either orientation. By contrast, we have shown that the inverted C-SNV promoter no longer causes local gene mis-regulation. These observations support the hypothesis that the new (C-SNV) promoter acts as an enhancer-blocker and that this effect is constrained by its orientation in the active locus.

Discussion
In this work, we have found that a gain of function SNV causes the emergence of an active promoter in a tissue-specific gene locus, disrupting normal chromatin interactions and gene regulation. By changing the position and orientation of the novel promoter, we show that rather than simply competing for the activity of the enhancer within a defined sub-TAD, the new promoter acts in an orientation-dependent manner. It predominantly disrupts chromatin interactions only when placed between the enhancers and the promoters of the α-globin genes, and only in one orientation (see Fig. 6). This raises two possibilities to explain the effects of the C-SNV promoter at its natural location between the enhancers and the α-globin promoters. One is that the resulting reduction in α-globin expression could reflect a role for the C-SNV element as an orientation-dependent enhancer-blocker. Alternatively, it could be that the enhancerpromoter interaction is weaker when the C-SNV promoter is inverted with respect to the enhancer, with the promoter inversion altering the enhancer's ability to recognise its target. This, in turn, might reduce the C-SNV promoter's ability to compete for the enhancer activity. However, there is no current evidence, to our knowledge, to suggest that enhancers differ in their interactions with respect to the orientation of their cognate promoters. If so, this would contest the commonly held hypothesis that enhancers and promoters interact equally regardless of their orientation 49 . Of interest, we have also recently shown that the αglobin promoters themselves may act to delimit enhancer contacts within the context of the sub-TAD 50 . Together these observations support the concept that promoters may act to block enhancer-promoter interactions in mammals as in Drosophila. However, the mechanism(s) by which a promoter may act as an orientation dependent enhancer-blocker are not yet known. We can only speculate that the motif grammar, the conformation of multiprotein complexes at the promoter and/or the direction and the act of transcription may all play a role in this. Further studies varying these parameters will be required to determine such mechanisms. The observation that a tissue-specific promoter can act to block an enhancer promoter interaction in an orientationdependent manner raises the question of whether other promoters, most notably those that are situated at the edges of selfinteracting domains, might exhibit the same type of effect and whether their orientation might serve as an additional mechanism that cells use to direct specificity of interactions and shape local chromatin conformation. Finally, variants like C-SNV which generate de novo regulatory elements can only be identified by observation in the correct genotype or, as shown here, through predictive machine learning approaches. These findings should prompt others to re-evaluate SNVs lying outside of known regulatory elements when studying human traits associated with natural variation.  Reprogramming to pluripotency. Reprogramming EBV immortalised B-cells from an individual homozygous the C allele (C-SNV) was performed using the CytoTune-iPS 2.0 Sendai Reprogramming Kit (Life Technologies) according to manufacturer's protocol using the feeder-based reprogramming method on Mitomycin C inactivated MEF plated dishes. When iPSC colonies were ready for transfer (day 18-20) they were transferred onto a Matrigel coated plate in mTeSR1 medium. Subsequently, the lines were maintained for 2 months until they reached passage 17-20 and then quality control was performed, and the lines were stored in  Fig. 5 Changing the orientation of the promoter alleviates repression and increases α-globin enhancer-promoter contacts. A The native promoter was inverted using heterotypical loxP sites located 4011 bp upstream and 2975 bp downstream of the XCPE1 TSS (fragment chr16:205,734-212,720, hg19). B ATAC-seq: the enhancer elements (R1 to R4) are highlighted in orange, the site of the T to C mutation is highlighted in green (SNV), the inverted region is highlighted in light blue, gene annotation by Refseq is in blue. Read-densities represent an average of: 3 C-SNV iPSC clones (labelled C-SNV), 3 clones of edited C-SNV cell line LA01 where the heterotypical loxP sites are inserted but the C-SNV promoter segment is not inverted (labelled nonINV), 3 clones of edited patient line LA01 where the C-SNV promoter segment is inverted (labelled INV). Reads for C-SNV and nonINV were mapped to the normal genome, reads for INV were mapped to a custom genome where a 7 kb segment containing the C-SNV promoter is inverted. Coordinates (hg19) chr16:108,000-238,000. C ChIP-seq, highlighted regions are as in a). Read-densities represent an average of 2 or 3 independent differentiation experiments, 3 C-SNV iPSC clones (C-SNV in red), 2 clones of INV cells (INV in purple). Reads for C-SNV were mapped to the normal genome, reads for INV were mapped to a custom genome. Coordinates (hg19) chr16:108,000-238,000. D Strand-specific RNA-seq of polyadenylated selected (pA+) and non-polyadenylated (pA-) RNA, read density (in RPKM) represents an average of 2 or 3 independent differentiation experiments: 3 C-SNV iPSC clones (C-SNV) or 2 clones of INV cells (INV). The inverted segment is highlighted in blue, the location of the LoxP sites is marked by orange triangles, the region of the T to C mutation is highlighted in green (SNV), gene annotation by Refseq is in blue with pseudogenes in pink. Reads for C-SNV were mapped to the normal genome, reads for INV were mapped to a custom genome. Coordinates (hg19) chr16:202,000-217,000. E qPCR quantification of HBA1/HBA2 in reference to RPS18 in mRNA obtained from independent differentiation experiments: 3 independent wild type iPSC lines differentiated twice (WT), 3 C-SNV iPSC clones   cryogenic store. Human iPSC lines at passage 17-20 were assessed for pluripotency marker expression using the PSC 4-Marker Immunocytochemistry Kit (Invitrogen) as per the manufacturer's protocol. Copy number variation analysis was performed on high quality genomic DNA using the Infinium HD assay on Human CytoSNP 12 Beadchip v2.1 (Illumina) at the Wellcome Trust Centre for Human Genetics (Oxford). Data was analysed using the KaryoStudio software package (Illumina).

Methods
Genome editing of iPSCs. DNA was introduced into human iPSCs by a lipofection based method using Fugene 6 (Promega). Up to 8 μg of DNA was used per 10 6 cells, equal amounts of single-stranded repair template DNA and pSpCas9 (BB)-2A-Puro (PX459) V2.0 plasmid DNA. For point mutation and lox site insertion single stranded oligonucleotides were ordered from IDT. For promoter insertion single stranded DNA was generated in house using λ exonuclease and a 5' phosphorylated primer (for sequences see Supplementary Table 1). Exogenous DNA containing cells were selected using puromycin for 24 h, cells were then maintained as normal for one passage. The puromycin resistant cells were brought to a single cell suspension using Accutase (Millipore) and seeded at low density (~800 cells in suspension per 10cm 2 ). After 7-9 days iPSC colonies were picked into a 96-well plate coated with Matrigel in mTeSR1 medium with 10 μM ROCK inhibitor. The 96 well plate was split to create 2 replica plates, one of the replica plates was used for genotyping the other one was used to expand the clones with correct edits.  ATAC-Seq. ATAC-seq was performed as previously published 52 . Briefly, 7 × 10 4 cells per replicate were lysed and nuclei were isolated prior to tagmentation with Tn5 transposase (Illumina) for 30 min at 37°C. Tagmented DNA was purified using the MinElute kit (Qiagen) and amplified using the NEBNext 2x Mastermix (NEB) and custom barcoded primers. The libraries were sequenced on the NextSeq platform using High-output 75cycle kits, paired end sequencing. Data were analysed using an in-house pipeline 53 . Briefly, reads were mapped onto human genome build hg19 or custom genome builds based on human genome build hg19 using bowtie 2 (version 2.3.2 http://bowtiebio.sourceforge.net/bowtie2/index. shtml). PCR duplicates and ploidy regions were removed, and biological replicates were normalised to reads per 10 8 . Mitochondrial DNA was excluded from the normalisation. Predicted open chromatin scores for reference and C-SNV sequences were generated using deepHaem 31 using both variant alleles with 1 kb of flanking reference sequence.
Next generation capture C. Next generation capture C was performed as previously published 41 . Briefly, 1 × 10 7 cells per replicate were fixed in 2% formaldehyde. 3 C libraries were prepared following digestion with NlaIII enzyme in CutSmart buffer (NEB). Libraries for capture were generated using the NEBNext DNA library Prep Reagent Set (NEB) following the manufacturer's protocol up to the addition of adapters then the libraries were indexed using the Herculase II Fusion Polymerase kit (Agilent) and the NEBNext Multiplex Oligos for Illumina Primers (NEB) following the manufacturer's protocol. Capture was performed on pooled indexed libraries using biotinylated DNA probes (Supplementary Table 1) and the NimbleGen SeqCap EZ Reagent kit (Roche) following manufacturers protocol. The libraries were sequenced on the NextSeq platform using Mid-output 300 cycle kits, paired end sequencing. NG Capture-C data were analysed using the CaptureCompendium toolkit 53 with human refence genome build hg18. Reporter counts were normalised to 10 5 for the calculation of the mean and standard deviation for each replicate (n = 3). Mean reporter counts were divided into 250 bp bins and smoothed using a 5 kb window.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data that support this study are available from the corresponding authors upon reasonable request. ATAC-seq, ChIP-Seq, RNA-seq and NG Capture-C raw data and bigwig files generated in this study are available under Gene Expression Omnibus (GEO) accession GSE159875 Analyses and coordinates referenced here are for either the hg19, hg18 human reference genomes, or custom genomes hg19_INV (inverted C-SNV promoter), hg19_Vas (promoter insertion in anti-sense behind enhancers) or hg19_Vs (promoter insertion in sense behind enhancers) based on reference human genome hg19, as indicated in figure legends. Sequences for chromosome 16 from the custom genomes are available in FASTA format as supplementary files in GEO Subseries GSE159871.