Spontaneous reactivation of latent HIV-1 promoters is linked to the cell cycle as revealed by a genetic-insulators-containing dual-fluorescence HIV-1-based vector

Long-lived latently HIV-1-infected cells represent a barrier to cure. We developed a dual-fluorescence HIV-1-based vector containing a pair of genetic insulators flanking a constitutive fluorescent reporter gene to study HIV-1 latency. The protective effects of these genetic insulators are demonstrated through long-term (up to 394 days) stable fluorescence profiles in transduced SUP-T1 cells. Analysis of 1,941 vector integration sites confirmed reproduction of HIV-1 integration patterns. We sorted monoclonal cells representing latent HIV-1 infections and found that both vector integration sites and integrity of the vector genomes influence the reactivation potentials of latent HIV-1 promoters. Interestingly, some latent monoclonal cells exhibited a small cell subpopulation with a spontaneously reactivated HIV-1 promoter. Higher expression levels of genes involved in cell cycle progression are observed in these cell subpopulations compared to their counterparts with HIV-1 promoters that remained latent. Consistently, larger fractions of spontaneously reactivated cells are in the S and G2 phases of the cell cycle. Furthermore, genistein and nocodazole treatments of these cell clones, which halted cells in the G2 phase, resulted in a 1.4–2.9-fold increase in spontaneous reactivation. Taken together, our HIV-1 latency model reveals that the spontaneous reactivation of latent HIV-1 promoters is linked to the cell cycle.

fluorescent reporter gene whose expression is driven by a constitutive promoter to identify transduced cells irrespective of their HIV-1 promoter activity.
These models have proved to be valuable in understanding molecular features of HIV-1 latency. For instance, it has been shown that vector integration sites have an influence on the HIV-1 promoter activity 7,13,14 , and Dahabieh et al. showed that the HIV-1 promoter activity correlates with the degree of activation of cells during transduction 11 . Sherrill-Mix et al. recently showed that the association between HIV-1 latency and chromosomal position is model dependent 15 . This is likely a consequence of the heterogeneous nature of the latent HIV-1 reservoir 16 , thus arguing for novel models to complement existing ones in order to unravel key determinants of HIV-1 latency-associated events.
We have developed a novel HIV-1-based vector (LTatC[M]) consisting of two fluorescent reporter gene cassettes: Cerulean and mCherry (Fig. 1a). The expression of Cerulean is driven by the HIV-1 promoter in the viral 5′ long terminal repeat (5′ LTR) and supported by the HIV-1 transactivator of transcription (Tat) via a positive feedback loop whereas the expression of mCherry is driven by the constitutive human elongation initiation factor 4A1 (heIF4A1) promoter (Fig. 1a). As such, Cerulean reports the activity of the HIV-1 promoter (active or latent) and mCherry labels transduced cells, enabling the identification of cells harbouring an active HIV-1 promoter and simultaneously distinguishing cells harbouring a latent HIV-1 promoter from untransduced cells. Additionally, the mCherry cassette is flanked by a pair of genetic insulators, the chicken hypersensitive site 4 core (cHS4) and synthetic matrix attachment region (sMAR) (Fig. 1a), to prevent (i) transcriptional interference between Cerulean and mCherry [17][18][19] that might lead to artefactual HIV-1 latency and (ii) silencing of mCherry due to position-effect variegation [20][21][22][23][24][25] , thus ensuring its long-term constitutive expression.
In the present study, we first characterized our vector LTatC [M] and verified its capacity to reproduce features of active and latent HIV-1 infections. We then transduced SUP-T1 cells with LTatC[M] pseudotyped with vesicular stomatitis virus glycoprotein G (VSV-G) and subsequently sorted out bulk cell populations, as well as monoclonal cells, representing active and latent HIV-1 infections for further analysis. With our model, we observed that some monoclonal cells constantly exhibited a spontaneously reactivated HIV-1 promoter and found that this phenomenon is linked to the cell cycle.

Transduction of SUP-T1 cells with LTatC[M] yields active and latent HIV-1 phenotypes.
Transduction of SUP-T1 cells with VSV-G-pseudotyped LTatC[M] (Fig. 1a) yielded four cell populations: double positive (DP), single Cerulean positive (C+), single mCherry positive (M+), and double negative (DN) (Fig. 1b). The HIV-1 promoter in DP and C+ cells was active since Cerulean was expressed, and thus they represented active HIV-1 infections. Correspondingly, latent HIV-1 infections were represented by M+ cells due to their lack of Cerulean expression (Fig. 1b). Henceforth, DP and C+ will be called active and M+ will be called latent to denote the state of the HIV-1 promoter harboured in these cells and not of the host cells.
As we were interested in cell populations with stable phenotypes representing active and latent HIV-1 infections, we examined the kinetics of the emergence of the four cell populations: DP, C+, M+, and DN, for 10 days post transduction ( Fig. 1c; Supplementary Fig. S1). The majority of transduced cells appeared to be latent up to 3 days post transduction, although this phenomenon was transient, and a stable fluorescence profile for all transduced cells was established only at 5 days post transduction. The initial fluctuations in fluorescence could be a result of differential temporal regulation of gene expression post integration at different vector integration sites and spontaneous reactivation of latent HIV-1 promoters in M+ cells. Similar kinetics were observed between transduction efficiencies of 5% and 15% ( Fig. 1c; Supplementary Fig. S1). Therefore, we sorted with flow cytometry all four cell populations at 10 days post transduction with a purity of >99% for subsequent analyses (Fig. 1b).
Next, DN, C+, and M+ cell populations sorted from LTatC[M]-transduced SUP-T1 cells were treated with TNF-α and SAHA for 24 hours and the induction of DP cells from these cell populations was measured with flow cytometry. Approximately 7% of M+ cells expressed Cerulean and mCherry after induction with TNF-α and SAHA (Fig. 1d), showing that latent M+ cells could be activated to become double positive. The expression of either Cerulean or mCherry in DN cells and mCherry in C+ cells could not be induced with TNF-α and SAHA (Fig. 1d).

The genetic insulators, cHS4 and sMAR, confer long-term stability of fluorescence profiles in LTatC[M]-transduced SUP-T1 cells.
The short-term kinetics and ratio of active (DP and C+) to latent cells (M+) were comparable between LTatC[M] and a vector variant with no genetic insulators flanking the mCherry cassette (LTatCM) ( Supplementary Fig. S1), demonstrating that the genetic insulators did not influence the emergence of latent cells. As the present study focused on the long-term maintenance and persistence of HIV-1 latency, we examined whether the genetic insulators were capable of protecting the mCherry cassette from being silenced over time by monitoring the fluorescence profiles of the sorted DP, M+, and DN cell populations transduced with LTatC [M] or LTatCM up to 191 days post transduction. Over 90% of cell populations transduced with LTatC[M] retained their initial fluorescence profiles for the entire period analysed (Fig. 2). In contrast, mCherry expression in DP cells transduced with LTatCM decreased gradually down to 48% at 191 days post transduction (Fig. 2a). The decrease in mCherry expression in LTatCM-transduced M+ cells was less drastic, although it remained consistently lower by ~5% compared to the LTatC[M]-transduced M+ cells (Fig. 2a). Our results were reproducible in a second transduction of SUP-T1 cells with LTatC[M]. Consistently, over 90% of all sorted cell populations in the second transduction retained their initial fluorescence profiles analysed for 394 days post transduction (Fig. 2b). Therefore, the pair of genetic insulators, cHS4 and sMAR, flanking the mCherry cassette confer long-term protection to mCherry expression from silencing.  26 . M TS C+ and M TS C− cells were sorted from latent M+ cells in which Cerulean expression was inducible and non-inducible with TNF-α and SAHA, respectively (Fig. 1b). A total of 676 unique LTatC[M] integration sites were analysed. Across all cell populations, the majority of vector integration sites were found in transcription units, although slightly more in the inducible M TS C+ cells (Fig. 3a), and in a convergent transcriptional orientation relative to vector-hosting genes, except for M TS C+ cells (Fig. 3c). The HIV-1 signature weakly conserved palindromic sequence was also observed at vector integration sites for all cell populations (Fig. 3b). These features were reproducible in a second transduction experiment in which a total of 1,265 unique vector integration sites were analysed ( Supplementary Fig. S2).  (Fig. 1b). Vector integration sites were sequenced for all cell clones, and cell clones with identical vector integration sites were treated as biological replicates, resulting in 7 M TS C+, 13 M TS C−, and 6 DP independent cell clones.
We subsequently evaluated the reactivation potentials of latent HIV-1 promoters after expansion of M TS C+ and M TS C− cell clones in culture without TNF-α and SAHA, during which M TS C+ cell clones reverted to their initial phenotype of single mCherry positive. Surprisingly, highly variable reactivation potentials (26-89%) were observed among the seven M TS C+ cell clones with distinct vector integration sites ( Fig. 4a; Table 1). To examine the effects of vector-hosting genes' activity upon HIV-1 infection on the reactivation potentials of latent HIV-1 While this could explain why M TS C 1 +16 had a significantly lower reactivation potential than M TS C 1 +8 and M TS C 1 +15, it did not explain the much lower reactivation potentials of M TS C 1 +12 and M TS C 2 +13 since the vector-hosting genes in these cell clones were not significantly downregulated post HIV-1 integration (Fig. 4a,b). Therefore, we examined the Cerulean cassettes of all cell clones to determine whether mutations contributed to low reactivation potentials of latent HIV-1 promoters. None (0/6) of the DP cell clones analysed had any mutations in their Cerulean cassettes whereas mutations were found in 5/7 M TS C+ cell clones (Fig. 5). Notably, M TS C 1 +12 had a mutation in the HIV-1 transactivation response (TAR) element, which was predicted to disrupt the 3-nucleotide bulge essential for HIV-1 Tat binding and subsequent transcription elongation from the HIV-1 promoter 28,29 (Fig. 4c), and M TS C 2 +13 had numerous mutations throughout its HIV-1 5′ LTR (Supplementary  Table S1). Mutations in these cell clones could account for their low reactivation potentials. The mutations in the HIV-1 Tat region found in M TS C 1 +8 and M TS C 1 +16 ( Fig. 5; Supplementary Table S1) have been reported to have wild-type transactivation activities 30,31 . Interestingly, no mutation was found in M TS C 1 +3 and M TS C 2 +15 while the reactivation potentials of latent HIV-1 promoters in these clones differed by ∼60% (Fig. 4a), further showing the influence of vector integration sites on the reactivation potentials of latent HIV-1 promoters. Taken together, our data provide evidence that the reactivation potentials of latent HIV-1 promoters are influenced by both vector integration sites and integrity of the Cerulean cassettes.
We then examined the Cerulean cassettes of the 18 M TS C− cell clones, the expression of which was not inducible with TNF-α and SAHA. The Cerulean cassettes of 17/18 cell clones contained large internal deletions in the HIV-1 Tat and/or Cerulean region (Fig. 5). Although cell clone MT 2 -5 did not have internal deletions, multiple mutations were found throughout its Cerulean cassette (Fig. 5), e.g. one in the stem of HIV-1 TAR and another resulting in a C37Y amino acid substitution in HIV-1 Tat, which likely abolished transactivation by Tat 28,29,31 . Thus, defective vector genome is the main cause of non-inducible HIV-1 promoters in M TS C− cell clones. This phenomenon has also been observed in cells from HIV-1-infected individuals 4 .
Higher expression of a distinct set of cell cycle regulators in TNF-α and SAHA-responsive single mCherry positive (M TS C+) cell subpopulations with spontaneously reactivated HIV-1 promoters. Some M TS C+ cell clones when reverted to their initial M+ phenotype upon TNF-α and SAHA withdrawal, constantly exhibited a small subpopulation of cells with a spontaneously reactivated HIV-1 promoter, i.e., cells that became double positive without further external stimulation. These spontaneous double positive cells are termed M sp C+ while the majority of cells that remained single mCherry positive are termed M r C− here (Fig. 6a). To examine whether M sp C+ and M r C− phenotypes were reversible, we sorted out the  Table 1). The percentages of spontaneous double positive cells that remained in the sorted M sp C+ cell subpopulations and those that emerged from the sorted M r C− cell subpopulations over time were monitored with flow cytometry. For all M TS C+ cell clones, the majority of the sorted M sp C+ cells became single mCherry positive while a small fraction of M r C− cells became double positive to the extents of their respective parental cell clones (Fig. 6c).
Given that the M sp C+ and M r C− phenotypes were reversible, we hypothesized that the spontaneous reactivation of HIV-1 promoters was linked to the cell cycle. The transcriptomes of M sp C+ and M r C− cells at 0 day post sorting were sequenced to identify differentially expressed genes between the two cell subpopulations ( Supplementary Fig. S3). Cells of the same fluorescence phenotype and vector integration site were treated as biological replicates. Using the cut-offs of fold change >1.5× (|log 2 ratio| > 0.585) and P < 0.02, we identified 18 genes that had a higher expression level in M sp C+ cell subpopulations compared to M r C− and were common between the two vector integration sites ( Fig. 6b; Supplementary Table S2a). Of these 18 genes, three (16.7%) are  Table S2a). For instance, the two top hits, FOSB and NEAT1, have been reported to induce cell cycle entry by activating cyclin D1 in mouse fibroblasts 32 and promote cell proliferation in cancer cells 33 , respectively. Furthermore, NEAT1 and EGR1 have also been reported to be involved in the HIV-1 replication cycle 34 Table S2b), such as RN7SK, the RNA product of which sequesters CDK9 and CycT1 from acting as positive transcription elongation factors at the HIV-1 promoter 36 , and RN7SL1, RN7SL2, RNY1, and RNY3, which are cellular RNAs typically co-packaged in the HIV-1 viral particle 37 . Six (7.0%) and three (3.5%) of these 86 genes are known to promote and suppress cell cycle progression, respectively (Supplementary Table S2b).  Fig. S4). The ratio of percentages of M sp C+ cells to M r C− cells in each of the three cell cycle phases are shown in Fig. 7a. Consistently, treatment of these cell clones with genistein and nocodazole, which halted SUP-T1 cells in the G2 phase of the cell cycle ( Supplementary Fig. S5), increased the percentages of  (Fig. 7b). Taken together with the transcriptomic data above, our findings suggest that spontaneous activation of latent HIV-1 promoters is linked to the cell cycle.

Discussion
HIV-1-based vectors have previously been utilized to dissect various aspects of HIV-1 infection 7,10,11 . In the present study, we have developed a novel vector, LTatC[M], and characterized it in great detail with regard to its representation of HIV-1 infection and latency in SUP-T1 cells. In contrast to previous vectors, LTatC[M] contains a pair of genetic insulators, cHS4 and sMAR, that flank the constitutive mCherry cassette to protect it from position-effect variegation that might be exerted at some integration sites 7,38 . Additionally, mCherry expression is driven by the human eIF4A1 promoter, which has been shown to yield high levels of gene expression in macrophage cell lines 39 , potentially enabling the use of LTatC[M] to study HIV-1 infection in this other target cell type of HIV-1. Furthermore, cHS4 is capable of alleviating transcriptional interference between tandem gene cassettes [17][18][19] . The functionality of the genetic insulators was evident in two ways: (i) stable expression of Cerulean and/or mCherry in long-term bulk and monoclonal cell cultures for up to 394 days and (ii) absence of reactivatable HIV-1 and heIF4A1 promoters in the double negative cell population. In contrast, in a previous study using a dual-fluorescence vector without genetic insulators, a substantial fraction of double negative cells was found to harbour a reactivatable vector 40 . More importantly, these genetic insulators did not alter the kinetics and ratio of emergence of active and latent cells and vector integration site patterns compared to the control vector lacking the genetic insulators. With our model, we were able to examine HIV-1 latency in monoclonal cells with distinct vector integration sites. We focused on factors that contribute towards two aspects of HIV-1 latency reversal: (i) spontaneous reactivation and (ii) reactivation potentials upon external induction. Spontaneous reactivation of latent HIV-1 promoters in vitro has also been observed by others 7 , although a mechanistic explanation is lacking. We found that spontaneous reactivation of latent HIV-1 promoters is linked to the cell cycle, as evidenced by transcriptomic profiling to identify differentially expressed genes between spontaneously reactivated and the remaining non-activated cell subpopulations within cell clones. This linkage was further corroborated by two observations: (i) larger fractions of spontaneously reactivated cells were in the S and G2 phases of the cell cycle compared to the non-activated cells and (ii) halting the cell cycle at the G2 phase with genistein and nocodazole increased the fraction of spontaneous reactivation. This is analogous to the partial reactivation of latent HIV-1 observed in central memory CD4 + T cells undergoing homeostatic proliferation 41 . Furthermore, viral production is known to increase in the G2 phase 42,43 . Interestingly, two cell cycle regulators having a higher expression level in spontaneously reactivated cells, NEAT1 and EGR1, have also been reported to be involved in the HIV-1 replication cycle. NEAT1 is essential for the formation of nuclear paraspeckles and has been found to be upregulated in peripheral blood mononuclear cells of untreated HIV-1-infected individuals (productive infection) compared to those of virally suppressed (latent infection) 44 . Additionally, knockdown of NEAT1 enhances HIV-1 production 35 . These studies suggest that NEAT1 might be upregulated as an antiviral response to HIV-1 expression, e.g. HIV-1 Tat in spontaneously reactivated cells. EGR1 encodes a transcriptional factor that regulates genes related to cell growth and differentiation. The expression of EGR1 is upregulated by HIV-1 Tat 45,46 and its activation leads to the reversal of HIV-1 latency 34 , consistent with our observation that EGR1 was expressed more in spontaneously reactivated cells. The vast majority of genes identified to have a higher expression level in non-activated cells were non-coding small nuclear RNAs (snRNAs) with unknown relevance to HIV-1 infection, except for RN7SK, which forms an RNA scaffold that binds CDK9 and CycT1, thus sequestering these molecules from acting as a positive transcription elongation factor at the HIV-1 promoter 36 . The role of non-coding RNAs, especially long non-coding RNAs and microRNAs, in the modulation of HIV-1  47,48 and snRNAs identified in our screen might represent an additional layer of potential candidates for further investigation in the context of HIV-1 latency establishment and maintenance.
The observations that larger fractions of spontaneously reactivated cells are in the later phases of the cell cycle and that spontaneous reactivation increased upon treatment with cell cycle regulators suggest that cell cycling might lead to spontaneous reactivation. This would provide a mechanistic explanation to the spontaneous reactivation of latent HIV-1 proviruses observed in HIV-1-infected individuals during treatment interruption as a result of homeostatic proliferation of latently infected cells. Our model, however, does not exclude the alternative; cell cycling is a result of spontaneous reactivation. Spontaneous reactivation of latent HIV-1 proviruses has been described as an intrinsic characteristic of the virus 49,50 . We detected a higher expression of EGR1, which is a positive cell cycle regulator, in spontaneously reactivated cells and upregulation of EGR1 by HIV-1 Tat has been shown 45,46 . Tat is contained in our vector and it is an early-expressed gene in the HIV-1 replication cycle 36 . Thus, the intrinsic spontaneous reactivation of latent HIV-1 proviruses may drive cell cycling, potentially contributing towards clonal expansion of latently infected cells [51][52][53] . Of note, our minimalistic HIV-1-based vector does not encode, for example, HIV-1 Vpr, which is also known to modulate the host cell's cell cycling programme 54 . Nonetheless, individual components of the viral genome could be cloned into our vector, thus enabling the possibility to systematically examine individual or combined effects of these components on cell cycling as well as other aspects of the host cell.
HIV-1 integration site selection has been implicated to have an influence on whether the integrated provirus would be active or latent, as well as the reactivation potentials of latent proviruses 13,14 . Vector integration sites identified in our model conformed to HIV-1 integration site patterns 8,26,[55][56][57] . Furthermore, cells representing latent HIV-1 infections consistently exhibited different vector integration site patterns, albeit modest, than their active counterparts in two independent transductions; higher frequencies in transcription units and in the same transcriptional orientation relative to vector-hosting genes. These differences between active and latent cells were also reported in a primary-cell model transduced with HIV-1 NL4-3 whose env was partially replaced with a fluorescent reporter gene 58 . More HIV-1 integrations in a convergent transcriptional orientation have also been observed in viraemic (active infections) than in treated (latent infections) individuals 55 . We further dissected cells with a latent HIV-1 phenotype into those that were inducible with TNF-α and SAHA to become active and those that were non-inducible. Inducible cell clones with distinct vector integration sites and expression profiles of vector-hosting genes displayed variable reactivation potentials. The reactivation potentials were dramatically attenuated in inducible cell clones with mutations in elements important for the expression of Cerulean, demonstrating that the integrity of the viral genome, in addition to features of the viral integration site, has an impact on HIV-1 latency reversal. On the other hand, we did not find any non-inducible cell clone with an intact Cerulean cassette, although the sample size of cells analysed was relatively small. Major internal deletions and mutations, possibly a result of the error-prone HIV-1 reverse transcriptase 59,60 and copy-choice recombination 61,62 , were the cause of these clones not being reactivatable. Such a phenomenon is also observed in cells derived from HIV-1-infected individuals 4 , although the rate and site of mutation and recombination are likely to differ in our system utilizing a minimalistic HIV-1-based vector. Nonetheless, our observation accentuates the need to carefully examine singly fluorescent cell populations, e.g. latency-representing cells, in HIV-1 latency models using HIV-1-based vector systems.
Besnard et al. recently, in a genome-wide shRNA screen, identified candidate genes that either promote or inhibit HIV-1 latency, and subsequently showed that the mechanistic target of rapamycin (mTOR) complex is a positive regulator of HIV-1 latency reversal 63 . In their screen, actin cytoskeleton reorganization pathways were also found to be potentially involved in the regulation of HIV-1 latency 63 . Interestingly, 4/7 inducible cell clones analysed had LTatC[M] integrated into genes related to actin cytoskeleton reorganization: EHBP1 64 , AKAP13 65 , CTNND1 66 , and CDC42BPA 67 . For example, EHBP1 links actin to the cell membrane for endosomal tubulation 64 and CDC42BPA (Cell division control 42 binding protein kinase α) (CDC42 effector protein 5 is the second highest latency-promoting gene in the study of Besnard et al.) 63 promotes actin and myosin reorganization 67 . These four genes are also listed as latency promoting in the study of Besnard et al. 63 . The linkage between integration into these genes and HIV-1 latency establishment is currently unclear. Nevertheless, our observation complements current knowledge on HIV-1 latency and presents an interesting finding for further investigation.

Conclusions
In summary, our well-characterized dual-fluorescence, genetic-insulators-containing HIV-1-based vector is capable of recapitulating various facets of HIV-1 infection and latency and invariably maintains active and latent HIV-1 phenotypes over a long observation period spanning at least one year. Application of this vector might provide insights into the pathomechanisms of HIV-1 latency. We applied this vector to study spontaneous reactivation of latent HIV-1 promoters in SUP-T1 cells and found a linkage between this phenomenon and the cell cycle. This is an important basis for further investigations on the mechanisms of maintenance of the HIV-1 reservoir and by which spontaneous viral rebound occurs in HIV-1-infected individuals off ART.

Methods
Plasmid vector construction. LTatC[M] was synthesized by GeneArt (Thermo Fisher Scientific), except that the HIV-1 Rev response element (RRE) was amplified and cloned from pEV731 7 (kindly provided by Eric Verdin) with primer pair 5′NNNNNNGTCGACCTCGAGATGGGTGCGAGAGCGTCAG3′ and 5′NNNNNNGTCGACGGTGGCATCGATACCGTCGAG3′, and the Woodchuck hepatitis virus post-transcriptional regulatory element (WPRE) was amplified from pCLX-UBI-GFP (kindly provided by Patrick Salmon; Addgene plasmid # 27245) with primer pair 5′NNNNNNGGATCCCGATAATCAACCTCTGGATTAC3′ and 5′NNNNNNACCGGTAATTCCCAGGCGGGGAGG3′. Genetic elements in LTatC[M] are shown in Fig. 1a and tabulated in Supplementary Table S3. Amplification, sequencing, and mapping of 5′ vector integration junctions. The 5′ vector integration junctions were amplified, sequenced, and mapped as described previously 26 . Briefly, 5′ vector integration junctions were amplified with the non-restrictive linear amplification-mediated PCR 70 and 12-14 pM purified PCR products were sequenced with the Illumina MiSeq platform using the MiSeq Reagent Kit v2 (300 cycles) (Illumina) with 8% PhiX. Mapping of sequencing reads to the human genome assembly GRCh37.p13 was performed using an in-house bioinformatic pipeline, Integration Site Analysis Pipeline (InStAP). All vector integration junctions are listed in Supplementary Table S4.
Amplification and sequencing of Cerulean cassettes. The Cerulean cassettes, spanning from the HIV-1 5′ LTR to cHS4, in transduced cells were amplified using Platinum Taq DNA polymerase high fidelity (Thermo Fisher Scientific) with the forward primer 5′GACAAGAGATCCTTGATCTGTGGATC3′ and reverse primer 5′CACTGATAGGGAGTAAACATATGC3′ or 5′GAAGGACAGCTTCAAGTAGTCG3′. The cycling conditions were 94 °C for 2 min, 40 cycles of 94 °C for 30 s, 55 °C for 30 s and 68 °C for 5 min 30 s, and a final extension of 68 °C for 10 min.
The amplicons were processed with the Nextera XT DNA Library Preparation Kit (Illumina) and subsequently sequenced with the MiSeq reagent Kit v2 (50 cycles) with 1% PhiX. Transcriptomic analysis. Total RNA was extracted using AllPrep DNA/RNA Kit (Qiagen) and reverse-transcribed library was prepared and normalized using SMARTer Stranded Total RNA-Seq Kit -Pico Input Mammalian (Takara). The TruSeq SR Cluster Kit v4-cBot-HS or TruSeq PE Cluster Kit v4-cBot-HS (Illumina) was used for cluster generation with 8 pM pooled normalized libraries on the Illumina cBOT system. Sequencing was performed with the Illumina HiSeq. 2500; paired end at 2 × 126 bp or single end 126 bp using the TruSeq SBS Kit v4-HS (Illumina).
Bioinformatic analysis was performed using the R package ezRun (https://github.com/uzh/ezRun) within the data analysis framework SUSHI 71 . Raw reads were quality checked using Fastqc (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and FastQ Screen (http://www.bioinformatics.babraham.ac.uk/projects/ fastq_screen/). Quality-controlled reads were aligned to the reference genome GRCh38 using the STAR aligner 72 . Expression counts were computed using featureCounts in the Bioconductor package Subread 73 . Differential expression analysis was performed using the edgeR package 74 , where raw read counts were normalized using the Trimmed Mean of M values (TMM) method 75 and differential expression was computed using the Generalized Linear Mode (GLM) likelihood ratio test. Quality checkpoints 76 , such as quality control of the alignment and count results were implemented in ezRun and applied throughout the analysis workflow to ensure correct data interpretation. Data availability. Data generated and analysed during this study are included in this published article and its supplementary information files unless stated otherwise, in which case are available from the corresponding author on reasonable request.