Chromosomal architecture is known to influence gene expression, yet its role in controlling cell fate remains poorly understood. Reprogramming of somatic cells into pluripotent stem cells (PSCs) by the transcription factors (TFs) OCT4, SOX2, KLF4 and MYC offers an opportunity to address this question but is severely limited by the low proportion of responding cells. We have recently developed a highly efficient reprogramming protocol that synchronously converts somatic into pluripotent stem cells. Here, we used this system to integrate time-resolved changes in genome topology with gene expression, TF binding and chromatin-state dynamics. The results showed that TFs drive topological genome reorganization at multiple architectural levels, often before changes in gene expression. Removal of locus-specific topological barriers can explain why pluripotency genes are activated sequentially, instead of simultaneously, during reprogramming. Together, our results implicate genome topology as an instructive force for implementing transcriptional programs and cell fate in mammals.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank D. Higgs, J. Hughes, J. Davies and Z. Duan for advice on Hi-C technology; C. Schmidl for ChIPmentation advice; C. van Oevelen for help with CTCF ChIP–seq; C. Segura for mouse-colony management; T. Tian for bone marrow collection; the CRG Genomics Core Facility and the CRG-CNAG Sequencing Unit for sequencing; and members of the laboratory of T.G. for discussions. This work was supported by the European Research Council under the 7th Framework Programme FP7/2007-2013 (ERC Synergy Grant 4D-Genome, grant agreement 609989 to T.G., G.J.F., M.A.M.-R. and M.B.) and the Ministerio de Educacion y Ciencia, SAF.2012-37167. R.S. was supported by an EMBO Long-term Fellowship (ALTF 1201-2014) and a Marie Curie Individual Fellowship (H2020-MSCA-IF-2014). We also acknowledge support from ‘Centro de Excelencia Severo Ochoa 2013-2017’ (SEV-2012-0208) and AGAUR to the CRG.
Integrated supplementary information
(a) Genome browser view of Sall4 gene expression measured by RNA-Seq data (two biological replicates per timepoint). Bar graph insert depicts qRT-PCR measurements of Sall4 expression in two independent biological replicate reprogramming experiments (bars indicate mean values). (b) Scatterplot of RPKM gene expression values (n = 16,332 genes) for biological replicates 1 and 2 (iPS samples shown). (c) Pearson correlation (R2) values between RNA-Seq replicates (n = 16,332 genes) for all timepoints. (d) Genome browser views of the Ctsg and Rag genes with H3K4Me2 ChIPmentation (red) or ATAC-Seq (blue) profiles during reprogramming. Bar graphs below show gene expression dynamics (bars indicate mean values, n = 2). (e) qRT-PCR measurements of Nanog (top) and Sox2 (bottom) expression (mean values, n = 2) using primers that detect both mRNA and primary transcripts. Red rectangles indicate area depicted in smaller zoom-in graph on the right. (f) Normalized genome-wide H3K4Me2 (marking active chromatin) coverage per timepoint. (g) Fraction of Oct4 binding sites in PSCs overlapping with an ATAC-Seq peak (‘ATAC+’) during a representative reprogramming time course. Absolute numbers of sites are shown. (h) ATAC-Seq and (i) H3K4Me2 coverage profiles for Oct4 binding sites in PSCs inside (left, n = 821) and outside (right, n = 31,869) PSC superenhancers (SEs) during reprogramming. Error bars in the figure denote 95% CI.
(a) Representative in-situ Hi-C contact maps (50kb resolution) of a 22.5 Mb region on chromosome 3. (b) Pearson correlation coefficient (R2) heatmap of PC1 value comparisons between timepoints. (c) Line chart depicting genome fractions assigned to A or B compartments at the different time points. Regions that could not be assigned (PC1 = 0, e.g. telomeric regions) are shown in gray. (d) Overall contact enrichment for 100kb bins within the A (left) or B (middle) compartment or between A and B (right) compartments during reprogramming. (e) Fraction of the genome that switches compartment at any point during the time course. Bar graph depicts switching percentages per timepoint. (f) Overlay of principal component analyses for gene expression (blue) and compartmentalization (red) dynamics reveals similar trajectories. Sample sizes are indicated in Fig.1d and Fig.2c. (g) Gene expression changes for genes in bins that switch compartment at any timepoint (n = 2,676 for A-to-B; n = 2,667 for B-to-A) or do not switch (‘stable’, n = 21,027) during reprogramming (*P<2.2e-16, Wilcoxon rank-sum test). (h) Gene ontology terms associated with the two categories of switching genes. (i) Average absolute PC1 score of switching or non-switching (‘stable) bins as a function of their distance to the nearest A/B compartment border. Samples sizes as in panel g. (j) Average distance to the nearest compartment border of non-switching stable bins divided by the average distance of the two types of switching bins. Switching bins are significantly closer to borders than stable bins at all timepoints (Poisson regression, P<4.97e-31). (k) Cartoon summarizing characteristics of compartment switching dynamics: compartmentalization dynamics are highest in regions of low PC1 and near compartment domain borders. Error bars in all plots denote 95% CI.
Supplementary Figure 3 Relationship between subnuclear compartmentalization and gene expression changes.
(a,b) Comparison of gene expression and PC1 dynamics for key B cell (panel a) and pluripotency (panel b) genes (n = 25). Genes were grouped into those stably associated with the A compartment (left, n = 10) and those that switch (right, n = 15). Pie charts depict changes in compartment status for these genes during reprogramming. (c,d) Gene expression (top) and PC1 (bottom) kinetics for downregulated genes (<-0.5 log2, panel c) or upregulated genes (>0.5 log2, panel d) between reprogramming endpoints. Genes were grouped into those stably associated with the A compartment (left; n = 6,119 for upregulated genes, n = 6,696 for downregulated genes) and those that switch (right; n = 1,191 for upregulated genes, n = 1,755 for downregulated genes). Grey shading marks first timepoint of significant change (versus B, *P<0.01, Wilcoxon rank-sum test). Boxplots on the right depict the extent of expression change (PSC versus B) for the two groups of genes. (e) Gene expression clusters of genes stably upregulated during reprogramming at different stages. Line graphs on the right depict average kinetics, gray shading marks first timepoint of significant change. (f) Gene expression (top) and PC1 (bottom) kinetics for stably upregulated genes (from two clusters shown in panel e; n = 64 for the left plot, n = 86 for the right plot) that switch compartment preceding transcriptional upregulation. Gray shading indicates timepoint at which switching was completed. (g) Change in PC1 value (relative to B cells) during reprogramming for bins containing PSC superenhancers (n = 262, P = 0.0004, Wilcoxon rank-sum test). Error bars denote SEM.
Supplementary Figure 4 Integrated kinetics of gene expression, compartmentalization and chromatin state.
(a) Dynamics of average gene expression versus PC1 (left) and H3K4Me2 levels versus PC1 (right) for all 20 individual switching clusters (see Fig.2g). Arrows indicate time points were the correlation between either expression or H3K4Me2 and PC1 is lost. Sample sizes are shown in panel b. (b) Summarized gene ontology (GO) annotation of the 20 switching clusters grouped by switching type and with relationship class (gene expression versus PC1; concom. = concomitant) indicated. Error bars in all plots denote SEM.
(a) Number of TAD borders identified per timepoint for each biological replicate. (b) TAD border reproducibility between replicates as measured by the Jaccard index. (c) Average enrichment of Ctcf and transcription start sites (TSS) at borders (compared to their genome-wide distribution, n = 3100) in two replicate datasets for B cells and PSCs (*P<2.2e-16, Wilcoxon rank-sum test). (d) In-situ Hi-C contact maps (50kb resolution) centered on TAD border 999, which is progressively lost during reprogramming. Black arrows indicate position of TAD border calls per timepoint. Bar graphs on the right shown insulation score (I-score) values for both independent biological replicates. (e) Number of TAD borders reproducibly called per timepoint. Invariant borders were present at all timepoints; variable borders were lost/acquired during reprogramming. (f) Boxplots showing TAD size distributions (n = 3100) during reprogramming. (g) Genome browser view of the Sox2 locus (2.2 Mb region, centered on Sox2) with H3K4Me2 (red) and Ctcf binding (dark grey, peaks indicated by black rectangles) dynamics indicated below. Sox2 gene location is indicated in blue on top, superenhancer (SE) position in red and neighboring genes as black rectangles. (h) Boxplots showing gene expression (RPKM) dynamics for border regions that were acquired (‘gained’, top) or decommissioned (‘lost’, bottom) during reprogramming. Borders were further grouped according to the timepoint they appeared/disappeared. Very few borders are lost at the Bα and D2 stages, resulting in <5 genes available for downstream analysis and we therefore omitted these analyses. (i) Proportion of borders where gene expression is on average upregulated, downregulated or not changed. Borders were separated based on whether they were gained or lost during reprogramming. (j) Gene expression dynamics at transcriptionally modulated border regions (divided in up or downregulated groups per timepoint) gained or lost during reprogramming (#P<0.1, *P<0.05 versus B cells; unpaired two-tailed t-test).
(a) Insulation score (I-score) dynamics for TAD border stably gained (n = 431), lost (n = 124) or invariant (n = 2,185) during reprogramming. Error bars denote 95% CI; percentages indicate the proportion of all borders that belong to the various classes. (b) Boxplots depicting I-score of borders harboring no Ctcf sites, 1-5 Ctcf sites or >5 Ctcf sites for indicated timepoints. (c) Meta-border plots for all borders that gain I-score, do not change I-score or lose I-score. (d) Principal component analysis (PCA) and unsupervised hierarchical clustering of I-score values (n = 3100). (e) Boxplot showing the average distance of pluripotency genes (n = 25) or all other genes (n = 16,307) to the nearest TAD border. (f) Gene ontology terms significantly associated with genes found within dynamic (top) or stable (bottom) border regions in both independent biological replicates.
(a) Conventional 4C-Seq analysis (representative experiment shown) of the Dppa3-Nanog locus at early reprogramming timepoints using the Nanog promoter as a viewpoint. Border region defined by Hi-C and superenhancer (SE) are indicated in blue. (b) 2.25 Mb in-situ Hi-C contact maps (50 kb resolution) centered on the Sox2 gene and its superenhancer (SE) for both independent biological replicate reprogramming experiments. TAD border calls per timepoint are indicated by black arrows. Note the progressive insulation of Sox2 and its SE into a smaller domain as the gene is activated (indicated by a black arrow in the PSC maps). (c-e) Kinetics of mean H3K4Me2 (panel c), D-score (panel d) and PC1 (panel e) changes at dynamic borders harboring genes that are either upregulated (n = 22) or downregulated (n = 21) after I-score changes are initiated. Shading denotes SEM.
(a) Average expression of genes (plotted as an expression percentile) in TADs (n = 1,664) having a low (-0.26;-0.02), average (-0.02;0.1) or high (0.1;0.6) relative domain score (D-score). (b) Boxplots showing relative D-score values for TADs in the A (n = 953-1,039) or B (n = 1,141-1,227) compartment at each timepoint. Statistical significance was assessed using a Wilcoxon rank-sum test. (c) Percentage of expression variance explained by TADs (relative to a linear model, see Supplemental Materials for a detailed explanation) for each timepoint. (d) Collection of all PCA trajectories generated in this study. Points denote average data from two biological replicates. (e) Average D-score and PC1 kinetics during reprogramming for clusters of TADs that gain (n = 705, left) or lose (n = 869, right) D-score. Pearson correlation coefficients (R) are indicated. (f) Top dynamic TADs gaining (n = 279, upper half) or losing (n = 252, lower half) D-score during reprogramming. Line graphs show mean PC1 values for switching and non-switching TADs. Percentages of A-to-B and B-to-A switching for both groups of TADs are depicted by triangles. Tables show selected gene ontology (GO) terms for the genes within the corresponding TADs. (g) Fraction of TADs that switch compartment in groups of TADs with low (0-0.02), average (0.02-0.07) or high (>0.07) absolute changes in D-score for both independent biological replicate experiments. Error bars in all plots denote SEM.
(a) Meta-loop analysis at 5kb resolution of B cell or PSC loops49. Area shown is centered on the respective TF binding sites (+/- 50kb). (b) Gene ontology (GO) annotation of the genes within B cell (left) or PSC (right) specific loops. In-situ Hi-C data from two independent biological replicate reprogramming experiments was pooled for these analyses.
(a) Compartment switching induced by C/EBPα (B-to-Bα, top) or OSKM (Bα-to-D2, bottom). Line graphs depict average expression changes of genes located in regions that have stably switched. (b) GO annotation of genes (n = 358) that stably switch B-to-A compartment at the Bα-D2 transition. (c) Average gene expression changes of the genes (n = 10) associated with the gene ontology (GO) term ‘Embryo Development’ that stably switch B-to-A compartment at the B-Bα transition. (d) Klf4 binding enrichment (over the genome-wide average) at the 20 switching clusters shown in Fig.2g. Mean values with 95% CI are shown. (e) Percentage of TAD border regions (n = 3,100) bound by Oct4 at each timepoint. (f) Oct4 (left) and Klf4 (right) enrichment kinetics at border regions that are already targeted by these factors at D2 (n = 37 for Oct4, n = 22 for Klf4) or border regions not yet targeted at D2 (n = 147 for Oct4, n = 162 for Klf4). (g) C/EBPα (left) or Oct4 (right) enrichment at border regions bound by indicated transcription factors at the earliest timepoint (n = 123 for C/EBPα, n = 37 for Oct4) or unbound regions (n = 61 for C/EBPα, n = 147 for Oct4). Mean values +/- SD are indicated, as well as individual data points. P values were calculated using a Wilcoxon rank-sum test. (h) Venn diagram showing the overlap between the number of dynamic borders bound by Oct4 (at D2), Klf4 (at D2) and C/EBPα (at Bα). (i) Kinetics of key transcriptional, epigenomic and topological events during somatic cell reprogramming. Light-to-dark color intensity range signifies quantitative differences. Ect., ectopic; chr., chromosome.
(a) Comparison of A/B compartmentalization in B cells (representative experiment shown) and replication timing in a B cell line (CH12 Repli-chip data obtained from the ENCODE consortium) for chromosome 2. Note the extremely high correlation between positive PC1 values (i.e. A compartment domains) and positive replication timing signal (i.e. early replication timing domains). (b) Residence time of switching 100kb genomic bins in either the A or B compartment (as measured in timepoints during reprogramming). See Methods for a detailed description of the analysis procedure employed.