Main

Recent works suggest that the regulation of gene expression involves the formation of biomolecular condensates on DNA1,2,3,4,5,6,7,8,9,10,11,12. Condensation from solution is an attractive concept to explain the spatial and temporal organization of transcription and the high local density of proteins at the transcription site. This concept invokes the collective behaviours of many molecules that emerge from their interactions, such as phase separation. However, the biophysics of the collective properties of transcription factors binding to DNA remain unresolved. What is the physical nature of condensates on DNA? What are the collective properties of their molecular components and how are they guided by DNA sequence11,13,14,15,16? Well-developed concepts from soft matter physics, such as wetting and prewetting17,18,19, provide a powerful framework to understand the relationship between droplet formation in bulk solution and condensation on surfaces.

Here we use optical tweezers to directly observe the condensation of the pioneer transcription factor20 Krüppel-like factor 4 (Klf4) on DNA in vitro. We demonstrate that Klf4 forms sequence-dependent liquid-like condensates that are enabled by interaction with the DNA surface. This sets their typical size and allows them to form below the saturation concentration for phase separation. By combining experiments with theory, we show that these condensates form via a switch-like transition similar to prewetting, a precursor to wetting that occurs below the saturation concentration for bulk-phase separation17,18,19. This transition amplifies the sequence specificity of Klf4 binding to DNA. Polymer-surface-mediated condensation reconciles several observations that were previously thought to be at odds with the idea of phase separation as an organizing principle in the nucleus.

The human pioneer transcription factor Klf4, one of the Yamanaka factors, is a driver of differentiation, cell growth and proliferation21,22. Klf4 has a domain organization typical for transcription factors: an activation domain predicted to be disordered and a structured DNA-binding region23. Human Klf4 with a C-terminal green fluorescent protein (GFP) tag purified from insect cells binds to DNA oligonucleotides in a sequence-specific manner24,25,26,27 (Fig. 1a,b and Extended Data Fig. 1). In the absence of DNA, Klf4 forms liquid-like condensates at physiological salt and pH above a concentration of ~1.0 µM, which is above the estimated nuclear concentration of Klf428 (Fig. 1c,d, Supplementary Methods and Extended Data Figs. 1b and 2). Untagged Klf4 behaves in a similar way (Extended Data Fig. 2c–e). The addition of λ-DNA triggered the formation of foci below the saturation concentration (CSAT) (Fig. 1e), confirming previous observations that DNA can trigger the foci formation of transcription factors11.

Fig. 1: DNA drives condensation of Klf4 at concentrations below the saturation concentration for liquid–liquid phase separation.
figure 1

a, Sodium dodecyl sulphate gel showing recombinantly expressed and purified MBP-Klf4-GFP. b, Test of the affinity of Klf4-GFP to short dsDNA oligonucleotides using EMSA (Extended Data Fig. 1 and Supplementary Tables 1 and 2). Green, N = 3: oligonucleotide containing Class B sequence; blue, N ≥ 3: oligonucleotide containing Class A sequence; grey, N = 3: oligonucleotide without reported Klf4 consensus sequence. Large circles, mean values; error bars, standard deviation; dots, individual experiments. c,d, Bulk-phase separation assay of Klf4-GFP reveals droplet formation above a saturation concentration of ~1.0 µM (Extended Data Fig. 2a). Confocal microscopy images of Klf4-GFP FRAP (Extended Data Fig. 2b) and droplet fusion. The droplets retain their liquid nature for about 40 min (Extended Data Fig. 6b,c). e, Addition of λ-DNA to 500 nM Klf4-GFP triggers foci formation. f, Schematic of the optical tweezers assay: a single λ-DNA molecule (black) is held between two optically trapped (orange cones) beads via biotin–streptavidin interactions (red and orange) at tension of ~8 pN (Extended Data Fig. 3f,g). The exposure of DNA to a solution of Klf4-GFP (light green) triggers foci formation (dark green). g, Top, representative kymograph revealing Klf4-GFP foci formation and dynamics. White horizontal bar, foci displacement on DNA (Extended Data Fig. 5f–h). Bottom, confocal image 200 s after exposure of DNA to Klf4-GFP. h, Average number of Klf4-GFP molecules per focus saturates over time. Black line, mean; grey shading, standard error of the mean at 95% confidence (20 foci from 13 experiments). i, Foci fusion observed in the indicated region in the kymograph (white box in g) (Extended Data Fig. 5a–c and Supplementary Video 1).

To further examine the behaviour of Klf4 on DNA, we used dual-trap optical tweezers with confocal microscopy to hold a linearized λ-DNA molecule stretched between two polystyrene beads (Fig. 1f)29. We observed many Klf4 foci on the DNA molecule at a Klf4 concentration of 115 nM, which varied with regard to the amount of Klf4 they contained (Fig. 1g, Extended Data Figs. 3 and 4a–d and Supplementary Video 1). Notably, even at concentrations closer to CSAT (250 nM), they grew with time until reaching a finite size with an average of approximately 800 molecules per cluster (Fig. 1h). Furthermore, foci can fuse, they can recover after photobleaching and their position can fluctuate on DNA (Fig. 1i, Extended Data Fig. 5 and Supplementary Video 2). Importantly, because condensates form on DNA well below CSAT (Extended Data Figs. 2 and 3k), DNA does not serve as a classic nucleator for the formation of bulk-phase droplets of the kind depicted in Fig. 1c. This is because after nucleation, bulk-phase droplets can only grow if the solution remains above the saturation concentration11,30,31 (Extended Data Figs. 3i and 6). This suggests that at concentrations below CSAT, a mechanism is at play that is qualitatively different from the standard picture of phase separation in a bulk solution.

To further study the physical nature of Klf4 foci on DNA, we investigated their dependence on protein concentration. Figure 2a shows the representative fluorescent images and corresponding traces of Klf4 intensities at different concentrations, recorded 200 s after the Klf4 solution was introduced to the observation chamber. The number and intensity of Klf4 foci increased with the concentration (Fig. 2a). A probability density histogram of pixel intensities at concentrations above 210 nM reveals a bimodal distribution, indicative of two distinct populations of Klf4 foci (Fig. 2b and Extended Data Fig. 7). The peak of the histogram at a low intensity characterizes Klf4 regions with, on average, less than one molecule per binding site (corresponding to a 10 bp footprint of the Klf4 zinc fingers32; Extended Data Fig. 4a,b). We refer to this mode of association as the adsorbed state. The peak at high intensity encompasses Klf4 regions that contain foci with several hundreds and up to a few thousand Klf4 molecules (corresponding to ~2–10 molecular layers of Klf4; Extended Data Fig. 4a,d). We refer to this mode of association as the condensed state. Next, we analysed the intensity histogram as a function of time (Fig. 2c). At low concentrations (below 80 nM), the histogram rapidly forms a peak at intensities that correspond to the adsorbed state that persists over time (Fig. 2c, top). In contrast, at higher concentrations (above 210 nM), the histogram reveals a bifurcation: it first rapidly forms a peak at intensities that correspond to the adsorbed state that subsequently decreases in amplitude, whereas a second peak emerges and quickly moves to higher intensities that corresponds to the condensed state (Fig. 2c, bottom, Extended Data Fig. 4g and Supplementary Video 3). This bifurcation reveals a switch-like transition from an adsorbed state to a condensed state via a bimodal intensity distribution. The occurrence of these two states and the bimodal intensity distribution depends on the concentration. It is noteworthy that the fraction of DNA that is occupied by the condensed state 200 s after exposure of Klf4 to DNA suddenly increases at a concentration CPW of 86 ± 5 nM (Fig. 2d). We conclude that in our experimental system, Klf4 condensates on DNA are formed in a two-step manner: at low concentrations, the protein merely adsorbs to DNA. At sufficiently high Klf4 concentrations, adsorbed proteins switch to a thick condensate.

Fig. 2: Klf4 foci on λ-DNA switch from an adsorbed state to a condensed state via a prewetting-like transition at positions predicted by a Klf4 position weight matrix.
figure 2

a, Confocal images (top) and intensity profiles (bottom) of Klf4-GFP on λ-DNA. Light and dark green, adsorbed and condensed intensity ranges. b, Probability density of the logarithm of intensities; N = 37 experiments. Black line, fit to the sum of two normal distributions. Intersection (658.5 counts) defines the threshold between adsorbed (light green, Ads.) and condensed (dark green, Cond.) states (Methods and Extended Data Fig. 7). c, Same as b as a function of time. Top: N = 60 (Extended Data Fig. 4g and Supplementary Video 3). d, Fraction of DNA occupied by the condensed state. Light dots, individual experiments; red circles, binned medians; error bars, 95% confidence interval (bootstrapping); red line, fit to the heterogeneous two-state model. e, Phase diagram for a binary fluid in the presence of a surface17,54. In the coexistence regime (dark green, liquid–liquid phase separation (LLPS)) and close to the surface, the dense phase transits from partial to complete wetting when the system crosses a characteristic temperature (yellow dashed line). This first-order transition extends into the single-phase region through the prewetting line (solid yellow line). f, Crossing the prewetting line (at CPW) leads to condensation from a thin adsorbed layer (left and middle). Above the saturation concentration CSAT, liquid droplets spontaneously appear in the bulk (right). g, Consensus recognition motif of Klf4 in vivo26 and probability of binding to a given sequence along λ-DNA relative to the consensus motif (Methods and Extended Data Fig. 8). h, Blue and colour map, binding energy landscape hi (kT is the unit of energy, where k is the Boltzmann constant, and T is the temperature), coarse grained over 1 kb (Methods and Extended Data Fig. 9a). Red, average Klf4 intensity along λ-DNA ([Klf4-GFP]: 8–281 nM, N = 79). Shaded area, standard error of the mean at 95% confidence.

How does the DNA surface facilitate the formation of Klf4 condensates? Our data indicate that the formation of Klf4 condensates on DNA is a wetting phenomenon, best described as a process of surface condensation known as prewetting17,18,19,33. Such prewetting transitions occur at a concentration denoted as CPW, which is below the saturation concentration CSAT in the bulk (Fig. 2e,f) and can be understood as follows: Klf4 has an affinity for the DNA surface; hence, the concentration of Klf4 tends to increase in the vicinity of DNA. Therefore, local condensation is facilitated by the surface. However, it cannot extend far away from it, because condensation is not possible in the bulk. Put differently, protein–protein interactions are expected to mediate condensate formation, but the condensate is only stable because of DNA–protein interactions that confine it to the vicinity of the surface.

We next set out to investigate the role of DNA sequence in Klf4 surface condensation. The average intensity profile along the DNA across all Klf4 concentrations tested reveals enrichment at preferred locations (Fig. 2h). Several Klf4-binding motifs have been reported from in vivo and in vitro studies24,25,26,27. We chose five of these for further investigation, and used the position weight matrix of these recognition motifs to infer the binding energy landscapes34 for Klf4 on λ-DNA (Fig. 2g, Extended Data Fig. 8, Supplementary Table 1 and Methods). We find that the measured profile positively correlates with the energy landscapes inferred from four of the five motifs24,25,26,27 (Fig. 2h and Extended Data Fig. 8a–c), with Pearson correlation coefficients of approximately 0.74. We refer to these as class A motifs. The Klf4 intensity profile did not positively correlate with the energy landscape inferred from the fifth motif, which we refer to as a class B motif24 (Extended Data Fig. 8d). This shows that the position weight matrices of class A, but not class B, motifs provide an accurate parameterization of the binding energy landscape of Klf4 on λ-DNA. Interestingly, electrophoretic mobility shift assay (EMSA) analysis reveals that an oligonucleotide representing the class B consensus motif binds with about 1.3 kT stronger affinity to Klf4 than one representing a class A consensus motif (Extended Data Fig. 1c–f and Supplementary Table 2). Even though Klf4 has higher affinity for class B motifs, these are less well represented on λ-DNA than class A motifs, explaining the observed binding pattern (Fig. 1b and Extended Data Fig. 8e,f). These results show that in our experiments, the localization of Klf4 condensates on DNA is guided by the underlying DNA sequence. Future work, both in vivo and in vitro, will be required to provide a full parameterization of the sequence-dependent energy landscape of the interaction of Klf4 with DNA.

The data so far indicate that Klf4 condensation on DNA corresponds to a prewetting phenomenon on a heterogeneous substrate, where the heterogeneity is provided by the DNA sequence. Position weight matrices reflect the binding preference of individual molecules to short DNA recognition motifs. However, the condensation phenomenon seen in our experiments is the result of the collective behaviour of many molecules. How can we reconcile single-molecule binding at the length scale of a few base pairs with the sequence dependence of condensation at larger length scales as seen in our experiments (Fig. 2g,h)? We developed a simplified model that considers transitions between a thin adsorbed state and thick condensed state, modulated by a binding energy landscape specified by the DNA sequence (Fig. 3a and Supplementary Note). We represent the stretched DNA polymer by a one-dimensional set of N discrete sites at which Klf4 association can be in either a thin adsorbed state (si = −1) or a thick condensed state (si = +1), where i = 1,...,N (Fig. 3a). This two-state model is formally equivalent to a heterogeneous Ising model35,36. Each site i corresponds to a putative binding site of Klf4 corresponding to ten base pairs32. At low bulk concentrations, the thin adsorbed state is thermodynamically favoured, whereas at sufficiently high concentrations, the adsorbed molecules collectively switch to form a thick condensed state. This balance is captured by energy h, which is proportional to the bulk chemical potential of Klf4. We capture the DNA sequence by introducing a site-dependent energy bias (hi for site i) determined using the position weight matrix corresponding to a binding motif of class A (Fig. 3b and Extended Data Fig. 9a, bottom). The free energy of condensation in the model is given by

$$E = - J\mathop {\sum}\limits_{ < i,j > } {s_is_j} - \mathop {\sum}\limits_i {\left( {h + h_i} \right)s_i} ,$$

where the first sum is over pairs 〈ij〉 of adjacent sites i and j on the DNA (every pair is counted once). Further, J is an energetic cost related to interfacial tensions. Numerical solutions for the time dependence of condensation as well as for the condensation patterns in the steady state show that the model captures key features seen in experiments: (1) the formation of condensates that coexist with regions that are in the adsorbed state (Figs. 2d and 3c); (2) the dependence of the condensed fraction on protein bulk concentration (Fig. 2d); (3) the sequence dependence of the average spatial condensation pattern in the steady state (Fig. 3c and Extended Data Fig. 9c,f,g); and (4) the time dependence of condensate formation as revealed in kymographs (Fig. 3c and Extended Data Fig. 9f,g). Notably, our model connects the molecular scale with the emerging condensation patterns that involve many molecules, and can be used to predict the condensation pattern for a given sequence of DNA (Fig. 3b,c).

Fig. 3: Two-state heterogeneous model of prewetting on DNA captures the sequence-dependent switching of Klf4 to a condensed state.
figure 3

a, Schematic of the two-state heterogeneous model of prewetting (Supplementary Note). The DNA is considered as a one-dimensional lattice of sites that can be in either one of the two states, adsorbed state (si = –1) or condensed state (si = +1). b, DNA sites have an inhomogeneous propensity for condensation (denoted by hi; Supplementary Note) given by the binding energy landscape inferred from the position weight matrix (Fig. 2g and Extended Data Fig. 9a, bottom). c, Average experimental kymograph ([Klf4-GFP]: 105–281 nM, N = 34; top) shows a spatial localization pattern consistent with the coarse-grained binding energy landscape (middle, colour map representation of hi; Fig. 2h). The average kymograph obtained from the model (N = 400; bottom) captures this pattern. The spatial dimension of the model kymograph was convolved with a Gaussian function with a full-width at half-maximum corresponding to that of the point spread function of the microscopy system (Extended Data Fig. 3c). The unconvolved average kymograph and individual kymographs are shown in Extended Data Fig. 9f,g.

To further analyse the interplay of surface condensation and sequence, we fused a maltose-binding protein (MBP) tag to the disordered N-terminus of Klf4. For this variant, which has an unchanged DNA-binding domain, no bulk-phase separation was observed (data not shown). Notably, MBP-Klf4 forms a thin adsorption layer on λ-DNA following a Hill–Langmuir adsorption isotherm37, which saturates at a density of less than one molecule per binding site (Fig. 4b, top, Extended Data Fig. 4e,f and Supplementary Note). We next removed the MBP tag from Klf4 after DNA binding (Fig. 4a). Strikingly, the adsorbed layer rapidly rearranged into several condensed foci that localized to positions predicted from the sequence (Fig. 4b,c, Extended Data Fig. 10a–f and Supplementary Video 4). These results show that the properties that drive bulk-phase separation also enable the formation of Klf4 condensates on DNA in a sequence-dependent manner.

Fig. 4: Surface condensation enables robust sequence-specific localization of Klf4.
figure 4

a, Schematic depicting the in situ condensation assay: MBP-Klf4-GFP-coated DNA is transferred to a Klf4-free buffer containing 3C protease that can cleave off the MBP tag. b, Top, confocal images before (time, t = 0 s) and after (t = 350 s) transfer to the 3C solution. Bottom, representative kymograph of the 3C-dependent rearrangement of Klf4-GFP on DNA (Extended Data Fig. 10a–f and Supplementary Video 4). c, Top, colour map representation of the average intensity profile (N = 22 experiments) 350 s after cleavage induction. Bottom, colour map representation of the coarse-grained binding energy landscape (hi; Fig. 2h). d, Average intensity along the DNA for Klf4-GFP (red) and MBP-Klf4-GFP (black) as a function of concentration. Light dots, individual experiments; filled circles, binned medians (Klf4-GFP: 11–34 experiments per bin; MBP-Klf4-GFP: 3–9 experiments per bin); error bars, 95% confidence interval; red line, fit of Klf4-GFP data using the two-state heterogeneous model (Supplementary Note); black line, fit of MBP-Klf4-GFP data to the Hill–Langmuir model (Supplementary Methods). e, Correlation between the coarse-grained binding energy landscape (hi; Fig. 2h,c, bottom) and the average intensity profile (quantified by Pearson’s correlation coefficient) for Klf4-GFP (red) and MBP-Klf4-GFP (black) as a function of protein concentration (Extended Data Fig. 10g,h). Circles, correlation resulting from the average intensity profiles selected from a moving window along the concentration axis (N = 15 and 12 experiments per bin for Klf4-GFP and MBP-Klf4-GFP, respectively); shaded area, 95% confidence interval (bootstrapping); red line, prediction from the two-state heterogeneous model for condensation using parameter values extracted from the fit in Fig. 2d; black line, prediction from the two-state heterogeneous model for adsorption (Supplementary Note provides the parameter values and Extended Data Fig. 9d,e).

We analysed the correlation between the protein localization pattern and the underlying DNA sequence as a function of bulk protein concentration (Fig. 4e and Extended Data Fig. 10g,h). For MBP-Klf4, the correlation coefficient initially increases until it reaches a maximum at ~70 nM (ρ = 0.66), followed by a sharp loss of correlation at higher MBP-Klf4 concentrations. This illustrates that the sequence sensitivity of protein localization patterns depends on the protein concentration, and is lost at concentrations beyond the typical binding constant, as expected from the Hill–Langmuir binding kinetics38. In the case of Klf4 without the MBP tag, the correlation initially shows a similar increase as a function of concentration (ρ = 0.76 at 83 nM), but here the correlation remains high at higher protein concentrations. This reveals that above a certain concentration, the pattern of sequence-dependent localization of condensates is insensitive to bulk protein concentration. We conclude that in contrast to single-molecule binding, surface condensation enables a large dynamic range of bulk concentrations for which the localization pattern remains sequence specific.

Prewetting is an attractive concept for transcription factors because it provides a mechanism for the sequence-dependent formation of small condensates on DNA that are limited in size by interactions with the DNA surface. The transition from an adsorbed layer to a condensed layer serves as a collective amplifier of sequence information that effectively expands the dynamic range at which sequence specificity is achieved. However, how can surface condensation maintain sequence specificity at higher concentrations? Two mechanisms are at play here. First, in the adsorbed state, sequence information is independently used by individual molecules, whereas in the condensed state, sequence information is collectively integrated by the molecules. This is presumably because surface condensation is triggered by a local increase in concentration, which is promoted by the local clustering of binding sites11 as previously suggested for the formation of transcriptional condensates in the bulk. This might explain how pioneer transcription factors can distinguish recognition sites within enhancers from isolated sites in other regions of the genome39,40. Second, when molecules adsorb independently, binding becomes saturated at higher bulk concentrations because each site can only be occupied once41. However, a condensate can accommodate a variable number of molecules, even as the concentration increases. Indeed, molecules associated with DNA will be incorporated into existing condensates either directly or via one-dimensional diffusion34,39,42,43 rather than occupying unfavourable sites. Consequently, a further increase in protein concentration can result in the growth of condensates without altering their localization pattern, rendering the process insensitive to molecular noise44,45.

Since polymer-surface-mediated condensation leads to the formation of liquid-like compartments, features such as the fusion of transcriptional condensates and recruitment of downstream factors that have been observed previously7,8,9,28 can be accounted for here. The limited size of transcription factor condensates provides a possible explanation for the small size of transcriptional foci observed in vivo8,46,47. We suggest that polymer-surface-mediated condensation provides a general framework to explain the formation of other nuclear condensates such as heterochromatin or paraspeckles on chromatin or RNA surfaces48,49,50,51,52,53.

Methods

Protein expression and purification

Proteins were expressed in Sf9 cells (Expression Systems, 94-001F) for 72 h using the baculovirus system55. For all the Klf4 constructs, the preparation was done on ice using pre-cooled solutions. The cells were resuspended in lysis buffer (50 mM bis-tris propane (pH 9.0), 500 mM KCl, 500 mM arginine-HCl, 6.25 µM ZnCl2, 5% glycerol, EDTA (ethylenediaminetetraacetic acid)-free protease inhibitor cocktail set III (Calbiochem) and 0.25 U ml–1 benzonase (in-house)) and lysed by sonication. The lysate was cleared by centrifugation for 1 h at 13,000 × g and 4 °C. The supernatant was incubated with amylose resin (NEB) for at least 30 min at 4 °C. After washing with wash buffer I (50 mM bis-tris propane (pH 9.0), 500 mM KCl, 500 mM arginine-HCl and 5% glycerol), the beads were transferred into Econo-Pac gravity columns (Bio-Rad) and washed with wash buffer II (50 mM bis-tris propane (pH 9.0), 1 M KCl, 500 mM arginine-HCl and 5% glycerol) followed by wash buffer I. MBP-Klf4 was eluted using elution buffer (50 mM bis-tris propane (pH 9.0), 500 mM KCl, 500 mM arginine-HCl, 10 mM maltose and 5% glycerol). The eluate was concentrated using Vivaspin 50,000 MWCO concentrators (GE Healthcare or Sartorius) and subjected to size-exclusion chromatography at 4 °C using a Superdex 200 column (GE Healthcare) and size-exclusion chromatography buffer (50 mM bis-tris propane (pH 9.0), 500 mM KCl, 5% glycerol and 1 mM DTT (dithiothreitol)).

After concentrating the sample as described above, the proteins were stored at 4 °C for no longer than 2 weeks. MBP-Klf4 was buffer exchanged using Zeba Spin Desalting Columns (Thermo Scientific) into Klf4 buffer (25 mM tris (pH 7.4), 500 mM KCl, 1 mM DTT and 0.1 mg ml–1 BSA (bovine serum albumin)). For MBP-Klf4-GFP (plasmid TH1528) and MBP-Klf4-mCherry (plasmid TH1529), unless stated otherwise, the MBP moiety was cleaved off with 10% (v/v) 3C protease (in-house; 1 U µl–1) for at least 1 h on ice (Extended Data Fig. 1a). For MBP-Klf4-MBP (plasmid TH1696), both MBP tags were cleaved off with 10% v/v 3C protease and 10% (v/v) TEV (tobacco etch virus) protease (both in-house) for at least 2 h on ice (Extended Data Fig. 2c). In both cases, the sample was spun for 10 min at 20,000 × g and 4 °C and the concentration was remeasured using either adsorption at 280 nm or GFP fluorescence.

Phase separation assays

Klf4-GFP was kept on ice and diluted with a pre-cooled solution to prevent premature phase separation at higher temperatures (Extended Data Fig. 2f). The protein was pre-diluted with cold Klf4 buffer to four times the final concentration and then mixed in a ratio of 1:4 with cold dilution buffer (25 mM tris (pH 7.4), 1 mM DTT and 0.1 mg ml–1 BSA) to obtain the Klf4 assay buffer (25 mM tris (pH 7.4), 125 mM KCl, 1 mM DTT and 0.1 mg ml–1 BSA) in a total volume of 20 µl. For assays containing DNA, the dilution buffer would also contain appropriate amounts of DNA. The samples were mixed by pipetting, and 18 µl was transferred into 384-well medium-binding microplates (Greiner Bio-One). The samples were incubated at room temperature for 20 min before imaging. Samples that contained DNA were additionally spun at 3,200 × g for 2 min. The images were taken using an Andor Eclipse Ti inverted spinning-disc microscope with an Andor iXon 897 electron-multiplying charge-coupled device camera and a UPLSAPO ×40/0.95 numerical aperture (NA) air objective or ×60/1.20 NA water-immersion objective (Nikon). Data from at least three independent experiments were averaged. Data analysis was performed as described elsewhere56. For untagged Klf4, differential interference contrast microscopy was done using a Zeiss LSM 880 inverted single-photon point-scanning confocal system utilizing a transmitted-light detector and ×40/1.2 NA C-Apochromat water-immersion objective (Zeiss), which is suitable for differential interference contrast.

Determining the concentration of the dilute phase

The Klf4 samples were set up as those for the phase separation assays, but instead of transfer to a microplate, the samples were incubated for 20 min at room temperature in 1.5 ml Eppendorf tubes. To obtain a standard curve, samples with a final KCl concentration of 500 mM were also prepared and treated in parallel. After incubation, the samples were spun in a temperature-controlled centrifuge at 20,000 × g for 15 min at 21 °C. Here 5 µl supernatant was added to 15 µl Klf4 buffer or, in the case of the control samples, a corresponding buffer, to reach the same final KCl concentration. These samples were transferred to 384-well non-binding microplates (Greiner Bio-One) and imaged with a wide-field fluorescence microscope (DeltaVision Elite, Applied Precision) using ×10/0.4 NA dry objective and a Photometrics electron-multiplying charge-coupled device camera. The median fluorescence values for each field of view were obtained using the Fiji software (https://fiji.sc/). The control samples were used to generate a standard curve that correlates fluorescence intensity with protein concentration. This curve was then used to calculate the original protein concentration in the sample supernatant. To determine CSAT, we fitted a two-component piecewise linear function to the curve of dilute phase concentration versus the total concentration (Extended Data Fig. 2a, left). The number of points considered on each side was varied and the two optimal lines were selected as those having the maximum difference between their slopes. The CSAT value for a given dataset was determined as the intercept of the lowest slope curve (Extended Data Fig. 2a, left). Fitting was done in MATLAB (version R2018b).

EMSAs

Reactions were setup at 4 °C at the indicated final protein concentrations. The Klf4 samples contained 25 mM tris (pH 7.4), 125 mM KCl, 6% glycerol, 1 mM DTT, 0.1 mg ml–1 BSA, 7.5 nM Cy5-dsDNA and 37.5 ng poly-d(IC) (poly-(5'-phosphono-3'-deoxy-cytidine compound with 5'-phosphono-2'-deoxy-inosine)). The oligonucleotides used in this study are listed in Supplementary Table 2. The absence of condensed material under these conditions was confirmed by fluorescence microscopy (data not shown). The samples were incubated for 20 min at 4 °C before they were loaded onto a pre-run 4–20% Novex TBE gel (Invitrogen). Electrophoresis was performed at 250 V for 45 min in TBE buffer (89 mM tris, 89 mM boric acid and 2 mM EDTA). The gels were then imaged using a Typhoon FLA 9500 fluorescence imager (GE Healthcare). Band intensities were determined using the Fiji software, and data plotting and fitting was done in MATLAB. The following expression was used to fit the data57:

$$f = b + \left[ {\frac{{m - b}}{{1 + \left( {\frac{{K_\mathrm{d}}}{{P_\mathrm{t}}}} \right)^n}}} \right],$$

where Pt is the total protein concentration and Kd is the dissociation constant; m and b are normalization factors for the upper and lower asymptotes of the DNA titration curve, respectively; and n is the Hill coefficient.

Optical tweezers with confocal microscopy

Experiments involving optical tweezers were performed on a Lumicks C-trap instrument with integrated confocal microscopy and microfluidics. Bacteriophage λ-DNA was biotinylated on both ends as described elsewhere29. Attachment of the λ-DNA-dCas9 complex to 4.42 μm Spherotech streptavidin-coated polystyrene beads was done using the laminar flow. For all the experiments, the trap position was kept constant to render an average force of 8.22 ± 2.65 pN (Extended Data Fig. 3f).

The protein stock was centrifuged for 10 min at 20,000 × g. The supernatant concentration was measured and diluted in Klf4 assay buffer following a dilution series: the solution containing the maximum concentration of a given series was flushed into the flow cell. After recording for 10–15 experiments, the remaining volume in the syringe was removed, and the protein was diluted and reloaded into the syringe. The flow chamber was flushed before each experiment and sealed during the course of it.

For confocal imaging, a 488 nm laser was used for excitation, with emission detected in the channel with a blue filter (525/25 nm). After a λ-DNA molecule was tethered between the beads, an image of the dCas9-EGFP (enhanced green fluorescence protein) probe was acquired in the buffer channel with 10% excitation intensity (this imaging setting is referred to as high excitation). We then started continuous acquisition with 5% excitation intensity as the beads–DNA system was transferred to the channel of the microfluidics chip containing the protein of interest. The interaction process was monitored for 200 s at a frame rate of 1 s–1 with a low pixel integration time of 0.08 ms (Fig. 1g and Extended Data Fig. 3). After 200 s, an image was acquired using the high-excitation imaging conditions. Analysis of the intensity distributions and quantification of number of molecules (Fig. 2a,b and Extended Data Fig. 4) was done for the high-excitation settings. To determine the number of molecules per cluster over time (Fig. 1h), time series were acquired using a pixel integration time of 1 ms (referred to as low excitation; Extended Data Fig. 3), conditions in which the dCas9-EGFP probe was detectable for the first few frames, before the beads–DNA system reached the protein solution.

For the in situ condensation assay (Fig. 4b and Extended Data Fig. 10), after the binding process was recorded, the beads–DNA system was transferred back to the buffer channel containing either the assay buffer or the assay buffer with 2% (v/v) 3C protease (in-house; 1 U μl–1). This process was recorded for more than 500 s under the low-excitation settings at a frame rate of 0.2 s–1.

Fluorescence recovery after photobleaching (FRAP) experiments were performed as follows: after a binding experiment (low-excitation settings), the chamber was gently flushed. A pre-bleach time series was acquired for 20 s. A smaller region of interest (ROI) for the FRAP experiment was imaged with a high-excitation laser intensity (90% excitation). To capture recovery, a 200-s-long time series was then acquired at a frame rate of 1 s–1 (Extended Data Fig. 5).

Analysis of tweezers data

Intensity emission per EGFP

For each experiment, confocal images of the dCas9-sgRNA-λ-DNA complex were acquired under the high-excitation settings. For the time series (low-excitation settings), the dCas9 probe was detectable for the first few frames, before the beads–DNA complex reached the protein solution. To confirm the position of the dCas9 probe, intensity profiles along the DNA were aligned using the beads centre as the reference and flipped when required. To confirm the position of the target sequence, the target locations were superimposed with the average profile (Extended Data Fig. 3). This alignment criteria was then used to analyse all the Klf4 and MBP-Klf4 intensity profiles shown throughout this work. The sequence information was converted to spatial units by taking into account the extension per base pair (xbp = 0.32 nm bp–1) at the average experimental force. The integration of the total intensity in an ROI of 21 pixels × 21 pixels around the detected probe rendered the total number of counts under the given imaging conditions. The probability distribution of integrated counts for several experiments exhibits a multi-mode Gaussian distribution, consistent with having four sites for dCas9 binding in λ-DNA (Supplementary Table 3). A fit to a Gaussian mixture model rendered the mean and standard deviation of each mode. The emission intensity per EGFP was then calculated as follows:

$$I_{\mathrm{GFP}} = \frac{{\mathop {\sum }\nolimits_{j = 1}^{N - 1} \left( {I_{j + 1} - I_J} \right)}}{{N - 1}},$$

where Ij is the mean of mode j and N is the number of modes (Extended Data Fig. 3 and Supplementary Table 5).

Intensity distributions

The pixel values used in the calculation of intensity distributions were obtained as follows: after background subtraction (to remove the contributions from the protein in solution), the maximum projection intensity profile along the DNA was determined in a region of 20 pixels around the DNA axis (Extended Data Fig. 7). We next filtered the profiles with a spatial mask. In brief, using the ‘findpeaks’ function in MATLAB, we detected the peaks above a threshold corresponding to the background value of the background subtracted image (Extended Data Fig. 7). Data points in a five-pixel window, along the horizontal direction and centred at the position of each peak, were selected. The window was displaced from left to right and accepted if there was no overlap. From the histograms of the obtained pixel intensity values, we computed the probability density of the logarithm of pixel intensities58. The probability density versus logarithm of intensities was fitted to either one- or two-component Gaussian mixture model in the linear scale. To compare the intensity distributions (Fig. 2b) with the intensity distributions over time (Fig. 2c, Extended Data Fig. 4g and Supplementary Video 3), time-series images were multiplied by a factor (13.4 ± 2.9) to compensate for the intensity-value differences between the low- and high-excitation imaging conditions. From the last frame of each time series and the corresponding high-excitation image acquired immediately thereafter, we computed the mean intensity in an ROI of 30 pixels × 100 pixels in the centre of the confocal image. Time-series intensities were multiplied by the ratio of these means.

Classification of pixels into adsorbed or condensed

Pixels were classified into adsorbed or condensed based on their intensity. An intensity above the background and below the layer threshold resulted in a classification as adsorbed, whereas an intensity above the layer threshold was classified as condensed. To determine the background threshold, we extracted the background values (after background subtraction) along a line away from the DNA and pulled together all the experiments corresponding to Klf4-GFP. The probability density of the logarithm of pixel intensities was fitted to a normal probability density function. The background threshold was defined as the mean plus three times the standard deviation of this distribution (Extended Data Fig. 7). To determine the layer threshold, we computed the probability density of the logarithm of pixel intensities along the masked maximum projection profiles pulling together 60 Klf4-GFP experiments recorded at low concentrations ([Klf4]: 3–80 nM). We extracted the mean value of this distribution by fitting the data to a normal probability density function. We next computed the same quantity for 37 experiments recorded at higher concentrations ([Klf4]: 210–281 nM). Here the probability density shows bimodality, and we fitted this distribution to a two-component Gaussian mixture model, constraining the mean of the low-intensity mode to the value obtained at low concentrations. Fitting was done in MATLAB using the ‘nonlinear least squares’ method and weights of 1/(probability density) + w, where w = 10 sets the strength of the weights.

Condensed fraction

The condensed fraction (Fig. 2d) was determined for each experiment as the number of pixels falling into the condensed category divided by the length of the considered ROI (161 pixels). For this calculation, we considered the pixels obtained by the masking procedure (as discussed above; Extended Data Fig. 7). Binned medians with error bars (95% confidence interval) were obtained by bootstrapping (‘bootci’ function in MATLAB) using 10,000 bootstrap samples. Binned medians contain 11–36 individual experiments.

Analysis of Klf4 sequence-specific binding to λ-DNA

To assess the sequence specificity of Klf4 localization on λ-DNA, we used a Klf4 sequence motif reported elsewhere26. The reported sequence logo was converted to a position weight matrix using the Logo2PWM tool59. A position weight matrix contains one row for each of the four DNA bases and a column for each position of the motif (Fig. 2g, Extended Data Fig. 8 and Supplementary Table 1). The values of the matrix represent the relative frequency to find a certain base at a given position within the motif. Since these matrices are generally derived from the genome-wide in vivo data of protein binding to DNA, they inform us of how likely it is to find a protein bound to a given sequence of bases.

We denote the matrix by \(M_{nb}^\mathrm{F}\), where \(M_{nb}^\mathrm{F}\) is the relative frequency to find base b at nucleotide position n on the forward strand when the protein is bound to DNA. \(M^{\rm{R}}_{nb}\) is defined analogously for the reverse strand. Here b can be any of the four possible nucleotides, namely, A, T, G or C. The position weight matrix describing the frequency of bases on the complementary strand is denoted as \(M_{nb}^\mathrm{C}\). Although position weight matrices give a fairly accurate estimate of the consensus sequences, any sequence further away from the consensus (represented by small values in the position weight matrix) is not well represented60. To account for this limitation, any element in the position weight matrix where Mnb ≤ e is replaced by a minimal value e. To discuss Klf4 binding to either strand of DNA, we, therefore, use the following average position weight matrix:

$$\bar M_{nb} = \frac{{M_{nb}^\mathrm{F} + M_{nb}^\mathrm{R}}}{2}.$$
(1)

For a sequence that differs by a single base n from the consensus sequence, the probability ratio λn,b of Klf4 binding to the two sequences is related to the position weight matrix by

$$\bar M_{nb} = \frac{{\lambda _{n,b}}}{{\mathop {\sum }\nolimits_b \lambda _{n,b}}}.$$
(2)

Using this equation, we can obtain the probability ratio of Klf4 binding to a given sequence \(\bar B\) (where \(\bar B = \left( {b_1,...,b_L} \right)\); bn is the base at position n along the sequence) relative to the consensus motif, which reads

$$P\left( {\bar B} \right) = \mathop {\prod }\limits_{n = 1}^L \lambda _{n,b},$$
(3)

where L is the number of bases along the sequence. Here P lies between 0 and 1; for consensus sequence \(\bar B^ \ast\), \(P(\bar B^ \ast ) = 1\). We use equation (3) to infer the landscape of Klf4-λ-DNA-binding probability, as shown in Fig. 2h. To infer the binding energy landscape along λ-DNA from the position weight matrix, we use the following equation:

$${\it{\epsilon }}_{b,n} - {\it{\epsilon }}_{b^ \ast ,n} = kTln\left( {\frac{{\bar M_{nb}}}{{\bar M_{nb^ \ast }}}} \right),$$
(4)

where εb,n is the binding energy contribution for nucleotide position n for the corresponding case bn. Further, εb*,n is the binding energy contribution corresponding to consensus base \(\left( {b_n^ \ast } \right)\) at nucleotide position n. Hence, the binding energy difference for a given sequence \(\bar B\) with respect to binding to the consensus sequence is given by

$$\delta E\left( {\bar B} \right) = \mathop {\sum }\limits_n \left( {{\it{\epsilon }}_{b,n} - {\it{\epsilon }}_{b^ \ast ,n}} \right).$$
(5)

Equations (4) and (5) allow us to infer the binding energy landscape for Klf4 binding to λ-DNA (Fig. 2h and Extended Data Figs. 8 and 9a). For details of how we obtain these equations, see Supplementary Note.

Correlation as a function of concentration

Independent experiments were sorted based on the experimental protein concentration. Intensity profiles were normalized to their maximum intensity and then averaged in groups selected from a moving window along the concentration axis (Fig. 4e) or in specific concentration bins (Extended Data Fig. 10g,h). For each average intensity profile, the correlation with the coarse-grained binding energy profile was quantified as the Pearson’s correlation coefficient. The coarse-grained binding energy profile was first interpolated into the dimensions of the experimental profile.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.