Sequence-dependent surface condensation of a pioneer transcription factor on DNA

Biomolecular condensates are dense assemblies of proteins that form distinct biochemical compartments without being surrounded by a membrane. Some, such as P granules and stress granules, behave as droplets and contain many millions of molecules. Others, such as transcriptional condensates that form on the surface of DNA, are small and contain thousands of molecules. The physics behind the formation of small condensates on DNA surfaces is still under discussion. Here we investigate the nature of transcription factor condensates using the pioneer transcription factor Krüppel-like factor 4 (Klf4). We show that Klf4 can phase separate on its own at high concentrations, but at low concentrations, Klf4 only forms condensates on DNA. Using optical tweezers, we demonstrate that these Klf4 condensates form on DNA as a type of surface condensation. This surface condensation involves a switch-like transition from a thin adsorbed layer to a thick condensed layer, which shows hallmarks of a prewetting transition. The localization of condensates on DNA correlates with sequence, suggesting that the condensate formation of Klf4 on DNA is a sequence-dependent form of surface condensation. Prewetting together with sequence specificity can explain the size and position control of surface condensates. We speculate that a prewetting transition of pioneer transcription factors on DNA underlies the formation and positioning of transcriptional condensates and provides robustness to transcriptional regulation. A DNA-binding protein condenses on DNA via a switch-like transition. Surface condensation occurs at preferential DNA locations suggesting collective sequence readout and enabling sequence-specificity robustness with respect to protein concentration.

liquid-like condensates that are enabled by interaction with the DNA surface. This sets their typical size and allows them to form below the saturation concentration for phase separation. By combining experiments with theory, we show that these condensates form via a switch-like transition similar to prewetting, a precursor to wetting that occurs below the saturation concentration for bulk-phase separation [17][18][19] . This transition amplifies the sequence specificity of Klf4 binding to DNA. Polymer-surface-mediated condensation reconciles several observations that were previously thought to be at odds with the idea of phase separation as an organizing principle in the nucleus.
The human pioneer transcription factor Klf4, one of the Yamanaka factors, is a driver of differentiation, cell growth and proliferation 21,22 . Klf4 has a domain organization typical for transcription factors: an activation domain predicted to be disordered and a structured DNA-binding region 23 . Human Klf4 with a C-terminal green fluorescent protein (GFP) tag purified from insect cells binds to DNA oligonucleotides in a sequence-specific manner [24][25][26][27] (Fig. 1a,b and Extended Data Fig. 1). In the absence of DNA, Klf4 forms liquid-like condensates at physiological salt and pH above a concentration of ~1.0 µM, which is above the estimated nuclear concentration of Klf4 28 (Fig. 1c,d, Supplementary Methods and Extended Data Figs. 1b and 2). Untagged Klf4 behaves in a similar way (Extended Data Fig. 2c-e). The addition of λ-DNA triggered the formation of foci below the saturation concentration (C SAT ) (Fig. 1e), confirming previous observations that DNA can trigger the foci formation of transcription factors 11 .
To further examine the behaviour of Klf4 on DNA, we used dual-trap optical tweezers with confocal microscopy to hold a linearized λ-DNA molecule stretched between two polystyrene beads ( Fig. 1f) 29 . We observed many Klf4 foci on the DNA molecule at a Klf4 concentration of 115 nM, which varied with regard to the amount of Klf4 they contained (Fig. 1g, Extended Data Figs. 3 and 4a-d and Supplementary Video 1). Notably, even at concentrations closer to C SAT (250 nM), they grew with time until reaching a finite size with an average of approximately 800 molecules per cluster (Fig. 1h). Furthermore, foci can fuse, they can recover after photobleaching and their position can fluctuate on DNA (Fig. 1i, Extended Data Fig. 5 and Supplementary Video 2). Importantly, because condensates form on DNA well below C SAT (Extended Data Figs. 2 and 3k), DNA does not serve as a classic nucleator for the formation of bulk-phase droplets of the kind depicted in Fig. 1c. This is because after nucleation, bulk-phase droplets can only grow if the solution remains above the saturation concentration 11,30,31 Letters NATUrE PHySicS (Extended Data Figs. 3i and 6). This suggests that at concentrations below C SAT, a mechanism is at play that is qualitatively different from the standard picture of phase separation in a bulk solution.
To further study the physical nature of Klf4 foci on DNA, we investigated their dependence on protein concentration. Figure 2a shows the representative fluorescent images and corresponding traces of Klf4 intensities at different concentrations, recorded 200 s after the Klf4 solution was introduced to the observation chamber. The number and intensity of Klf4 foci increased with the concentration (Fig. 2a). A probability density histogram of pixel intensities at concentrations above 210 nM reveals a bimodal distribution, indicative of two distinct populations of Klf4 foci ( Fig. 2b and Extended Data Fig. 7). The peak of the histogram at a low intensity characterizes Klf4 regions with, on average, less than one molecule per binding site (corresponding to a 10 bp footprint of the Klf4 zinc fingers 32 ; Extended Data Fig. 4a,b). We refer to this mode of association as the adsorbed state. The peak at high intensity encompasses Klf4 regions that contain foci with several hundreds and up to a few thousand Klf4 molecules (corresponding to ~2-10 molecular layers of Klf4; Extended Data Fig. 4a,d). We refer to this mode of association as the condensed state. Next, we analysed the intensity histogram as a function of time (Fig. 2c). At low concentrations (below 80 nM), the histogram rapidly forms a peak at intensities that correspond to the adsorbed state that persists over time (Fig. 2c, top). In contrast, at higher concentrations (above 210 nM), the histogram reveals a bifurcation: it first rapidly forms a peak at intensities that correspond to the adsorbed state that subsequently decreases in amplitude, whereas a second peak emerges and quickly moves to higher intensities that corresponds to the condensed state (Fig. 2c, bottom, Extended Data Fig. 4g and Supplementary Video 3). This bifurcation reveals a switch-like transition from an adsorbed state to a condensed state via a bimodal intensity distribution. The occurrence of these two states and the bimodal intensity distribution depends on the concentration. It is noteworthy that the fraction of DNA that is occupied by the condensed state 200 s after exposure of Klf4 to DNA suddenly increases at a concentration C PW of 86 ± 5 nM (Fig. 2d). We conclude that in our experimental system, Klf4 condensates on DNA are formed in a two-step manner: at low concentrations, the protein merely adsorbs to DNA. At sufficiently high Klf4 concentrations, adsorbed proteins switch to a thick condensate.
How does the DNA surface facilitate the formation of Klf4 condensates? Our data indicate that the formation of Klf4 condensates on DNA is a wetting phenomenon, best described as a process of surface condensation known as prewetting [17][18][19]33 . Such prewetting transitions occur at a concentration denoted as C PW , which is below the saturation concentration C SAT in the bulk (Fig. 2e,f) and can be understood as follows: Klf4 has an affinity for the DNA surface; hence, the concentration of Klf4 tends to increase in the vicinity of DNA. Therefore, local condensation is facilitated by the surface. However, it cannot extend far away from it, because condensation is not possible in the bulk. Put differently, protein-protein interactions are expected to mediate condensate formation, but the condensate is only stable because of DNA-protein interactions that confine it to the vicinity of the surface.
We next set out to investigate the role of DNA sequence in Klf4 surface condensation. The average intensity profile along the DNA

NATUrE PHySicS
across all Klf4 concentrations tested reveals enrichment at preferred locations (Fig. 2h). Several Klf4-binding motifs have been reported from in vivo and in vitro studies [24][25][26][27] . We chose five of these for further investigation, and used the position weight matrix of these recognition motifs to infer the binding energy landscapes 34 for Klf4 on λ-DNA (Fig. 2g, Extended Data Fig. 8, Supplementary  Table 1 and Methods). We find that the measured profile positively correlates with the energy landscapes inferred from four of the five motifs 24-27 ( Fig. 2h and Extended Data Fig. 8a-c), with Pearson correlation coefficients of approximately 0.74. We refer to these as class A motifs. The Klf4 intensity profile did not positively correlate with the energy landscape inferred from the fifth motif, which we refer to as a class B motif 24 (Extended Data Fig. 8d). This shows that the position weight matrices of class A, but not class B, motifs provide an accurate parameterization of the binding energy landscape of Klf4 on λ-DNA. Interestingly, electrophoretic mobility shift assay (EMSA) analysis reveals that an oligonucleotide representing the class B consensus motif binds with about 1.3 kT stronger affinity to Klf4 than one representing a class A consensus motif (Extended Data Fig. 1c-f and Supplementary Table 2). Even though Klf4 has higher affinity for class B motifs, these are less well represented on λ-DNA than class A motifs, explaining the observed binding pattern ( Fig. 1b and Extended Data Fig. 8e,f). These results show that in our experiments, the localization of Klf4 condensates on DNA is guided by the underlying DNA sequence. Future work, both in vivo and in vitro, will be required to provide a full parameterization of the sequence-dependent energy landscape of the interaction of Klf4 with DNA. The data so far indicate that Klf4 condensation on DNA corresponds to a prewetting phenomenon on a heterogeneous substrate, where the heterogeneity is provided by the DNA sequence. Position weight matrices reflect the binding preference of individual molecules to short DNA recognition motifs. However, the condensation phenomenon seen in our experiments is the result of the collective behaviour of many molecules. How can we reconcile single-molecule binding at the length scale of a few base pairs with the sequence dependence of condensation at larger length scales as seen in our experiments (Fig. 2g,h)? We developed a simplified model that considers transitions between a thin adsorbed state and thick condensed state, modulated by a binding energy landscape specified by the DNA sequence ( Fig. 3a and Supplementary Note). We represent the stretched DNA polymer by a one-dimensional set  17,54 . In the coexistence regime (dark green, liquid-liquid phase separation (LLPS)) and close to the surface, the dense phase transits from partial to complete wetting when the system crosses a characteristic temperature (yellow dashed line). This first-order transition extends into the single-phase region through the prewetting line (solid yellow line). f, Crossing the prewetting line (at C PW ) leads to condensation from a thin adsorbed layer (left and middle). Above the saturation concentration C SAT , liquid droplets spontaneously appear in the bulk (right). g, Consensus recognition motif of Klf4 in vivo 26 and probability of binding to a given sequence along λ-DNA relative to the consensus motif (Methods and Extended Data Fig. 8). h, Blue and colour map, binding energy landscape h i (kT is the unit of energy, where k is the Boltzmann constant, and T is the temperature), coarse grained over 1 kb (Methods and Extended Data Fig. 9a). red, average Klf4 intensity along λ-DNA ([Klf4-GFP]: 8-281 nM, N = 79). Shaded area, standard error of the mean at 95% confidence.

NATUrE PHySicS
of N discrete sites at which Klf4 association can be in either a thin adsorbed state (s i = −1) or a thick condensed state (s i = +1), where i = 1,...,N (Fig. 3a). This two-state model is formally equivalent to a heterogeneous Ising model 35,36 . Each site i corresponds to a putative binding site of Klf4 corresponding to ten base pairs 32 . At low bulk concentrations, the thin adsorbed state is thermodynamically favoured, whereas at sufficiently high concentrations, the adsorbed molecules collectively switch to form a thick condensed state. This balance is captured by energy h, which is proportional to the bulk chemical potential of Klf4. We capture the DNA sequence by introducing a site-dependent energy bias (h i for site i) determined using the position weight matrix corresponding to a binding motif of class A ( Fig. 3b and Extended Data Fig. 9a, bottom). The free energy of condensation in the model is given by where the first sum is over pairs 〈ij〉 of adjacent sites i and j on the DNA (every pair is counted once). Further, J is an energetic cost related to interfacial tensions. Numerical solutions for the time dependence of condensation as well as for the condensation patterns in the steady state show that the model captures key features seen in experiments: (1) the formation of condensates that coexist with regions that are in the adsorbed state (Figs. 2d and 3c); (2) the dependence of the condensed fraction on protein bulk concentration (Fig. 2d); (3) the sequence dependence of the average spatial condensation pattern in the steady state ( Fig. 3c and Extended Data Fig. 9c,f,g); and (4) the time dependence of condensate formation as revealed in kymographs ( Fig. 3c and Extended Data Fig. 9f,g). Notably, our model connects the molecular scale with the emerging condensation patterns that involve many molecules, and can be used to predict the condensation pattern for a given sequence of DNA (Fig. 3b,c).
To further analyse the interplay of surface condensation and sequence, we fused a maltose-binding protein (MBP) tag to the disordered N-terminus of Klf4. For this variant, which has an unchanged DNA-binding domain, no bulk-phase separation was observed (data not shown). Notably, MBP-Klf4 forms a thin adsorption layer on λ-DNA following a Hill-Langmuir adsorption isotherm 37 , which saturates at a density of less than one molecule per binding site (Fig. 4b, top, Extended Data Fig. 4e,f and Supplementary Note). We next removed the MBP tag from Klf4 after DNA binding (Fig. 4a). Strikingly, the adsorbed layer rapidly rearranged into several condensed foci that localized to positions predicted from the sequence (Fig. 4b,c, Extended Data Fig. 10a-f and Supplementary Video 4). These results show that the properties that drive bulk-phase separation also enable the formation of Klf4 condensates on DNA in a sequence-dependent manner.
We analysed the correlation between the protein localization pattern and the underlying DNA sequence as a function of bulk protein concentration ( Fig. 4e and Extended Data Fig. 10g,h). For MBP-Klf4, the correlation coefficient initially increases until it reaches a maximum at ~70 nM (ρ = 0.66), followed by a sharp loss of correlation at higher MBP-Klf4 concentrations. This illustrates that the sequence sensitivity of protein localization patterns depends on the protein concentration, and is lost at concentrations beyond the typical binding constant, as expected from the Hill-Langmuir binding kinetics 38 . In the case of Klf4 without the MBP tag, the correlation initially shows a similar increase as a function of concentration (ρ = 0.76 at 83 nM), but here the correlation remains high at higher protein concentrations. This reveals that above a certain concentration, the pattern of sequence-dependent localization of condensates is insensitive to bulk protein concentration. We conclude that in contrast to single-molecule binding, surface condensation enables a large dynamic range of bulk concentrations for which the localization pattern remains sequence specific.
Prewetting is an attractive concept for transcription factors because it provides a mechanism for the sequence-dependent formation of small condensates on DNA that are limited in size by interactions with the DNA surface. The transition from an adsorbed layer to a condensed layer serves as a collective amplifier of sequence information that effectively expands the dynamic range at which sequence specificity is achieved. However, how can surface condensation maintain sequence specificity at higher concentrations? Two mechanisms are at play here. First, in the adsorbed state, sequence information is independently used by individual molecules, whereas in the condensed state, sequence information is collectively integrated by the molecules. This is presumably because surface condensation is triggered by a local increase in concentration, which is promoted by the local clustering of binding sites 11 as previously suggested for the formation of transcriptional condensates in the bulk. This might explain how pioneer transcription factors can distinguish recognition sites within enhancers from isolated sites in other regions of the genome 39,40 . Second, when molecules adsorb independently, binding becomes saturated at higher bulk concentrations because each site can only be occupied once 41 . However, a condensate can accommodate a variable number of molecules, even as the concentration increases. Indeed, molecules associated with DNA will be incorporated into existing condensates either directly or via one-dimensional diffusion 34,39,42,43 rather than occupying unfavourable sites. Consequently, a further increase in protein concentration can result in the growth of condensates without altering their localization pattern, rendering the process insensitive to molecular noise 44,45 .
Since polymer-surface-mediated condensation leads to the formation of liquid-like compartments, features such as the fusion of transcriptional condensates and recruitment of downstream factors that have been observed previously 7-9,28 can be accounted for here. The limited size of transcription factor condensates provides a possible explanation for the small size of transcriptional foci observed in vivo 8, 46,47 . We suggest that polymer-surface-mediated condensation provides a general framework to explain the formation of other nuclear condensates such as heterochromatin or paraspeckles on chromatin or RNA surfaces [48][49][50][51][52][53] .

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41567-021-01462-2.
Phase separation assays. Klf4-GFP was kept on ice and diluted with a pre-cooled solution to prevent premature phase separation at higher temperatures (Extended Data Fig. 2f). The protein was pre-diluted with cold Klf4 buffer to four times the final concentration and then mixed in a ratio of 1:4 with cold dilution buffer (25 mM tris (pH 7.4), 1 mM DTT and 0.1 mg ml -1 BSA) to obtain the Klf4 assay buffer (25 mM tris (pH 7.4), 125 mM KCl, 1 mM DTT and 0.1 mg ml -1 BSA) in a total volume of 20 µl. For assays containing DNA, the dilution buffer would also contain appropriate amounts of DNA. The samples were mixed by pipetting, and 18 µl was transferred into 384-well medium-binding microplates (Greiner Bio-One). The samples were incubated at room temperature for 20 min before imaging. Samples that contained DNA were additionally spun at 3,200 × g for 2 min. The images were taken using an Andor Eclipse Ti inverted spinning-disc microscope with an Andor iXon 897 electron-multiplying charge-coupled device camera and a UPLSAPO ×40/0.95 numerical aperture (NA) air objective or ×60/1.20 NA water-immersion objective (Nikon). Data from at least three independent experiments were averaged. Data analysis was performed as described elsewhere 56 . For untagged Klf4, differential interference contrast microscopy was done using a Zeiss LSM 880 inverted single-photon point-scanning confocal system utilizing a transmitted-light detector and ×40/1.2 NA C-Apochromat water-immersion objective (Zeiss), which is suitable for differential interference contrast.
Determining the concentration of the dilute phase. The Klf4 samples were set up as those for the phase separation assays, but instead of transfer to a microplate, the samples were incubated for 20 min at room temperature in 1.5 ml Eppendorf tubes. To obtain a standard curve, samples with a final KCl concentration of 500 mM were also prepared and treated in parallel. After incubation, the samples were spun in a temperature-controlled centrifuge at 20,000 × g for 15 min at 21 °C. Here 5 µl supernatant was added to 15 µl Klf4 buffer or, in the case of the control samples, a corresponding buffer, to reach the same final KCl concentration. These samples were transferred to 384-well non-binding microplates (Greiner Bio-One) and imaged with a wide-field fluorescence microscope (DeltaVision Elite, Applied Precision) using ×10/0.4 NA dry objective and a Photometrics electron-multiplying charge-coupled device camera. The median fluorescence values for each field of view were obtained using the Fiji software (https://fiji.sc/). The control samples were used to generate a standard curve that correlates fluorescence intensity with protein concentration. This curve was then used to calculate the original protein concentration in the sample supernatant. To determine C SAT , we fitted a two-component piecewise linear function to the curve of dilute phase concentration versus the total concentration (Extended Data Fig. 2a, left). The number of points considered on each side was varied and the two optimal lines were selected as those having the maximum difference between their slopes. The C SAT value for a given dataset was determined as the intercept of the lowest slope curve (Extended Data Fig. 2a, left). Fitting was done in MATLAB (version R2018b).

EMSAs.
Reactions were setup at 4 °C at the indicated final protein concentrations. The Klf4 samples contained 25 mM tris (pH 7.4), 125 mM KCl, 6% glycerol, 1 mM DTT, 0.1 mg ml -1 BSA, 7.5 nM Cy5-dsDNA and 37.5 ng poly-d(IC) (poly-(5'-phosphono-3'-deoxy-cytidine compound with 5'-phosphono-2'-deoxyinosine)). The oligonucleotides used in this study are listed in Supplementary  Table 2. The absence of condensed material under these conditions was confirmed by fluorescence microscopy (data not shown). The samples were incubated for 20 min at 4 °C before they were loaded onto a pre-run 4-20% Novex TBE gel (Invitrogen). Electrophoresis was performed at 250 V for 45 min in TBE buffer (89 mM tris, 89 mM boric acid and 2 mM EDTA). The gels were then imaged using a Typhoon FLA 9500 fluorescence imager (GE Healthcare). Band intensities were determined using the Fiji software, and data plotting and fitting was done in MATLAB. The following expression was used to fit the data 57 : where P t is the total protein concentration and K d is the dissociation constant; m and b are normalization factors for the upper and lower asymptotes of the DNA titration curve, respectively; and n is the Hill coefficient.
Optical tweezers with confocal microscopy. Experiments involving optical tweezers were performed on a Lumicks C-trap instrument with integrated confocal microscopy and microfluidics. Bacteriophage λ-DNA was biotinylated on both ends as described elsewhere 29 . Attachment of the λ-DNA-dCas9 complex to 4.42 μm Spherotech streptavidin-coated polystyrene beads was done using the laminar flow. For all the experiments, the trap position was kept constant to render an average force of 8.22 ± 2.65 pN (Extended Data Fig. 3f). The protein stock was centrifuged for 10 min at 20,000 × g. The supernatant concentration was measured and diluted in Klf4 assay buffer following a dilution series: the solution containing the maximum concentration of a given series was flushed into the flow cell. After recording for 10-15 experiments, the remaining volume in the syringe was removed, and the protein was diluted and reloaded into the syringe. The flow chamber was flushed before each experiment and sealed during the course of it.
For confocal imaging, a 488 nm laser was used for excitation, with emission detected in the channel with a blue filter (525/25 nm). After a λ-DNA molecule was tethered between the beads, an image of the dCas9-EGFP (enhanced green fluorescence protein) probe was acquired in the buffer channel with 10% excitation intensity (this imaging setting is referred to as high excitation). We then started continuous acquisition with 5% excitation intensity as the beads-DNA system was transferred to the channel of the microfluidics chip containing the protein of interest. The interaction process was monitored for 200 s at a frame rate of ∼1 s -1 with a low pixel integration time of 0.08 ms (Fig. 1g and Extended Data Fig. 3). After 200 s, an image was acquired using the high-excitation imaging conditions. Analysis of the intensity distributions and quantification of number of molecules (Fig. 2a,b and Extended Data Fig. 4) was done for the high-excitation settings. To determine the number of molecules per cluster over time (Fig. 1h), time series were acquired using a pixel integration time of 1 ms (referred to as low excitation; Extended Data Fig. 3), conditions in which the dCas9-EGFP probe was detectable for the first few frames, before the beads-DNA system reached the protein solution.
For the in situ condensation assay ( Fig. 4b and Extended Data Fig. 10), after the binding process was recorded, the beads-DNA system was transferred back to the buffer channel containing either the assay buffer or the assay buffer with 2% (v/v) 3C protease (in-house; 1 U μl -1 ). This process was recorded for more than 500 s under the low-excitation settings at a frame rate of 0.2 s -1 .
Fluorescence recovery after photobleaching (FRAP) experiments were performed as follows: after a binding experiment (low-excitation settings), the chamber was gently flushed. A pre-bleach time series was acquired for 20 s. A smaller region of interest (ROI) for the FRAP experiment was imaged with a high-excitation laser intensity (90% excitation). To capture recovery, a 200-s-long time series was then acquired at a frame rate of ∼1 s -1 (Extended Data Fig. 5).

Analysis of tweezers data. Intensity emission per EGFP.
For each experiment, confocal images of the dCas9-sgRNA-λ-DNA complex were acquired under the high-excitation settings. For the time series (low-excitation settings), the dCas9 probe was detectable for the first few frames, before the beads-DNA complex reached the protein solution. To confirm the position of the dCas9 probe, intensity profiles along the DNA were aligned using the beads centre as the reference and flipped when required. To confirm the position of the target sequence, the target locations were superimposed with the average profile (Extended Data Fig. 3). This alignment criteria was then used to analyse all the Klf4 and MBP-Klf4 intensity profiles shown throughout this work. The sequence information was converted to spatial units by taking into account the extension per base pair (xbp = 0.32 nm bp -1 ) at the average experimental force. The integration of the total intensity in an ROI of 21 pixels × 21 pixels around the detected probe rendered the total number of counts under the given imaging conditions. The probability distribution of integrated counts for several experiments exhibits a multi-mode Gaussian distribution, consistent with having four sites for dCas9 binding in λ-DNA (Supplementary Table 3). A fit to a Gaussian mixture model rendered the mean and standard deviation of each mode. The emission intensity per EGFP was then calculated as follows: where I j is the mean of mode j and N is the number of modes (Extended Data Fig. 3 and Supplementary Table 5).
Intensity distributions. The pixel values used in the calculation of intensity distributions were obtained as follows: after background subtraction (to remove the contributions from the protein in solution), the maximum projection intensity profile along the DNA was determined in a region of 20 pixels around the DNA axis (Extended Data Fig. 7). We next filtered the profiles with a spatial mask. In brief, using the 'findpeaks' function in MATLAB, we detected the peaks above a threshold corresponding to the background value of the background subtracted image (Extended Data Fig. 7). Data points in a five-pixel window, along the horizontal direction and centred at the position of each peak, were selected. The window was displaced from left to right and accepted if there was no overlap. From the histograms of the obtained pixel intensity values, we computed the probability density of the logarithm of pixel intensities 58 . The probability density versus logarithm of intensities was fitted to either one-or two-component Gaussian mixture model in the linear scale. To compare the intensity distributions ( Fig. 2b) with the intensity distributions over time (Fig. 2c, Extended Data Fig. 4g and Supplementary Video 3), time-series images were multiplied by a factor (13.4 ± 2.9) to compensate for the intensity-value differences between the low-and high-excitation imaging conditions. From the last frame of each time series and the corresponding high-excitation image acquired immediately thereafter, we computed the mean intensity in an ROI of 30 pixels × 100 pixels in the centre of the confocal image. Time-series intensities were multiplied by the ratio of these means.
Classification of pixels into adsorbed or condensed. Pixels were classified into adsorbed or condensed based on their intensity. An intensity above the background and below the layer threshold resulted in a classification as adsorbed, whereas an intensity above the layer threshold was classified as condensed. To determine the background threshold, we extracted the background values (after background subtraction) along a line away from the DNA and pulled together all the experiments corresponding to Klf4-GFP. The probability density of the logarithm of pixel intensities was fitted to a normal probability density function.
The background threshold was defined as the mean plus three times the standard deviation of this distribution (Extended Data Fig. 7). To determine the layer threshold, we computed the probability density of the logarithm of pixel intensities along the masked maximum projection profiles pulling together 60 Klf4-GFP experiments recorded at low concentrations ([Klf4]: 3-80 nM). We extracted the mean value of this distribution by fitting the data to a normal probability density function. We next computed the same quantity for 37 experiments recorded at higher concentrations ([Klf4]: 210-281 nM). Here the probability density shows bimodality, and we fitted this distribution to a two-component Gaussian mixture model, constraining the mean of the low-intensity mode to the value obtained at low concentrations. Fitting was done in MATLAB using the 'nonlinear least squares' method and weights of 1/(probability density) + w, where w = 10 sets the strength of the weights.
Condensed fraction. The condensed fraction (Fig. 2d) was determined for each experiment as the number of pixels falling into the condensed category divided by the length of the considered ROI (161 pixels). For this calculation, we considered the pixels obtained by the masking procedure (as discussed above; Extended Data Fig. 7). Binned medians with error bars (95% confidence interval) were obtained by bootstrapping ('bootci' function in MATLAB) using 10,000 bootstrap samples.

Binned medians contain 11-36 individual experiments.
Analysis of Klf4 sequence-specific binding to λ-DNA. To assess the sequence specificity of Klf4 localization on λ-DNA, we used a Klf4 sequence motif reported elsewhere 26 . The reported sequence logo was converted to a position weight matrix using the Logo2PWM tool 59 . A position weight matrix contains one row for each of the four DNA bases and a column for each position of the motif (Fig. 2g, Extended Data Fig. 8 and Supplementary Table 1). The values of the matrix represent the relative frequency to find a certain base at a given position within the motif. Since these matrices are generally derived from the genome-wide in vivo data of protein binding to DNA, they inform us of how likely it is to find a protein bound to a given sequence of bases. We denote the matrix by M F nb , where M F nb is the relative frequency to find base b at nucleotide position n on the forward strand when the protein is bound to DNA. M R nb is defined analogously for the reverse strand. Here b can be any of the four possible nucleotides, namely, A, T, G or C. The position weight matrix describing the frequency of bases on the complementary strand is denoted as M C nb . Although position weight matrices give a fairly accurate estimate of the consensus sequences, any sequence further away from the consensus (represented by small values in the position weight matrix) is not well represented 60 . To account for this limitation, any element in the position weight matrix where M nb ≤ e is replaced by a minimal value e. To discuss Klf4 binding to either strand of DNA, we, therefore, use the following average position weight matrix: (1) For a sequence that differs by a single base n from the consensus sequence, the probability ratio λ n,b of Klf4 binding to the two sequences is related to the position weight matrix byM (2) Using this equation, we can obtain the probability ratio of Klf4 binding to a given sequence B (where B = (b1, ..., bL); b n is the base at position n along the sequence) relative to the consensus motif, which reads where L is the number of bases along the sequence. Here P lies between 0 and 1; for consensus sequence B * , P(B * ) = 1. We use equation (3) to infer the landscape of Klf4-λ-DNA-binding probability, as shown in Fig. 2h. To infer the binding energy landscape along λ-DNA from the position weight matrix, we use the following equation: where ε b,n is the binding energy contribution for nucleotide position n for the corresponding case b n . Further, ε b*,n is the binding energy contribution corresponding to consensus base (b * n ) at nucleotide position n. Hence, the binding energy difference for a given sequence B with respect to binding to the consensus sequence is given by Equations (4) and (5) allow us to infer the binding energy landscape for Klf4 binding to λ-DNA ( Fig. 2h and Extended Data Figs. 8 and 9a). For details of how we obtain these equations, see Supplementary Note.
Correlation as a function of concentration. Independent experiments were sorted based on the experimental protein concentration. Intensity profiles were normalized to their maximum intensity and then averaged in groups selected from a moving window along the concentration axis (Fig. 4e) or in specific concentration bins (Extended Data Fig. 10g,h). For each average intensity profile, the correlation with the coarse-grained binding energy profile was quantified as the Pearson's correlation coefficient. The coarse-grained binding energy profile was first interpolated into the dimensions of the experimental profile.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All data generated or analysed in this study are available from the corresponding authors upon reasonable request.

Materials
All materials will be made available upon request after the completion of a Material Transfer Agreement.

Code availability
The code used to analyse the data and perform numerical simulations is available from the corresponding authors upon reasonable request.  Fig. 7). After an initial binding step (first ∼ 50 s in top) the intensity distribution bifurcates into the adsorbed and condensed states (low and high intensity branches respectively). Over time, these states coexist and the condensed one increases its brightness (the high intensity branch migrates toward higher intensity values) until it reaches a stable value (∼10 3 photon counts in top). The way these states are populated is suggestive of a switch-like behaviour: the condensed state becomes more populated over time at the expense of the adsorbed one. Increasing the bulk concentration reduces the time required for the bifurcation. At higher bulk concentrations, the condensed state gets brighter (∼5*10 3 in bottom) and overall, more populated. indicated. e, representative kymographs corresponding to the 3 C dependent MBP-Klf4 condensation process. The coarse-grained binding energy profile is shown at the bottom as a guide to the eye (see Fig. 2h and Methods). f, red, average intensity profile along λ-DNA for N = 22 experiments binned at the indicated time after transfer of the bead-DNA system to a solution containing the 3 C protease. Shaded area, standard error of the mean at 95% confidence. The coarse-grained binding energy profile is shown in blue (see Fig. 2h and Methods). The Pearson's correlation between the average intensity and the binding energy profiles, increases from =0.51 at t = 0 s to =0.69 at t = 450 s. g, h, Average intensity profile along λ-DNA for N experiments binned in the indicated concentration range (red) for Klf4 (g) and MPB-Klf4 (h). Shaded area, standard error of the mean at 95% confidence. The coarse-grained binding energy profile is shown in blue (see Fig. 2h and Methods). The correlation between the average intensity profile and the binding energy profile, quantified by Pearson's correlation coefficient ( ), is indicated in each case. α α α