Correspondence: Reply to ‘DNA shape is insufficient to explain binding'

Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Sequence-speciﬁc TFs recognize DNA via speciﬁc amino acid-base hydrogen bonds and contacts that read local DNA shape 1 . Studying base and shape readout modes of TFs in vivo has been challenging due to technical issues associated with current approaches for mapping TF-binding sites (TFBSs). We recently introduced Chromatin Endogenous Cleavage with sequencing (ChEC-seq), an in vivo mapping method based on fusing Micrococcal Nuclease (MNase) to a TF (ref. 2). Upon addition of calcium to permeabilized cells, tethered MNase cuts DNA adjacent to the bound TF and the released fragments are sequenced to provide a high-resolution genome-wide TFBS map. We used ChEC-seq to map the budding yeast TFs Abf1, Reb1 and Rap1 and obtained data similar to high-resolution ChIP-seq without the need for cross-linking, chromatin solubilization or antibodies. When cells were collected o 1 min after calcium addition, most TFBSs contained a TF-speciﬁc sequence motif (‘fast’ sites). We also reported ‘slow’ sites with low motif scores that appeared after B 10 min. We found that DNA shape features of high-scoring (mostly fast) and low-scoring (mostly slow) TFBSs corresponded closely, but differed from randomly chosen sites not overlapping high- or low-scoring sites. In our study, DNA shape features of fast and slow sites were centred on the best match to the TF consensus motif; however, randomly chosen genomic intervals were not similarly centred on the best motif match. Rossi, Lai and Pugh now ﬁnd that when random sites are motif-centred, the shape features correspond closely to slow site features 3

Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Sequence-specific TFs recognize DNA via specific amino acid-base hydrogen bonds and contacts that read local DNA shape 1 . Studying base and shape readout modes of TFs in vivo has been challenging due to technical issues associated with current approaches for mapping TF-binding sites (TFBSs). We recently introduced Chromatin Endogenous Cleavage with sequencing (ChEC-seq), an in vivo mapping method based on fusing Micrococcal Nuclease (MNase) to a TF (ref. 2). Upon addition of calcium to permeabilized cells, tethered MNase cuts DNA adjacent to the bound TF and the released fragments are sequenced to provide a high-resolution genomewide TFBS map. We used ChEC-seq to map the budding yeast TFs Abf1, Reb1 and Rap1 and obtained data similar to highresolution ChIP-seq without the need for cross-linking, chromatin solubilization or antibodies.
When cells were collected o1 min after calcium addition, most TFBSs contained a TF-specific sequence motif ('fast' sites). We also reported 'slow' sites with low motif scores that appeared after B10 min. We found that DNA shape features of high-scoring (mostly fast) and low-scoring (mostly slow) TFBSs corresponded closely, but differed from randomly chosen sites not overlapping high-or low-scoring sites. In our study, DNA shape features of fast and slow sites were centred on the best match to the TF consensus motif; however, randomly chosen genomic intervals were not similarly centred on the best motif match. Rossi, Lai and Pugh now find that when random sites are motif-centred, the shape features correspond closely to slow site features 3 , which might suggest that DNA shape is insufficient to explain binding site selection by the TFs Abf1, Reb1 and Rap1. However, given that sequence and shape features covary 4 , it is problematic to rely on motif-dependent analyses to draw conclusions about whether a TF recognizes DNA shape 5 .
To address this problem, we aligned DNA shape feature vectors for unique fast and slow ChEC-seq sites for each TF using a procedure that relied only on shape data and was not directly informed by sequence alignment. Given the possibility for overlap between nearby TFBSs, we identified unique sites that do not intersect with any other ChEC-seq sites within intervals ranging from 100 to 500 bp surrounding ChEC-seq peak maxima, with larger windows associated with increasing stringency. For Abf1 and Reb1, we found that average fast and slow site shape features were well correlated at a range of interval widths (Po o0.001; Fig. 1a-c). We also searched sites using a 'shape profile' defined using the average fast site features and found that score distributions for fast and slow sites only slightly differed (P40.03), but were very different from random and free MNase sites (Po o10 À 10 ) for Abf1 (Fig. 1b) and Reb1 (not shown). The major shape feature proximal to Abf1 motifs is a deformation to the helix indicative of motif-proximal poly(dA:dT) tracts (Fig. 1a), a sequence feature we observed at slow sites in our original study 2 . Consistent with the recognition of a preferred shape signature by Abf1 and Reb1 at fast and slow sites, random sites and free MNase sites were not well correlated with fast and slow sites (Fig. 1a-c). We do not observe shape features enriched for poly(dA:dT) tracts at free MNase sites (Fig. 1a,b), suggesting that the detection of this shared shape feature at fast and slow ChEC-seq sites is not simply due to the higher prevalence of these features within nucleosome-depleted regions. Shape features at Rap1 fast and slow sites were not well correlated (Po0.1; Fig. 1c,d). The robustness of the correlation between average fast and slow shape features for Abf1 and Reb1 across a range of interval widths (Fig. 1d) suggests that sampling of similar shapes by TFs may explain binding events, even within promoters where fast and slow sites co-occur. From these motif-independent analyses, we conclude that fast and slow binding sites for Abf1 and Reb1 have similar shape features.
We next queried a TF-gene regulatory association database 6 , and asked whether TF-slow site associations had been previously observed in mapping or gene expression studies orthogonal to ChEC-seq. Consistent with our previous demonstration that slow sites were recovered as sites without the canonical motif in other studies 2 , the proportion of fast and slow sites documented or proposed to regulate proximal genes in previous studies (Fig. 1e) was similar across a range of interval widths. This suggests that slow sites with shape features similar to fast sites are likely true binding sites and not simply experimental noise due to cleavage proximal to fast sites.
What accounts for the differential sensitivity of these TFs to DNA shape? All three TFs are essential and have roles in maintaining nucleosome organization 7,8 ; however, Rap1 is unique in that it also functions in chromatin silencing at the mating type locus and telomeres 9 . Promoter architecture in Saccharomyces cerevisiae may provide a basis for this functional specialization 4 . We observed marked deviations in DNA shape in the average aligned fast and slow site profiles for Abf1 (Fig. 1a) and Reb1, but not Rap1 (not shown) consistent with the presence of poly(dA:dT) tracts, which are known to exclude nucleosomes and play a role in establishing canonical chromatin architecture 4,10 . Abf1 and Reb1 have been proposed to be dependent on poly(dA:dT) tracts for their localization and function [11][12][13] . It has been suggested that poly(dA:dT) tracts may participate in regulating ribosomal protein gene promoters, which are also bound by Rap1 (ref. 14); however, our inability to detect significant DNA shape contributions to Rap1 binding may be due to the comparatively small number of sites tested. We speculate that promoters with poly(dA:dT) tracts not only exclude nucleosomes, but also have shape features that help recruit TFs that actively maintain nucleosome depletion 15 . Indeed, binding site-proximal poly(dA:dT) tracts have been proposed to enhance binding 16 , potentially by increasing accessibility of the adjacent major groove 17 . Thus, TF functional diversity and architecture of yeast promoters may explain the varying sensitivities of TFs to DNA shape. In this context, we anticipate that ChEC-seq will be a useful tool for generating high-resolution maps of protein-DNA interactions, with the potential to provide insights into the in vivo role of DNA shape in TFBS recognition.

Methods
We defined unique sites such that the intersection of intervals of 100-500 bp widths centred on unique Abf1, Reb1, Rap1 and Free MNase ChEC-seq peak maxima was disjoint. As a null set, we generated 1,500 random intervals from the sacCer3 genome assembly that did not overlap with ChEC-derived peaks. Shape features in 201-bp windows centred on peak maxima were determined as described 4  Pearson r DNAshapeR package 18 . At each interval width for a given TF, sites that did not have overlapping shape alignment windows were selected for alignment. Motif-independent alignment involved comparing each site against every other site within a given class and determining the shift that maximized the cosine similarity. Within a class, all sites were aligned to an internal centroid, defined as the site with the smallest sum of squared cosine similarities versus all other sites. Sites were then shifted relative to the centroid and class-specific average features were computed. Pearson's r was used to quantify the similarity of average shape features between classes (reported P values are two-tailed) without shifting the average features relative to each other. Given the strong A/T MNase cleavage preference (not shown) in the 5-bp window centred on peak maxima, we excluded these positions from the alignment. Further, because shape readout likely occurs near the TFBS, the largest shift considered was 25 bp and alignment was limited to the 90-bp interval centred at the peak maximum. Parameters used for all site classes including the random and free MNase sites were identical. Shape profiles for Abf1 and Reb1 were defined as the regions in the average fast shape features with the largest information gain relative to shuffled sequences. Score distributions were generated by scoring the aligned fast, slow, free MNase and random sites in the same 90-bp interval used for shape alignment using correlation distance to the shape profile; Mann-Whitney U-tests were performed for pairwise comparisons of the resulting distributions. To determine whether putative TFBSs regulate nearby genes, we assigned them to their closest (r1 kb) genes and queried YEASTRACT 6 . Source code for these analyses is publicly available (https://github.com/ sivakasinathan/shape_align).