(a) The binding preference of CTCF2 represented as a sequence logo9, in which the height of each letter is proportional to the information content at that position. (b) The 20 top-scoring occurrences of the CTCF binding site in human chromosome 21. Coordinates of the starting position of each occurrence are given with respect to human genome assembly NCBI 36.1. (c) A histogram of scores produced by scanning a shuffled version of human chromosome 21 with the CTCF motif. (d) This panel zooms in on the right tail of the distribution shown in c. The blue histogram is the empirical null distribution of scores observed from scanning a shuffled chromosome. The gray line is the analytic distribution. The P-value associated with an observed score of 17.0 is equal to the area under the curve to the right of 17.0 (shaded pink). (e) The false discovery rate is estimated from the empirical null distribution for a score threshold of 17.0. There are 35 null scores >17.0 and 519 observed scores >17.0, leading to an estimate of 6.7%. This procedure assumes that the number of observed scores equals the number of null scores.