Genomic footprinting

Journal name:
Nature Methods
Volume:
13,
Pages:
213–221
Year published:
DOI:
doi:10.1038/nmeth.3768
Received
Accepted
Published online

Abstract

The advent of DNA footprinting with DNase I more than 35 years ago enabled the systematic analysis of protein-DNA interactions, and the technique has been instrumental in the decoding of cis-regulatory elements and the identification and characterization of transcription factors and other DNA-binding proteins. The ability to analyze millions of individual genomic cleavage events via massively parallel sequencing has enabled in vivo DNase I footprinting on a genomic scale, offering the potential for global analysis of transcription factor occupancy in a single experiment. Genomic footprinting has opened unique vistas on the organization, function and evolution of regulatory DNA; however, the technology is still nascent. Here we discuss both prospects and challenges of genomic footprinting, as well as considerations for its application to complex genomes.

At a glance

Figures

  1. Principles of a DNase I footprinting experiment.
    Figure 1: Principles of a DNase I footprinting experiment.

    (a) The classical DNase I footprinting technique was performed in vitro and combined purified protein or nuclear extract with a radiolabeled DNA probe. A limited DNase I digestion resulted in a series of nested fragments that were resolved using gel electrophoresis. (b) Digital genomic footprinting combines exposure of nuclei to DNase I, purification of small DNase I–released fragments, and massively parallel sequencing of fragment ends (DNase I cleavage sites) to generate a digital readout of per-nucleotide cleavages genome-wide.

  2. Resolving cis-regulatory architecture at nucleotide resolution in individual regulatory regions.
    Figure 2: Resolving cis-regulatory architecture at nucleotide resolution in individual regulatory regions.

    Digital genomic footprints within the promoter of TMEM143 and SYNGR4 in regulatory T cells. Dashed boxes highlight individual TF footprints that are marked by decreased cleavage rates (blue) compared with those expected (yellow) considering the intrinsic sequence preference of DNase I.

  3. An illustrative example of interpretation of DNase I cleavage to determine TF occupancy.
    Figure 3: An illustrative example of interpretation of DNase I cleavage to determine TF occupancy.

    (a) Heat map of per-nucleotide DNase I cleavages surrounding AP-1 recognition sequences within DHSs in regulatory T cells, sorted by decreasing cleavage density in a ±25-bp window. Red dashed lines demarcate the 20% most and least accessible recognition sequences. (b) Aggregate profiles of observed and expected mean per-nucleotide DNase I cleavages of the 20% most (top) and least (bottom) accessible recognition sequences. For computation of expected cleavages, observed cleavages were reassigned with respect to a 6-mer preference model. (c) Heat map of the ratio of observed (obs) cleavages to expected (exp) cleavages surrounding AP-1 recognition sequences sorted as in a. (d) Aggregate profiles of the log2 observed/expected ratio of the 20% most (top) and least (bottom) accessible recognition sequences. (e) Footprints identified at AP-1 recognition sequences. Blue tick marks indicate that the recognition sequence has an associated DNase I footprint (from ref. 10).

  4. De novo versus TF recognition site-directed analysis of TF occupancy.
    Figure 4: De novo versus TF recognition site–directed analysis of TF occupancy.

    (a) Conceptual strategy for de novo delineation of TF footprints in digital genomic data. Footprints are defined when the number of observed cleavages decreases over a short stretch of contiguous nucleotides relative to the number in adjacent flanking regions. Additional corrections for primary DNA structure-directed DNase I cleavage preferences (Fig. 5) can be incorporated explicitly or applied in a subsequent step. De novo footprint detection is critically dependent on cleavage density over the regulatory region (which in turn depends on sequencing depth and sample SNR; Supplementary Table 1). (b) Strategy for determining TF occupancy at a predefined candidate recognition element (e.g., defined by a match to a consensus sequence motif). Motif occupancy is determined via application of a statistical framework that assesses the probability of the observed cleavage given an expected model. The total number of cleavage events required for statistically robust categorization of occupancy events is lower than that needed for de novo detection because the nucleotide stretch is predefined.

  5. Modeling variation in DNase I cleavages rates due to primary DNA structure.
    Figure 5: Modeling variation in DNase I cleavages rates due to primary DNA structure.

    (a) Strategy for determining expected cleavage counts from observed data. Observed cleavage counts are reassigned within a local window (±5 bp) according to the relative sequence preference of the overlapping 6-mer (derived from empirical data in ref. 21), using only the center position as the corresponding expected cleavage rate. The window is stepped in 1-bp increments to compute the expected cleavage rates genome-wide. (b) Median observed cleavage rates versus predicted rates using the strategy described in a. Local resampling of DNase I cleavages using a sequence-preference model outperforms uniform shuffling. The dashed red line indicates a hypothetical preference model that fully predicts the observed cleavage rates. (c) The negative binomial model is more effective at fitting the variation in observed per-nucleotide rates than the commonly used Poisson distribution. Shown is a density histogram of the observed cleavage rates at sites for which there are 20 expected cleavages on the basis of the 6-mer model from b. The brown dashed line indicates the probability density of the Poisson distribution with a mean of 20. The orange line shows the probability density of the negative binomial fitted to the observed data.

  6. Aggregated DNase I cleavage patterns for TF recognition sequences reflecting diverse DNA-binding domains.
    Supplementary Fig. 1: Aggregated DNase I cleavage patterns for TF recognition sequences reflecting diverse DNA-binding domains.

    (a) Heatmaps of per-nucleotide DNase I cleavages and discovered footprints surrounding NRF1 recognition sequences. Left, observed cleavages. Right, the ratio of the observed cleavages to expected cleavages computed by reassigning tags to a hexamer model DNase I cleavage bias. Blue ticks indicate that the recognition sequence has an associated DNase I footprint. Line plots show the aggregate profile of mean per-nucleotide DNase I cleavages at the 20% most (left column) and 20% least (right column) accessible NRF1 recognition sequences. Top row, observed cleavages. Middle, expected cleavages computed using the hexamer model. Bottom, the log2 ratio of observed to expected. (b-g) The same as (a) for the recognition sequences for (b) SP1, (c) ELK1, (d) USF1, (e) RFX3, (f) NFIB, and (g) CTCF within accessible chromatin. In each case the cleavage patterns at occupied templates (coinciding with de novo TF footprint calls) parallel known structural features of the respective DNA binding domains.

  7. General features of DNase I sequence preference.
    Supplementary Fig. 2: General features of DNase I sequence preference.

    (a) Relative cleavage preference of all 4,096 hexamers with respect to the median hexamer as determined by deep sequencing (~100 million tags) of a DNase I digestion of deproteinized DNA from human IMR90 cells (data from ref.). (b) Biased hexamers contribute disproportionately to total DNase I cleavages for both naked DNA and chromatin (regulatory T cells cleavages mapping within DHS) when compared to the 36 bp mappable genome. Shown is the cumulative fraction all mappable positions or sequencing tags within respect to their hexamer context. Hexamers are ranked by decreasing cleavage preference as in a.

References

  1. Ptashne, M. & Gann, A. Genes and Signals (Cold Spring Harbor Laboratory Press, 2002).
  2. Galas, D.J. & Schmitz, A. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 31573170 (1978).
  3. Church, G.M., Ephrussi, A., Gilbert, W. & Tonegawa, S. Cell-type-specific contacts to immunoglobulin enhancers in nuclei. Nature 313, 798801 (1985).
  4. Jackson, P.D. & Felsenfeld, G. A method for mapping intranuclear protein-DNA interactions and its application to a nuclease hypersensitive site. Proc. Natl. Acad. Sci. USA 82, 22962300 (1985).
  5. Zinn, K. & Maniatis, T. Detection of factors that interact with the human beta-interferon regulatory region in vivo by DNAase I footprinting. Cell 45, 611618 (1986).
  6. Ephrussi, A., Church, G.M., Tonegawa, S. & Gilbert, W. B lineage–specific interactions of an immunoglobulin enhancer with cellular factors in vivo. Science 227, 134140 (1985).
  7. Becker, P.B., Ruppert, S. & Schütz, G. Genomic footprinting reveals cell type-specific DNA binding of ubiquitous factors. Cell 51, 435443 (1987).
  8. Hesselberth, J.R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat. Methods 6, 283289 (2009).
  9. Boyle, A.P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456464 (2011).
  10. Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 8390 (2012).
  11. Sullivan, A.M. et al. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep. 8, 20152030 (2014).
  12. Stergachis, A.B. et al. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365370 (2014).
  13. Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 12741286 (2012).
  14. Galas, D.J. The invention of footprinting. Trends Biochem. Sci. 26, 690693 (2001).
  15. Gross, D.S. & Garrard, W.T. Nuclease hypersensitive sites in chromatin. Annu. Rev. Biochem. 57, 159197 (1988).
  16. Tullius, T.D. Physical studies of protein-DNA complexes by footprinting. Annu. Rev. Biophys. Biophys. Chem. 18, 213237 (1989).
  17. Hampshire, A.J., Rusling, D.A., Broughton-Head, V.J. & Fox, K.R. Footprinting: a method for determining the sequence selectivity, affinity and kinetics of DNA-binding ligands. Methods 42, 128140 (2007).
  18. Weiss, B., Live, T.R. & Richardson, C.C. Enzymatic breakage and joining of deoxyribonucleic acid. V. End group labeling and analysis of deoxyribonucleic acid containing single stranded breaks. J. Biol. Chem. 243, 45304542 (1968).
  19. Ehrlich, S.D., Bertazzoni, U. & Bernardi, G. The specificity of pancreatic deoxyribonuclease. Eur. J. Biochem. 40, 143147 (1973).
  20. Dingwall, C., Lomonossoff, G.P. & Laskey, R.A. High sequence specificity of micrococcal nuclease. Nucleic Acids Res. 9, 26592673 (1981).
  21. Lazarovici, A. et al. Probing DNA shape and methylation state on a genomic scale with DNase I. Proc. Natl. Acad. Sci. USA 110, 63766381 (2013).
  22. Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. & Greenleaf, W.J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 12131218 (2013).
  23. Adey, A. et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010).
  24. Johnson, A.D., Meyer, B.J. & Ptashne, M. Interactions between DNA-bound repressors govern regulation by the lambda phage repressor. Proc. Natl. Acad. Sci. USA 76, 50615065 (1979).
  25. Payvar, F. et al. Sequence-specific binding of glucocorticoid receptor to MTV DNA at sites within and upstream of the transcribed region. Cell 35, 381392 (1983).
  26. Dynan, W.S. & Tjian, R. The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter. Cell 35, 7987 (1983).
  27. Church, G.M. & Gilbert, W. Genomic sequencing. Proc. Natl. Acad. Sci. USA 81, 19911995 (1984).
  28. Mueller, P.R. & Wold, B. In vivo footprinting of a muscle specific enhancer by ligation mediated PCR. Science 246, 780786 (1989).
  29. Warshawsky, D. & Miller, L. Mapping protein-DNA interactions using in vivo footprinting. Methods Mol. Biol. 127, 199212 (1999).
  30. Weston, S.A., Lahm, A. & Suck, D. X-ray structure of the DNase I-d(GGTATACC)2 complex at 2.3 A resolution. J. Mol. Biol. 226, 12371256 (1992).
  31. Drew, H.R. & Travers, A.A. DNA structural variations in the E. coli tyrT promoter. Cell 37, 491502 (1984).
  32. Melgar, E. & Goldthwait, D.A. Deoxyribonucleic acid nucleases. II. The effects of metals on the mechanism of action of deoxyribonuclease I. J. Biol. Chem. 243, 44094416 (1968).
  33. Campbell, V.W. & Jackson, D.A. The effect of divalent cations on the mode of action of DNase I. The initial reaction products produced from covalently closed circular DNA. J. Biol. Chem. 255, 37263735 (1980).
  34. Lutter, L.C. Precise location of DNase I cutting sites in the nucleosome core determined by high resolution gel electrophoresis. Nucleic Acids Res. 6, 4156 (1979).
  35. Rhodes, D. & Klug, A. Helical periodicity of DNA determined by enzyme digestion. Nature 286, 573578 (1980).
  36. Stamatoyannopoulos, J.A., Goodwin, A., Joyce, T. & Lowrey, C.H. NF-E2 and GATA binding motifs are required for the formation of DNase I hypersensitive site 4 of the human beta-globin locus control region. EMBO J. 14, 106116 (1995).
  37. Ptashne, M. A Genetic Switch (Cold Spring Harbor Laboratory Press, 2004).
  38. Dabrowiak, J.C., Goodisman, J. & Ward, B. Quantitative DNA footprinting. Methods Mol. Biol. 90, 2342 (1997).
  39. Pellerin, I., Schnabel, C., Catron, K.M. & Abate, C. Hox proteins have different affinities for a consensus DNA site that correlate with the positions of their genes on the hox cluster. Mol. Cell. Biol. 14, 45324545 (1994).
  40. Renda, M. et al. Critical DNA binding interactions of the insulator protein CTCF: a small number of zinc fingers mediate strong binding, and a single finger-DNA interaction controls binding at imprinted loci. J. Biol. Chem. 282, 3333633345 (2007).
  41. N'soukpoé-Kossi, C.N., Diamantoglou, S. & Tajmir-Riahi, H.A. DNase I-DNA interaction alters DNA and protein conformations. Biochem. Cell Biol. 86, 244250 (2008).
  42. Coulon, A., Chow, C.C., Singer, R.H. & Larson, D.R. Eukaryotic transcriptional dynamics: from single molecules to cell populations. Nat. Rev. Genet. 14, 572584 (2013).
  43. John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat. Genet. 43, 264268 (2011).
  44. Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447455 (2011).
  45. Sherwood, R.I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171178 (2014).
  46. Siersbæk, R. et al. Molecular architecture of transcription factor hotspots in early adipogenesis. Cell Rep. 7, 14341442 (2014).
  47. He, H.H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat. Methods 11, 7378 (2014).
  48. Sung, M.-H., Guertin, M.J., Baek, S. & Hager, G.L. DNase footprint signatures are dictated by factor dynamics and DNA sequence. Mol. Cell 56, 275285 (2014).
  49. Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 7582 (2012).
  50. Sabo, P.J. et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat. Methods 3, 511518 (2006).
  51. Vierstra, J., Wang, H., John, S., Sandstrom, R. & Stamatoyannopoulos, J.A. Coupling transcription factor occupancy to nucleosome architecture with DNase-FLASH. Nat. Methods 11, 6672 (2014).
  52. Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317330 (2015).
  53. Kähärä, J. & Lähdesmäki, H. BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data. Bioinformatics 31, 28522859 (2015).
  54. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 7274 (2012).
  55. Piper, J. et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013).
  56. Gusmao, E.G., Dieterich, C., Zenke, M. & Costa, I.G. Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications. Bioinformatics 30, 31433151 (2014).
  57. Chen, X., Hoffman, M.M., Bilmes, J.A., Hesselberth, J.R. & Noble, W.S. A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data. Bioinformatics 26, i334i342 (2010).
  58. Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 10171018 (2011).
  59. Frank, C.L., Crawford, G.E. & Ohler, U. Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection. Nucleic Acids Res. 42, 1186511878 (2014).
  60. Teytelman, L., Thurtle, D.M., Rine, J. & van Oudenaarden, A. Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. Proc. Natl. Acad. Sci. USA 110, 1860218607 (2013).
  61. Lickwar, C.R., Mueller, F., Hanlon, S.E., McNally, J.G. & Lieb, J.D. Genome-wide protein-DNA binding dynamics suggest a molecular clutch for transcription factor function. Nature 484, 251255 (2012).
  62. Koohy, H., Down, T.A. & Hubbard, T.J. Chromatin accessibility data sets show bias due to sequence specificity of the DNase I enzyme. PLoS One 8, e69853 (2013).
  63. Hashimoto, T.B., Edwards, M.D. & Gifford, D.K. Universal count correction for high-throughput sequencing. PLoS Comput. Biol. 10, e1003494 (2014).
  64. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327339 (2013).
  65. Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384388 (2015).
  66. Luo, K. & Hartemink, A.J. Using DNase digestion data to accurately identify transcription factor binding sites. Pac. Symp. Biocomput. 2013, 8091 (2013).

Download references

Author information

Affiliations

  1. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

    • Jeff Vierstra &
    • John A Stamatoyannopoulos
  2. Altius Institute for Biomedical Sciences, Seattle, Washington, USA.

    • Jeff Vierstra &
    • John A Stamatoyannopoulos
  3. Division of Oncology, Department of Medicine, University of Washington, Seattle, Washington, USA.

    • John A Stamatoyannopoulos

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Aggregated DNase I cleavage patterns for TF recognition sequences reflecting diverse DNA-binding domains. (307 KB)

    (a) Heatmaps of per-nucleotide DNase I cleavages and discovered footprints surrounding NRF1 recognition sequences. Left, observed cleavages. Right, the ratio of the observed cleavages to expected cleavages computed by reassigning tags to a hexamer model DNase I cleavage bias. Blue ticks indicate that the recognition sequence has an associated DNase I footprint. Line plots show the aggregate profile of mean per-nucleotide DNase I cleavages at the 20% most (left column) and 20% least (right column) accessible NRF1 recognition sequences. Top row, observed cleavages. Middle, expected cleavages computed using the hexamer model. Bottom, the log2 ratio of observed to expected. (b-g) The same as (a) for the recognition sequences for (b) SP1, (c) ELK1, (d) USF1, (e) RFX3, (f) NFIB, and (g) CTCF within accessible chromatin. In each case the cleavage patterns at occupied templates (coinciding with de novo TF footprint calls) parallel known structural features of the respective DNA binding domains.

  2. Supplementary Figure 2: General features of DNase I sequence preference. (67 KB)

    (a) Relative cleavage preference of all 4,096 hexamers with respect to the median hexamer as determined by deep sequencing (~100 million tags) of a DNase I digestion of deproteinized DNA from human IMR90 cells (data from ref.). (b) Biased hexamers contribute disproportionately to total DNase I cleavages for both naked DNA and chromatin (regulatory T cells cleavages mapping within DHS) when compared to the 36 bp mappable genome. Shown is the cumulative fraction all mappable positions or sequencing tags within respect to their hexamer context. Hexamers are ranked by decreasing cleavage preference as in a.

PDF files

  1. Supplementary Text and Figures (1,081 KB)

    Supplementary Figures 1 and 2, Supplementary Box 1 and Supplementary Table 1

Additional data