The advent of DNA footprinting with DNase I more than 35 years ago enabled the systematic analysis of protein-DNA interactions, and the technique has been instrumental in the decoding of cis-regulatory elements and the identification and characterization of transcription factors and other DNA-binding proteins. The ability to analyze millions of individual genomic cleavage events via massively parallel sequencing has enabled in vivo DNase I footprinting on a genomic scale, offering the potential for global analysis of transcription factor occupancy in a single experiment. Genomic footprinting has opened unique vistas on the organization, function and evolution of regulatory DNA; however, the technology is still nascent. Here we discuss both prospects and challenges of genomic footprinting, as well as considerations for its application to complex genomes.
At a glance
- Genes and Signals (Cold Spring Harbor Laboratory Press, 2002). &
- DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978). &
- Cell-type-specific contacts to immunoglobulin enhancers in nuclei. Nature 313, 798–801 (1985). , , &
- A method for mapping intranuclear protein-DNA interactions and its application to a nuclease hypersensitive site. Proc. Natl. Acad. Sci. USA 82, 2296–2300 (1985). &
- Detection of factors that interact with the human beta-interferon regulatory region in vivo by DNAase I footprinting. Cell 45, 611–618 (1986). &
- B lineage–specific interactions of an immunoglobulin enhancer with cellular factors in vivo. Science 227, 134–140 (1985). , , &
- Genomic footprinting reveals cell type-specific DNA binding of ubiquitous factors. Cell 51, 435–443 (1987). , &
- Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat. Methods 6, 283–289 (2009). et al.
- High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011). et al.
- An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012). et al.
- Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep. 8, 2015–2030 (2014). et al.
- Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365–370 (2014). et al.
- Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 1274–1286 (2012). et al.
- The invention of footprinting. Trends Biochem. Sci. 26, 690–693 (2001).
- Nuclease hypersensitive sites in chromatin. Annu. Rev. Biochem. 57, 159–197 (1988). &
- Physical studies of protein-DNA complexes by footprinting. Annu. Rev. Biophys. Biophys. Chem. 18, 213–237 (1989).
- Footprinting: a method for determining the sequence selectivity, affinity and kinetics of DNA-binding ligands. Methods 42, 128–140 (2007). , , &
- Enzymatic breakage and joining of deoxyribonucleic acid. V. End group labeling and analysis of deoxyribonucleic acid containing single stranded breaks. J. Biol. Chem. 243, 4530–4542 (1968). , &
- The specificity of pancreatic deoxyribonuclease. Eur. J. Biochem. 40, 143–147 (1973). , &
- High sequence specificity of micrococcal nuclease. Nucleic Acids Res. 9, 2659–2673 (1981). , &
- Probing DNA shape and methylation state on a genomic scale with DNase I. Proc. Natl. Acad. Sci. USA 110, 6376–6381 (2013). et al.
- Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013). , , , &
- Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010). et al.
- Interactions between DNA-bound repressors govern regulation by the lambda phage repressor. Proc. Natl. Acad. Sci. USA 76, 5061–5065 (1979). , &
- Sequence-specific binding of glucocorticoid receptor to MTV DNA at sites within and upstream of the transcribed region. Cell 35, 381–392 (1983). et al.
- The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter. Cell 35, 79–87 (1983). &
- Genomic sequencing. Proc. Natl. Acad. Sci. USA 81, 1991–1995 (1984). &
- In vivo footprinting of a muscle specific enhancer by ligation mediated PCR. Science 246, 780–786 (1989). &
- Mapping protein-DNA interactions using in vivo footprinting. Methods Mol. Biol. 127, 199–212 (1999). &
- X-ray structure of the DNase I-d(GGTATACC)2 complex at 2.3 A resolution. J. Mol. Biol. 226, 1237–1256 (1992). , &
- DNA structural variations in the E. coli tyrT promoter. Cell 37, 491–502 (1984). &
- Deoxyribonucleic acid nucleases. II. The effects of metals on the mechanism of action of deoxyribonuclease I. J. Biol. Chem. 243, 4409–4416 (1968). &
- The effect of divalent cations on the mode of action of DNase I. The initial reaction products produced from covalently closed circular DNA. J. Biol. Chem. 255, 3726–3735 (1980). &
- Precise location of DNase I cutting sites in the nucleosome core determined by high resolution gel electrophoresis. Nucleic Acids Res. 6, 41–56 (1979).
- Helical periodicity of DNA determined by enzyme digestion. Nature 286, 573–578 (1980). &
- NF-E2 and GATA binding motifs are required for the formation of DNase I hypersensitive site 4 of the human beta-globin locus control region. EMBO J. 14, 106–116 (1995). , , &
- A Genetic Switch (Cold Spring Harbor Laboratory Press, 2004).
- Quantitative DNA footprinting. Methods Mol. Biol. 90, 23–42 (1997). , &
- Hox proteins have different affinities for a consensus DNA site that correlate with the positions of their genes on the hox cluster. Mol. Cell. Biol. 14, 4532–4545 (1994). , , &
- Critical DNA binding interactions of the insulator protein CTCF: a small number of zinc fingers mediate strong binding, and a single finger-DNA interaction controls binding at imprinted loci. J. Biol. Chem. 282, 33336–33345 (2007). et al.
- DNase I-DNA interaction alters DNA and protein conformations. Biochem. Cell Biol. 86, 244–250 (2008). , &
- Eukaryotic transcriptional dynamics: from single molecules to cell populations. Nat. Rev. Genet. 14, 572–584 (2013). , , &
- Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat. Genet. 43, 264–268 (2011). et al.
- Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011). et al.
- Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014). et al.
- Molecular architecture of transcription factor hotspots in early adipogenesis. Cell Rep. 7, 1434–1442 (2014). et al.
- Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat. Methods 11, 73–78 (2014). et al.
- DNase footprint signatures are dictated by factor dynamics and DNA sequence. Mol. Cell 56, 275–285 (2014). , , &
- The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012). et al.
- Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat. Methods 3, 511–518 (2006). et al.
- Coupling transcription factor occupancy to nucleosome architecture with DNase-FLASH. Nat. Methods 11, 66–72 (2014). , , , &
- Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
- BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data. Bioinformatics 31, 2852–2859 (2015). &
- Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2012). et al.
- Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013). et al.
- Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications. Bioinformatics 30, 3143–3151 (2014). , , &
- A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data. Bioinformatics 26, i334–i342 (2010). , , , &
- FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011). , &
- Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection. Nucleic Acids Res. 42, 11865–11878 (2014). , &
- Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. Proc. Natl. Acad. Sci. USA 110, 18602–18607 (2013). , , &
- Genome-wide protein-DNA binding dynamics suggest a molecular clutch for transcription factor function. Nature 484, 251–255 (2012). , , , &
- Chromatin accessibility data sets show bias due to sequence specificity of the DNase I enzyme. PLoS One 8, e69853 (2013). , &
- Universal count correction for high-throughput sequencing. PLoS Comput. Biol. 10, e1003494 (2014). , &
- DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013). et al.
- DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015). et al.
- Using DNase digestion data to accurately identify transcription factor binding sites. Pac. Symp. Biocomput. 2013, 80–91 (2013). &
- Supplementary Figure 1: Aggregated DNase I cleavage patterns for TF recognition sequences reflecting diverse DNA-binding domains. (307 KB)
(a) Heatmaps of per-nucleotide DNase I cleavages and discovered footprints surrounding NRF1 recognition sequences. Left, observed cleavages. Right, the ratio of the observed cleavages to expected cleavages computed by reassigning tags to a hexamer model DNase I cleavage bias. Blue ticks indicate that the recognition sequence has an associated DNase I footprint. Line plots show the aggregate profile of mean per-nucleotide DNase I cleavages at the 20% most (left column) and 20% least (right column) accessible NRF1 recognition sequences. Top row, observed cleavages. Middle, expected cleavages computed using the hexamer model. Bottom, the log2 ratio of observed to expected. (b-g) The same as (a) for the recognition sequences for (b) SP1, (c) ELK1, (d) USF1, (e) RFX3, (f) NFIB, and (g) CTCF within accessible chromatin. In each case the cleavage patterns at occupied templates (coinciding with de novo TF footprint calls) parallel known structural features of the respective DNA binding domains.
- Supplementary Figure 2: General features of DNase I sequence preference. (67 KB)
(a) Relative cleavage preference of all 4,096 hexamers with respect to the median hexamer as determined by deep sequencing (~100 million tags) of a DNase I digestion of deproteinized DNA from human IMR90 cells (data from ref.). (b) Biased hexamers contribute disproportionately to total DNase I cleavages for both naked DNA and chromatin (regulatory T cells cleavages mapping within DHS) when compared to the 36 bp mappable genome. Shown is the cumulative fraction all mappable positions or sequencing tags within respect to their hexamer context. Hexamers are ranked by decreasing cleavage preference as in a.
- Supplementary Text and Figures (1,081 KB)
Supplementary Figures 1 and 2, Supplementary Box 1 and Supplementary Table 1