Analysis of computational footprinting methods for DNase sequencing experiments

Gusmao, Eduardo G; Allhoff, Manuel; Zenke, Martin; Costa, Ivan G

doi:10.1038/nmeth.3772

Analysis
Published: 22 February 2016

Analysis of computational footprinting methods for DNase sequencing experiments

Eduardo G Gusmao ORCID: orcid.org/0000-0001-7461-1443^1,2,
Manuel Allhoff^1,3,
Martin Zenke² &
…
Ivan G Costa^1,2,3

Nature Methods volume 13, pages 303–309 (2016)Cite this article

9977 Accesses
91 Citations
24 Altmetric
Metrics details

Subjects

Abstract

DNase-seq allows nucleotide-level identification of transcription factor binding sites on the basis of a computational search of footprint-like DNase I cleavage patterns on the DNA. Frequently in high-throughput methods, experimental artifacts such as DNase I cleavage bias affect the computational analysis of DNase-seq experiments. Here we performed a comprehensive and systematic study on the performance of computational footprinting methods. We evaluated ten footprinting methods in a panel of DNase-seq experiments for their ability to recover cell-specific transcription factor binding sites. We show that three methods—HINT, DNase2TF and PIQ—consistently outperformed the other evaluated methods and that correcting the DNase-seq signal for experimental artifacts significantly improved the accuracy of computational footprints. We also propose a score that can be used to detect footprints arising from transcription factors with potentially short residence times.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: FLR-Exp evaluation metric.**

**Figure 2: Clustering of bias estimates.**

**Figure 3: Effects of sequence bias on methods.**

**Figure 4: Evaluation of computational footprinting methods.**

**Figure 5: Impact of TF binding time on computational footprinting.**

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

Gene trajectory inference for single-cell data by optimal transport metrics

Article 05 April 2024

Rihao Qu, Xiuyuan Cheng, … Yuval Kluger

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Accession codes

Accessions

Gene Expression Omnibus

Sequence Read Archive

SRP004871

References

The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Crawford, G.E. et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 16, 123–131 (2006).
Article CAS PubMed PubMed Central Google Scholar
Sabo, P.J. et al. Genome-wide identification of DNase I hypersensitive sites using active chromatin sequence libraries. Proc. Natl. Acad. Sci. USA 101, 4537–4542 (2004).
Article CAS PubMed PubMed Central Google Scholar
Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).
Article CAS PubMed PubMed Central Google Scholar
Boyle, A.P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011).
Article CAS PubMed PubMed Central Google Scholar
Piper, J. et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sung, M.-H.H., Guertin, M.J., Baek, S. & Hager, G.L. DNase footprint signatures are dictated by factor dynamics and DNA sequence. Mol. Cell 56, 275–285 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gusmao, E.G., Dieterich, C., Zenke, M. & Costa, I.G. Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications. Bioinformatics 30, 3143–3151 (2014).
Article CAS PubMed Google Scholar
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cuellar-Partida, G. et al. Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics 28, 56–62 (2012).
Article CAS PubMed Google Scholar
Sherwood, R.I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yardımcı, G.G., Frank, C.L., Crawford, G.E. & Ohler, U. Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection. Nucleic Acids Res. 42, 11865–11878 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kähärä, J. & Lähdesmäki, H. BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data. Bioinformatics 31, 2852–2859 (2015).
Article CAS PubMed Google Scholar
Stergachis, A.B. et al. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365–370 (2014).
Article CAS PubMed PubMed Central Google Scholar
He, H.H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat. Methods 11, 73–78 (2014).
Article CAS PubMed Google Scholar
Meyer, C.A. & Liu, X.S. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat. Rev. Genet. 15, 709–721 (2014).
Article CAS PubMed PubMed Central Google Scholar
Park, P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009).
Article CAS PubMed PubMed Central Google Scholar
Teytelman, L., Thurtle, D.M., Rine, J. & van Oudenaarden, A. Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. Proc. Natl. Acad. Sci. USA 110, 18602–18607 (2013).
Article CAS PubMed PubMed Central Google Scholar
The difficulty of a fair comparison. Nat. Methods 12, 273 (2015).
Davis, J. & Goadrich, M. The relationship between precision-recall and ROC curves. Proc. 23rd International Conference on Machine Learning—ICML 2006 233–240 (2006).
Tewari, A.K. et al. Chromatin accessibility reveals insights into androgen receptor activation and transcriptional specificity. Genome Biol. 13, R88 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sharp, Z.D. et al. Estrogen-receptor-alpha exchange and chromatin dynamics are ligand- and domain-dependent. J. Cell Sci. 119, 4101–4116 (2006).
Article CAS PubMed Google Scholar
McNally, J.G., Müller, W.G., Walker, D., Wolford, R. & Hager, G.L. The glucocorticoid receptor: rapid exchange with regulatory sites in living cells. Science 287, 1262–1265 (2000).
Article CAS PubMed Google Scholar
Malnou, C.E. et al. Heterodimerization with different Jun proteins controls c-Fos intranuclear dynamics and distribution. J. Biol. Chem. 285, 6552–6562 (2010).
Article CAS PubMed PubMed Central Google Scholar
Nakahashi, H. et al. A genome-wide map of CTCF multivalency redefines the CTCF code. Cell Rep. 3, 1678–1689 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lazarovici, A. et al. Probing DNA shape and methylation state on a genomic scale with DNase I. Proc. Natl. Acad. Sci. USA 110, 6376–6381 (2013).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).
Article CAS PubMed PubMed Central Google Scholar
Yu, J. et al. An integrated network of androgen receptor, polycomb and TMPRSS2-ERG gene fusions in prostate cancer progression. Cancer Cell 17, 443–454 (2010).
Article CAS PubMed PubMed Central Google Scholar
Guertin, M.J., Zhang, X., Coonrod, S.A. & Hager, G.L. Transient estrogen receptor binding and p300 redistribution support a squelching mechanism for estradiol-repressed genes. Mol. Endocrinol. 28, 1522–1533 (2014).
Article CAS PubMed PubMed Central Google Scholar
John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat. Genet. 43, 264–268 (2011).
Article CAS PubMed PubMed Central Google Scholar
Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014).
Article CAS PubMed Google Scholar
Robasky, K. & Bulyk, M.L. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 39, D124–D128 (2011).
Article CAS PubMed Google Scholar
Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
Article CAS PubMed Google Scholar
Boyle, A.P., Guinney, J., Crawford, G.E. & Furey, T.S. F-seq: a feature density estimator for high-throughput sequence tags. Bioinformatics 24, 2537–2538 (2008).
Article CAS PubMed PubMed Central Google Scholar
Hesselberth, J.R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat. Methods 6, 283–289 (2009).
Article CAS PubMed PubMed Central Google Scholar
Sabo, P.J. et al. Discovery of functional noncoding elements by digital analysis of chromatin structure. Proc. Natl. Acad. Sci. USA 101, 16837–16842 (2004).
Article CAS PubMed PubMed Central Google Scholar
Madden, H.H. Comments on the Savitzky-Golay convolution method for least-squares fit smoothing and differentiation of digital data. Anal. Chem. 50, 1383–1386 (1978).
Article CAS Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).
Article CAS PubMed PubMed Central Google Scholar
Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stormo, G.D. DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000).
Article CAS PubMed Google Scholar
Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wilczynski, B., Dojer, N., Patelak, M. & Tiuryn, J. Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs. BMC Bioinformatics 10, 82 (2009).
Article CAS PubMed PubMed Central Google Scholar
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).
Article Google Scholar
Ritchie, M.E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
PubMed PubMed Central Google Scholar
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006).
Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).
Google Scholar

Download references

Acknowledgements

This work was supported by the Interdisciplinary Center for Clinical Research (IZKF Aachen), RWTH Aachen University Medical School, Aachen, Germany (to E.G.G., M.A. and I.G.C.), and the Excellence Initiative of the German Federal and State Governments and the German Research Foundation (grant GSC 111 to M.A. and I.G.C.).

Author information

Authors and Affiliations

IZKF Computational Biology Research Group, RWTH Aachen University Medical School, Aachen, Germany
Eduardo G Gusmao, Manuel Allhoff & Ivan G Costa
Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, Aachen, Germany
Eduardo G Gusmao, Martin Zenke & Ivan G Costa
Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University, Aachen, Germany
Manuel Allhoff & Ivan G Costa

Authors

Eduardo G Gusmao
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Allhoff
View author publications
You can also search for this author in PubMed Google Scholar
Martin Zenke
View author publications
You can also search for this author in PubMed Google Scholar
Ivan G Costa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.G.G., M.Z. and I.G.C. designed the research. E.G.G. wrote HINT program code. E.G.G., M.A. and I.G.C. analyzed data. E.G.G., M.Z. and I.G.C. wrote the manuscript.

Corresponding author

Correspondence to Ivan G Costa.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 FLR-Exp results for different cell-type pairs.

Correlation between Kolmogorov-Smirnov (KS) test statistics from FLR scores versus expression fold change for cell type pairs H1-hESC versus K562 (left), H1-hESC versus GM12878 (middle) and GM12878 versus K562 (right) for footprints predicted by: HINT-BC, DNase2TF, Neph and FLR (from top to bottom, respectively). We observe high FLR-Exp (Spearman correlation) values (r > 0.8) for all cases. Moreover, similar rankings of methods are obtained on the FLR-Exp for each cell pair: H1- hESC/K562 versus H1-hESC/GM12878 r = 0.99, H1-hESC/K562 versus GM12878/K562 r = 0.96 and H1-hESC/GM12878 versus GM12878/K562 r = 0.97.

Supplementary Figure 2 FLR-Exp results for different footprint quality metrics.

Correlation between Kolmogorov-Smirnov (KS) test statistics versus expression fold change for cell type pair H1- hESC versus K562 by evaluating either the FLR (left), FS (middle) and TC (right) as quality metric for the footprints. Footprints were predicted with HINT-BC, DNase2TF, Neph and FLR (from top to bottom, respectively). The use of FLR as quality metric presents the highest Spearman correlation values (FLR-Exp). On the other hand, TC exhibits small correlation values (r < 0.4) and presents several cases in which the signal of KS and fold change disagree (off diagonal points). Note that the use of FS also have a high average correlation with fold change expression on all evaluated data/methods (average r = 0.73) and indicates a ranking of footprint methods similar to FLR (r = 0.89). Therefore, FS can be used as an alternative to the FLR score as a footprint quality metric.

Supplementary Figure 3 Clustering of sequence bias estimates.

Ward's minimum variance clustering on pairwise Spearman correlation coefficient (r) of sequence bias estimates of all ENCODE's Tier 1 and 2 DNase-seq data sets and naked DNA DNase-seq data sets. DNase-seq experiments were based on single-hit (red), double-hit (blue) protocols or naked DNA (yellow). We observe a high average correlation between sequence biases estimated on DNase-seq data sets originated from the same protocol: single-hit = 0.89; double-hit = 0.84. Also, lower average correlation values are observed from experimental biases estimates from different protocols: single-hit versus double-hit = 0.39. The group of sequence bias estimates based on the three naked DNA data sets have an average correlation of 0.96.

Supplementary Figure 4 Correlation between the performance of methods and their OBS from the He et al. data set.

The x-axis represents the observed sequence bias. The y-axis represents the ratio between the AUC at 10% FPR for a particular method and the TC-Rank method. In accordance with He et al.¹, we observe that FS-Rank method has a high negative correlation (r = −0.4144; adjusted p-value < 0.001) with the sequence bias score, while no significant correlation is found for all other evaluated methods HINT, HINT-BCN, HINT-BC and PWM-Rank. It is important to notice that the correlation value for FS-Rank method differs from He et al.¹. This stems from a different strategy to find the DHSs and MPBSs used in the evaluation dataset. Nevertheless, we were able to observe a strong bias for the FS-Rank method as in He et al.¹.

1. He, H.H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat. Methods 11, 73−78 (2014).

Supplementary Figure 5 Evaluation of sequence bias correction strategies and CG-content contribution.

(a) Distribution of AUC (at 10% FPR) differences between HINT-BC and HINT; HINT-BCN and HINT; HINT-BC and HINT-BCN for all 233 TFs of the comprehensive dataset. TFs are ranked by the difference between HINT-BC and HINT-BCN. There is a clear increase in AUC values between sequence bias-corrected methods (HINT-BC and HINT-BCN) and the uncorrected method HINT (p-value < 10⁻³⁰; Mann- Whitney-Wilcoxon test). Moreover, HINT-BC has higher AUC values for all but seven TFs in the comparison with HINT-BCN. (b) CG content of TF motifs. We observe no correlation between CG content of the motifs and the individual AUC of each method: HINT r = 0.0144, HINT-BC r = 0.0254 and HINT-BCN r = 0.0108 (p-value > 0.05; Spearman correlation test). Furthermore, we observe no correlation between CG content of motifs and differences in AUC: HINT-BC − HINT-BCN r = 0.0188, HINT-BC − HINT r = 0.0724 and HINT-BCN − HINT r = 0.0644 (p-value > 0.05; Spearman correlation test).

Supplementary Figure 6 Average DNase-seq signals around selected TFs with ChIP-seq evidence in H1-hESC (DU) cells.

These TFs had the higher AUC gain between HINT-BC and HINT: (a) ATF3, (b) EGR1, (c) NRF1, (d) RAD21, (e) SP1 and (f) SP4. In the top panel of each graph, we show the strand-specific average DNase-seq signal on naked DNA DNase-seq experiments (MCF-7 cell type); the middle panel shows the strand-specific estimated DHS sequence bias signal; and the bottom panels shows the (1) uncorrected – observed DNase-seq I cleavage signal and (2) corrected – DNase-seq signal after the bias correction. Signals in the bottom graph were standardized to be in the interval [0,1]. The motif logo represents all underlying DNA sequences centered on the TFBSs. The bias correction led to a substantial change in the average DNase-seq sequence bias patterns surrounding several TFs. On EGR1, for instance, we observed that the bias-corrected DNase-seq signal presents three clear depletions, which fit the high affinity regions of EGR1 motif (two CC and one C). In contrast, EGR1 uncorrected DNase-seq signal presents a single peak in the center of the motif. The same observations can be made for other TFs, such as NRF1 (with affinity regions (C/G)(C/G)(G/C)C and G(G/C)(C/G)(C/G)C) and SP4 (with affinity region CGCCC). Such patterns reflect bias corrections which are clearly beneficial to footprinting method accuracy.

Supplementary Figure 7 Association between 6-mer CG content and DNase-seq sequence bias.

We sorted 6-mers by their bias estimates and grouped similar ranked 6-mers. We show scatter plots with CG content versus average sequence bias for 6-mer groups on DNase-seq data generated with the (a) single-hit (DU), (b) double-hit (UW) protocols and (c) naked DNA experiments. There is a strong positive correlation between DNase-seq sequence bias and CG content for all DHS sequence bias estimates from both single-hit and double-hit protocols (p-value < 0.01). Interestingly, we observe a negative correlation for two naked DNA experiments: K562 and IMR90 (p-value < 10⁻⁵).

Supplementary Figure 8 Analysis of footprint ranking strategies.

Distribution of AUC values (at 10% FPR) by using distinct ranking strategies for site centric methods: (a) BinDNase, (b) Centipede, (c) Cuellar, (d) FLR, (e) PIQ and (f) segmentation methods DNase2TF and Wellington. Ranking strategies (x-axis) are ranked by decreasing median AUC. The site-centric methods are tested based on probability (P) cutoffs of 0.8, 0.85, 0.9, 0.95, 0.99 and their own ranking strategy (Own rank). Segmentation methods are tested based on the TC metric ranking and their own ranking strategy (Own rank). Methods not shown in this figure do not contain an intrinsic ranking methodology. In all cases, using TC-based strategies/cutoff was significantly better than the original ranking of the methods (p-value < 10⁻¹²; Mann-Whitney-Wilcoxon test). Concerning site-centric methods, the use of a probability threshold (P) of 0.9 was best for all methods, with the exception of BinDNase, where 0.8 was best. The box plot depicts the distribution median value (middle dot) and first and third quartiles (box extremities). The whiskers represent the 1.5 IQR and external dots represent outliers (data greater than or smaller than 1.5 IQR).

Supplementary Figure 9 Accuracy of methods based on TF ChIP-seq evaluation strategy.

Accuracy distribution for all 15 footprinting methods regarding all TF ChIP-seq validation sets (ordered by Friedman Ranking). Accuracies are shown for the statistics: (a) AUC at 100% FPR (b) AUC at 10% FPR (c) AUC at 1% FPR and (d) AUPR. We used the Friedman-Nemenyi hypothesis test for statistical evaluation (see Supplementary Tables 3-6). The box plot depicts the distribution median value (middle dot) and first and third quartiles (box extremities). The whiskers represent the 1.5 IQR and external dots represent outliers (data greater than or smaller than 1.5 IQR).

Supplementary Figure 10 Average sequence bias and DNase-seq signals around nuclear receptors.

Results are shown for the TFs: (a) AR (R1881), (b) GR (with DEX), (c) ER (40 min) and (d) ER (160 min). In the top panel, we show the strand-specific average DNase-seq signal on naked DNA DNase-seq experiments (MCF-7 (DU) for data sets from single-hit and IMR90 (UW) for data sets with double-hit protocol); the middle panel shows the strand-specific estimated DHS sequence bias signal; and the bottom panels shows the (1) uncorrected – observed DNase-seq signal and (2) corrected – DNase-seq signal after the bias correction with the DHS sequence bias estimates. Signals in the bottom graph were standardized to be in the interval [0,1]. The motif logo represents all underlying DNA sequences centered on the TFBSs. While corrected DNase-seq profiles from ER have a better match with the underlying motif, this is not the case for AR and GR. However, we observed a small gain in the AUC score comparing HINT- BC and HINT. This difference is in the upper quartile range for all 233 TFs analyzed. These results indicate that cleavage bias correction also brings improvements to footprint prediction of nuclear receptors. However, all these TFs have low AUC scores in all footprinting methods, i.e. lower quartiles for HINT-BC or TC-Rank AUC scores. This indicates that short binding time indeed poses a challenge in footprint prediction.

Supplementary Figure 11 Average sequence bias and DNase-seq signals around binding sites of de novo motifs found using Neph footprints.

Results are shown for de novo motifs: (a) #0458 and (b) #0500 binding on cell type H7-hESC (UW). In the top panel, we show the strand-specific average DNase-seq signal on naked DNA DNase-seq experiments (MCF-7 cell type); the middle panel shows the strand-specific estimated DHS sequence bias signal; and the bottom panels shows the (1) uncorrected – observed DNase-seq signal and (2) corrected – DNase-seq signal after the bias correction using DHS sequence bias estimates. Signals in the bottom graph were standardized to be in the interval [0,1]. The motif logo represents all underlying DNA sequences centered on the TFBSs. These motifs were discovered in the footprint analysis of Neph et al.¹ and indicated in He et al.² to be artifacts of sequence bias. Bias-corrected DNase-seq profiles reveal no clear footprint shape. Furthermore, we compared the overlap between footprints generated by HINT-BC and Neph in H7-hESC (UW) cells. We considered only the MPBSs that overlapped DHSs in H7-hESC. We observed that 24.99% (motif #0458) and 28.58% (motif #0500) of MPBSs were associated with a Neph footprint. In contrast, only 0.73% (motif #0458) and 1.71% (motif #0500) of MPBSs overlapped with a HINT-BC footprint. Altogether, this indicates that these motifs are indeed potential artifacts of sequence bias and reinforces the importance of bias correction prior to any DNase-seq analysis.

1. Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012). 2. He, H.H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat. Methods 11, 73−78 (2014).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gusmao, E., Allhoff, M., Zenke, M. et al. Analysis of computational footprinting methods for DNase sequencing experiments. Nat Methods 13, 303–309 (2016). https://doi.org/10.1038/nmeth.3772

Download citation

Received: 31 July 2015
Accepted: 27 January 2016
Published: 22 February 2016
Issue Date: April 2016
DOI: https://doi.org/10.1038/nmeth.3772

This article is cited by

RGT: a toolbox for the integrative analysis of high throughput regulatory genomics data
- Zhijian Li
- Chao-Chung Kuo
- Ivan G. Costa
BMC Bioinformatics (2023)
Epigenetic landscape of drug responses revealed through large-scale ChIP-seq data analyses
- Zhaonan Zou
- Michio Iwata
- Shinya Oki
BMC Bioinformatics (2022)
Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA
- Shengen Shawn Hu
- Lin Liu
- Chongzhi Zang
Nature Communications (2022)
Profiling of chromatin accessibility identifies transcription factor binding sites across the genome of Aspergillus species
- Lianggang Huang
- Xuejie Li
- Li Pan
BMC Biology (2021)
Transcriptional changes and the role of ONECUT1 in hPSC pancreatic differentiation
- Sandra Heller
- Zhijian Li
- Ivan G. Costa
Communications Biology (2021)