Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Evaluating deep learning for predicting epigenomic profiles

A preprint version of the article is available at bioRxiv.


Deep learning has been successful at predicting epigenomic profiles from DNA sequences. Most approaches frame this task as a binary classification relying on peak callers to define functional activity. Recently, quantitative models have emerged to directly predict the experimental coverage values as a regression. As new models with different architectures and training configurations continue to emerge, a major bottleneck is forming due to the lack of ability to fairly assess the novelty of proposed models and their utility for downstream biological discovery. Here we introduce a unified evaluation framework and use it to compare various binary and quantitative models trained to predict chromatin accessibility data. We highlight various modelling choices that affect generalization performance, including a downstream application of predicting variant effects. In addition, we introduce a robustness metric that can be used to enhance model selection and improve variant effect predictions. Our empirical study largely supports that quantitative modelling of epigenomic profiles leads to better generalizability and interpretability.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: GOPHER overview.
Fig. 2: Evaluation of Basenji-based quantitative models.
Fig. 3: Testing model robustness against translational shifts.
Fig. 4: Performance comparison between binary and quantitative models.
Fig. 5: Comparison of functional effect predictions.
Fig. 6: GIA for ResidualBind-32 on PC-3 cell line.

Data availability

The processed ATAC-seq data, JASPAR 2022 core motifs for vertebrates data set and CAGI5 challenge data set used that support the findings of this study are available in Zenodo at

Code availability

The code to reproduce results and figures in this study is available in Zenodo at The open-source project repository is available in GitHub at A stable version of the code used for generating the results in the paper is available in Zenodo at


  1. Quang, D. & Xie, X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods 166, 40–47 (2019).

    Article  Google Scholar 

  2. Li, H., Quang, D. & Guan, Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res. 29, 281–292 (2019).

    Article  Google Scholar 

  3. Zheng, A. et al. Deep neural networks identify sequence context features predictive of transcription factor binding. Nat. Mach. Intell. 3, 172–180 (2021).

    Article  Google Scholar 

  4. Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    Article  Google Scholar 

  5. Minnoye, L. et al. Cross-species analysis of enhancer logic using deep learning. Genome Res. 30, 1815–1834 (2020).

    Article  Google Scholar 

  6. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  Google Scholar 

  7. Yin, Q., Wu, M., Liu, Q., Lv, H. & Jiang, R. DeepHistone: a deep learning approach to predicting histone modifications. BMC Genomics (2019).

  8. Dey, K. K. et al. Evaluating the informativeness of deep learning annotations for human complex diseases. Nat. Commun. 11, 4703 (2020).

  9. Cheng, J., Çelik, M. H., Kundaje, A. & Gagneur, J. Mtsplice predicts effects of genetic variants on tissue-specific splicing. Genome Biol. 22, 1–19 (2021).

    Google Scholar 

  10. Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).

    Article  Google Scholar 

  11. Park, C. Y. et al. Genome-wide landscape of rna-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat. Genet. 53, 166–173 (2021).

    Article  Google Scholar 

  12. Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).

    Article  Google Scholar 

  13. Kim, D. S. et al. The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. Nat. Genet. 53, 1564–1576 (2021).

    Article  Google Scholar 

  14. Novakovsky, G., Saraswat, M., Fornes, O., Mostafavi, S. & Wasserman, W. W. Biologically relevant transfer learning improves transcription factor binding prediction. Genome Biol. 22, 1–25 (2021).

    Article  Google Scholar 

  15. Atak, Z. K. et al. Interpretation of allele-specific chromatin accessibility using cell state-aware deep learning. Genome Res. 31, 1082–1096 (2021).

    Article  Google Scholar 

  16. Li, J., Pu, Y., Tang, J., Zou, Q. & Guo, F. DeepATT: a hybrid category attention neural network for identifying functional effects of DNA sequences. Brief. Bioinform. 22, bbaa159 (2021).

    Article  Google Scholar 

  17. Karbalayghareh, A., Sahin, M. & Leslie, C. S. Chromatin interaction-aware gene regulatory modeling with graph attention networks. Genome Res. 32, 930–944 (2022).

    Google Scholar 

  18. Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022).

    Article  Google Scholar 

  19. Janssesn, J. et al. Decoding gene regulation in the fly brain. Nature 601, 630–636 (2022).

    Article  Google Scholar 

  20. Vaishnav, E. D. et al. The evolution, evolvability and engineering of gene regulatory DNA. Nature 603, 455–463 (2022).

    Article  Google Scholar 

  21. Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet. 54, 725–734 (2022).

    Article  Google Scholar 

  22. Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).

    Article  Google Scholar 

  23. Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).

    Article  Google Scholar 

  24. Maslova, A. et al. Deep learning of immune cell differentiation. Proc. Natl Acad. Sci. USA 117, 25655–25666 (2020).

    Article  Google Scholar 

  25. Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).

    Article  Google Scholar 

  26. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  Google Scholar 

  27. Koohy, H., Down, T. A., Spivakov, M. & Hubbard, T. A comparison of peak callers used for DNase-seq data. PLoS ONE 9, e96303 (2014).

    Article  Google Scholar 

  28. Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. Towards deep learning models resistant to adversarial attacks. In Proc. International Conference on Learning Representations (2018).

  29. Cohen, J., Rosenfeld, E. & Kolter, Z. Certified adversarial robustness via randomized smoothing. In Proc. International Conference on Machine Learning (ed. Chaudhuri, K. and Salakhutdinov, R.) 97, 1310–1320 (PMLR, 2019).

  30. Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In Proc. 3rd International Conference on Learning Representations (ed. Bengio, Y., LeCun, Y.) (2015).

  31. Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proc. International Conference on Learning Representations (ed. Bengio, Y., LeCun, Y.) (2016).

  32. Koo, P. K., Majdandzic, A., Ploenzke, M., Anand, P. & Paul, S. B. Global importance analysis: an interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput. Biol. 17, e1008925 (2021).

    Article  Google Scholar 

  33. Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019).

  34. Shigaki, D. et al. Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. Hum. Mutat. 40, 1280–1291 (2019).

    Article  Google Scholar 

  35. Koo, P. K. & Eddy, S. R. Representation learning of genomic sequence motifs with convolutional neural networks. PLoS Comput. Biol. 15, e1007560 (2019).

    Article  Google Scholar 

  36. Koo, P. K. & Ploenzke, M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat. Mach. Intell. 3, 258–266 (2021).

    Article  Google Scholar 

  37. Ghotra, R., Lee, N. K., Tripathy, R. & Koo, P. K. Designing interpretable convolution-based hybrid networks for genomics. Preprint at bioRxiv (2021).

  38. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    Article  Google Scholar 

  39. Mauduit, D. et al. Analysis of long and short enhancers in melanoma cell states. eLife 10, e71735 (2021).

    Article  Google Scholar 

  40. de Almeida, B. P., Reiter, F., Pagani, M. & Stark, A. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022).

    Article  Google Scholar 

  41. Stormo, G. D., Schneider, T. D., Gold, L. & Ehrenfeucht, A. Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10, 2997–3011 (1982).

    Article  Google Scholar 

  42. Le, D. D. et al. Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. Proc. Natl Acad. Sci. USA 115, E3702–E3711 (2018).

    Article  Google Scholar 

  43. Levo, M. et al. Unraveling determinants of transcription factor binding outside the core binding site. Genome Res. 25, 1018–1029 (2015).

    Article  Google Scholar 

  44. Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).

    Article  Google Scholar 

  45. Vaswani, A. et al. Attention is all you need. In International Conference on Neural Information Processing Systems Vol. 30, 5998–6008 (Curran Associates, 2017).

  46. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21–29 (2015).

    Article  Google Scholar 

  47. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  48. Yu, F., Koltun, V. & Funkhouser, T. Dilated residual networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 472–480 (Curran Associates, 2017).

  49. Hendrycks, D. & Gimpel, K. Gaussian error linear units (GeLUs). Preprint at (2016).

  50. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. International Conference on Machine Learning 448–456 (ed. Bach F., Blei D.M.) (PMLR, 2015).

  51. Kingma, D. & Ba, J. Adam: A method for stochastic optimization. In Proc. International Conference on Learning Representations (ed. Bengio, Y., LeCun, Y.) (2015).

  52. Biewald, L. Software for experiment tracking with weights and biases. Weights & Biases (2020).

  53. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (Curran Associates, 2016).

  54. Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, 1–9 (2007).

    Article  Google Scholar 

  55. Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2021).

    Article  Google Scholar 

  56. Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Workshop Track Proc. International Conference on Learning Representations (ed. Bengio, Y., LeCun, Y.) (2014).

  57. Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).

    Article  Google Scholar 

  58. Toneyan, S., Tang, Z. & Koo, P. Evaluating deep learning for predicting epigenomic profiles. Zenodo (2022).

  59. Toneyan, S., Tang, Z. & Kaczmarzyk, J. shtoneyan/gopher: stable. Zenodo (2022).

Download references


This work was supported in part by funding from NIH grant R01HG012131 and the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. This work was performed with assistance from the US National Institutes of Health grants S10OD028632-01. We thank J. Kaczmarzyk and other members of the Koo laboratory for helpful discussions.

Author information

Authors and Affiliations



S.T., Z.T. and P.K.K. conceived the experiments. S.T. and Z.T. wrote the code base, conducted the experiments and analysed and interpreted the results. All authors contributed to the manuscript.

Corresponding author

Correspondence to Peter K. Koo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Anat Kreimer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of BPNet-based quantitative models.

(a) Loss function analysis. Scatter plot of the whole-chromosome Pearson’s r versus the MSE for different loss functions (shown in a different color) and different target resolutions (shown in a different marker). The results for the scaled Pearson’s r loss function were removed due to poor training runs. (b) Bin resolution analysis. Plot of the whole-chromosome Pearson’s r for models trained on a given bin size (y-axis) with predictions that were systematically down-sampled to a lower resolution for evaluation (x-axis). (a, b) Pearson’s r represents the average across cell lines.

Extended Data Fig. 2 The effect of smoothing coverage on performance.

Basenji-based models were trained on target resolutions (y-axis) and evaluated using different levels of smoothing with a box-car filter. For each higher resolution model, a box-car filter was applied to both predictions and experimental coverage values with various kernel sizes prior to calculating the average Pearson’s r (x-axis). Pearson’s r represents the average across cell lines.

Extended Data Fig. 3 Performance comparison between quantitative and binary models.

Scatter plot of the classification-based AUPR versus the regression-based Pearson’s r for various binary models (blue) and quantitative models (orange) on peak-centred test data (left) and whole-chromosome test data (right). Metrics represent the average across cell lines.

Extended Data Fig. 4 GIA for optimal flanking nucleotides of motifs in PC-3 cell line for various models.

Ranked plot of the global importance for each tested flank for AP-1 motif (left column), ATAAA motif (middle column) and GATA (right column) for different models (shown in a different row). Dashed line represents the global importance of the core motif with random flanks. The hue in the first column represents the position-weight-matrix score for an AP-1 motif from the JASPAR database (ID: MA0491.1). The first 3 rows are quantitative models, the rest are binary models (with (exp) in the name indicating that the first layer ReLU activation has been replaced with an exponential function). For binary models, the results are based on the logits before the output sigmoid activation. The hue in the first column plots represents the PWM score for an AP-1 motif from the JASPAR database (ID: MA0491.1). The black dot in each plot (in the first column) indicates ‘TGTGATTCATG’, which has a high PWM score (12.800) but yields a global importance close to the core motif with randomized flanks.

Extended Data Fig. 5 GIA for distance dependence between AP-1 and other motifs for PC-3 cell line for various models.

Global importance plot for sequences with an AP-1 motif fixed at the centre of the sequence and another motif that is systematically placed in different locations. Positive and negative values represent the first positions the motifs w/ optimized flanks were embedded to be non-overlapping. First column shows results where the second motif is an identical AP-1 motif, the centre column shows results for ATAAA motif and right column for the GATA motif. All the motifs were embedded with optimized flanks. Red vertical dashed lines indicate the 1024bp position. Each row corresponds to a different trained model, the first 3 are quantitative models, the rest are binary models (with (exp) in the name indicating that the first layer ReLU activation has been replaced with an exponential function). For binary models, the results are based on the logits before the output sigmoid activation.

Extended Data Fig. 6 GIA for cooperative interactions between AP-1 and other motifs for PC-3 cell line for various models.

Each column corresponds to a motif pair between two copies of AP-1, ATAAA and AP-1 and AP-1 and GATA. Each row corresponds to a different trained model, the first 3 are quantitative models, the rest are binary models (with (exp) in the name indicating that the first layer ReLU activation has been replaced with an exponential function). For binary models, the results are based on the logits before the output sigmoid activation. Blue and orange box-plots show the global importance scores for the 1000 sampled sequences when motif 1 or motif 2 is individually embedded. Green box-plot shows the case when both motifs are embedded in the same sequence. Red box-plot shows the sum of the green and blue boxes as an estimate of the global importance if there is no interaction. The pairs were embedded at the optimal distance specified from the distance dependence GIA experiments. Box plots show the first and third quartiles, central line is the median, and the whiskers show the range of data with outliers removed. For each motif pair experiment n=1000 independent samples were drawn from the test set sequences.

Supplementary information

Supplementary Information

Supplementary Tables 1–6, Figs. 1–4 and Notes 1–3.

Reporting Summary

Supplementary Data 1

Full evaluation of model prediction performance.

Supplementary Data 2

Model architecture and hyperparameters.

Supplementary Data 3

CAGI5 data set prediction performance.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Toneyan, S., Tang, Z. & Koo, P.K. Evaluating deep learning for predicting epigenomic profiles. Nat Mach Intell 4, 1088–1100 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing