Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Improving representations of genomic sequence motifs in convolutional networks with exponential activations

A preprint version of the article is available at bioRxiv.


Deep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs. Here we perform a comprehensive analysis of synthetic sequences to investigate the role that CNN activations have on model interpretability. We show that employing an exponential activation in the first layer filters consistently leads to interpretable and robust representations of motifs compared with other commonly used activations. Strikingly, we demonstrate that CNNs with better test performance do not necessarily imply more interpretable representations with attribution methods. We find that CNNs with exponential activations significantly improve the efficacy of recovering biologically meaningful representations with attribution methods. We demonstrate that these results generalize to real DNA sequences across several in vivo datasets. Together, this work demonstrates how a small modification to existing CNNs (that is, setting exponential activations in the first layer) can substantially improve the robustness and interpretabilty of learned representations directly in convolutional filters and indirectly with attribution methods.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Motif representation performance.
Fig. 2: Interpretability performance of saliency maps.
Fig. 3: Attribution score comparison for real regulatory DNA sequences.

Data availability

Data for tasks 1, 3, 5 and 6, and code to generate data for Task 2, are available at Data for Task 4 are available via ref. 1.

Code availability

Code to reproduce results and figures is available at


  1. Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–998 (2016).

    Article  Google Scholar 

  2. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  Google Scholar 

  3. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).

    Article  Google Scholar 

  4. Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell 178, 91–106 (2019).

    Article  Google Scholar 

  5. Koo, P. K. & Ploenzke, M. Deep learning for inferring transcription factor binding sites. Curr. Opin. Syst. Biol. 19, 16–23 (2020).

    Article  Google Scholar 

  6. Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at (2013).

  7. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning Vol. 70, 3319–3328 (ICML, 2017).

  8. Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning Vol. 70, 3145–3153 (ICML, 2017).

  9. Lundberg, S. & Lee, S. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 4765–4774 (NeurIPS, 2017).

  10. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–8 (2015).

    Article  Google Scholar 

  11. Selvaraju, R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).

  12. Jha, A., Aicher, J. K., Gazzara, M. R., Singh, D. & Barash, Y. Enhanced integrated gradients: improving interpretability of deep learning models using splicing codes as a case study. Genome Biol. 21, 1–22 (2020).

    Article  Google Scholar 

  13. Erhan, D., Bengio, Y., Courville, A. & Vincent, P. Visualizing higher-layer features of a deep network. In ICML Workshop on Learning Feature Hierarchies Vol. 1341 (ICML, 2009).

  14. Yosinski, J., Clune, J., Nguyen, A., Fuchs, T. & Lipson, H. Understanding neural networks through deep visualization. Preprint at (2015).

  15. Lanchantin, J., Singh, R., Lin, Z. & Qi, Y. Deep motif: visualizing genomic sequence classifications. Preprint at (2016).

  16. Shrikumar, A. et al. echnical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5. 1.1. Preprint at (2018).

  17. Koo, P., Qian, S., Kaplun, G., Volf, V. & Kalimeris, D. Robust neural networks are more interpretable for genomics. Preprint at (2019).

  18. Koo, P. K. & Eddy, S. R. Representation learning of genomic sequence motifs with convolutional neural networks. PLoS Comput. Biol. (2019).

  19. Ploenzke, M. & Irizarry, R. Interpretable convolution methods for learning genomic sequence motifs. Preprint at (2018).

  20. Raghu, M., Poole, B., Kleinberg, J., Ganguli, S. & Sohl-Dickstein, J. On the expressive power of deep neural networks. Preprint at (2016).

  21. Kelley, D. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–50 (2018).

    Article  Google Scholar 

  22. Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning, 807–814 (2010).

  23. Dugas, C., Bengio, Y., Belisle, F., Nadeau, C. & Garcia, R. Incorporating second-order functional knowledge for better option pricing. In Advances in Neural Information Processing Systems 472–478 (NeurIPS, 2001).

  24. Clevert, D. A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). Preprint at (2015).

  25. Pennington, J., Schoenholz, S. & Ganguli, S. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. In Advances in Neural Information Processing Systems 4785–4795 (NeurIPS, 2017).

  26. Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

  27. Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. 13th International Conference on Artificial Intelligence and Statistics Vol. 9, 249–256 (AISTATS, 2010).

  28. He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In IEEE International Conference on Computer Vision 1026–1034 (IEEE, 2015).

  29. LeCun, Y. A., Bottou, L., Orr, G. B. & Müller, K.-R. in Neural networks: Tricks of the Trade 9–48 (Springer, 2012).

  30. Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. Self-normalizing neural networks. In Advances in Neural Information Processing Systems 971–980 (NeurIPS, 2017).

  31. Siggers, T. & Gordan, R. Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42, 2099–2111 (2014).

    Article  Google Scholar 

  32. Stormo, G. D., Schneider, T. D., Gold, L. & Ehrenfeucht, A. Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10, 2997–3011 (1982).

    Article  Google Scholar 

  33. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    Article  Google Scholar 

  34. Grant, C. E., Bailey, T. L. & Noble, W. S. Fimo: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    Article  Google Scholar 

  35. Inukai, S., Kock, K. H. & Bulyk, M. L. Transcription factor-DNA binding: beyond binding site motifs. Curr. Opin. Genet. Dev. 43, 110–119 (2017).

    Article  Google Scholar 

  36. Simcha, D., Price, N. D. & Geman, D. The limits of de novo DNA motif discovery. PLoS One 7, e47836 (2012).

  37. Kulkarni, M. M. & Arnosti, D. N. Information display by transcriptional enhancers. Development 130, 6569–6575 (2003).

    Article  Google Scholar 

  38. Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–99 (2014).

    Article  Google Scholar 

  39. Tsipras, D., Santurkar, S., Engstrom, L., Turner, A. & Madry, A. Robustness may be at odds with accuracy. Preprint at (2018).

  40. Adebayo, J. et al. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems 9505–9515 (NeurIPS, 2018).

  41. Sixt, L., Granz, M. & Landgraf, T. When explanations lie: why modified BP attribution fails. Preprint at (2019).

  42. Adebayo, J., Gilmer, J., Goodfellow, I. & Kim, B. Local explanation methods for deep neural networks lack sensitivity to parameter values. Preprint at (2018).

  43. Piper, M., Gronostajski, R. & Messina, G. Nuclear factor one X in development and disease. Trends Cell Biol. 29, 20–30 (2019).

    Article  Google Scholar 

  44. Forrest, M. P. et al. The emerging roles of TCF4 in disease and development. Trends Mol. Med. 20, 322–331 (2014).

    Article  Google Scholar 

  45. Wei, B. et al. A protein activity assay to measure global transcription factor activity reveals determinants of chromatin accessibility. Nat. Biotechnol. 36, 521–529 (2018).

    Article  Google Scholar 

  46. Koo, P. K., Ploenzke, M., Anand, P., Paul, S. & Majdandzic, A. Global importance analysis: a method to quantify importance of genomic features in deep neural networks. Preprint at (2020).

  47. Mathelier, A. et al. Jaspar 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016).

    Article  Google Scholar 

  48. Consortium, E. P. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  Google Scholar 

  49. Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  Google Scholar 

  50. Vakoc, C R. ZBED2 is an antagonist of interferon regulatory factor 1 and modifies cell identity in pancreatic cancer. Proc. Natl Acad. Sci. USA 117, 11471–11482 (2020).

    Article  Google Scholar 

  51. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at (2015).

  52. Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    MathSciNet  MATH  Google Scholar 

  53. Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at (2014).

  54. Tareen, A. & Kinney, J. Logomaker: beautiful sequence logos in python. Preprint at (2019).

Download references


This work was supported in part by funding from the NCI Cancer Center Support Grant (CA045508) and the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. M.P. was supported by NIH NCI RFA-CA-19-002. The authors would like to thank D. Krotov, who provided inspiration for the exponential activation. We would also like to thank J. Kinney, A. Tareen and the members of the Koo laboratory for helpful discussions.

Author information

Authors and Affiliations



P.K.K. conceived of the experiments. P.K.K. and M.P. conducted the experiments. P.K.K. and M.P. analysed the results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Peter K. Koo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Smita Krishnaswamy and the other, anonymous reviewer(s), for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Task 1 motif representations for CNNs with modified activations.

a, Boxplot of the fraction of filters that match ground truth motifs for different CNNs with traditional and modified activations. b, Boxplot of the fraction of filters that match ground truth motifs for an ablation study of transformations for modified activations. c, First layer filter scans from CNN-deep with relu activations (top) and exponential activations (middle). Each colour represents a different filter. d, Motif scans (top) and PWM scans (middle) using ground truth motifs and their reverse-complements (each colour represents a different filter scan). Negative PWM scan values were rectified to a value of zero. c,d, The information content of the sequence model used to generate the synthetic sequence (ground truth), which has 3 embedded motifs centred at positions 15, 85, and 150, is shown at the bottom. e, Boxplot of the fraction of filters that match ground truth motifs for CNN-deep with various activations: log activations trained with and without L2-regularization (Log-Relu-L2 and Log-Relu, respectively) and relu activations with and without L2-regularization. a,b,e, Each boxplot represents the performance across 10 models trained with different random intialisations (box represents first and third quartile and the red line represents the median).

Extended Data Fig. 2 Interpretability performance comparison of different attribution methods.

Boxplots of the interpretability AUROC (a) and AUPR b, for CNN-local (top) and CNN-dist (bottom) with relu activations (left) and exponential activations (right) for different attribution methods. Each boxplot represents the performance across 10 models trained with different random intialisations (box represents first and third quartile and the red line represents the median). Sequence logo of a saliency map for a Task 3 test sequence generated with different attribution methods for CNN-deep with relu activations (c) and exponential activations (d). The right y-axis label shows the interpretability AUROC score. c-d, The sequence logo for the ground truth sequence model is shown at the bottom.

Supplementary information

Supplementary Information

Supplementary Tables 1–8 and Figs. 1–16.

Reporting Summary

Supplementary Data

TomTom results for Task 2 and Task 4.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Koo, P.K., Ploenzke, M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat Mach Intell 3, 258–266 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing