Deep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs. Here we perform a comprehensive analysis of synthetic sequences to investigate the role that CNN activations have on model interpretability. We show that employing an exponential activation in the first layer filters consistently leads to interpretable and robust representations of motifs compared with other commonly used activations. Strikingly, we demonstrate that CNNs with better test performance do not necessarily imply more interpretable representations with attribution methods. We find that CNNs with exponential activations significantly improve the efficacy of recovering biologically meaningful representations with attribution methods. We demonstrate that these results generalize to real DNA sequences across several in vivo datasets. Together, this work demonstrates how a small modification to existing CNNs (that is, setting exponential activations in the first layer) can substantially improve the robustness and interpretabilty of learned representations directly in convolutional filters and indirectly with attribution methods.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Code to reproduce results and figures is available at https://doi.org/10.5281/zenodo.4301062.
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–998 (2016).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).
Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell 178, 91–106 (2019).
Koo, P. K. & Ploenzke, M. Deep learning for inferring transcription factor binding sites. Curr. Opin. Syst. Biol. 19, 16–23 (2020).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at https://arxiv.org/abs/1312.6034 (2013).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning Vol. 70, 3319–3328 (ICML, 2017).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning Vol. 70, 3145–3153 (ICML, 2017).
Lundberg, S. & Lee, S. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 4765–4774 (NeurIPS, 2017).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–8 (2015).
Selvaraju, R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).
Jha, A., Aicher, J. K., Gazzara, M. R., Singh, D. & Barash, Y. Enhanced integrated gradients: improving interpretability of deep learning models using splicing codes as a case study. Genome Biol. 21, 1–22 (2020).
Erhan, D., Bengio, Y., Courville, A. & Vincent, P. Visualizing higher-layer features of a deep network. In ICML Workshop on Learning Feature Hierarchies Vol. 1341 (ICML, 2009).
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T. & Lipson, H. Understanding neural networks through deep visualization. Preprint at https://arxiv.org/abs/1506.06579 (2015).
Lanchantin, J., Singh, R., Lin, Z. & Qi, Y. Deep motif: visualizing genomic sequence classifications. Preprint at https://arxiv.org/abs/1605.01133 (2016).
Shrikumar, A. et al. echnical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5. 1.1. Preprint at https://arxiv.org/abs/1811.00416 (2018).
Koo, P., Qian, S., Kaplun, G., Volf, V. & Kalimeris, D. Robust neural networks are more interpretable for genomics. Preprint at https://www.biorxiv.org/content/10.1101/657437v1 (2019).
Koo, P. K. & Eddy, S. R. Representation learning of genomic sequence motifs with convolutional neural networks. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1007560 (2019).
Ploenzke, M. & Irizarry, R. Interpretable convolution methods for learning genomic sequence motifs. Preprint at https://www.biorxiv.org/content/10.1101/411934v1 (2018).
Raghu, M., Poole, B., Kleinberg, J., Ganguli, S. & Sohl-Dickstein, J. On the expressive power of deep neural networks. Preprint at https://arxiv.org/abs/1606.05336 (2016).
Kelley, D. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–50 (2018).
Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning, 807–814 (2010).
Dugas, C., Bengio, Y., Belisle, F., Nadeau, C. & Garcia, R. Incorporating second-order functional knowledge for better option pricing. In Advances in Neural Information Processing Systems 472–478 (NeurIPS, 2001).
Clevert, D. A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). Preprint at https://arxiv.org/abs/1511.07289 (2015).
Pennington, J., Schoenholz, S. & Ganguli, S. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. In Advances in Neural Information Processing Systems 4785–4795 (NeurIPS, 2017).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. 13th International Conference on Artificial Intelligence and Statistics Vol. 9, 249–256 (AISTATS, 2010).
He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In IEEE International Conference on Computer Vision 1026–1034 (IEEE, 2015).
LeCun, Y. A., Bottou, L., Orr, G. B. & Müller, K.-R. in Neural networks: Tricks of the Trade 9–48 (Springer, 2012).
Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. Self-normalizing neural networks. In Advances in Neural Information Processing Systems 971–980 (NeurIPS, 2017).
Siggers, T. & Gordan, R. Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42, 2099–2111 (2014).
Stormo, G. D., Schneider, T. D., Gold, L. & Ehrenfeucht, A. Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10, 2997–3011 (1982).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Grant, C. E., Bailey, T. L. & Noble, W. S. Fimo: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Inukai, S., Kock, K. H. & Bulyk, M. L. Transcription factor-DNA binding: beyond binding site motifs. Curr. Opin. Genet. Dev. 43, 110–119 (2017).
Simcha, D., Price, N. D. & Geman, D. The limits of de novo DNA motif discovery. PLoS One 7, e47836 (2012).
Kulkarni, M. M. & Arnosti, D. N. Information display by transcriptional enhancers. Development 130, 6569–6575 (2003).
Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–99 (2014).
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A. & Madry, A. Robustness may be at odds with accuracy. Preprint at https://arxiv.org/abs/1805.12152 (2018).
Adebayo, J. et al. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems 9505–9515 (NeurIPS, 2018).
Sixt, L., Granz, M. & Landgraf, T. When explanations lie: why modified BP attribution fails. Preprint at https://arxiv.org/abs/1912.09818 (2019).
Adebayo, J., Gilmer, J., Goodfellow, I. & Kim, B. Local explanation methods for deep neural networks lack sensitivity to parameter values. Preprint at https://arxiv.org/abs/1810.03307 (2018).
Piper, M., Gronostajski, R. & Messina, G. Nuclear factor one X in development and disease. Trends Cell Biol. 29, 20–30 (2019).
Forrest, M. P. et al. The emerging roles of TCF4 in disease and development. Trends Mol. Med. 20, 322–331 (2014).
Wei, B. et al. A protein activity assay to measure global transcription factor activity reveals determinants of chromatin accessibility. Nat. Biotechnol. 36, 521–529 (2018).
Koo, P. K., Ploenzke, M., Anand, P., Paul, S. & Majdandzic, A. Global importance analysis: a method to quantify importance of genomic features in deep neural networks. Preprint at https://www.biorxiv.org/content/10.1101/2020.09.08.288068v1 (2020).
Mathelier, A. et al. Jaspar 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016).
Consortium, E. P. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Vakoc, C R. ZBED2 is an antagonist of interferon regulatory factor 1 and modifies cell identity in pancreatic cancer. Proc. Natl Acad. Sci. USA 117, 11471–11482 (2020).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/abs/1502.03167 (2015).
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Tareen, A. & Kinney, J. Logomaker: beautiful sequence logos in python. Preprint at https://www.biorxiv.org/content/10.1101/635029v1 (2019).
This work was supported in part by funding from the NCI Cancer Center Support Grant (CA045508) and the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. M.P. was supported by NIH NCI RFA-CA-19-002. The authors would like to thank D. Krotov, who provided inspiration for the exponential activation. We would also like to thank J. Kinney, A. Tareen and the members of the Koo laboratory for helpful discussions.
The authors declare no competing interests.
Peer review information Nature Machine Intelligence thanks Smita Krishnaswamy and the other, anonymous reviewer(s), for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Boxplot of the fraction of filters that match ground truth motifs for different CNNs with traditional and modified activations. b, Boxplot of the fraction of filters that match ground truth motifs for an ablation study of transformations for modified activations. c, First layer filter scans from CNN-deep with relu activations (top) and exponential activations (middle). Each colour represents a different filter. d, Motif scans (top) and PWM scans (middle) using ground truth motifs and their reverse-complements (each colour represents a different filter scan). Negative PWM scan values were rectified to a value of zero. c,d, The information content of the sequence model used to generate the synthetic sequence (ground truth), which has 3 embedded motifs centred at positions 15, 85, and 150, is shown at the bottom. e, Boxplot of the fraction of filters that match ground truth motifs for CNN-deep with various activations: log activations trained with and without L2-regularization (Log-Relu-L2 and Log-Relu, respectively) and relu activations with and without L2-regularization. a,b,e, Each boxplot represents the performance across 10 models trained with different random intialisations (box represents first and third quartile and the red line represents the median).
Boxplots of the interpretability AUROC (a) and AUPR b, for CNN-local (top) and CNN-dist (bottom) with relu activations (left) and exponential activations (right) for different attribution methods. Each boxplot represents the performance across 10 models trained with different random intialisations (box represents first and third quartile and the red line represents the median). Sequence logo of a saliency map for a Task 3 test sequence generated with different attribution methods for CNN-deep with relu activations (c) and exponential activations (d). The right y-axis label shows the interpretability AUROC score. c-d, The sequence logo for the ground truth sequence model is shown at the bottom.
About this article
Cite this article
Koo, P.K., Ploenzke, M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat Mach Intell 3, 258–266 (2021). https://doi.org/10.1038/s42256-020-00291-x
This article is cited by
Nature Reviews Genetics (2023)
Deep neural networks with controlled variable selection for the identification of putative causal genetic variants
Nature Machine Intelligence (2022)
Nature Machine Intelligence (2022)