Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Concept whitening for interpretable image recognition

A preprint version of the article is available at arXiv.

Abstract

What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can be misleading, unusable or rely on the latent space to possess properties that it may not have. Here, rather than attempting to analyse a neural network post hoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a convolutional neural network, the latent space is whitened (that is, decorrelated and normalized) and the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us with a much clearer understanding of how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens), the latent space. CW can be used in any layer of the network without hurting predictive performance.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Possible data distributions in the latent space.
Fig. 2: Top-10 image activated on axes representing different concepts.
Fig. 3: Joint distribution of the bed–person subspace.
Fig. 4: Two-dimensional representation plot of two representative images.
Fig. 5: Normalized intra-concept and inter-concept similarities.
Fig. 6: Concept purity measured by AUC score.
Fig. 7: Concept importance to different Places365 classes measured on the concept axes when CW is applied to the 16th layer.

Similar content being viewed by others

Data availability

All datasets that support the findings are publicly available, including Places365 at http://places2.csail.mit.edu, MS COCO at https://cocodataset.org/ and ISIC at https://www.isic-archive.com.

Code availability

The code for replicating our experiments is available on https://github.com/zhiCHEN96/ConceptWhitening (https://doi.org/10.5281/zenodo.4052692).

References

  1. Zhou, B., Bau, D., Oliva, A. & Torralba, A. Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2131–2145 (2018).

    Article  Google Scholar 

  2. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Object detectors emerge in deep scene CNNs. In Proc. International Conference on Learning Representations (2015).

  3. Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In Proc. 35th International Conference on Machine Learning 2668–2677 (ICML, 2018).

  4. Zhou, B., Sun, Y., Bau, D. & Torralba, A. Interpretable basis decomposition for visual explanation. In Proc. European Conference on Computer Vision (ECCV) 119–134 (2018).

  5. Ghorbani, A., Wexler, J., Zou, J. Y. & Kim, B. Towards automatic concept-based explanations. In Proc. Conference on Advances in Neural Information Processing Systems 9273–9282 (NeurIPS, 2019).

  6. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  Google Scholar 

  7. Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In Proc. European Conference on Computer Vision Vol. 8689, 818–833 (Springer, 2014).

  8. Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Proc. International Conference on Learning Representations Workshop (ICLR, 2014).

  9. Smilkov, D., Thorat, N., Kim, B., Viégas, F. & Wattenberg, M. Smoothgrad: removing noise by adding noise. In Proc. International Conference on Machine Learning Workshop (ICML, 2017).

  10. Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proc. 2017 IEEE International Conference on Computer Vision 618–626 (ICCV, 2017).

  11. Adebayo, J. et al. Sanity checks for saliency maps. In Proc. 32nd Conference on Advances in Neural Information Processing Systems 9505–9515 (NeurIPS, 2018).

  12. Yeh, C.-K. et al. On concept-based explanations in deep neural networks. Preprint at https://arxiv.org/pdf/1910.07969.pdf (2019).

  13. Chen, C. et al. This looks like that: deep learning for interpretable image recognition. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 8930–8941 (NeurIPS, 2019).

  14. Li, O., Liu, H., Chen, C. & Rudin, C. Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In Proc. AAAI Conference on Artificial Intelligence 3530-3537 (AAAI, 2018).

  15. Li, X., Song, X. & Wu, T. AOGNets: compositional grammatical architectures for deep learning. In Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6213–6223 (CVPR, 2019).

  16. Granmo, O.-C. et al. The convolutional Tsetlin machine. Preprint at https://arxiv.org/pdf/1905.09688.pdf (2019).

  17. Wu, T. & Song, X. Towards interpretable object detection by unfolding latent structures. In Proc. 2019 IEEE/CVF International Conference on Computer Vision 6033–6043 (ICCV, 2019).

  18. Mnih, V. et al. Recurrent models of visual attention. In Proc. 27th Conference on Advances in Neural Information Processing Systems 2204–2212 (NeurIPS, 2014).

  19. Ba, J., Mnih, V. & Kavukcuoglu, K. Multiple object recognition with visual attention. In Proc. International Conference on Learning Representations (ICLR, 2015).

  20. Sermanet, P., Frome, A. & Real, E. Attention for fine-grained categorization. In Proc. 2015 International Conference on Learning Representations Workshop (ICLR, 2015).

  21. Elsayed, G., Kornblith, S. & Le, Q. V. Saccader: improving accuracy of hard attention models for vision. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 700–712 (NeurIPS, 2019).

  22. Saralajew, S., Holdijk, L., Rees, M., Asan, E. & Villmann, T. Classification-by-components: probabilistic modeling of reasoning over a set of components. In Proc. 33rd Conference on Advances in Neural Information Processing Systems 2788–2799 (NeurIPS, 2019).

  23. Bouchacourt, D. & Denoyer, L. EDUCE: explaining model decisions through unsupervised concepts extraction. Preprint at https://arxiv.org/pdf/1905.11852.pdf (2019).

  24. Zhang, Q., Yang, Y., Liu, Y., Wu, Y. N. & Zhu, S.-C. Unsupervised learning of neural networks to explain neural networks. In Proc. AAAI Conference on Artificial Intelligence Workshop (AAAI, 2019).

  25. Adel, T., Ghahramani, Z. & Weller, A. Discovering interpretable representations for both deep generative and discriminative models. In Proc. 35th International Conference on Machine Learning 50–59 (ICML, 2018).

  26. Chen, X. et al. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Proc. 30th Conference on Advances in Neural Information Processing Systems 2172–2180 (NeurIPS, 2016).

  27. Higgins, I. et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proc. International Conference on Learning Representations (ICLR, 2017).

  28. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research (eds Bach, F. & Blei, D.) 448–456 (ICML, 2015).

  29. Desjardins, G. et al. Natural neural networks. In Proc. 28th Conference on Advances in Neural Information Processing Systems 2071–2079 (NeurIPS, 2015).

  30. Luo, P. Learning deep architectures via generalized whitened neural networks. In Proc. 34th International Conference on Machine Learning 2238–2246 (ICML, 2017).

  31. Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L. & Batra, D. Reducing overfitting in deep networks by decorrelating representations. In Proc. International Conference on Learning Representations (ICLR, 2016).

  32. Huang, L., Yang, D., Lang, B. & Deng, J. Decorrelated batch normalization. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 791–800 (CVPR, 2018).

  33. Huang, L., Zhou, Y., Zhu, F., Liu, L. & Shao, L. Iterative normalization: beyond standardization towards efficient whitening. In Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4874–4883 (CVPR, 2019).

  34. Siarohin, A., Sangineto, E. & Sebe, N. Whitening and coloring batch transform for GANs. In Proc. International Conference on Learning Representations (ICLR, 2019).

  35. Vorontsov, E., Trabelsi, C., Kadoury, S. & Pal, C. On orthogonality and learning recurrent networks with long term dependencies. In Proc. 34th International Conference on Machine Learning 3570–3578 (ICML, 2017).

  36. Mhammedi, Z., Hellicar, A., Rahman, A. & Bailey, J. Efficient orthogonal parametrisation of recurrent neural networks using householder reflections. In Proc. 34th International Conference on Machine Learning 2401–2409 (ICML, 2017).

  37. Wisdom, S., Powers, T., Hershey, J., Le Roux, J. & Atlas, L. Full-capacity unitary recurrent neural networks. In Advances in Neural Information Processing Systems 4880–4888 (NeurIPS, 2016).

  38. Harandi, M. & Fernando, B. Generalized backpropagation, étude de cas: orthogonality. Preprint at https://arxiv.org/pdf/1611.05927.pdf (2016).

  39. Huang, L. et al. Orthogonal weight normalization: solution to optimization over multiple dependent Stiefel manifolds in deep neural networks. In Proc. AAAI Conference on Artificial Intelligence 3271–3278 (AAAI, 2018).

  40. Lezcano-Casado, M. & Martínez-Rubio, D. Cheap orthogonal constraints in neural networks: a simple parametrization of the orthogonal and unitary group. In Proc. 36th International Conference on Machine Learning 3794–3803 (ICML, 2019).

  41. Lezama, J., Qiu, Q., Musé, P. & Sapiro, G. OLÉ: orthogonal low-rank embedding-a plug and play geometric loss for deep learning. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 8109–8118 (CVPR, 2018).

  42. Wen, Z. & Yin, W. A feasible method for optimization with orthogonality constraints. Math. Program. 142, 397–434 (2012).

    Article  MathSciNet  Google Scholar 

  43. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations (ICLR, 2015).

  44. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (CVPR, 2016).

  45. Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition 4700–4708 (CVPR, 2017).

  46. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. & Torralba, A. Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017).

    Article  Google Scholar 

  47. Lin, T.-Y. et al. Microsoft COCO: common objects in context. In Proc. 2014 European Conference on Computer Vision 740–755 (Springer, 2014).

  48. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  49. Fisher, A., Rudin, C. & Dominici, F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019).

    MathSciNet  MATH  Google Scholar 

  50. Digital Imaging in Skin Lesion Diagnosis (ISIC, 2020); https://www.isic-archive.com/#!/topWithHeader/wideContentTop/main

  51. Rose, L. Recognizing neoplastic skin lesions: a photo guide. Am. Fam. Physician 58, 873–884 (1998).

    Google Scholar 

Download references

Acknowledgements

We are grateful to W. Zhang, L. Semenova, H. Parikh, C. Zhong, O. Li, C. Chen, and especially C. Tomasi and G. Sapiro for the feedback and assistance they provided during the development and preparation of this research. The authors acknowledge funding from MIT-Lincoln Laboratory and the National Science Foundation.

Author information

Authors and Affiliations

Authors

Contributions

Z.C. and C.R. conceived the study. Z.C. developed methods, designed visualizations and metrics, ran experiments and contributed to the writing. Y.B. designed metrics, ran experiments and contributed to the writing. C.R. supervised research, method development and contributed to the writing.

Corresponding author

Correspondence to Zhi Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peer review Information Nature Machine Intelligence thanks Professor Andreas Holzinger and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Extended data

Extended Data Fig. 1 Absolute correlation coefficient of every feature pair in the 16th layer.

a, when the 16th layer is a BN module; b, when 16th layer is a CW module.

Supplementary information

Supplementary Information

Supplementary sections and Figs. 1–13.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Bei, Y. & Rudin, C. Concept whitening for interpretable image recognition. Nat Mach Intell 2, 772–782 (2020). https://doi.org/10.1038/s42256-020-00265-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-020-00265-z

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics