Humans infer much of the intentions of others by just looking at their gaze. Similarly, we want to understand how machine learning systems solve a problem. New tools are developed to find out what strategies a learning machine is using, such as what it is paying attention to when classifying images.
Current AI systems are no longer the rule-based systems of the past that we could inspect and verify. Instead, they are complex machine learning systems that seem to be inscrutable black boxes. How can we know what these machines are ‘thinking’? What strategies are they using to solve a problem? In a recent comprehensive paper, published in Nature Communications, Lapuschkin et al.1 provide new tools and experiments that aim to answer these questions. Building on heat maps that highlight what parts of an image the system is paying attention to, they present a systematic way of extracting the set of strategies that a machine learning system uses to solve a problem involving vision.
Clever Hans (pictured) was the infamous early twentieth century horse that solved arithmetic problems by simply following the cues that inadvertently emanated from his trainer’s body language. The horse performed well, but only as long as the choreography of the show was kept the same. Measuring performance tells us how well a system works for specific conditions. However, it does not explain what strategies are being used to solve a particular problem and often fails to extrapolate when the context changes2.
If you solve 837 − 465 with a different strategy3 than you do for 372 − 0, how can we know? We could discover that your error rate for subtractions involving a 0 is lower than for other subtractions, thus inferring that you may be using different strategies. This is the approach taken by the behavioural sciences and more recently in AI as well, by comparing how errors differ between subjects: humans, non-human animals or AI systems4. Two systems with the same average performance might have very different behaviour, depending on how their contingency tables or actions are distributed.
But the real challenge for machine learning is to determine how a system is going to behave in new situations. Only then will we be able to build robust models that are trustworthy. In the area of computer vision, the efforts have focused on highlighting (through heat maps) the areas of images that are most relevant for a given decision. It’s like determining what Clever Hans was looking at.
Among an increasing range of approaches, such as salient maps5, layer-wise relevance propagation6 (LRP) is a technique that does exactly that. LRP requires some access to the model’s internals, but it has been extended to different machine learning families. Lapuschkin et al. show how this technique can unmask Clever Hans phenomena. First, whereas a deep neural network (DNN) architecture achieves very good performance in classifying images from the benchmark dataset PASCAL VOC, it turns out that in some cases it relies on spurious features such as source tags. A similar thing happens with a pinball videogame: a trained DNN performs well through an unintended strategy of tilting the table. Finally, a third showcase of high performance due to a resourceful tactic is Atari Breakout. Here, the heat maps show that that the machine has discovered a strategy that humans also use — digging tunnels in the corners.
Determining what a system is doing can be challenging — even with heat maps — as different strategies may be conflated. For example, identifying a ‘horse’ in the aforementioned image dataset involves any of four different strategies: (1) detect horse and rider; (2) detect a source tag in portrait-oriented images; (3) detect wooden hurdles and other contextual elements; and (4) detect a source tag in landscape-oriented images. To facilitate the identification of such strategies, Lapuschkin et al. present SpRAy (spectral relevance analysis), a semi-automated method that creates clusters from the spectral analysis of the LRP heat maps. Each cluster represents a strategy that still requires manual inspection, but the effort of disentangling several strategies is done by SpRAy, circumventing the need to look at the heat maps of hundreds of examples manually. But more importantly, SpRAy shows whether the instances are solved in different ways. If this is the case, then the solution can’t be general, but rather a collection of specific solutions for particular given situations.
There are also some limitations. LRP works for some nonlinear models, such as DNN, but can’t be compared with other techniques or even humans. Any method that depends on particular architectures has less applicability than other purely behavioural methods2,7,8. For instance, in machine vision, adversarial examples could be used to determine what strategies the system is using or, in reinforcement learning, one could still use cluster strategies that look at the actions of the system. These are purely behavioural pathways for future research.
In sum, this and other recent contributions2,4,5,7,8 aim to characterize how AI systems work beyond performance measurement. Higher levels of abstraction than a ‘gaze’ will be needed to extrapolate and validate artificial behaviour more robustly, but efforts to understand the Clever Hanses of AI are well underway.
Lapuschkin, S. et al. Nat. Commun. https://doi.org/10.1038/s41467-019-08987-4 (2019).
Sturm, B. J. IEEE Trans. Multimedia 16, 1636–1644 (2014).
Barrouillet, P., Mignon, M. & Thevenot, C. J. Exp. Child Psychol. 99, 233–251 (2008).
Rajalingham, R. et al. J. Neurosci. 38, 7255–7269 (2018).
Greydanus, S., Koul, A., Dodge, J. & Fern, A. Preprint at https://arxiv.org/abs/1711.00138 (2017).
Bach, S. et al. PLOS ONE 10, e0130140 (2015).
Ribeiro, M. T., Singh, S. & Guestrin, C. in Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).
Ribeiro, M. T., Singh, S. & Guestrin, C. in Thirty-Second AAAI Conference on Artificial Intelligence 1527–1535 (AAAI, 2018).
The author declares no competing interests.
About this article
Cite this article
Hernández-Orallo, J. Gazing into Clever Hans machines. Nat Mach Intell 1, 172–173 (2019). https://doi.org/10.1038/s42256-019-0032-5
Minds and Machines (2020)