In February 2020, with COVID-19 spreading rapidly around the globe and antigen tests hard to come by, some physicians turned to artificial intelligence (AI) to try to diagnose cases1. Some researchers tasked deep neural networks — complex systems that are adept at finding subtle patterns in images — with looking at X-rays and chest computed tomography (CT) scans to quickly distinguish between people with COVID-based pneumonia and those without2. “Early in the COVID-19 pandemic, there was a race to build tools, especially AI tools, to help out,” says Alex DeGrave, a computer engineer at the University of Washington in Seattle. But in that rush, researchers did not notice that many of the AI models had decided to take a few shortcuts.

The AI systems honed their skills by analysing X-rays that had been labelled as either COVID-positive or COVID-negative. They would then use the differences they had spotted between the images to make inferences about new, unlabelled X-rays. But there was a problem. “There wasn’t a lot of data available at the time,” says DeGrave.

Radiographs of people with COVID-19 were being released by a number of hospitals, he explains. Scans of people without COVID-19, meanwhile, came mainly from a repository of lung images held by the US National Institutes of Health, put together before the pandemic. As a result, the data sets had characteristic differences that had nothing to do with whether a person had the disease. For instance, many X-rays use the letter R to label a person’s right side, so a radiologist looking at the image can orient it properly. However, the appearance of these markers differs from one hospital to another. With most of the COVID-negative images coming from a single source, some of the AI systems trained in this way based their diagnoses not just on the biology on display, but on the style and placement of the letter R on the X-ray.

DeGrave and Joseph Janizek, both members of computer scientist Su-In Lee’s Lab of Explainable AI for Biological and Medical Sciences in Seattle, published a paper3 in Nature Machine Intelligence in May 2021 reporting the problem. The decision-making process of a machine-learning model is often referred to as a black box — researchers and users typically know the inputs and outputs, but it is hard to see what’s going on inside. But DeGrave and Janizek were able to prise open these boxes, using techniques designed to test AI systems and explain why they do what they do.

X-ray of human chest alongside same chest with red highlights to show what AI considers important areas

The parts of X-rays that artificial intelligence systems were using to guide their diagnoses of COVID-19 (red pixels) included the letters that marked patients’ right sides.Credit: A. Bustos et al. Med. Image Anal. 66, 101797 (2020)

There are many advantages to building explainable AI, sometimes known as XAI. In a medical setting, understanding why a system made a certain diagnosis can help to convince a pathologist that it is legitimate. In some cases, explanations are required by law: when a system makes a decision on loan eligibility, for example, both the United States and the European Union require evidence that if credit is denied it is not for reasons barred by law, such as race or sex. Insight into an AI system’s inner workings can also help computer scientists to improve and refine the models they create — and might even lead to fresh ideas about how to approach certain problems. However, the benefits of XAI can only be achieved if the explanations it gives are themselves understandable and verifiable — and if the people building the models see it as a worthwhile endeavour.

A neuron by any other name

The deep neural networks that DeGrave and Janizek investigated have become popular for their uncanny ability to learn about what’s in a photograph, the meaning of spoken language and much more, just through exposure. These networks work in a similar way to the human brain. Just as certain living nerve cells fire in a pattern in response to external stimuli — the sight of a cat, for instance, will trigger a different pattern from the sight of a tree — the artificial neurons in a neural network produce a characteristic response on the basis of the input they receive.

The neurons in this case are mathematical functions. Input data comes into the system in numerical form, describing, for instance, the colour of a pixel in a photograph. The neurons then perform a calculation on that data. In the human body, neurons fire off a signal only if the stimulus they receive surpasses a certain electrical threshold. Similarly, each mathematical neuron in an artificial neural network is weighted with a threshold value. If the result of the calculation surpasses that threshold, it is passed to another layer of neurons for further calculations. Eventually, the system learns statistical patterns about how the data coming out relates to the data going in. Images that have been labelled as having a cat in them will have systematic differences from those labelled as not having a cat, and these telltale signs can then be looked for in other images to ascertain the probability of a cat being present.

There are variations in the design of neural networks, as well as other machine-learning techniques. The more layers of calculation a model applies to an input, the more challenging it becomes to explain what it is doing. Simple models such as small decision trees — which weigh up a handful of competing choices that lead to different answers — are not really black boxes, says Kate Saenko, a computer scientist at Boston University in Massachusetts. Small decision trees are “basically a set of rules where a human can easily understand what that model is doing, so it’s inherently interpretable”, she says. A deep neural network, however, is typically too complex for us to wrap our heads around easily. “A neural network is doing a computation that involves millions, or more likely now billions, of numbers,” Saenko says.

Mapping activity

In general, attempts to explain the mysterious workings of a deep neural network involve finding out what characteristics of the input data are affecting the results, and using that to infer what’s happening inside the black box. One tool that helped DeGrave and Janizek to work out that the orientation markers on chest X-rays were affecting diagnoses was saliency maps — colour-coded charts that show which part of an image the computer paid the most attention to when making its call.

Saenko and her colleagues developed a technique called D-RISE (detector randomized input sampling for explanation) to produce such maps4. The researchers take a photo — for instance, of a vase full of flowers — and systematically block out different parts of the image before showing it to an AI tasked with identifying a particular object, such as the vase. They then record how obscuring each cluster of pixels affects the accuracy of the results, as well as telling the system to colour code the whole photo according to how important each part was to the recognition process.

Wearing facemasks, Alex DeGrave and Joseph Janizek sit a table on a balcony looking at their laptops

Alex DeGrave and Joseph Janizek are students on the Medical Scientist Training Program at the University of Washington, in Seattle.Credit: Alex DeGrave

Unsurprisingly, in a picture of a flower-filled vase, the vase itself is lit up in bright reds and yellows — its presence is important. But it is not the only area of the picture that is highlighted. “The saliency extends all the way up to the bouquet of flowers,” Saenko says. “They’re not labelled as part of the vase, but the model learns that if you see flowers, it’s much more likely that this object is a vase.”

D-RISE highlights the factors that, if removed, would cause the AI model to change its results. “It’s useful for understanding what mistakes they might be making, or if they’re doing something for the wrong reason,” says Saenko, whose work in this area was partly funded by a now-completed XAI programme run by the US Defense Advanced Research Projects Agency.

Altering input data to identify important features is a basic approach to many types of AI model. But the task becomes more challenging in more complex neural networks, says Anupam Datta, a computer scientist at Carnegie Mellon University in Pittsburgh, Pennsylvania. In those complex cases, scientists want to tease out not just which features play a part in the decision-making and how big that role is, but also how the importance of a feature alters in relation to changes in other features. “The causality element still carries over because we are trying to still figure out which features have the highest causal effect on the model’s prediction,” Datta says. “But the mechanism for measuring it changes a little bit.” As with Saenko’s saliency maps, he systematically blocks out individual pixels in images. A mathematical value can then be assigned to that portion of the image, representing the magnitude of the change that results from obscuring that part. Seeing which pixels are most important tells Datta which neurons in the hidden layers have the greatest role in the outcome, helping him to map the model’s internal structure and draw conclusions about the concepts it has learnt5.

Advances from explanation

Another way DeGrave and Janizek measured saliency relied on a complex type of neural network known as a generative adversarial network (GAN). A typical GAN consists of a pair of networks. One generates data — an image of a street, for instance — and the other tries to determine whether the output is real or fake.

The two networks continue to interact in this way until the first network is reliably creating images that can fool the other. In their case, the Washington researchers asked a GAN to turn COVID-positive X-rays into COVID-negative images3. By seeing which aspects of the X-rays it altered, the researchers could see what part of the image the computer considered important to its diagnosis.

Although the basic principle of a GAN is straightforward, the subtle dynamics of the pair of networks is not well understood. “The way that a GAN generates images is quite mysterious,” says Antonio Torralba, a computer scientist at the Massachusetts Institute of Technology in Cambridge, who is trying to solve this enigma. Given a random input of numbers, it eventually outputs a picture that looks real. This approach has been used to create photos of faces that don’t exist and produce news stories that read as if they were written by a person.

Torralba and his team decided to dissect a GAN and look at what the individual neurons were doing. Just like Datta, they found some neurons focused on specific concepts6. “We found groups of units that were responsible for drawing trees, other groups responsible for drawing buildings, and some units drawing doors and windows,” he says. And just as Saenko’s models had learnt that flowers suggest a vase, units in his GAN also learnt from context. One developed a detector for beds to decide whether a scene was a bedroom, and another learnt that doors don’t usually exist in trees.

Two images of a dining room table with a vase of flowers on it. In the second the vase is highlighted in red

A saliency map showing that the flowers in a vase help an artificial intelligence system to detect the vase itself.Credit: MS COCO dataset

Being able to recognize which neurons are identifying or producing which objects opens up the possibility of being able to refine a neural network without having to show it thousands of new photographs, Torralba says. If a model has been trained to recognize cars, but all the images it trained on were of cars on a paved surface, it might fail when shown a picture of a car on snow. But a computer scientist who understands the model’s internal connections might be able to tweak the model to recognize a layer of snow as equivalent to a paved surface. Similarly, a computer special-effects designer who might want to automate the creation of an impossible scene could re-engineer the model by hand to accomplish that.

Another value of explainability is that the way a machine performs a task might provide the people watching it with some insight into how they could do things differently or better themselves. Computational biologist Laura-Jayne Gardiner trained an AI to predict which genes were at work in regulating circadian clocks, internal molecular timers that govern a range of biological processes7. Gardiner and her colleagues at IBM Research Europe and the Earlham Institute, a life-sciences research group in Norwich, UK, also made the computer highlight the features that it used to decide whether a gene was likely to play a part in circadian rhythm. Its approach was surprising. “We were only focused on the promoters for gene regulation,” Gardiner says, but the AI found clues in sequences in genes that the researchers would have ignored. “You end up with this ranked list of the features,” Gardiner explains; the team can use this in its lab-based research to further refine its understanding of the biology.

Accuracy and trust

Coming up with explanations is a start, but there should also be a way to quantify their accuracy, says Pradeep Ravikumar, a computer scientist at Carnegie Mellon University who is working on ways to automate such evaluation8. Explanations that seem to make sense to a human could in fact prove to have little relation to what the model is actually doing.

“The question of how to objectively evaluate explanations is still in its early stages,” Ravikumar says. “We need to get better explanations and also better ways to evaluate explanations.” One way to test the veracity of an explanation is to make small changes to the features that it says are important. If they truly are, these minor changes in the input should lead to big changes in the output. Similarly, large alterations to irrelevant features — say, removing a bus from a picture of a cat — should not affect the results. If the evaluation system goes one step further and predicts not just which features are important, but also how the model’s answer would change if small changes were made to those features, this can also be tested. “If an explanation was actually explaining the model, then it would have a better sense of how the model would behave with these small changes,” Ravikumar says.

The search for explanations can sometimes seem like so much work that many computer scientists might be tempted to skip it, and take the AI’s results at face value. But at least some level of explainability is relatively simple — saliency maps, for instance, can now be generated quickly and inexpensively, Janizek says. By contrast, training and using a GAN is more complex and time-consuming. “You definitely have to have pretty good familiarity with deep-learning stuff, and a nice machine with some graphics processing units to get it to work,” Janizek says. A third method his group tried — altering a few hundred images manually with photo-editing software to identify whether a feature was important — was even more labour intensive.

Animated image in which a GAN is use dto add grass and trees to an image of houses

A type of neural network known as a generative adversarial network (GAN) can be used to produce new images, such as by adding grass and trees.Credit: David Bau

Saenko says many researchers in the machine-learning community have also tended to see a trade-off between explainability and accuracy. They think that the level of detail and the number of calculations that make neural networks more accurate than smaller decision trees also put them out of reach of all human comprehension. But some are questioning whether that trade-off is real, Janizek says. “It could end up being the case that a more interpretable model is a more useful model and a more accurate model.”

It’s also beginning to look as if some of the patterns that neural networks can pick out that are imperceptible to people might not be as important as computer scientists once thought, he adds. “How often are they something that’s truly predictive in a way that’s going to generalize across environments? And how often are they some weird kind of source-specific noise?”

However big or small the challenge of explainability might be, a good explanation is not always going to be enough to convince users to rely on a system, Ravikumar says. Knowing why an AI assistant, such as Amazon’s Alexa, answered a question in a certain way might not foster trust among users as much as, say, laws that prohibit the misuse of recordings of private conversations. Perhaps physicians will need clinical evidence that a computer’s diagnoses have proved right over time, and a verified biological reason why the factors the computer is looking at should be relevant. And policymakers might require that some protections regarding the use of such systems be written into law. “These are broader questions that I think the community hasn’t really thought too deeply about,” Ravikumar says.

However, in the area of explanations, AI researchers have been making strides. Although there might still be specifics to be worked out to cover the variety of machine-learning models in use, the problem will be cracked, probably in a year or two, says Torralba. People, he says, “always talk about this black box, and we don’t think that neural networks are black boxes. If they are working really well, then if you look inside, what they do makes sense.”