Leggi in italiano

Researchers used a laboratory version of children's telephone game to study bias in AI systems. Credit: Image Professionals GmbH / Alamy Stock Photo.

Critics of artificial intelligence systems such as ChatGPT have often warned that they risk inheriting cultural biases from humans that may affect how they produce and transmit information. This has now been proven in an experiment.

ChatGPT belongs to the class of large language models (LLMs), which are AI systems designed to generate human-like text, trained on huge collections of texts taken from the Internet. To investigate what kind of biases LLMs may have, the team applied the ‘transmission chain’ method, which has a long history in psychology and is essentially a laboratory version of children’s telephone game. In this method, human participants must pass iteratively to each other a story, and the researchers track how the story is modified at every transmission step. The methodology is particularly suitable to investigate bias in LLMs, because it can be directly compared with human’s results, and can highlight the presence of subtle biases that would not be identified otherwise.

In this study1, Alberto Acerbi from the University of Trento and Joseph Stubbersfield of the University of Winchester in the United Kingdom, asked a LLM to summarize and rephrase a story. Then they fed the resulting summary back to the AI, and asked it to summarize it again, repeating the operation three times.

Acerbi and Stubbersfield used the same stories used by five previously published psychology experiments with human participants, each one tuned to highlight a specific bias. For example, one story included elements consistent with gender stereotypes (such as a wife cooking for a dinner party where the husband invites guests) together with elements that contradicted them (the same wife going out for drinks with friends before the dinner). Another story had both negative and positive elements, as well ambiguous ones that could be interpreted both ways (such as a man ‘taking an old lady’s bag’, which could be read as a helping gesture or a theft). A third story included elements suggesting a threat next to reassuring ones, and so on.

In all five experiments, ChatGPT reproduced the same biases that had been observed in human participants. In selecting what to include in the summaries, it chose to retain information that conforms to gender stereotypes and discard the non-conforming one. It chose negative rather than positive information, and tended to interpret ambiguous situations as negative. It also favoured the transmission of threat-related contents, and of social content over information without a social aspect.

The authors highlight how, without human intervention, these biases can produce harmful effects by magnifying pre-existing human tendencies. Human bias may be linked to cognitive characteristics that have been selected during evolution (for example, paying particular attention to potential threats is surely useful) but do not necessarily provide informative or valuable content. When faced with ambiguous information LLMs, for example, may end up always producing a negative outcome instead of a neutral one.

“As for many technologies, we need to learn how to best use them,” says Acerbi. “As they adapt to us, we, as individuals and as a society, adapt to them. The important aspect is knowing that those biases exist,” he concludes.