Social-media bots that pump out computer-generated content have been accused of swaying elections and damaging public health by spreading misinformation. Now, some social scientists have a fresh accusation: bots meddle with research studies that mine popular sites such as Twitter, Reddit and Instagram for information on human health and behaviour.
Data from these sites can help scientists to understand how natural disasters affect mental health, why young people have flocked to e-cigarettes in the United States and how people join together in complex social networks. But such work relies on discerning the real voices from the automated ones.
“Bots are designed to behave online like people,” says Jon-Patrick Allem, a social scientist at the University of Southern California in Los Angeles. “If a researcher is interested in describing public attitudes, you have to be sure that the data you’re collecting on social media is actually from people.”
Computer scientist Sune Lehmann designed his first bots in 2013, as a social-network experiment for a class that he was teaching at the Technical University of Denmark in Kongens Lyngby. Back then, he says, bots on Twitter were simple, obscure and mainly meant to increase the number of followers for specific Twitter accounts. Lehmann wanted to show his students how such bots could manipulate social systems, so together they designed bots that impersonated fans of the singer Justin Bieber.
The ‘Bieber Bots’ were easy to design and quickly attracted thousands of followers. But social-media bots have continued to evolve, becoming more complex and harder to detect. They surged into the spotlight after the 2016 US presidential election – amid accusations that bots had been deployed on social media in an attempt to sway the vote in President Donald Trump’s favour. “All of a sudden, it became something of interest to people,” Allem says.
Since then, Allem has shown that tweets generated by bots are twice as likely as their real counterparts to attest that e-cigarettes help people quit smoking1 — a claim that is still hotly debated. Bots are also more likely to tout the unproven health benefits of cannabis2. These studies rely on algorithms that estimate the likelihood that a Twitter account is automated. But despite bot-detecting tools with names like Botometer and BotSlayer, Allem says that many social-science and public-health researchers still fail to take the step of filtering out probable automated content from their data — in part, because some feel that they lack the expertise to do so.
That omission can pollute a data set, cautions Amelia Jamison, who studies health disparities at the University of Maryland in College Park and has mined social media for posts that oppose vaccination. “You might be artificially giving the bots a voice by treating them as if they are really part of the discussion, when they are actually just amplifying something that may not be voiced by the community,” she says. In her case, she notes, failing to weed out bots could lead her to conclude that people are generating more or different anti-vaccination chatter than they actually are.
One problem that the field must grapple with is how to define a bot, says Katrin Weller, an information scientist at the Leibniz Institute for the Social Sciences in Cologne, Germany. Not all bots are maliciously dispensing misinformation: some provide updates from weather stations, data on sea-level changes from buoys or general news updates. Some researchers define Twitter bots as those accounts that send out more than a certain number of messages each day, Weller notes — a loose definition that could rope in prolific human tweeters.
Other definitions are more complex, but bot detectors are locked in an arms race with bot developers. The first generation of social-media bots were relatively simple programs that retweeted others’ posts at regular intervals. Now, however, advances in machine learning have enabled the creation of more sophisticated bots that post original content. Some bots will post at random intervals and mimic human patterns, such as not tweeting when a person would probably be asleep. Some developers will mix in human-generated content with automated content to better camouflage their bots.
“Once you know more about the bots and how to detect them, then this knowledge is also available for the bot creators,” says Oliver Grübner, who studies quantitative health geography at the University of Zurich in Switzerland. “It’s a really tricky field.”
Like Lehman, some social scientists are creating their own bots to conduct social experiments. Pennsylvania State University political scientist Kevin Munger and his colleagues built bots that chided Twitter users who used racist language. One set of bots had profile pictures of white men, the other set had profile pictures of black men. Munger found that Twitter users were more likely to tone down their racist rhetoric after being called out by bots with a white male profile picture3.
After his Bieber Bot success, Lehmann designed more sophisticated bots to study the spread of behaviours from one group to another. But bots have acquired such a bad reputation that he has decided he will probably abandon the approach, for fear of a public backlash. “The whole thing around bots blew up so much,” he says. “I kind of thought: ‘I’ll find another quiet corner and do my research without courting controversy.’”