Another game long considered extremely difficult for artificial intelligence (AI) to master has fallen to machines. An AI called DeepNash, made by London-based company DeepMind, has matched expert humans at Stratego, a board game that requires long-term strategic thinking in the face of imperfect information.
“The rate at which qualitatively different game features have been conquered — or mastered to new levels — by AI in recent years is quite remarkable,” says Michael Wellman at the University of Michigan in Ann Arbor, a computer scientist who studies strategic reasoning and game theory. “Stratego and Diplomacy are quite different from each other, and also possess challenging features notably different from games for which analogous milestones have been reached.”
Stratego has characteristics that make it much more complicated than chess, Go or poker, all of which have been mastered by AIs (the latter two games in 20153 and 20194). In Stratego, two players place 40 pieces each on a board, but cannot see what their opponent’s pieces are. The goal is to take turns moving pieces to eliminate those of the opponent and capture a flag. Stratego’s game tree — the graph of all possible ways in which the game could go — has 10535 states, compared with Go’s 10360. In terms of imperfect information at the start of a game, Stratego has 1066 possible private positions, which dwarfs the 106 such starting situations in two-player Texas hold’em poker.
“The sheer complexity of the number of possible outcomes in Stratego means algorithms that perform well on perfect-information games, and even those that work for poker, don’t work,” says Julien Perolat, a DeepMind researcher based in Paris.
So Perolat and colleagues developed DeepNash. The AI’s name is a nod to the US mathematician John Nash, whose work led to the term Nash equilibrium, a stable set of strategies that can be followed by all of a game’s players, such that no player benefits by changing strategy on their own. Games can have zero, one or many Nash equilibria.
DeepNash combines a reinforcement-learning algorithm with a deep neural network to find a Nash equilibrium. Reinforcement learning involves finding the best policy to dictate action for every state of a game. To learn an optimal policy, DeepNash has played 5.5 billion games against itself. If one side gets a reward, the other is penalized, and the parameters of the neural network — which represent the policy — are tweaked accordingly. Eventually, DeepNash converges on an approximate Nash equilibrium. Unlike previous game-playing AIs such as AlphaGo, DeepNash does not search through the game tree to optimize itself.
For two weeks in April, DeepNash competed with human Stratego players on online game platform Gravon. After 50 matches, DeepNash was ranked third among all Gravon Stratego players since 2002. “Our work shows that such a complex game as Stratego, involving imperfect information, does not require search techniques to solve it,” says team member Karl Tuyls, a DeepMind researcher based in Paris. “This is a really big step forward in AI.”
“The results are impressive,” agrees Noam Brown, a researcher at Meta AI, headquartered in New York City, and a member of the team that in 2019 reported the poker-playing AI Pluribus4.
Brown and his colleagues at Meta AI set their sights on a different challenge: building an AI that can play Diplomacy, a game with up to seven players, each representing a major power of pre-First World War Europe. The goal is to gain control of supply centres by moving units (fleets and armies). Importantly, the game requires private communication and active cooperation between players, unlike two-player games such as Go or Stratego.
“When you go beyond two-player zero-sum games, the idea of Nash equilibrium is no longer that useful for playing well with humans,” says Brown.
So, the team trained its AI — named Cicero — on data from 125,261 games of an online version of Diplomacy involving human players. Combining these with some self-play data, Cicero’s strategic reasoning module (SRM) learnt to predict, for a given state of the game and the accumulated messages, the probable policies of the other players. Using this prediction, the SRM chooses an optimal action and signals its ‘intent’ to Cicero’s dialogue module.
The dialogue module was built on a 2.7-billion-parameter language model pre-trained on text from the Internet and then fine-tuned using messages from Diplomacy games played by people. Given the intent from the SRM, the module generates a conversational message (for example, Cicero, representing England, might ask France: “Do you want to support my convoy to Belgium?”).
In a 22 November Science paper2, the team reported that in 40 online games, “Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game”.
Brown thinks that game-playing AIs that can interact with humans and account for suboptimal or even irrational human actions could pave the way for real-world applications. “If you’re making a self-driving car, you don’t want to assume that all the other drivers on the road are perfectly rational, and going to behave optimally,” he says. Cicero, he adds, is a big step in this direction. “We still have one foot in the game world, but now we have one foot in the real world as well.”
Wellman agrees, but says that more work is needed. “Many of these techniques are indeed relevant beyond recreational games” to real-world applications, he says. “Nevertheless, at some point, the leading AI research labs need to get beyond recreational settings, and figure out how to measure scientific progress on the squishier real-world ‘games’ that we actually care about.”