A complex variant of poker is the latest game to be mastered by artificial intelligence (AI). And it has been conquered not once, but twice, by two rival bots developed by separate research teams.
Both algorithms plays a ‘no limits’ two-player version of Texas Hold 'Em. And each has in recent months hit a crucial AI milestone: they have beaten human professional players.
The game first fell in December to DeepStack, developed by computer scientists at the University of Alberta in Edmonton, Canada, with collaborators from Charles University and the Czech Technical University in Prague. A month later, Libratus, developed by a team at Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania, achieved the feat.
Over the past decade the groups have pushed each other to make ever better bots, and now the team behind DeepStack has formally published details of its AI in Science1. But the bots are yet to play each other.
Nature looks at how the two AIs stack up, what the accomplishments could mean for online casinos and what’s left for AI to conquer.
Why do AI researchers care about poker?
AIs have mastered several board games, including chess and the complex-strategy game Go. But poker has a key difference from board games that adds complexity: players must work out their strategies without being able to see all of the information about the game on the table. They must consider what cards their opponents might have and what the opponents might guess about their hand based on previous betting.
Games that have such ‘imperfect information’ mirror real-life problem-solving scenarios, such as auctions and financial negotiations, and poker has become an AI test bed for these situations.
Algorithms have already cracked simpler forms of poker: the Alberta team essentially solved a limited version of two-player hold ’em poker in 2015. The form played by DeepStack and Libratus is still a two-player game, but there are no limits on how much an individual player can bet or raise — which makes it considerably more complex for an AI to navigate.
How did the human-versus-AI games unfold?
Over 4 weeks beginning in November last year, DeepStack beat 10 of 11 professional players by a statistically significant margin, playing 3,000 hands against each.
Then, in January, Libratus beat four better professionals who are considered specialists at the game, over a total of around 120,000 hands. The computer ended up almost US$1.8 million up in virtual chips.
What are the mathematics behind the algorithms?
Game theory. Both AIs aim to find a strategy that is guaranteed not to lose, regardless of how an opponent plays. And because one-on-one poker is a zero-sum game — meaning that one player’s loss is always the opponent’s gain — game theory says that such a strategy always exists. Whereas a human player might exploit a weak opponent’s errors to win big, an AI with this strategy isn’t concerned by margins — it plays only to win. That means it also won't be thrown by surprising behaviour.
Previous poker-playing algorithms have generally tried to work out strategies ahead of time, computing massive ‘game trees’ that outline solutions for all the different ways that a game could unfold. But the number of possibilities is so huge — 10160 — that mapping all of them is impossible. So researchers settled for solving fewer possibilities. In a game, an algorithm compares a live situation to those that it has previously calculated. It finds the closest one and ‘translates’ the corresponding action to the table.
Now, however, both DeepStack and Libratus have found ways to compute solutions in real time — as is done by computers that play chess and Go.
How do the approaches of the two AIs compare?
Instead of trying to work out the whole game tree ahead of time, DeepStack recalculates only a short tree of possibilities at each point in a game.
The developers created this approach using deep learning, a technique that uses brain-inspired architectures known as neural networks (and that helped a computer to beat one of the world’s best players at Go).
By playing itself in more than 11 million game situations, and learning from each one, DeepStack gained an ‘intuition’ about the likelihood of winning from a given point in the game. This allows it to calculate fewer possibilities in a relatively short time — about 5 seconds — and make real-time decisions.
The Libratus team has yet to publish its method, so it’s not as clear how the program works. What we do know is that early in a hand, it uses previously calculated possibilities and the ‘translation’ approach, although it refines that strategy as the game gives up more information. But for the rest of each hand, as the possible outcomes narrow, the algorithm also computes solutions in real time.
And Libratus also has a learning element. Its developers added a self-improvement module, which automatically analyses the bot's playing strategy to learn how an opponent had exploited its weaknesses. They then use the information to permanently patch up holes in the AI’s approach.
The two methods require substantially different computing power: DeepStack trained using 175 core years — the equivalent of running a processing unit for 175 years or a few hundred computers for a few months. And during games it can run off a single laptop. Libratus, by contrast, uses a supercomputer before and during the match, and the equivalent of around 2,900 core years.
Can they bluff?
Yes. People often see bluffing as something human, but to a computer, it has nothing to do with reading an opponent, and everything to do with the mathematics of the game. Bluffing is merely a strategy to ensure that a player’s betting pattern never reveals to an opponent the cards that they have.
OK, so which result was more impressive?
It depends on whom you ask. Experts could quibble over the intricacies of both methods, but overall, both AIs played enough hands to generate statistically significant wins — and both against professional players.
Libratus played more hands, but DeepStack didn’t need to, because its team used a sophisticated statistical method that enabled them to prove a significant result from fewer games. Libratus beat much better professionals than did DeepStack, but on average, DeepStack won by a bigger margin.
Will the two AIs now face off?
Maybe. A sticking point is likely to be the big difference in computing power and so the speed of play between the AIs. This could make it difficult to find rules to which both sides can agree.
University of Alberta computer scientist Michael Bowling, one of the developers of DeepStack, says that his team is up for playing Libratus. But Libratus developer Tuomas Sandholm at CMU says that he first wants to see DeepStack beat Baby Tartanian8, one of his team’s earlier and weaker AIs.
Bowling stresses that the match would carry a big caveat: the winner might not be the better bot. Both are trying to play the perfect game, but the strategy closest to that ideal doesn’t always come out in head-to-head play. One program could accidentally hit on a hole in the opponent’s strategy, but that wouldn’t necessarily mean that the strategy overall has more or bigger holes. Unless one team wins by a substantial margin, says Bowling, “my feeling is it won’t be as informative as people would like it to be”.
Does this mean the end of online poker?
No. Many online poker casinos forbid the use of a computer to play in matches, although top players have started to train against machines.
Now that computers have slain another AI milestone, what’s left to tackle?
There are few mountains left for the AI community to climb. In part, this is because many of the games that remain unsolved, such as bridge, have more complicated rules, and so have made for less obvious targets.
The natural next move for both teams is to tackle multiplayer poker. This could mean almost starting from scratch because zero-sum game theory does not apply: in three-player poker, for instance, a bad move by one opponent can indirectly hinder, rather than always advantage, another player.
But the intuition of deep learning could help to find solutions even where the theory doesn’t apply, says Bowling. His team’s first attempts to apply similar methods in the three-player version of limited Texas Hold ’Em have turned out surprisingly well, he says.
Another challenge is training an AI to play games without being told the rules, and instead discovering them as it goes along. This scenario more realistically mirrors real-world problem-solving situations that humans face.
The ultimate test will be to explore how much imperfect-information algorithms really can help to tackle messy real-world problems with incomplete information, such as in finance and cybersecurity.
- Journal name:
- Date published: