Michal Bowling and colleagues at the University of Alberta in Edmonton announced today that they have “solved” poker. Bowling’s group has been working on this problem for a number of years. In Poker, Life and Other Confusing Things I devote a chapter to their earlier work developing ‘bots’ that could beat most top poker pros. But these earlier versions weren’t provably optimal. This one is and this is an extraordinary claim worth discussing.
First, the game they’re talking about is Hold ‘em played heads-up at limit stakes with a set number of raises. If another player is added or the game is changed to no-limit the claim no longer holds.
Second, by “solve” they mean that they have developed a set of decision-making algorithms for every possible situation that are optimal for that situation. In short, no opponent could make checking, folding or raising decisions that would beat it in the long run. By “long run” is meant a sufficient number of hands for a Nash equilibrium to have been reached.[1] That is, the program cannot lose. It can only be played to what I guess we can call a draw where Cepheus (the name of the program) and its opponent both break even.
This fact leads to an interesting paradox: If two players whose games are optimal (in the sense that they use Cepheus’s strategic moves) play against each other in a card room (either in a brick and mortar casino or online) they will both eventually go broke because they will be ground down by the rake.
Several of my poker playing friends tell me they find Bowling’s group’s claims hard to believe. They note that when they play poker they play against an opponent and use knowledge gained about these very humans who are trying to take their money to make their decisions. “How,” they wonder, “can a ‘bot’ win when it can’t do this?”
The answer is that knowledge about an opponent is simply knowledge about how that individual has played in the past and how they are likely to be acting in any given situation. Cepheus has all this information and it uses it to determine the optimal move, just like a human player but with a huge edge. Unlike the human it has a effectively infinite memory and doesn’t make mistakes, doesn’t get tired, doesn’t go on tilt.
The program was developed in a fascinating way. Bowling and co-authors used what’s called a counterfactual regret minimization (CFR) heuristic — that’s techno-speak for calculating the outcomes of “gee, I wish I’d done x” on that hand. In short, after every hand Cepheus looked at the outcome and ascertained what would have happened had it done something else and what the outcome of that decision would have been.
They started from scratch. That is, they didn’t program any strategies into Cepheus; they merely gave it the rules for poker and nothing else. The point was to allow the program to evolve the decision-making metric on its own without any possible contamination by a human who might be using a non-optimal principle.
Then they let it play against itself for a trillion hands. Yes, trillion, or more hands than have been played in real life by all the humans who ever played poker, far more. Each time it adjusted its decisions based on feedback from the CFR and homed in on the action that would have maximized gain and/or minimized loss. Ultimately the set of optimal decisions emerged.
Bowling noted that Cepheus was still learning but the advances were so small that they were negligible so they stopped the operation.
They also were able to prove what every serious poker player knows, the dealer has an edge. In fact, they were able to calculate it as .088 of a big blind. That is, with optimal play, sitting on the button in a heads-up game provides a bit less than a 9% gain. This may seem small but over a large number of hands it’s significant.
From a game theoretic point of view, one of the things most exciting about this project is that it is the first time that a partial (or incomplete) information game has been solved. All the other games for which optimal strategies were developed have been full (or complete) information games like backgammon or checkers (FWIW, it was the same group at the University of Alberta that solved checkers back in 2007).
As they note the procedures they used can have real-world applications in other complex, partial information domains like medical diagnosis, auctions and security screening settings, areas where artificial intelligence (AI) programs have been developed but have proven inadequate relative to human decision-makers.
The full paper describing this accomplishment can be found here.
A nice, non-technical summary is here. This site, which is at the University of Alberta, also has a url that will link you up to Cepheus for a chance to play against it (but not for real money).
[1] Without getting too technical, this is essentially the point where random factors dominate the distribution of events (or, in this case, hands). In short, the luck factor has been distilled from the game and the only operations that are useful are rational ones.