Poker Bots: A Much More Sophisticated One
I don’t know if folks who follow this blog are aware but there’s a fascinating poker contest going on — live. It’s between four online specialists and an Artificial Intelligence (AI) developed at Carnegie Mellon University and this time (unlike the AI in Alberta described in an earlier posting) they’re playing no limit (though still heads-up). The human players are in two teams, each playing against the AI. The CMU team has named their AI (AIs always get names) Claudico which, amusingly, means “limp.”
A surprising feature of this poker bot is that it tends to limp into hands (i.e., just call — almost all pros favor raising when entering a pot). Limping has some things going for it but a player (digital or carbon-based) has to be highly skilled at playing after the flop. I can only assume that Claudico’s post-flop play is very strong.
They’re also using a variation on “duplicate” style of play for this contest. In duplicate each side plays the exact same hands. The first time through you play all the “A” hands against your opponent’s “B” hands while your teammate is playing the “B” hands against your opponent’s “A” hands. This way each team and the ‘bot play the same hands but from different sides.
Duplicate play was introduced in bridge and is the standard used in all tournaments precisely because it limits the luck factor. However, because it does, it basically ruins poker. The weaker players just go broke too quickly. One of the great allures of poker is the fun of the random turn of a card, that quirky luck element that keeps the weaker players in the game. Duplicate neutralizes that.
Here’s where the contest is, as of today:
https://www.cs.cmu.edu/brains-vs-ai
So far the humans are ahead some $166,000. The AI is beating one team but the other is stomping it. They haven’t played enough hands to get close to a Nash equilibrium point (the point where the number of iterations has grown large enough so the random components no longer have a statistically significant impact on the outcome) so the jury’s still out. If the Polk-Li team has found weaknesses in Claudico, that win should continue to grow.
The match will continue till May 8th. I’ll report on the final outcomes later.
If Claudico is using the “counterfactual regret-minimization” heuristic it should continue to learn as they play. That heuristic is an exceedingly clever one and is standard in efforts to solve complex games and other settings (like teaching an AI to do medical diagnostics or trading commodities). It’s also the one that the University of Alberta team used.
After each hand the AI quickly ascertains what the result would have been had it played it in any of the other myriad possible ways. It then adjusts its strategy bank by raising the value of the play that would have had maximum win (or minimal loss) and lowers the one it used. Slowly over time it can home in on the optimal strategy for that situation.
And, FWIW, I have no idea who these four “best poker players” in the world are. I suspect they’re online heads-up specialists. I’ve never heard of any of them — but that doesn’t mean much since I don’t play poker online and certainly not at these stakes.
FWIW, last night I made the final table of a local tournament. I’m not happy with how I played at the end. At my age, after some ten hours of play my brain starts to melt — a problem AIs don’t suffer from (no brain!).