In 2015 an Artificial Intelligence (AI) dubbed “Claudico” played 80,000 hands of No Limit Hold ‘Em poker against four of the top online poker pros. The ‘bot (short for “robot”) was a program written by the AI experts at Carnegie Mellon University’s computer science department. While this might seem like a lot of hands to most folks, it turned out not to be enough data to settle the issue. The human players won but not enough to be statistically meaningful.
Starting today a newer more powerful poker playing computer (named Libratus) will take on the same four pros. This time they’ll be playing 120,000 hands each and hopefully settle the issue. If they win the four pros will get more than bragging rights, they will split a $200,000 prize. Most handicappers think Libratus still isn’t up the job and the international betting markets have the AI as a 4 - 1 and even 5 - 1 underdog.
There are some fascinating elements of this project that go way beyond poker. They involve the development of a remarkably effective procedure that is used to “teach” the computer how to play an effective game of poker. It uses an algorithm called the counterfactual regret-minimization routine that operates as follows.
The AI starts out knowing nothing about poker other than the basics like hand rankings and the rules of the game. It’s dealt a hand and makes a decision about how to play it. It then sees whether it won or lost money on it and how much. The next step is the key (and requires a very fast, powerful computer): it runs through every other possible decision it could have made against all the things its opponent might have done. That is, it looks at all the counterfactuals and selects the one that would have minimized regret (won the most or lost the least) and moves it up in the hierarchy of possible ways to play that hand. The ones that would have won less or lost more are moved down.
Then they ran several trillion (that’s not a misprint) iterations — all played against a second instantiation of the program which was carrying out the same routines. Each time a better decision (i.e., one that “minimized” regret ) was found the hierarch of decisions was revised. The “regret” notion is used here in the sense of “Oh damn, I should have done that instead.”
Over time the AI slowly homed in on the most effective strategies. Whether it has found the ones that can beat these top pros will be determined soon.
Note that this “brains v. bot” contest is being played “heads-up.” That is there are only two players in each hand, one of the humans and Libratus. Even though Libratus is playing at a world-class level of skill heads-up, poker is such a complex game that it cannot handle the computational load that having a third player at the table produces. The reason is that poker, unlike other complex games like chess and go where there are AI’s that can beat any human, is a game of partial, and sometimes unreliable information. Each player only knows some things but is missing the most important data, what one’s opponent’s cards are and what its bets mean. This simple fact makes the problem one of overarching complexity and arriving at an optimal set of decisions extremely difficult.
What’s even more fascinating is that the counterfactual regret-minimization algorithm is a general one. As the researchers at CMU are showing, it can be applied to any situation where the number of counterfactuals is large but within the computational capacity of the computer, where only some of the relevant information is available and where some of it may be misdirection and misleading. The applications they are exploring are in areas like medical diagnosis, cybersecurity, financial markets, economic decision-making, business and political negotiations, military situations — all circumstances where there is only incomplete and potentially misleading information but concrete decisions need to be made.