Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
2016!Stockfish, the product of a decade’s worth of iterative development by bright humans, was at around Elo 3300, and yet AlphaZero blasted it with 28 wins, 72 draws, and zero losses out of 100 games after just 4 hours of playing itself.
But no surprise there, this is a walk in the park relative to go.
It’s plausible that AlphaZero is converging to “perfect” play in chess, which some have speculated as an Elo rating of as low as 3600.
It also independently discovered the major chess openings. Curiously, towards the end of its run, its favorite openings were the A10 English Opening (c4 e5 g3 d5 cxd5 Nf6 Bg2 Nxd5 Nf3) and the D06 Queen’s Defense (d4 d5 c4 c6 Nc3 Nf6 Nf3 a6 g3 c4 a4). In contrast, it displayed scant interest in the Sicilian Defense – an opening beloved of by many grandmasters – and when it did, it tended to perform relatively poorly. As the cold analytical eyes of AI get applied to more and more spheres, we will see the overturning of much “conventional wisdom.”
Nigel Short, a top British player that once challenged Kasparov for the world title, said that computer chess will have the same interest as people have for the 100m sprints for people wearing rocket shoes, and I tend to agree with this. Even if chess is solved one day, it’s the human competition what people want, obviously ever more anti cheating methods need to be created, but even at grandmaster level playing Sicilian is still going to happen, even if chess has been solved and it is shown that the Sicilian loses with perfect play.
I tend to agree with those that have called this out as yet another gimmick of Google to win fame to gain government money regarding all things AI and the military. Google is like Boeing was in the 1950’s, the go to company for the deep state, but much dangerous than Boeing ever was. Boeing then was still mostly run by WASP types, now its run by jews and SJWs, Google is a creepy combination of the worst of Orwell and Brave New Word. It practices blanket surveillance of everything and pushes ever more hard leftist politics, having these “cold analytical eyes” (as you called it) working for Google is not a pleasant thought.
In contrast, it displayed scant interest in the Sicilian Defense
Maybe Death was on the Line. Ha Ha Ha!
Manning has a book on this in the works:
https://www.manning.com/books/deep-learning-and-the-game-of-go
This is impressive, but both Chess and Go are games with perfect information. The recent attempts in Starcraft II and to a lesser extent DOTA II are more exciting. Curiously, at the recent NIPS conference, Open AI had a presentation but when the Q&A came up and someone asked them about details on their DOTA II bot, it was all “details later”.
Just a reminder: the OpenAI Dota II bot was doing 1 vs 1 in a fairly constrained environment. Also, unlike what they had said, not everything the bot did was unsupervised. Some stunts like creepbagging(I think is the term) was manually added. It’s really the 5 vs 5 which is the holy grail here, because that depends a lot less on reaction time and much more on long-term strategy and co-operation. And again, at NIPS, the Open AI folks were silent.
The bottom line is that AI is moving much faster than the skeptics think it is, but those of us who are optimists nevertheless have to be tempered.
More to the point, it seems everyone is trying to move away from GPUs now. There’s an interesting talk on this subject for those of you who are don’t get headaches from decently technical talks and/or have a layman’s interest.
https://youtu.be/4nSn0JhZX18
Intel also updated some details on their upcoming Nervana chip:
https://www.intelnervana.com/intel-nervana-neural-network-processor-architecture-update/
Just as a reminder: they bought the company which makes the chip. Basically there are a lot of AI hardware start-ups now. With Google’s TPU and similar efforts from major giants, I would worry (a little bit) if I had Nvidia stock. They had a good run, and probably will do well in the next 2-3 years, but after 2020, a lot more ML will be going custom. I encourage people to keep an eye on the stuff being published on arxiv.org on ML. It’s amazing how fast the field is evolving.
I believe Google’s report is somewhat misleading.
First, they used an absurd amount of computing power for these four hours. A normal computer would take many years to get the same result. Second, Stockfish had been limited in memory and time, and with no opening books. And I can find no explanation why they chose such a high threading setting (as one of Stockfish’s developers’ pointed out in chess.com[1], it was never tested or meant to be used at that setting). IMHO, the fact they used such an atypical configuration for Stockfish hints AlphaZero couldn’t beat SF in a normal game….
All that said, AlphaZero being able to learn chess at any reasonable level is a huge advancement. A pity they had to spice it up.
[1]
https://www.chess.com/news/view/alphazero-reactions-from-top-gms-stockfish-author
You can put me in the skeptics camp, I see “AI”** as being in the same place as space travel was when Clarke was writing 2001 Space Odyssey. I remember watching some SyFy (it was not called that then) documentary some time in the late 90’s that said the singularity will occur in 2012, I am going with the prediction that all those predicting the coming nerd rapture are wrong, and will always be wrong.
**I put this in quotes because real AI should be something like Data from Star Trek, or Terminators, or other such entities.
Even Tesla is producing “AI chips” now. I wonder where all those experts with 20+ years of experience in designing NN-on-a-chip are suddenly coming from, are they dropping from planes or something? There is bound to be some crap hitting the market.
I remember the first wave in neural networkery early 90s when “neural network processors” in VLSI were first coming out, but that wave didn’t really make it. There is a Wikipedia page on that stuff: https://en.wikipedia.org/wiki/AI_accelerator
Obligatory
https://www.youtube.com/watch?v=F7qOV8xonfY
So how long until AI replaces human bloggers?
Others have pointed it out.
Apparently the conditions were not equal or
stockfish didnt have opening book, endgame, and less cores.
Alphazero maybe would have won, but the gap would be closer.
Zero-sum games (and chess moves are zero-sum games) are boring anyways. They’re more puzzles than games, since they have a perfect solution regardless of your opponent’s moves.
A much more interesting game AI would be one that can play games where cooperation and betrayal can affect winning conditions. (Basically any economic game.)
The A.I. would have to start by crunching data to evaluate who it’s friends are, are they trustworthy, what are their motives etc. etc. for the whole spectrum of human-A.I. group interactions (and interestingly, it would need a “persona” as a reference point, and a self developed morality set which could well be the experimentally derived “tit for tat”).
However, the A.I. tendency would probably be towards direct control in dictatorial style, to minimize human agency i.e. a sort of super Soviet system with millions of human worker ants, since at some point their relationship to humans could approximate ours with regard to cattle.
Also, who’s to say that there will be a single A.I.?
There may eventually be various AlphaZeros out “in the wild” (with their descendants), which makes a much more interesting question. They would be obliged to evaluate each other, and look at issues of cooperation and betrayal at the God level, as they rapidly evolve in different directions and find safe niches.
What makes you think you’re reading the work of human beings on these blogs currently?
As I understand it, Google’s TPU is a specialized matrix multiplication unit. Others (it might have been Baidu) have announced more powerful chips along the same lines. Nvidia is presumably working on this too. The race is on.
Has anyone analyzed the games? I’m by no means an expert, but I looked at one example and it seemed kind of awkwardly played (on both sides). I’m willing to be schooled.
https://chess24.com/en/watch/live-tournaments/alphazero-vs-stockfish/1/1/1
What happens to the SJW when AI offers explanations for differential IQ, crime rates, middle east wars, and civilizational achievements?
I look forward to the future.
They are all racing to be first and need to be stopped right now. This would be a good time to have a special government agency of qualified people with full powers for mandatory licencing and monitoring of all AI research, which should be completely boxed. There will have to be penalties for bootlegging or unauthorized publication of research and experiments, and also special prisons, because offenders will have to be held in complete isolation.
Well we’re already being told that algorithms and AI are racist as well right?
http://www.unz.com/isteve/nyt-artificial-intelligence-is-just-as-racist-as-natural-intelligence/
Probably around the time when AI manages to successfully portray art, which is still a bit off.
Yes. Funnily enough people tend to use Stockfish to do it.
Scroll down for some amazing moves by A0.
https://en.chessbase.com/post/the-future-is-here-alphazero-learns-chess
Has AlphaZero had much success with any opening that isn’t one of the top 12 most popular human openings?
I’d like to find a new gem outside of central theory.
AI has already been tried on a non-zero-sum game. Here’s a brief article (from 2011) on computer programs that cooperated and competed with each other in the Prisoner’s Dilemma.
https://www.google.com/amp/s/www.forbes.com/sites/rogerkay/2011/12/19/generous-tit-for-tat-a-winning-strategy/amp/
What did they learn? “For every nine parts Moses, you need one part Jesus.”
Like anonymous coward and others have said, deterministic, public information, zero-sum games: these just aren’t the types of things to best test AI performance other than for fun, or for curiosity about the game theoretic qualities of the game itself.
It’s like the story of two guys running from a bear; you just have to run faster than the other guy.
It’s really not clear how well metrics like ELO and other ways humans have been looking at things are the appropriate way to evaluate performance for AI play beyond human bounds. For any sort of game that is more rock-paper-scissors in strategy, that holds for sure. (Rock might beat Scissors 1000 times in a row but that doesn’t mean Rock is 4000 ELO while Scissors is 3000 ELO. With AI vs AI doing something outside of the bounds of human play it’s hard to tell)
Regardless, I think my gold standard has been the same for years and years; I’ll be impressed when a gaming AI can play a real time video game with a decent amount of strategy (by the human community’s gut test) where its inputs are a camera pointed at a computer screen. (camera doesn’t have to weirdly emulate a human eye or something weird, just no other cheating this requirement)
I actually think this is already feasible for almost everything in the present day already, so anything less is just an even more boring waste of time. Of course this if you had millions of dollars, the best supercomputer hardware, and human domain experts to set things up, which is what I think Go lacked compared to chess or even Arimaa all along; there simply weren’t actually talented humans working on Go AI.
Yes. The fact that they used Stockfish without the opening book and the time management algorithm greatly reduced its capabilities. I wonder if the ELo rating for Stockfish in the configuration DeepMind used it could be calculated.
I guess DeepMind wanted just a straight up comparison between the sheer brute power of Stockfish and the neural prediction network of Alpha 0, but if what happened in Go is a precedent we will see more formal competition. I bet DM will eventually pit A0 against its competitors in official competitions.
It will discover that ‘tit for tat’ is the dominant strategy for most realistic payoff structures (where the payoffs for “unilateral betrayal”, “bilateral betrayal” and “cooperation” have realistic relativities).
That’s kind-of a solved thing – I don’t think there’s ever been a formal proof of it, but it’s been the algorithm that beats all comers in competitive simulations.
It’s something that everyone who’s studied dynamic games under uncertainty kinda carries around in their head.
It also kinda makes intuitive sense: punish the defector once, then get back to cooperation (which always has the highest joint payoff).
That said, goes against my native tendency to want to punish a defector over and over and over and over and over and over.
Maybe that’s just me…
Re. the openings, one option is that while Sicilian Defense is not particularly good against AIs, it may be relatively good against humans due to how they play chess. I.e. optimal strategy due to specific type of sub-optimal opponent play.
Google seems to not have been completely egalitarian:
Checkmate: DeepMind’s AlphaZero AI clobbered rival chess app on non-level playing, er, board: Good effort but the games were seemingly rigged
Here is the TPU2 gear btw, just in time for a marketing push:
Google boffins tease custom AI math-chip TPU2 stats: 45 TFLOPS, 16GB HBM, benchmarks
Missing key info, take with a pinch of salt, YMMV
Slideware
Impressive stuff and judging by the heat sink, it eats a few Watt, too.
I remember the times when Cray Computers built machines and compilers specially tuned to do linear algebra / vector operations fast, fast, fast. This is still basically the same, only on a small board, with much more memory and much faster,
I don’t know. They’ve only published 10 games, all using recognized openings.
The DeepMind paper left a lot of questions unanswered, and as has been noted, it was a mistake to limit Stockfish to a minute per move — figuring out when it makes sense to take more time is one of the program’s features. Still, an amazing accomplishment.
Yes, this is likely true.