Next time you’re at King’s Cross station, take a moment to think about this. Just yards from where you’re standing, the world’s most advanced artificial intelligence (AI) technology is being developed — by a London company called DeepMind.
We introduce AlphaGo Zero, the latest evolution of AlphaGo, the first computer program to defeat a world champion at the ancient Chinese game of Go. Zero is even more powerful and is arguably the strongest Go player in history. Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.
Humans excel at solving a wide variety of challenging problems, from low-level motor control through to high-level cognitive tasks. Our goal at DeepMind is to create artificial agents that can achieve a similar level of performance and generality. Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards.
Neural networks are the workhorse of many of the algorithms developed at DeepMind. For example, AlphaGo uses convolutional neural networks to evaluate board positions in the game of Go and DQN and Deep Reinforcement Learning algorithms use neural networks to choose actions to play at super-human level on video games. This post introduces some of our latest research in progressing the capabilities and training procedures of neural networks called Decoupled Neural Interfaces using Synthetic Gradients. This work gives us a way to allow neural networks to communicate, to learn to send messages between themselves, in a decoupled, scalable manner paving the way for multiple neural networks to communicate with each other or improving the long term temporal dependency of recurrent networks.
Human gaming tactics draw analogies from the physical world to hide the underlying complexity (chunking), and enable the players to think at a higher level. AlphaGo isnt limited(?) by physical world analogies.