Gian-Carlo Pascutto is the creator of LeelaZero, an Open Source Go Engine based on DeepMind’s AlphaZero. In this interview he explains why he built it, how it works and how he plans to improve it in the future.

What motivated you to build LeelaZero originally?

Leela Zero was based on Leela, so we have to start from there. I originally programmed computer chess, but got fed up with the plagiarism in that domain, so when computer Go sort of got "reset" with the introduction of MCTS (Monte Carlo Tree Search) in 2006, I started exploring that as well. I had some fun and achieved some good results, but, at that time, computers weren’t really strong enough to be useful.

In 2016 my interest in computer Go rekindled a bit with AlphaGo beating Lee Sedol so I updated Leela with neural network support, which got a ton of very positive feedback from the Go community. When the AlphaZero paper came out, it was obvious to me that reconstructing it starting from Leela would not be too hard, and that hardware would be the limiting factor, but I also figured that the largest part of the computation was amendable to being distributed, so I open sourced the result as LeelaZero.

How does LeelaZero actually work?

LeelaZero probes a deep residual convolutional neural network (DCNN) to assess which player is in a better position and to compute the probabilities that a given move is the best it could make. Then, based on this information, it constructs a search tree and re-evaluates the network after every move. This allows it to search for moves that initially look good and then shift its focus (after enlarging the search tree) to moves that end up leading to good positions.

The reinforcement learning procedure that allows the program to self-learn basically just runs the tree search to select moves and plays an entire game, and then feeds back the output from the search to the neural network, so a single evaluation of the network can learn to "predict" what a larger search evaluating hundreds of nodes would eventually have found. The outcome of the game corrects the assessment of who was better in each position.

How does LeelaZero differ from AlphaZero?

We followed AlphaZero’s published papers as closely as possible, though we can't be 100% sure because some parts are a bit ambiguous. There have been some small tweaks for efficiency, such as starting the self-play with a smaller network (you don't need a large one if the moves are still mostly random), a better initialization for not-yet-evaluated nodes, etc. The problem with such improvements is that we can't restart the entire learning procedure to find out what their "eventual" effect is (unless we want to wait many months!), so we have to be rather conservative.

How do you leverage distributed computing in training LeelaZero?

About 500 people on average are running a client that downloads LeelaZero’s “best” network and lets the program plays against itself. This generates training data that is collected on a central server. After some amount of training steps, a new candidate network is uploaded and clients start playing matches against the old network to determine whether it is better (and thus the new "best").

In Season 12 of the “Top Chess Engine Championship”, LeelaChessZero (a chess engine adapted from LeelaZero) only won one of its 28 games, but in Season 13, it won the division. How did it improve so quickly over such a short period of time?

Various reasons. A lot of bugs were fixed, it had been trained longer, an optimized client was written to work faster, etc. Barring any bugs (very important), a reinforcement learning approach has a very fast initial phase until the gains start to flatten out, so throwing hardware at it leads to rapid improvement. E.g., Google famously claimed it took "4 hours" for AlphaZero to master chess.

How do you plan to make Leela Zero better in the future?

For pure strength, the size of the network can be enlarged. Of course there is a point of diminishing returns eventually, but we haven’t reached it yet. I strongly suspect the tree search itself can also be improved further to get some strength gain - this is a good area for future research.

What is your goal for Leela Zero?

The goal was to demonstrate that one could replicate the DeepMind result with a distributed effort and make the results and data publicly available. Of course it can always be made a bit better or stronger but my personal goal for the project has been achieved. The program is much, much stronger than my previous closed source engine and it's beyond human level too.

Will Leela learn to play games other than Go and Chess in the future?

I don't plan to work on any other games myself, but obviously the Leela Zero code is open source so it does not only depend on me.

To learn more about Leela Zero, go to zero.sjeng.org.

If you would like to be sent a list of our interviews and puzzles each week, sign up to Black Swans here.