AlphaZero: Playing Chess and Controlling Quantum Systems

by Mogens Dalgaard (Aarhus University), Felix Motzoi (Forschungszentrum Jülich) and Jacob Sherson (Aarhus University)

Achieving high-performing control of quantum systems is a formidable challenge that is being addressed by physicists around the world. Pushing beyond the current frontier could help realise quantum technologies within communication, sensing, drug design, machine learning, optimisation, and computation. Our work demonstrates that the state-of-the-art machine-learning algorithm, AlphaZero, initially designed for playing board games such as chess, can also control a quantum system.

What do playing chess and controlling a quantum system have in common? "Absolutely nothing" would probably be the immediate answer from most physicists engaged in quantum control. However, there are several common traits between the two. For instance, both are generally very complicated problems, where "expert solutions" are not generalisable across the various situations that may be encountered. In addition, both typically require a global search strategy to reach the best solutions. In chess, this global search strategy naturally presents itself as a long-term planning task, i.e., a skilled player needs to foresee several steps ahead in the game to make their next move. Similarly, quantum physicists have developed methods to gradually improve the "score" of their control solutions. Unfortunately, quantum control problems often contain many suboptimal solutions that impede local optimisation algorithms specifically designed to optimise quantum control. This problem is what inspired us to look outside quantum physics for help.

The help we found came from a reinforcement-learning algorithm developed by a private company Google Deepmind [1]. In 2017, Deepmind developed AlphaGo, which was the first algorithm to successfully beat human players in the ancient Chinese board game Go. However, AlphaGo required training a deep neural network on previous gameplays, which was not ideal in the context of quantum control optimisation, since we may not have reasonable solutions at hand a priori. AlphaZero [1], a more powerful successor, developed in 2018, on the other hand, was self-taught by only playing against itself, starting from having no expert knowledge of the game. It performed amazingly within the complicated board games Chess, Go, and Shogi, beating both the best human players and the best game-specific designed playing software.

The key to AlphaZero's success was the combination of two very powerful ideas: a Monte Carlo tree search and a deep neural network. A tree search is a tool to foresee future outcomes starting from one's current state. However, an exhaustive search would be too expensive for complicated board games such as chess. In contrast, a shallow search would be too constrained in information about the game's future development. To avoid this problem, the tree search in AlphaZero is guided by a deep neural network that allows it to explore the most promising branches and avoid those that are more likely to lead to defeat.

In our work [2], we applied AlphaZero to control a superconducting circuit consisting of two coupled quantum bits or qubits, which is an architecture that is potentially applicable in a quantum computer. However, a major problem in real-life experiments is that the circuit has unwanted interactions with its nearby environment, which are only negligible in the regime of shorter durations. For this reason, we need to find control solutions that work in as short a time as possible.

Figure 1: The infidelity error (lower is better) for making a quantum computational gate that could potentially be used in a quantum computer. The figures show the results of three different methods: A deep learning algorithm, AlphaZero, a local gradient-based optimisation algorithm, GRAPE, and a hybrid algorithm that combines the two. Results are taken from [2].

We applied AlphaZero to this control problem and benchmarked it against a local gradient-based quantum control algorithm, GRAPE, which was set to optimise randomly drawn controls. The GRAPE algorithm has, in particular, benefited from a couple of decades of fine tuning, incorporating expert knowledge from quantum physics and computer science. Both methods did comparably well in an equal computational-resource comparison, but for quite different reasons. AlphaZero learned the overall structure of the solution space, in particular, identified promising regions, but had limited ability to fine-tune its solutions.. In contrast, GRAPE had no learning incorporated into it, but being a very efficient local optimisation algorithm, it would always find the nearest optimum in the space of solutions. However, the existence of many suboptimal solutions would ultimately impede its performance. For this reason, we designed a hybrid algorithm where AlphaZero's solution would subsequently be optimised by GRAPE. With this hybrid algorithm, we obtained around 200 times as many high-performing solutions compared to when using GRAPE or AlphaZero on their own. Based on our results, we believe deep learning combined with quantum-specific designed tools could potentially help realise certain quantum technologies.

In our subsequent work [3], we have also encoded the entire many-body dynamics into a deep neural network. This could help avoid the curse of dimensionality that prohibits numerically solving sufficiently big quantum systems. In doing so, we obtained up to several orders of magnitude speed-up in evaluation time.

To summarise our experience: deep learning constitutes a powerful set of tools that allows us to tackle problems of increasing complexity. We are looking forward to seeing how the field develops in the years to come.

References:
[1] D. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ... & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140-1144.
[2] Dalgaard, M., Motzoi, F., Sørensen, J. J., & Sherson, J. (2020). Global optimization of quantum dynamics with AlphaZero deep exploration. npj Quantum Information, 6(1), 1-9.
[3] Dalgaard, M., Motzoi, F., & Sherson, J. (2021). Predicting quantum dynamical cost landscapes with deep learning. arXiv preprint arXiv:2107.00008.

Please contact:
Mogens Dalgaard
Department of Physics and Astronomy, Aarhus University
This email address is being protected from spambots. You need JavaScript enabled to view it.

Sidebar

Contents

AlphaZero: Playing Chess and Controlling Quantum Systems