Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis

2026-05-11 • Machine Learning

Machine Learning

AI summaryⓘ

The authors developed a method to create quantum circuits called Clifford circuits by teaching a computer program using reinforcement learning. Their program learns to break down complex quantum operations into simple steps by working backward from known solutions. They designed a special neural network that works well no matter how many qubits are involved or how they are labeled. Their method is very fast and nearly always finds the best or almost best solution on six-qubit circuits, and it can handle much larger circuits with good results compared to existing tools. This shows their approach can efficiently build complex quantum circuits across many sizes.

Clifford circuitsquantum circuitsreinforcement learningsymplectic matrixneural network equivariancequbit connectivityquantum gate synthesistwo-qubit gatesQiskitAaronson-Gottesman algorithm

Authors

Richie Yeung, Aleks Kissinger, Rob Cornish

Abstract

We consider the problem of synthesizing Clifford quantum circuits for devices with all-to-all qubit connectivity. We approach this task as a reinforcement learning problem in which an agent learns to discover a sequence of elementary Clifford gates that reduces a given symplectic matrix representation of a Clifford circuit to the identity. This formulation permits a simple learning curriculum based on random walks from the identity. We introduce a novel neural network architecture that is equivariant to qubit relabelings of the symplectic matrix representation, and which is size-agnostic, allowing a single learned policy to be applied across different qubit counts without circuit splicing or network reparameterization. On six-qubit Clifford circuits, the largest regime for which optimal references are available, our agent finds circuits within one two-qubit gate of optimality in milliseconds per instance, and finds optimal circuits in 99.2% of instances within seconds per instance. After continued training on ten-qubit instances, the agent scales to unseen Clifford tableaus with up to thirty qubits, including targets generated from circuits with over a thousand Clifford gates, where it achieves lower average two-qubit gate counts than Qiskit's Aaronson-Gottesman and greedy Clifford synthesizers.

View PDFOpen arXiv