Skip to main content
Figure 1 | EPJ Quantum Technology

Figure 1

From: Reinforcement learning assisted recursive QAOA

Figure 1

Training QAOA-based policies for reinforcement learning. We consider an RL-enhanced recursive QAOA (RL-RQAOA) scenario where a hybrid quantum-classical agent learns by interacting with an environment which we represent as a search tree induced by the recursive framework of RQAOA. The agent samples the next action a (corresponding to selecting an edge and its sign) from its policy \(\pi _{\theta}(a|s)\) and receives feedback in the form of a reward r, where each state corresponds to a graph (the state space is characterized by a search tree of weighted graphs, where each node of the tree corresponds to a graph). The nodes at each level of the search tree correspond to the candidate states for an agent to perceive by taking action. For our hybrid agents, the policy \(\pi _{\theta}\) of RL-RQAOA (see Def. 1) along with the gradient estimate \(\nabla _{{\theta}} \log \pi _{{\theta}}\) is evaluated on a CPU as we are in the regime where depth \(l=1\). However, the policy can also be evaluated on a quantum processing unit (QPU) for higher depths, when classical simulations can only be performed efficiently for graphs of small size. The training of the policy is performed by a classical algorithm such as REINFORCE (see Alg. 1), which uses sample interactions and policy gradients to update parameters

Back to article page