Figure 5From: Reinforcement learning assisted recursive QAOANumerical evidence of the advantage of RL-RQAOA over RQAOA in terms of approximation ratio on hard instances. The box plot is generated by taking the mean of the best approximation ratio over 15 independent runs of 1400 episodes for RL-RQAOA. The RL-RQAOA clearly outperforms RQAOA in terms of approximation ratio for the instances considered (these are exactly the instances where RQAOA’s approx. ratio ≤0.95). We chose \(n_{c}=8\) in our simulations and the parameters \(\theta = (\alpha , \gamma , \vec{\beta})\) of the RL-RQAOA policy were initialized by setting \(\vec{\beta} = \{25\}^{{(n^{2}-n)}/2}\) and the angles \(\{\alpha , \gamma \}\) (at every iteration) were initialized randomly. All agents were trained using REINFORCE (Alg. 1)Back to article page