Skip to main content
Figure 4 | EPJ Quantum Technology

Figure 4

From: Reinforcement learning assisted recursive QAOA

Figure 4

Comparison of success probability in attaining ground state solutions of RL-RQAOA and RQAOA on cage graphs. The x-axis depicts the properties of cage graph(s), for instance, d3-g6 denotes that the instance is 3-regular with girth (length of the shortest cycle) being 6. The error-bars appear only for few instances (specifically for d3-g9, d3-g10 and d5-g5) because of the existence of multiple graph instances with the same properties (degree and girth). The evaluation of RL-RQAOA was done by evaluating the average learning performance over 15 independent runs. While, for RQAOA, the best energy is taken when given a fixed budget of 1400 runs. The probability for RL-RQAOA-max is computed by taking the maximum energy attained by the agent over all 15 independent runs for a particular episode. One the other hand, the probability for RL-RQAOA-vote (statistically more significant) is computed by aggregating the maximum energy attained for a particular episode only if more than 50% of the runs agree. We chose \(n_{c}=8\) for instances with nodes ≤50 and \(n_{c}=10\) otherwise. The parameters \(\theta = (\alpha , \gamma , \vec{\beta})\) of the RL-RQAOA policy were initialized by setting \(\vec{\beta} = \{25\}^{{(n^{2}-n)}/2}\) and the angles \(\{\alpha , \gamma \}\) (at every iteration) to energy-optimal angles (i.e., by following one run of RQAOA). All agents were trained using REINFORCE (Alg. 1)

Back to article page