Figure 8From: Robustness of quantum reinforcement learning under hardware errorsPolicy gradient agents on the CartPole environment trained and evaluated at varying perturbations σ. Panel (a) shows training performance, while panel (b) shows the performance of the same agents after training and evaluated under different perturbation levels than those present during training. Each point is computed as the average score of the 10 agents under the perturbation indicated on the x-axisBack to article page