Figure 2From: Deep reinforcement learning for universal quantum state preparation via dynamic pulse control(a) The average fidelity and total reward over the validation set as functions of the number of episodes in the training process for the single-qubit \(|0\rangle \) USP. (b) The distribution of state preparation fidelities F versus pulses designing time for preparing single-qubit \(|0\rangle \) over 128 sampled tasks in the test set with different optimization algorithms. The average fidelities \(\overline{F}= 0.9968\), 0.9721, 0.9655 and the average pulses designing time \(\overline{t}= 0.0120\), 0.0268, 0.7504 with USP, GRAPE and CRAB, respectivelyBack to article page