Figure 4From: Deep reinforcement learning for universal quantum state preparation via dynamic pulse control(a) The average fidelity and total reward over the validation set as functions of the number of training episodes for two-qubit Bell state USP. (b) The fidelities distribution of 6400 sampled preparation tasks in the test set for the two-qubit Bell state USP with \(\bar{F}=0.9695\)Back to article page