From: Deep reinforcement learning for universal quantum state preparation via dynamic pulse control
Parameters∖Target state | |0〉 | Bell state |
---|---|---|
Allowed actions a (J(t)) | 0, 1, 2, 3 | a |
Size of the training set | 32 | 256 |
Size of the validation set | 32 | 256 |
Size of the test set | 64 | 6400 |
Batch size \(N_{bs}\) | 32 | 32 |
Memory size M | 20,000 | 40,000 |
Learning rate α | 0.01 | 0.0001 |
Replace period C | 200 | 200 |
Reward discount factor γ | 0.9 | 0.9 |
Number of hidden layers | 2 | 3 |
Neurons per hidden layer | 32/32 | 256/256/128 |
Activation function | Relu | Relu |
ϵ-greedy increment δϵ | 0.001 | 0.0001 |
Maximal ϵ in training \(\epsilon _{\max}\) | 0.95 | 0.95 |
ϵ in validation and testing | 1 | 1 |
\(F_{\mathrm{threshold}}\) per episode | 0.999 | 0.999 |
\(\mathrm{episode}_{\max} \) for training | 33 | 731 |
Total time T | 2Ï€ | 20Ï€ |
Action duration dt | π/10 | π/2 |
Maximum steps per episode | 20 | 40 |