Skip to main content

Table 1 List of hyperparameters for USP

From: Deep reinforcement learning for universal quantum state preparation via dynamic pulse control

Parameters∖Target state

|0〉

Bell state

Allowed actions a (J(t))

0, 1, 2, 3

a

Size of the training set

32

256

Size of the validation set

32

256

Size of the test set

64

6400

Batch size \(N_{bs}\)

32

32

Memory size M

20,000

40,000

Learning rate α

0.01

0.0001

Replace period C

200

200

Reward discount factor γ

0.9

0.9

Number of hidden layers

2

3

Neurons per hidden layer

32/32

256/256/128

Activation function

Relu

Relu

ϵ-greedy increment δϵ

0.001

0.0001

Maximal ϵ in training \(\epsilon _{\max}\)

0.95

0.95

ϵ in validation and testing

1

1

\(F_{\mathrm{threshold}}\) per episode

0.999

0.999

\(\mathrm{episode}_{\max} \) for training

33

731

Total time T

2Ï€

20Ï€

Action duration dt

Ï€/10

Ï€/2

Maximum steps per episode

20

40

  1. a The allowed actions of two-qubit operations satisfy \(\{(J_{1},J_{2})|J_{1},J_{2}\in \{1, 2, 3, 4, 5\} \}\).