Benchmarking Machine Learning Algorithms for Adaptive Quantum Phase Estimation with Noisy Intermediate-Scale Quantum Sensors

Quantum phase estimation is a paradigmatic problem in quantum sensing andmetrology. Here we show that adaptive methods based on classical machinelearning algorithms can be used to enhance the precision of quantum phase estimation when noisy non-entangled qubits are used as sensors. We employ the Differential Evolution (DE) and Particle Swarm Optimization (PSO) algorithms to this task and we identify the optimal feedback policies which minimize the Holevo variance. We benchmark these schemes with respect to scenarios that include Gaussian and Random Telegraph fluctuations as well as reduced Ramsey-fringe visibility due to decoherence. We discuss their robustness against noise in connection with real experimental setups such as Mach-Zehnder interferometry with optical photons and Ramsey interferometry in trapped ions,superconducting qubits and nitrogen-vacancy (NV) centers in diamond.


Introduction
Precision measurements play a fundamental role in physics, as they constitute a key ingredient of many state-of-the-art applications and experiments testing the limits of scientific theories. But the accuracy to which such measurements can be performed is itself governed by the laws of physics -and, ultimately, by those of quantum mechanics [1,2,3,4].
A generic measurement protocol for estimating the value of an unknown parameter consists of preparing a probe in a desired initial state, allowing it to interact with the physical system whose state depends on the parameter and, finally, obtaining a measurement result that encapsulates the information about it. This process, however, is often affected by systematic and statistical errors. While the source of the former may stem from imperfect calibration of the measurement instruments, the origin of the latter can either be accidental, due to insufficient control of the measurement chain, or fundamental, deriving from the nature of the physical measurement [5,6]. Fortunately, statistical errors, regardless of their origin, can be minimized by repeating the process and averaging the resulting outcomes, as a consequence of the central limit theorem [7]. This theorem states that, given a large number N of independent measurement results, their average will converge to a Gaussian distribution with standard deviation scaling as 1/ √ N . In metrology, this arXiv:2108.06978v2 [quant-ph] 17 Aug 2021 is referred to as the Standard Quantum Limit (SQL). For many practical applications that involve noisy and error-prone systems, it is essential to devise protocols that yield precision levels close to the SQL. However, it is known that this limit is not fundamental -the ultimate limit set by the laws of quantum physics is the Heisenberg Limit (HL) [8,9,10,11].
Recently, several systems have achieved sufficient levels of technological maturity to allow the experimental exploration of these limits. Due to the non-idealities present in these systems, they are typically referred to as NISQ (noisy intermediatescale quantum). Largely, the motivation for this development has been rooted in quantum information processing, where efficient and precise measurements and state manipulation is required [12], and indeed significant progress towards the implementation of quantum gates and algorithms has been made in optical setups [13,14], NV centers in diamond [15,16], trapped ions [17,18] and superconducting circuits [19,20]. Since it employs similar control and measurement techniques as quantum computing, the exploration of quantum-enhancing techniques has grown as a separate field, usually referred to as quantum metrology [21,22], with applications such as ultrasensitive force detection [23], adaptive environment sensing [24], near-surface electric field measurements [25], sensing of weak signals [26] and even detection of gravitational waves [27].
The paradigmatic protocol in quantum metrology is quantum phase estimation. The quantum phase is a parameter that cannot be directly measured and yet, it contains information about other quantities of interest, such as electric or magnetic fields. The traditional quantum metrology approach to phase estimation has been through the use of highly entangled states such as NOON states [1,3,28], as well as other specially optimized states [29,30,31]. However, highly entangled states tend to be very sensitive to noise: for NOON states even the loss of a single qubit (photon) results in a separable state. Thus, the implementation of this method is challenging with respect to real-life applications. Fortunately, it has been later realized that precise quantum phase estimation does not necessarily require entanglement. In optics, it was demonstrated that adaptive homodyne phase measurements yield uncertainties close to the quantum uncertainty of coherent states [32,33]. It was also shown how to adaptively estimate the angle of a half waveplate that characterizes the linear polarization of photons [34,35] as well as how to perform phase-shift estimation with polarization-encoded qubits [36]. Furthermore, the inverse quantum Fourier transform, which is the last step in Shor's factorization algorithm and which typically requires gate entangling, can be carried out using local measurements and feedback [37]. This approach has been used to break the SQL in experiments with photons [38], superconducting qubits [39] and NV centers in diamond [40].
The incorporation of machine learning techniques in estimation protocols is a natural step forward. Seminal theoretical work on employing reinforcement learning algorithms has demonstrated the potential of these methods for reaching sensitivities below the SQL when used in conjunction with entanglement [41,42,43,44,45,46,47]. Recently, some of these methods have been tested experimentally in an optics setup with their estimation precision being limited by the SQL [48]. This only upholds the potential of applying machine learning for the optimization of parameter estimation under resources and noise level constraints, but the foundations of these methods are still poorly understood. In mathematical statistics, the limits of machine learning algorithms are an active area of investigation -for example deriving bounds on the number of samples needed to reach a certain accuracy [49]. However, the issue at stake is that typically all the mathematical results, including the SQL, are derived from the so-called central limit N 1 and for ideal noiseless systems. Yet the relevant regime for the present status of quantum technologies in the NISQ era is that of moderate N and significant noise. In this context no general unbiased estimator has been found yet [28].
The objective of this work is to employ two machine learning algorithms, the DE and the PSO algorithm, in the design of adaptive estimation schemes capable of driving the measurement precision close to the the SQL. We numerically show that the SQL is still a valid bound even in the regime of not too large N . We also demonstrate that machine learning algorithms can be used even in the presence of various experimental noises, consequently providing robust policies with better performance than non-adaptive protocols. For practical devices such as magnetometers based on single qubits, these methods can be directly applied. More precisely, we observe that in order to increase the precision of these devices, one straightforward option is to increase the sensitivity to the parameter to be estimated by, for example, using higher-energy states or by increasing the response using a different biasing point. However, both of these strategies result in an increased coupling to noise sources and, therefore, a compromise needs to be reached in order to achieve the maximum performance. This will be further detailed in the paper when analyzing the experimental realizations.
Both the DE and the PSO algorithm can also be employed as subroutines in order to enhance other quantum algorithms. For example, algorithms that use variable sensing times, or multipass techniques, are in principle able to breaks the SQL and reach the HL. Instead of averaging at every sensing time, which is the typical approach used one can further increase the sensitivity by using our technique. Beyond quantum metrology, machine learning protocols could become useful in other quantum-information paradigmatic problems that involve phase estimation, such as factorization, sampling, and computation of molecular spectra [50]. In particular, our calculations are relevant for NISQ quantum technologies, where the number of qubits is limited and subject to errors and noise. Overall, by benchmarking these two important protocols for the task of quantum phase estimation, we hope that machine learning algorithms will be more prominently employed in applications such as optical interferometry and magnetometry, where the increase of precision is essential and where the aim is set on reaching the HL.
The paper is organized in the following sections. Section II describes the general concept of adaptive quantum phase estimation in Ramsey interferometry, discussing the updating procedure as well as the relevant sources of noise. Section III presents the two machine learning algorithms, the DE and PSO algorithm. Section IV presents our main results, where we show how the two algorithms allow us to approach the SQL. In this section we also provide an analysis on the effects of Gaussian noise, Random Telegraph noise (RTN) and quantum decoherence on the performance of the algorithms. Section V discusses the implementation of our protocol on several experimental platforms, namely Mach-Zehnder optical interferometers, superconducting qubits, trapped ions and defects in diamond. Finally, we conclude our results in Section VI.

Adaptive Quantum Phase Estimation Scheme
To perform the estimation of an unknown phase φ, we use a generic adaptive scheme that is able to adjust the value of a control phase θ to match the value of φ, based on the results of previous measurements in a Ramsey interferometric sequence. Figure 1: Adaptive quantum phase estimation scheme. A qubit m is injected into the quantum circuit either in the quantum state |0 or |1 and its measured outcome ζ m ∈ {0, 1} is used together with some policy phase instruction x m ∈ [0, 2π[ to prepare the circuit for the next round of measurements. The quantum circuit consists of two Hadamard gates H, a phase shift gate U φ,θ , a detector D, and a processing unit with phase update instructions for the controllable phase shifter.
The schematic representation of the adaptive quantum phase estimation scheme displayed in Fig. 1 consists of a quantum circuit made of a Hadamard gate H, a phase shift gate U φ,θ , another Hadamard gate H and a detector augmented with a processing unit to calculate the value of θ for the next round of measurements. Using standard notations in quantum information, the Hadamard and phase shift gates can be respectively defined as Under this adaptive quantum estimation scheme, an ensemble of N qubits is injected sequentially into the circuit in randomly chosen quantum states, either in state |0 or state |1 , and their outcome is measured to prepare the value of the controllable phase shifter for the next incoming qubit. The input state of the quantum circuit takes the form |0 1 ⊗ |1 2 ⊗ |1 3 ⊗ .... ⊗ |0 N , which is manifestly separable. After the last N th qubit is injected and its outcome measured, the final phase value of θ is considered to be the estimated value of φ. The initial states of the qubits can be represented in Dirac notation as The idea is to have at the end of the process an estimated controllable phase value θ as close as possible to the value of the unknown phase φ. The state of each qubit after the second Hadamard gate is where the upper sign corresponds to a qubit whose initial state was |0 and the lower sign to one whose initial state was |1 . This yields measurement results ζ ∈ {0, 1} with probabilities and If the phase difference φ − θ is zero, then θ approximates very well the unknown phase φ and a qubit prepared in state |0 will also exit in state |0 . Similarly, the same logic can be applied for a qubit injected in the quantum state |1 .
In order to estimate the phase φ at every step m of the algorithm the controllable phase θ m has to be updated based on the previous result ζ m−1 of the measurement. Here we adopt a standard updating rule which has already been employed successfully in similar setups [44,45,46], and where the initial value of θ can be fixed to θ 0 = 0 without any loss of generality. Note that the term x m in Eq. (4) represents the update policy which will be determined by the machine learning algorithms. This update rule can be viewed as a decision tree where for each qubit m, the controllable phase shifter must update its value θ m either by adding or by subtracting x m depending on the measured outcome state of the previous qubit ζ m ∈ {0, 1}. Consequently, for N qubits the number of possible θ m values increases exponentially as 2 (N +1) − 1. This is a Markovian feedback, since the new controllable phase θ m depends only on the latest measured outcome state ζ m .
To evaluate the performance of a policy, we use the Holevo variance [31,30,29], defined as where e i(φ−θ) represents the average value of e i(φ−θ) for different phase values φ and their respective estimates θ considered in the learning process of the machine learning algorithms. Here we make the notation abbreviation θ = θ N , since the Holevo variance of a policy can only be calculated after the last qubit N is injected into the circuit and its outcome measured. The quantity S = e i(φ−θ) ∈ [0, 1] is called the sharpness of the phase distribution, where the value S = 1 corresponds to a perfect estimation of φ. For periodically bounded variables such as the phase, the Holevo variance is a direct measure of the standard deviation by (∆φ) 2 = V H . Therefore we have V H ∼ 1/N for the SQL. It is also important for the model of the designed adaptive quantum phase estimation scheme to include the possibility of errors and imperfections that occur in a real experimental situation. This provides an important test to the robustness of the algorithms to relatively general sources of noise which can be encountered on most experimental platforms. The first source of noises to be considered are the noises in the application of the controlled unitary phase shift transformation U φ−θ , namely the Gaussian and the Random Telegraph noise. Note that the Random Telegraph noise is particularly relevant for experiments involving solid-state qubits.
In the scenario of Gaussian noise, the noise follows a normal distribution parametrized by the standard deviation σ. Letting θ (GSN) represent the actual value of the controllable phase shifter subjected to the noise, the Gaussian noise distribution can be defined as: In the scenario of Random Telegraph noise, the noise follows a discrete distribution where at each time step the the controllable phase shifter value can be randomly offset by a fixed valued λ with a probability η. Letting θ (RTN) represent the value of the controllable phase shifter subject to this source of noise, the Random Telegraph noise distribution can be described as: The other important type of noise to be considered affects the qubit itself. This modifies the unitary evolution generated by a Hermitian Hamiltonian H to a nonunitary evolution given by the Lindblad master equatioṅ where Γ 1 is the relaxation rate, Γ ϕ is the pure dephasing rate and the dissipation superoperator D acting on the density matrix ρ is defined by It is also useful to introduce the standard notations T 1 and T 2 for the relaxation and decoherence time, T 1 = 1/Γ 1 and T 2 = 1/ (Γ 1 /2 + Γ ϕ ).

Machine Learning Algorithms
The problem of quantum phase estimation relies on a sequential and cumulative set of measurements to drive the estimation process, thus making it an ideal problem for reinforcement learning algorithms. In this work, we considered the Differential Evolution (DE) [51,52] and the Particle Swarm Optimization (PSO) [53,54,55], among other reinforcement learning algorithms, as they are the most commonly employed for similar tasks in literature [41,42,43,44,45,46,47]. These algorithms employ a direct search method to the exploration of the search space generated by all the possible policy configurations. Direct search methods use a greedy criterion to drive their exploration. Such methods guarantee fairly fast convergence times, even though such fast convergence times often come at the expense of becoming trapped in a local minimum. This comes as result of the greedy criterion promoting decisions that lead to shorter and more immediate term rewards usually in detriment of fully exploring the total search space. To avoid this scenario, it is important to perform a thorough study on the controllable parameters of each of the mentioned algorithms before applying them to the quantum phase estimation task. To evaluate the performance of an algorithm there are two fundamental criteria: its ability to converge within the imposed number of iterations to a solution and its ability to converge to a valid solution.

Differential Evolution
The implementation of the DE algorithm to the current problem starts with a set of P populations, each representing a candidate solution for the adaptive scheme update policy. Each of these populations is a vector of size N , with each entry representing a new phase instruction to prepare the controllable phase shifter for each of the qubits being injected into the system. The DE algorithm at each iteration will employ its direct search method to P N -dimensional vectors x G i,j , where i ∈ {1, 2, ..., P } and j ∈ {1, 2, ..., N } represent each entry of the candidate solutions vectors and G represents the generation of the population vectors.
Each of these vectors is initialized with random values for each entry in the interval 2π]. Afterwards, at each iteration, the DE algorithm generates possible new candidate solution vectors for the next generation by adding the weighted difference between four population vectors to a fifth vector. This process is referred to as mutation: Here F represents a constant value in the interval [0, 1] which controls the amplification of the difference between the considered populations. Hence, F will be referred to as the amplification parameter. Note as well that all indexes {r 1 , ..., r 5 } are randomly chosen integer values always different between themselves. At this point, the entries of the newly mutated vectorsũ G+1 i,j are randomly mixed with the originally corresponding vectors to increase their diversity. This process is referred to as crossover: This process is controlled by the crossover parameter C which can take any value in the interval [0, 1]. Hence, a crossover only occurs if the random value R 1 , which is generated for each population member at each iteration, is below or equal to the chosen crossover parameter. The value of R 2 is an integer randomly chosen for each population at each iteration of the evaluation process to ensure that at least one entry from the newly mutated vectorsũ G+1 i,j is passed to the trial vector for the next generationx G+1 i,j . Finally, the new trial vectors are compared against the population vectors of the previous generation to see which perform best against the cost function of the problem and, as a result, become a member of the next generation of populations. This process is referred to as selection: Here f (·) represents the cost function associated to the quantum phase estimation which is defined by the Holevo variance in Eq. (5). Therefore, if the new trial vectors x G+1 i,j minimize this cost function when compared to the previous generation vectors x G i,j , then they become part of the next generation of populations. Otherwise, the previous generation populations survive for the next generation. This entire process illustrates the adapted DE algorithm implemented in this work and is schematically represented in Fig. 2.

Particle Swarm Optimization
The implementation of the PSO algorithm starts with a set of P particles, each representing an individual candidate solution to the update policy of the adaptive scheme. Each particle can move in the N -dimensional search space associated with the N different phase instructions of the controllable phase shifter for each of the input qubits. Therefore, each particle will be represented by a position vector x G i,j and velocity vector v G i,j , where i ∈ {1, 2, ..., P } and j ∈ {1, 2, ..., N } represent each entry and G the generation of the vectors. Note that the position vector x G i,j corresponds to a candidate solution vector, while the velocity vector v G i,j represents the change in direction of the corresponding position vector in the search space.
All entries of these vectors are initialized with random values in the interval [0, 2π]. At each iteration, each particle evaluates its current position according to the cost function of the search problem, defined by the Holevo variance in Eq. (5), and compares its value with the positions previously visited by itself and with the positions previously visited by the entire collective of particles. If the current position is better than its own previously visited positions, the particle stores it in a variable pbest i , where i ∈ [1, ..., P ] is the identifier of that particle. If, in addition, the current position is better than all of the other previously visited positions by the entire collective ensemble, the same position is stored in a variable gbest shared among all other particles. This process is illustrated in Fig. 3.
Both of these variables will determine the entire exploration of the search space by the P particle candidate solutions. After each iteration, each particle will use this collective knowledge to adjust its displacement for the next turn according to Here the parameter α controls the desirability of each particle to move towards its best found position, while the parameter β controls the desirability of each particle to move towards the best found solution by the entire collective. Both R a and R b are uniformly distributed random values in the interval [0, 1]. In addition, the parameter w works as a damping weight controlling the change of direction imposed by the new velocity vector at the current position of each particle.
To steer the direction of each particle to a converging candidate policy solution and to avoid overstepping the found minima, an additional parameter v max is imposed to the algorithm which determines the maximum value that each entry in the velocity vector may take. As the algorithm advances in its search for the optimal policy solution, this parameter will decay with the number of iterations, consequently reducing the step size of each particle and forcing them to converge to a solution. This final adjustment completes the description of the adapted PSO algorithm implemented in this work. Initially, all the particles representing the different candidate solution vectors are initialized at random positions with random velocities. At each iteration the particles explore the search space of the problem, eventually converging around the collectively found global optimum. The particles are able to share knowledge regarding the best-found position with other particles, which is generically represented by the red fuzzy lines. In our implementation the topology of inter-particle communication is such that all particles are able to share information with every other particle.

Parameters Analysis
It is important to understand the behaviour and performance of these two algorithms under the different possible configurations of their controllable parameters. To verify the convergence of the solutions, it is important to remember that at each iteration of both algorithms there are P different candidate solutions, each one represented by a given population of N phase value policies. As the algorithm iterates, the candidate solutions should move closer to each other until converging to a final solution. Thus, one way of inferring this convergence value is by calculating the deviation from the average value of each population for each entry of their candidate solution and then averaging these values. To do so, the convergence value L is defined as wherex j corresponds to the average value of entry j over all the candidate solutions and x i,j corresponds to the entry j of the candidate solution vector i. Therefore, lower values of L occur when all the candidate solutions are relatively close to each other and the algorithm has converged to a solution. On the other hand, larger values of L indicate that the algorithm was not able to converge to a solution at the given iteration.
The algorithms should also converge to a valid solution. It is not enough that the algorithms converge to a solution, if it is not the correct one. Letting K represent the total number of different phase values of φ considered in the learning task imposed to the machine learning algorithms, the performance of a policy can be evaluated by the Holevo variance in Eq.(5). This equation, however, is computationally expensive in its current form. So instead, we can approximate it numerically [46,29,42] to reduce computational time. A more efficient evaluation of the Holevo variance can be described as where values for φ (k) −θ (k) N close to zero signify lower values of imprecision (sharpness S ≈ 1) and therefore better performance.
Additionally, the performance V H of each candidate policy vector is evaluated M = 5 separate times and the results averaged in order to smoothen small fluctuations. Repeating the simulation multiple times allows for a more accurate representation of the performance of each policy and thus more consistent results. The number of training instances K can be any arbitrarily large enough number given that it does not hinder the computational time. A large number of training instances K also ensures a faithful representation of φ in the interval [0, 2π[, since they are sampled uniformly and randomly from that same interval. A reasonable choice that satisfies these criteria is K = 10N 2 , a number sufficiently large to guarantee the convergence of the algorithms [46]. Overall, the time complexity of the algorithms At this point it is possible to completely evaluate the performance of each algorithm according to the different possible configurations of each of their controllable parameters and choose those that achieve better results. While Eq. (10) provides a measurement for the convergence of the algorithms, Eq. (11) shows the precision to which they are able to estimate the value of the unknown phase φ.
A thorough study on the performance of both algorithms under the different possible controllable parameter configurations can be found in appendix A.2 and appendix A.3. The optimal parameter configuration obtained for each algorithm is summarized in Table 1. Along with a fixed configuration for each algorithm, it is also important to ensure that all the other variable parameters remain the same in order to draw comparable results. Thus, it is important that for the same number of input qubits N being injected into the system, the population number of candidate solution vectors P and the number of training instances K remain the same under all different scenarios. As previously mentioned, the number of training instances was set to K = 10N 2 , while the number of populations was defined as P = 20 + 2 · int(N/10) − 1, where int(·) represents the integer part of the division. Note that for an increasing number of qubits being injected into the system, the population size of candidate solution vectors P and the number of training instances K must also increase to accommodate the increasing complexity of the problem search space.
Ideally, both algorithms would be allowed to run until all the different candidate solution vectors would have converged to a successful policy vector. However, due to time constraints the number of iterations for which both algorithms were allowed to run for each number N of input qubits was set to G = 100, regardless of having reached convergence or not. Thus, both algorithms would stop either when they had converged to a solution, or when they reached iteration G = 100, and the final policy vector would be the average of all the different candidate solution vectors at that point.

Results
In order to provide a benchmark for the two machine learning algorithms discussed above, we first introduce a non-adaptive protocol that can also be run in the presence of noise. This protocol has the advantage of simplicity and we find that for moderate noise values it yields results that are better or comparable with machine learning algorithms. On the other hand, for increased noise values the machine learning protocols show better results.
We start by discussing the ideal configuration where any source of noise or quantum decoherence is neglected, then we consider the Gaussian and Random Telegraph noise configurations and, finally, the visibility loss due to decoherence. These different sources of imperfection were applied to the quantum phase estimation configuration independently. The number of varying qubits used under all the different scenarios was set in the interval N ∈ [5,25] and all the remaining parameters were left the same across all the different scenarios. For N > 20 we found that the algorithms were already taking more than five days to arrive at a successful policy vector, which ultimately made any larger value of N computationally impracticable under a reasonable amount of time.
The break in performance in non-ideal systems is clearer for larger values of noise, since increasing the value of noise in the estimation process consequently leads to an increased complexity of the search space for both algorithms. Thus, for larger values of N this added level of complexity becomes even more evident in the scattering of the obtained results. However, it is important to stress that when both algorithms had enough time to converge to a successful policy, they were able to perform with precision values close to the SQL, thus attesting their resilience to noise in the quantum phase estimation scheme.

A Non-Adaptive Protocol and the Standard Quantum Limit Benchmark
The SQL is defined strictly speaking in the limit of large N . Since we do not work in this asymptotic regime, it is important to devise a non-adaptive protocol that reproduces the SQL for N 1, yet it yields results also at N ≈ 5 − 25. This protocol can be outlined as follows. We consider the random phase that should be estimated φ and a fixed control phase θ. Based on these values, we calculate the probability P ± (0|φ, θ), see Eqs.
(2), (3) or Eq. (9) with ν = 1. Then, for each N we generate N uniformly random numbers in the interval [0, 1]. If the random number is less than P ± (0|φ, θ) we add it into the 0 bin, otherwise we add it to 1. Next, we count the number of elements in the 0 stack, N ± (0). Finally, we find an estimation for the phase φ est , as φ est = θ + arccos(±2N ± (0)/N ∓ 1), and we calculate the Holevo variance with the exponent in Eq. (11) The number of elements N + (0) and N + (1) = N − N + (0) follows the binomial distribution as shown in Fig. 4. This distribution is obtained from two constant phase estimation by 50 measurements, repeated 250000 times.
This procedure is repeated K = 10N 2 times for each phase φ uniformly distributed in the interval [0, 2π]. The resulting Holevo variance is represented by the blue squares in Fig. 5. We have verified numerically that the non-adaptive method reproduces asymptotically the SQL. For example, even at N = 100 the difference between simulated V H and 1/N is 10 % and reaches 4 % at N = 800. Also, as it is seen from the inset at Fig. 5, the slope for this non-adaptive protocol is equal to −1.0265, while for the ideal SQL this slope should be −1.
The same procedure is used to simulate the variance in the presence of different noises, as it is shown below.

Adaptive Protocols in the Ideal Noiseless Configuration
Before analysing the evolution of the performance obtained by the DE and PSO algorithms under a increasing number of qubits N , it is important to show that this increase in N does indeed lead to better estimation values. To do so, the two algorithms were first allowed to converge to a given value of the unknown phase φ for different values of N . This experiment was repeated 1000 times for each value of N and, at each time, the value of the estimated phase θ was recorded. Finally, these 1000 different results of θ for each value of N were fitted to a probability density function (PDF) and centered all around the same value θ = π to better compare the results. The results for each algorithm are displayed side by side in Fig. 6.
Considering the results obtained in Fig. 6 it is indeed possible to see that by increasing the number of qubits N at the input level the estimation process leads to values of θ converging further towards the given value of the unknown phase φ. It is also possible to see that as N goes higher, this increase in precision starts to be less evident as the algorithms start running into convergence issues. Given that the complexity of the search space of each algorithm scales polynomial in N and that due to time constraints the algorithms were not allowed to run for more than G = 100, a decrease in performance under this time limitation for large values of N is expected.
Having arrived at this conclusion, the final results obtained under the ideal scenario, where there is no source of noise or quantum decoherence, are shown in Fig.  5. Observing the results it is possible to see that both algorithms were able to follow the SQL for the ideal-case scenario. It is also possible to reinforce the previous found conclusion that an increase in N leads to better results. In fact, for N ∈ [5,15] the scaling of the Holevo variance is well approximated by a power law V H ∼ N −α , where α DE = 0.75 for the DE algorithm and α PSO = 0.64 for the PSO algorithm, while the the corresponding value for the SQL is α SQL = 1. This also shows that the DE performs slightly better than PSO at reaching low variances, which is consistent with results obtained in other contexts, such as estimating the bias in the quantum walk algorithm [43]. This scaling in precision close to the SQL for N ∈ [5,15] is consistent with what has been numerically observed in other algorithms [31].
The fact that the machine learning algorithms do not reach the SQL is also consistent with the known results that the adaptive measurements cannot saturate the Cramér-Rao inequality even for optimal measurements [56,57]. It is also noticeable that for larger values of N the performance starts to break due to the limiting number of iterations G = 100 imposed for the convergence of the algorithms. This  is not a malfunction of the algorithms, but a direct consequence of the restricted time available. As the complexity of the search space of the algorithms increases, so does the number of generations required to converge to a successful policy.

Configurations with Noise
Considering the results obtained under the ideal configuration, it is important to study the resilience of the algorithms to different sorts of imperfections that can be found in an experimental quantum phase estimation scheme. First, the performance of the algorithms was evaluated in the presence of Gaussian noise, followed by Random Telegraph noise and finally in the presence of quantum decoherence.

Gaussian Noise
In this scenario the algorithms were tested for increasing amounts of Gaussian noise  It is possible to see that as the noise fluctuations increase the precision of both adaptive and non-adaptive algorithms starts to diminish. This is, nevertheless, expected for any estimation process being conducted under increasing values of noise. The break in performance is also perceptible for larger values of N . However, we see from Fig. 7 that the adaptive algorithms are more sensitive to increasing values of Gaussian noise and that for σ = 0.8 the policies obtained from machine learning become clearly superior to the non-adaptive protocol.

Random Telegraph Noise
Considering now the scenario where the estimation scheme was subject to a Random Telegraph noise following Eq. (7), the algorithms were tested against increasing values of λ = {0.2, 0.4, 0.8} while keeping the probability of switching to the erroneous phase value fixed at η = 0.4. The results obtained by the algorithms under these configurations are displayed in Fig. 8. Similarly to the results obtained under the Gaussian noise, the results obtained in Fig. 8 show that both adaptive and non-adaptive algorithms partly follow the SQL curve even in the presence of Random Telegraph noise. The poorer performance for larger values of noise, as well as the break in performance for larger values of N , is also evident under this scenario for the same reason as before. We can also see that for larger values of λ the machine learning adaptive algorithms are more robust than the non-adaptive algorithm.

Quantum Decoherence
In this last scenario, we study the impact of quantum decoherence in the performance of both algorithms. This is a key source of noise in all non-optical qubits (e.g. trapped ions, superconducting qubits, NV centers). The algorithms were tested against decreasing values of visibility ν = {0.9, 0.8, 0.6} according to Eq. (9). Note that unlike the ideal scenario, the reduced visibility also impacts the SQL, see Eq.
(18) in Appendix A. Also note that the value ν = e −0.5 ≈ 0.6 appears in certain tasks of non-adaptive parameter estimation as the visibility corresponding to an optimal phase accumulation time τ of half the decoherence time [21]. The results obtained under this configuration are shown in Fig. 9. Considering the results obtained in Fig. 9, it is evident that reduced values of visibility have a significant impact on the performance of the algorithms. This is an expected behaviour, since it is known that quantum enhancement for systems operated at around the decoherence rate is asymptotically limited to only a constant factor improvement over the SQL [58]. This can be confirmed by an analysis of the Fisher information: The Fisher information quantifies the knowledge gained at each measurements. It can also be extended from projective to positive operator-valued measurements, see [59], and it is used to define the SQL (see Appendix A.1). Indeed, from Eq. (12) it can be observed that ν reduces the information extracted after each measurement from its maximum value F φ,θm = 1 at ν = 1.
Similarly here we observe that relatively large values of decoherence (visibility around 0.6) result in the suppression of precision obtained by the non-adaptive protocol, while the adaptive DE and PSO are less sensitive.

Experimental Implementations
Summing up the theoretical results obtained so far, we can see that even for noisy systems it is possible to reduce the Holevo variance to values in a band of typically [−1.2, −1.7] (standard deviations between 0.5 and 0.4) with a moderate numbers of N ∈ [10,25] qubits. This protocol can be implemented on several experimental platforms, using either optical or microwave frequencies. We describe here the main experimental platforms and we show how the problem of optimizing the precision in the presence of noise can be addressed using the machine learning framework.
The most straightforward implementation is optical Mach-Zehnder interferometry. In this case, the operator U θ−φ can be realized by placing a phase shifter φ in one branch of the interferometer and a variable-phase shifter θ in the other branch. The latter can be realized physically as a half-wave plate placed on a rotation stage controlled by a computer [60]. The states |0 and |1 correspond to a single photon in one branch or the other of the interferometer (dual-rail encoding). Our results can be compared also with those obtained from running the more powerful Kitaev algorithm, which for the same number of resources (from 10 to 25) results in standard deviations ranging from 0.35 to 0.16 [60], only marginally better than the data reported here. Also, our results are consistent with the theoretical limits reported in [61] for the case of non-entangled initial states (V H = 0.5609, ln V H = −0.25).
We obtain approximately a factor of 2 improvement in the standard deviation. For implementations employing optical interferometry, the visibility is typically very close to 1, and the main noise sources in the feedback loop concern changes in the refractive index of the beam-splitters and variable phase shifter, as well as variations in the optical paths typically caused by temperature.
More recently, an experiment with optical photons has tested the PSO algorithm with sequential non-entangled photons [48]. The single photons were obtained non-deterministically, by generating entangled pairs through spontaneus parametric down-conversion in a 2 mm long beta-barium borate (BBO) crystal and heralding over detection events in another detector. The unknown and the control phases in the two arms of the Mach-Zehnder interferometer were realized with liquid crystal devices, where the relative phase between the horizontal and vertical polarization can be changed depending on the applied electric field. The results with the PSO algorithm obtained agree with our analysis: PSO is quite efficient at reaching values approaching the SQL, especially when the number of resources (number of photons N ) is limited. Similarly to our findings, for N exceeding approximately 15 photons the Holevo variance tends to saturate. The robustness with respect to Gaussian phase noise and dephasing noise was also demonstrated by artificially adding errors to the feedback phase.
In the case of qubits based on discrete levels (solid-state and ions/atoms), Mach-Zehnder interferometry corresponds to Ramsey interference [62,63]. To understand how the phase information is embedded in this system, consider the following generic qubit Hamiltonian driven by a microwave field [64], In a frame defined by the unitary exp(iωtσ z ) (a clockwise rotation around the z-axis), using exp(iωtσ z )σ x exp(−iωtσ z ) = σ x cos ωt − σ y sin ωt and by further applying the rotating-wave approximation we get the effective Hamiltonian A non-zero Rabi frequency Ω = 0 can be thus used to create the two Hadamard gate, while the time in-between is spend for the actual sensing. Indeed, if Ω = 0 for a sensing time τ , the resulting unitary becomes which is exactly Eq. (1) up to an overall irrelevant phase and the identification φ = ω 01 τ , θ = ωτ . Consider now a concrete problem, that of evaluating the magnetic field using a superconducting qubit, a trapped ion, or a nitrogen-vacancy (NV) center in diamond. In these cases, for typical experiments using Ramsey interferometry, the probability is given by Eq. (9), where the dependence of the visibility ν on the sensing time τ can be sometimes different from a simple exponential decay ν(τ ) = exp[−τ /T 2 ], alid for the case of Gaussian noise (see the Lindblad equation used in Section II). For example, another dependence is ν(τ ) = exp[−(τ /T 2 ) 2 ] if the noise experienced by the qubit is 1/f, see e.g. [64,63,40]. Since τ is bounded by T 2 , in these setups one might attempt to increase the precision by increasing the sensitivity of ω 01 to the magnetic field. However, this means increasing the coupling to the magnetic field, which at the same time this will increase the exposure of the qubit to noise. Thus, a tradeoff must be reached between these two competing effects. Our results demonstrate that the increase in noise can be mitigated successfully by the use of machine learning strategies.
In the case of a superconducting qubit in the symmetric transmon design as used in recent magnetometry experiments [63,65,66], the information about magnetic field is embedded in the energy level separation as where B is the magnetic field piercing the SQUID area S of the transmon and modulating the total Josephson energy E JΣ . In order to evaluate B, we can keep τ fixed and adjust the frequency ω (generated by an external signal generator) at every step. In the case of superconducting qubits, with relatively standard values T 2 = 10 µs and τ = 1 µs we obtain ν = 0.9 (one of the values used in Sec. IV) if the noise is Gaussian and ν = 0.99 if it is 1/f noise.
In order to increase the precision of determining the magnetic field, we can use a higher excited state [65,67] -for example, the second excited state |2 , and considering the harmonic approximation for the transmon, the relevant accumulated phase will be ≈ 2ω 01 τ ; or we can bias the transmon to a magnetic field value where the slope dω 01 /dB is larger. Both situations result in an increase in the noise. In the first case, this is due to the fact that higher energy levels have higher couplings to the electromagnetic environment. This causes an increase in T 2 , cause due to both an increase in the T 1 time and an increase in the pure dephasing time. For example, if the second excited state is used, the decay rate of the 1 − 2 transition is twice that of the 0 − 1 transition, which for T 1 -limited qubits results in a significant reduction of the interferometric visibility. In the second situation, due to biasing far from the sweet point. In the latter case, let us restrict ourselves for simplicity to the interval BS ∈ [−Φ 0 /2, Φ 0 /2]. We then have This slope is infinite for BS = ±Φ 0 /2, apparently allowing us to measure B with an infinite precision. However, the displacement of the bias point from the sweet point is accompanied by a significant increase in noise, since the qubit is no longer protected against linear-oder fluctuations. This results again in a visibility ν below unity. The noises in the control phase θ m are typically not caused by electronics in the feedback look, but they result from uncontrollable frequency shifts that are poorly understood and controlled. Experimentally, these shifts are of the order of a few MHz. If τ is of the order of microseconds, then the resulting values of λ and σ are well below those considered in this work. Therefore, this type of noise will not affect the performance of our protocols when run on a superconducting qubit.
In the case of trapped ions, sensitive magnetometry using the well-known 171 Yb + ion has been demonstrated [68]. This uses four hyperfine states, |F = 0, m F = 0 , |F = 1, m F = 1 , |F = 1, m F = −1 , and |F = 1, m F = 0 belonging to the 2 S 1/2 manifold. The latter three states are degenerate and they are separated by the hyperfine splitting ω hf /(2π) = 12.642 GHz from the first state. The degeneracy of these three states can be lifted by the application of a magnetic field. In the first order in magnetic field, the state |F = 1, m F = 0 remains unmodified but |F = 1, m F = ±1 acquires frequency shifts of ±(g e µ B /2h)B, where g e ≈ 2 is the g-factor of the electron and µ B is the Bohr magnetron. Thus, for magnetic field detection one could in principle use the state |F = 0, m F = 0 and either of the magnetic-sensitive states |F = 1, m F = ±1 and drive resonant Ramsey π/2 microwave pulses at around 12 GHz with τ time separation. Then the information about magnetic field is obtained from the phase φ = (ω hf ± g e µ B B/2h)τ . These states would be exposed not only to the magnetic field that we would like to sense, but also to magnetic field noise, making our results for the noisy case relevant. Further improvements may be achieved by the use of a continuous dynamical decoupling technique, where one could identify the |F = 1, m F = 0 ≡ |0 and the dark state 1 √ a T 2 time exceeding one second [69], three orders of magnitude more than the bare atomic states which is a few miliseconds.
Similar ideas can be applied to NV centers in diamond. These defects have very long decay times, of the order of tens of miliseconds, and total decoherence times are of the order of microseconds and can be extended to hundreds of microseconds by the use of dynamical decoupling pulses [70]. Single NV centers have a triplet S = 1 ground-state structure, with the states m S = 0 and m S = ±1 separated by the so-called zero-field splitting D = 2.87 GHz. By applying a magnetic field along the crystal axis, the levels m S = ±1 can be further split by the Zeeman effect [71]; the resulting energy level difference between these levels is 2g e µ B B/h = 2γ e B where µ B is the Bohr magnetron and g e is the electronic g-factor, g e ≈ 2. The gyromagnetic ratio γ e is defined as γ e = g e µ B /h. Because the triplet states can be selectively addressed with microwaves [71] we can immediately identify for example |0 = |m S = 0 and |1 = |m S = 1 as the basis corresponding to our formalism, and we obtain ω 01 = 2πD + γ e B. The encoding of the magnetic field in frequency and subsequently in the phase is linear, Φ = (2πD + γ e B)τ .
To summarize, each of these experimental systems involved relatively well characterized imperfections, and an evaluation of our protocols for any specific setup can be realized by inspecting the general results obtained for various types of noise in Section IV. We find that the precision can be optimized by using machine learning protocols to mitigate the noise resulting from the increase in sensitivity. We also note that this tradeoff is expected to occur also if machine learning algorithms are included as subroutines in protocols that use highly entangled states to increase the precision, since highly entangled states are also very susceptible to noise.

Conclusion
The objective of this work was to study the performance of machine learning algorithms, namely the DE and PSO algorithm, in a quantum phase estimation scheme and determine their ability to come close to the SQL without resorting to multiparticle entanglement. To this end, both algorithms were tested against different configuration scenarios and were able to follow the SQL curve up to a given value of input qubits N . Under the constraint of G = 100 it was possible to notice that the algorithms start to loose performance for larger values of N . This becomes even more relevant in the scenario with quantum decoherence. However, it is important to reiterate that this is not a deficiency of the algorithms, but a direct consequence of the time and computational resources available.
These limitations can be overcome in future works by optimizing the code and making it more time efficient in order to allow both algorithms a larger number of iterations before converging to a valid solution. An immediate improvement would be to fully vectorize the code which would make it significantly faster. Another direct improvement would be to parallelize the code so that it could leverage the power of stronger computers with higher number of cores. This would allow for the different independent simulation threads of both algorithms to be conducted in parallel, making the estimation process even more time efficient. One can also explore recurrent neural networks for the quantum phase estimation task, as they are particularly well suited for regression-based prediction models on time series events.
It would be interesting to see how architectures such as the Long Short-Term Memory (LSTM), the Gated-Recurrent Unit (GRU) and the Temporal Convolutional Networks (TCN) would compare against our reinforcement learning algorithms.
Overall, we have shown that machine learning algorithms can provide noise-robust policies and we benchmarked their performances for various types of noise in close connection to experimental realizations. Above a certain critical value of noise, we found that machine learning based protocols show better results than non-adaptive methods. This opens a door to future developments that may push even further the precision of quantum phase estimation with classical machine learning algorithms without resorting to preparation and measurement of highly entangled states.

Appendix
A.1 Theoretical background Probabilities Here we present analytical results related to the calculation of the Holevo variance. Given a qubit fed into the adaptive scheme with the adjustable phase set to θ m and the unknown phase φ at step m, the probability of obtaining the result ζ m is given by Eq. (9) where the + and − signs correspond to the initial qubit in the state |0 respectively |1 , and ν is the visibility, ν ∈ [0, 1].
To simulate numerically P ± (ζ m |φ, θ m ) we draw a random number from the interval [0,1]. If this number is smaller than P (ν) ± (0|φ, θ m ), then the result ζ m of the measurement is recorded as 0; if it is larger, it is recorded as 1. With these notations, we can understand easily the origin of the SQL. Indeed, the information about phase is contained in the probabilities Eq. (17). However, the evaluation of this probabilities is affected by measurement projection noise.
Standard Quantum Limit Suppose now that we try a simple non-adaptive strategy where we just repeat the Ramsey experiment for N times with an initial states |0 and the same control phase θ, selecting the results with ζ = 1. The precision δφ that we can achieve is given by δP + (1|φ, θ) = (1/2)ν sin(θ − φ)δφ. To achieve a small δφ it is advantageous to measure at phase parameters where the sine in this expression is maximal, that is, at around P + (1|φ, θ) = 1/2. At the same time, the measurement results have a binomial distribution, with [∆P + (1|φ, θ)] 2 = (1/N )P (ν) , which results in ∆P + (1|φ, θ) = 1/(2 √ N ) around the region of maximal sensitivity P + (1|φ, θ) = 1/2. In order to obtain a signal-to-noise ratio of at least 1, the uncertainty noise should be at most the same as the signal, δP + (1|φ, θ) ≥ ∆P + (1|φ, θ), which results in Thus, the precision is limited by the SQL. As expected, a small visibility ν = 1 results in an increased (∆φ) SQL , corresponding to a loss in precision.
We can make this argument more rigorous by using the Cramér-Rao bound, The maximum value of F φ,θ is obtained for φ − θ = π/2, 3π/2 and equals ν 2 . We then obtain the relation Eq. (18). An alternative derivation consists of calculating the fluctuations of the , reflecting the fact that the fluctuations of the σ z 's are uncorrelated.
Adaptive measurements In an adaptive measurement, the control phase is adjusted depending on the previous measurement values. Here, in order to simplify the notation, we will not write anymore the ± subscript for probabilities. Then, the probability of a sequence ζ m ≡ {ζ 1 , ζ 2 , ..., ζ m } of results given the phase adjustments θ m ≡ {θ 1 , θ 2 , ...., θ m } is obtained as From the Bayes' theorem, we have at the m'th measurement The prior P (φ) is taken uniformly distributed P (φ) = 1/(2π) and the marginalization P ( ζ m ) = 2π 0 dφP ( ζ m |φ, θ m )P (φ). To further understand how the Holevo variance changes at every step of the algorithm, we present a recursive calculation procedure based on the Fourier transform. We start by examining the sharpness appearing in the Holevo variance for a given depth m, From Eq. (22) and Eq. (17) we notice that the probability density is a harmonic function of φ, 2φ, 3φ... therefore we can use the Fourier transform where for clarity we use [k] as the index of the k'th Fourier component and with inverse given by The Fourier coefficients can be calculated recursively. To see this, we apply the Bayes rule Eq. (22) where the last fraction is a normalization factor. Thus, if we consider non-normalized probabilities p, we have the Bayes relation p(φ| ζ m , θ m ) = p(φ| ζ m−1 , θ m−1 )p( ζ m |φ, θ m ), from which we find the Fourier recurrence relation To recover the correct normalization we simply impose However, even with the use of these analytical results the evaluation of probabilities would require significant computational resources especially at large N . The reason is that, as it is clear from Eq. (23), the evaluation needs to be done for all 2 N vectors ζ N , requiring exponentially more resources as N increases.

A.2 Differential Evolution
To study the convergence of the DE algorithm, two parameter control tests were done varying the controllable parameters of interest, F and C, while keeping all the other values constant. The tests were conducted for N = 10 qubits, P = 20 populations, G = 50 generations and K = 1000 training instances. All sources of noise were neglected during the test, since they are not needed to study the overall performance of the different controllable parameters configurations and would only slow down the learning process.
The first parameter control test concerned the ability of the algorithm to converge to a solution, whether it was a valid one or not, with different possible parameter configurations. Referring to Eq. (10) to evaluate the convergence of the algorithm the results obtained are displayed in Table 2  Inspecting the results in Table 2 it is possible to find a pattern where only the configurations where the amplification constant F is strictly smaller than the crossover constant C lead to the convergence of the algorithm within the given number of generations. In other words: F < C guarantees the convergence of the DE algorithm. This newfound rule can be understood from a physical interpretation of the process. Larger values for the crossover parameter C encourage the different candidate solutions to experience more configurations, thus covering a larger area of the entire search space. This increased exploration of the search space, leads to a more exhaustive pursuit of the best policy solution. However, by increasing the amplification constant F , the step of this search process is also increased. This may lead the algorithm to overlook possibly desirable candidate solutions, which ultimately may not allow the algorithm to converge. Hence, smaller values of F for larger values of C ensure a more careful examination of the search space at each iteration of the algorithm.
The second parameter control test was concerned with the ability of the algorithm to converge to a valid a solution based on Eq. (5). Recalling that lower values of the Holevo variance correspond to candidate solutions with a higher degree of accuracy and keeping exactly the same configuration setup as before, the results obtained are shown in Table 3.   It is possible to see in Table 3 that the parameter configurations that previously lead the algorithm to converge to a solution also lead the algorithm to converge to a valid solution if and only if F > 0. Hence, the previously found rule can be made more specific: F < C ∀ F > 0 guarantees the convergence of the DE algorithm to a valid solution. Note, however, that the algorithm also obtained good performance results even with configurations that did not follow the previously found rule. These are misleading results and can be quickly disregarded, since the algorithm is told to stop regardless of having converged to a solution or not. Therefore, in these situations the resulting averaged solution among all the populations may haphazardly coincide with a valid one. These situations must, nevertheless, be regarded as fortuitously events and not be considered as possible valid solutions as the algorithm was not indeed able to converge to any solution.
Having arrived at the conclusion that F must be strictly smaller than C while still being bigger than zero for the DE algorithm to work as desired, it is important to find the exact parameter configuration that guarantees the most promising results.
A more thorough control test regarding the performance of the algorithm for each configuration following this rule was conducted three times and its results averaged and presented in Fig. 10. Note that this time, the step between the different parameter configurations was reduced to avoid overlooking possible optimal parameter configurations.
Looking at the results in Fig. 10 it is possible to notice that the best results are obtained among the configurations where C is only slightly bigger than F . It is also possible to notice that these results seem to be stronger for values of F in the interval [0.3, 0.7]. Ensuring that the step size at each iteration of the algorithm is not too big as to overstep a possible valid solution, it should not be too small either to avoid getting stuck in a local minimum. This explains why these values of F lead to better results. Among these parameter configurations, the ones where where the value of C is in the interval [0.4, 0.8] and is only slightly bigger than F lead to the overall best results. Hence, the optimal parameter configuration considered for the DE algorithm is: F = 0.7 and C = 0.8.

A.3 Particle Swarm Optimization
The PSO algorithm has four controllable parameters of interest. Two of these parameters, α and β, are mostly concerned with the ability of the algorithm to converge to a valid solution, while the other two, w and v max , regulate the speed at which the parameter will converge to a solution.
As before, the first step is to study the overall ability of the algorithm to converge to a solution with the different possible parameter configurations. Keeping in mind that only the parameters α and β have a direct impact on the convergence process, all the remaining parameters were set to a constant value and the results obtained are presented in Table 4.  Table 4: Convergence values L(α, β) for the PSO algorithm. Convergence was considered only for values of L ≤ 0.1256, which corresponds to a maximum dispersion of approximately 2% of the entire 2π search space for each entry of the different candidate solution vector.
Considering the results obtained, it is possible to see that as long as β > 0 the algorithm is able to converge to a solution, even if it may take more than G = 50 iterations. This can be intuitively understood by remembering that the β parameter defines the desirability of each individual to follow the best found solution of the entire group until that point. Thus, as long as this value is bigger than zero, each individual will feel the urge to move towards the best globally found solution. In other words: β > 0 guarantees the convergence of the PSO algorithm. Additionally, it is also possible to notice that for values of β equal or larger than α the algorithm is always able to converge to a solution within the stipulated number of iterations. This leads to a second conclusion: β ≥ α ∀ β > 0 guarantees the convergence of the PSO algorithm within the given number of iterations. However, for β values much larger than α the algorithm converges within very few iterations as it gets stuck in local minima. These configurations privilege moving towards the already found best global solution in detriment of exploring other new possible solutions.
Keeping the same configurations as before, it is necessary to study which of these parameter configurations lead to valid solutions. Referring to Eq. (5) to evaluate their performance, the results obtained are shown in Table 5.    Table 4. Lower variance values are obtained by solutions that lead to more precise estimations of the unknown parameter that is being measured.
Considering the previously found rule and the results obtained in Table 5 is is possible to see that for β ≥ α while keeping β > 0 not only guarantees the convergence of the algorithm, but also guarantees the convergence of the algorithm to a valid solution. Therefore, it is possible to come to the conclusion that: β ≥ α ∀ β > 0 guarantees the convergence of the PSO algorithm within the given number of iterations to a valid solution.
Having arrived to this conclusion it is still important to determine which parameter configuration leads to the best results. As before, a more thorough control test on the performance of the algorithm under these conditions was conducted three times and its results averaged and displayed in Fig. 11. Notice that, once again, there is a smaller increment between each parameter step in order to avoid overlooking any possibly better parameter configuration.
Unlike with the DE algorithm, the results obtained for the PSO algorithm are less contrasting. Even though configurations where the value of α and β are similar to each other deliver slightly better results, it appears that having β ≥ α ∀ β > 0 is a condition strong enough to arrive at optimal candidate solutions. This robustness of the algorithm against different possible parameter configurations would have allowed for a multiple number of pair of configurations to be chosen. Therefore, the first two optimal controllable parameters chosen for the PSO algorithm are α = 0.8 and β = 0.8 as they appear to stand out even if only slightly.
To study how the remaining two controllable parameters, the update weight w and the maximum velocity v max , might impact the performance of the algorithm a similar study was also made for them. Keeping all of the other parameters constant and already using the optimal configuration α = 0.8 and β = 0.8, the performance of the algorithm was studied for the different possible configurations of w and v max . However, this time all the different configurations of these two parameters have lead the algorithm to converge to a solution and, most importantly, to a valid one. This comes as no surprise since these parameters, from a physical viewpoint of their behaviour on the search space, were not expected to have any influence on the ability of the algorithm to converge to a solution. Their impact was mostly expected on converging times.
Having arrived to an optimal configuration for the parameters α and β, the algorithm is quite robust to any configuration of the parameters w and v max and is able to converge to a solution in the available number of iterations. It is nevertheless important to bare in mind that these parameter analysis was made for N = 10. For larger values of N , the convergence speed of the algorithms becomes more relevant as the search space scales polynomially with N . Unfortunately, such a parameter evaluation would also have been too computationally expensive and time consuming. Keeping this in mind, the optimal configuration considered for the PSO algorithm is: α = 0.8, β = 0.8, w = 0.8 and v max = 0.2.