 Research
 Open Access
 Published:
Unsupervised strategies for identifying optimal parameters in Quantum Approximate Optimization Algorithm
EPJ Quantum Technology volume 9, Article number: 11 (2022)
Abstract
As combinatorial optimization is one of the main quantum computing applications, many methods based on parameterized quantum circuits are being developed. In general, a set of parameters are being tweaked to optimize a cost function out of the quantum circuit output. One of these algorithms, the Quantum Approximate Optimization Algorithm stands out as a promising approach to tackling combinatorial problems. However, finding the appropriate parameters is a difficult task. Although QAOA exhibits concentration properties, they can depend on instances characteristics that may not be easy to identify, but may nonetheless offer useful information to find good parameters. In this work, we study unsupervised Machine Learning approaches for setting these parameters without optimization. We perform clustering with the angle values but also instances encodings (using instance features or the output of a variational graph autoencoder), and compare different approaches. These anglefinding strategies can be used to reduce calls to quantum circuits when leveraging QAOA as a subroutine. We showcase them within RecursiveQAOA up to depth 3 where the number of QAOA parameters used per iteration is limited to 3, achieving a median approximation ratio of 0.94 for MaxCut over 200 ErdősRényi graphs. We obtain similar performances to the case where we extensively optimize the angles, hence saving numerous circuit calls.
1 Introduction
Noisy IntermediateScale Quantum (NISQ) era hardware [1] faces many limiting challenges preventing faulttolerant quantum algorithm execution (e.g., the number of qubits, decoherence, etc.). Hence nearterm hybrid quantumclassical algorithms were designed as an alternative for applications such as quantum chemistry problems [2], quantum machine learning [3] and combinatorial optimization [4].
With a userspecified depth p, the Quantum Approximate Optimization Algorithm (QAOA) [4] consists of a quantum circuit involving 2p real parameters (or angles). QAOA exhibits a few properties that makes it interesting for combinatorial optimization such as a perfect theoretical performance at infinite depth [4], a sampling advantage [5] and the concentration of parameters [6]. The latter suggests that optimal parameters found for one instance can be reused on another. Most importantly, this means we can reduce the classical optimization loop and number of calls to a quantum device (saving runtime of QAOAfeatured algorithms).
Many works have studied or illustrated this concentration property [6–14]. However, in many algorithms which feature QAOA as a subroutine [15–19], many distributions of instances are generated and several areas of parameter concentrations may arise. Hence, balancing between finding good QAOA parameters and reducing circuit calls will be key to QAOAfeatured algorithms.
In this work, we propose to apply unsupervised learning for setting QAOA angles, namely clustering. Our main contributions are as follows:

We consider different approaches for the problem of setting QAOA angles with clustering: using directly the angle values, instance features, and the output of a variational graph autoencoder as input to the clustering algorithm.

We analyze our methods by comparing them on two types of problems: MaxCut on ErdősRényi graphs and Quadratic Unconstrained Binary Problems on random dense matrices.

We demonstrate that our techniques can be used to learn to set QAOA parameters with respectively a less than 1–2% reduction (in relative value) in approximation ratio in crossvalidation while reducing circuit calls.

We show that leveraging instance encodings for angle setting strategies yields better results than using angle values only.

Finally, we demonstrate their usage in RecursiveQAOA (RQAOA) [19] up to depth 3 on the ErdősRényi graphs. We limit the number of QAOA circuit calls per iteration to 3 (in contrast to a de novo optimization which would require many more calls), and achieve a 0.94 median approximation ratio. With our approaches, we obtain similar performances to the case where we extensively optimize the angles, hence saving numerous circuit calls.
The structure of the paper is as follows. Section 2 provides the necessary background and related works. Section 3 analyses the optimal angles found in both problems, pointing to concentration effects and the suitability of clustering. Section 4 shows different unsupervised learning strategies using different data encoding for clustering and the comparison between them. Section 5 sums up our experiments on RQAOA. We conclude this work with a discussion in Sect. 6.
2 Background
2.1 QUBO and QAOA
Quadratic Unconstrained Binary Optimization (QUBO) problems are specified by the formulation \(\min_{x \in \{0,1\}^{n}} \sum_{i \le j} x_{i} Q_{ij} x_{j}\) where n is the dimensionality of the problem and \(Q\in \mathbb{R}^{n\times n}\). This formulation is connected to the task of finding so called «ground states»of «Ising models», i.e., configurations of binary labels \(\{1,1\}\) minimising the energy of spin Hamiltonians, commonly tackled in statistical physics and quantum computing, i.e.,:
where \(h_{i}\) are the biases and \(J_{ij}\) the interactions between spins. QUBO can express an exceptional variety of combinatorial optimization (CO) problems such as Quadratic Assignment, Constraint Satisfaction Problems, Graph Coloring, and Maximum Cut [20].
The QAOA algorithm [4] was inspired by adiabatic quantum computing with the goal to tackle CO problems. It consists of a quantum circuit whose construction depends on the classical cost function. Indeed, the latter is encoded in a quantum Hamiltonian defined on N qubits by replacing each variable \(s_{i}\) in Eq. (1) by the singlequbit operator \(\sigma _{i}^{z}\):
Here, the bitstring corresponding to the ground state of \(H_{C}\) also minimizes the cost function. Another Hamiltonian named mixer \(H_{B} = \sum_{j=1}^{N} \sigma _{j}^{x}\) is also employed in QAOA. These operators are then used for building a quantum circuit with real parameters and organized as layers. This circuit is initialized in the \({ \vert {+} \rangle }^{\otimes N}\) state, corresponding to all bitstrings in superposition with equal probability of being measured. Then, applying p layers sequentially yields the following quantum state:
defined by 2p real parameters \(\gamma _{i},\beta _{i}\), \(i=1,\ldots,p\) or QAOA angles as they correspond to angles of parameterized quantum gates. Such output corresponds to a probability distribution over all possible bitstrings. The classical optimization challenge of QAOA is to find the sequence of angles γ, β minimizing the expected value of the cost function from the measurement outcome. In the limit of infinite depth, the distribution will converge to the global optimum.
An interesting property of the algorithm is the concentration of the QAOA objective for fixed angles [6] due to typical instances having (nearly) the same value of the objective function. Additionally, the QAOA landscape is instanceindependent when instances come from a «reasonable» distribution (with the number of certain types of subgraphs of fixed size themselves concentrate, which in turn implies the values concentrate). Hence, we can focus on finding good parameters on a subset of instances that could be reapplied to new ones, with a few extra calls to the quantum device in order to refine. As stated earlier, in the most general case, characterizing distributions which are «reasonable» may be involved, or even characterizing the distribution at hand may be hard. Previous work [6, 13, 14] referenced [12] reported concentrations over optimal parameters even when QAOA is applied on random instances. These distributions over optimal parameters are empirically shown to behave nontrivially with respect to n. [12] pointed out this problem as «folklore of concentrations».
Hence, even though angles concentrate in many settings asymptotically, for finitesize problems, different areas of concentration may rise. Therefore, choosing good angle values is challenging, especially when considering the runtime of quantum algorithms. As such, some studies built on this property and resorted to using Machine Learning (ML) or characterizing instances by some properties for finding good QAOA parameters. We present a few of them in the next subsection.
2.2 Related work
Many previous works have extensively employed the concentration property [7–14]. Among them, a few employed ML or designed strategies for setting good QAOA parameters for different objectives. In [8], a simple kernel density model was trained on the best angles and instances solved by QAOA to exhibit better QAOA optimization than the NelderMead optimizer. Parameter fixing strategies for QAOA are also studied in [7, 9] where the bestfound angles at depth p are used as starting points for depth \(p+1\) before using a classical optimizer.
[13] present a strategy to find good parameters for QAOA based on topological properties of the problem graph and tensor network techniques. [10] point out that the success of transferability of parameters between different problem instances can be explained and predicted based on the types of subgraphs composing a graph. Finally, metalearning is used in [11] to learn good initial angles for QAOA. They focused on initializationbased metalearners in which a single set of parameters is used for a distribution of problems as initial parameters of a gradientbased optimizer. The metalearner is a simple neural network that takes as inputs some metafeatures of the QAOA circuit to predict the angles to apply (depth and which angle to output the value). However, no instancerelated features are involved in their work.
In our case, we focus on clustering with the goal of proposing many parameter values to try for new QAOA circuits. In contrast to all the approaches we discussed above, we do not use a classical optimization loop after setting them. Hence, our approaches allow balancing between circuit calls of small quantum computers and performances. Such settings for instance naturally occur in divideandconquertype schemes to enable smaller quantum computers to improve optimization [15–18], or in RecursiveQAOA [19] as we demonstrate later.
3 Revisiting the concentration property
In contrast to previous related works, we propose unsupervised approaches that also exploit these concentration effects. We take a datadriven approach where from examples of good angles, we will infer new good angles for new instances. Namely, we use clustering in order to obtain clusters that can be used to reduce calls to the quantum device to small numbers (in our case, less than 10) when applying QAOA on new instances, without further optimization.
We take a usual ML approach to this problem. First, from generated instances, we apply exploratory data analysis [21] (EDA) that suggests clustering may be a good approach for recommending good angles to new instances. Namely, we look at the density of angle values and apply tdistributed stochastic neighbor embedding (tSNE) [22] for visualizing concentration effects. tSNE is a nonlinear dimensionality reduction technique for mapping highdimensional data to a lower ddimensional space (typically \(d\in \{2,3\}\)). Briefly, this method constructs a probability distribution to measure the similarity between each pair of points, where closer pairs are assigned with a higher probability. Then, in the lowerdimensional space \(\mathbb{R}^{d}\), we use a Student tbased distribution to quantify the similarity among the embeddings of the original data points. Finally, the optimal embeddings are chosen by minimizing the Kullback–Leibler divergence between the similarity distributions in the original and the lowerdimensional spaces. We follow by explaining how clustering is used in order to recommend angles for new instances. The approaches we outline differ in input to the clustering algorithm. We consider clustering from the angle values directly but also from instance encodings. Finally, we compare these approaches allowing us to provide recommendations for their usage.
3.1 Data generation
We generated two datasets that show different concentration behavior. The first one consists of 200 ErdősRényi graphs for MaxCut problems. The graphs have 10, 12, 14, 16 and 18 nodes. We utilized the following probabilities of edge creation: 0.5, 0.6, 0.7, 0.8. We have generated 10 graphs per number of nodes and probability. The second dataset consists of 100 instances of QUBO problems, specified by their weight matrix Q (20 per aforementioned number of nodes). Their coefficients are sampled uniformly in \([1,1]\). For the purpose of computing approximation ratios, we are interested in \(C_{\mathrm{opt}}\) – the maximal value of the MaxCut (or QUBO) – over all possible bit configurations, and as a reference, this was computed using bruteforce. Our experiments were achieved using a classical simulator.
We then obtained for each problem the best set of angles by running the BFGS optimizer [23] 1000 times for \(p=1,2,3\), and selecting the ones which achieve the best QAOA objective. BFGS with random restarts is deemed a very good optimizer for continuous differentiable functions [24]. These angles are saved as a database and apply unsupervised approaches to learn to set optimal angles for unseen instances. Our approach is clearly optimization methodspecific but can be applied to other stateoftheart optimizers. Different optimizers would give different data (as the optimizers could fail to find the optimal QAOA parameters) but they can be combined and one would select the best set of angles found among all considered.
3.2 Exploratory data analysis
Having obtained the optimal angles, we apply EDA to observe concentration effects. We look at their corresponding performance ratios using the average cost yielded by QAOA for angles γ, β denoted with \(E_{\gamma ,\beta} (C)\). For MaxCut on unweighted ErdősRényi graphs, we compute approximation ratios as \(\frac{E_{\gamma ,\beta} (C)}{C_{\mathrm{opt}}}\). This value is upper bounded by 1, which is the optimal value. For QUBOs, we compute optimality gaps \(\frac{C_{\mathrm{opt}}  E_{\gamma ,\beta} (C)}{C_{\mathrm{opt}}} \) as the optima were all negative and the closer to 0, the better. We show boxplots in Fig. 1 the ratios wrt depth. Increasing depth results in better ratios.
Next, we looked at the distribution of \(\gamma _{i}\), \(\beta _{i}\) values. Figure 2 shows that the concentration per each parameter is significant since their corresponding density functions are quite peaky. Also, we also observed multiple clusters of angles as the density functions are multimodal. Finally, we applied tSNE with two components to visualize the angle values in 2D for \(p=2,3\). This highlights potentially a number of clusters for each depth and problem. Note that it may be possible that we may not obtain global optima with these angles, or know if they are unique.
We notice that the probability of edge creation, represented by a different color, does not seem to influence the clusters. For dense QUBOs, we observe one important cluster and a few instances that start to form another. Finally, in the dense instances case, we witness a more important spread in angle values at depth 1. This can be explained by differences between instances. Although the concentration effect is present, such order of magnitude will impact the performances of parameter setting strategies, and make an interesting playground to benchmark them.
Using clustering techniques can then reveal potential areas of QAOA angles values where good angles can be found to try on new instances. The angle values related to clusters can be used as recommendations for new instances. This becomes interesting as this enables lowering runtime and allow comparing based on function evaluations, or on the number of quantum circuit calls, in algorithms where QAOA would be used as a subroutine.
4 Clusteringbased (unsupervised) learning for angles
As the EDA highlights a clustering effect, we propose different clustering approaches that use different data for angle recommendations. Namely, we describe first using the angle values directly for building clusters serving as angles to try. Then, we switch to using instancerelated features. Finally, for the unweighted case, we use graph autoencoders whose outputs can be used for clustering instead of computing graph features. In the following, we detail each clustering approach for flexible angle recommendation.
4.1 Identifying clusters of angles or problem instances
We first considered clustering using angle values. Given a database of optimal angles for Q problem instances \(\{ I_{1}, \ldots , I_{Q} \}\), \(\{ (\gamma ^{\ast} , \beta ^{\ast})_{1}, \ldots ,(\gamma ^{\ast} , \beta ^{\ast})_{Q} \}\), this can be seen as computing or selecting a good set of angle values the database to apply on new instances. In this case, we do not use the problem instances during clustering. Given a userspecified number of angles to be tested K, this set of angle values is then applied to new QAOA circuits. To specify them, we can use a clustering algorithm on the database \(\{ (\gamma ^{\ast} , \beta ^{\ast})_{1}, \ldots , (\gamma ^{\ast} , \beta ^{\ast})_{Q} \}\). For instance, Kmeans [25] will output centroids to use directly as angle recommendations for QAOA on new instances. The Kmeans algorithm aims to partition a set of n data points \(x_{i}\) into K disjoint clusters C, characterized by the mean/centroid of the points within a cluster, denoted \(\mu _{j}\). The partition \(P = \{P_{1}, P_{2}, \ldots , P_{K}\}\) (\(\forall i\neq j \in [1..K]\), \(P_{i} \neq \emptyset \), \(P_{i} \cap P_{j}=\emptyset \), \(\cup _{i} P_{i} = \{x_{i}\}_{i=1}^{n}\)) is chosen by minimizing the withincluster sum of squares, i.e., \(\operatorname{arg\,min}_{P}\sum_{i=1}^{K}\sum_{x\in P_{i}}x  \mu _{i}^{2}\), where the centroid \(\mu _{i} = P_{i}^{1}\sum_{x \in P_{i}}x\). The algorithm iteratively updates the centroids by assigning each data point to its nearest centroid and computing the mean, until convergence.
To incorporate knowledge from instances when recommending angles, we change the data fed to the clustering algorithm. We distinguish computing instance features from learning an embedding, that is a userdefined Fdimensional representation or encoding of the instances as data. We denote an encoding of an instance \(I_{t}\) as \(f(I_{t})\). The angle recommendation framework using a clustering algorithm for such instance representation is presented in Algorithm 1. First, clusters are learned from the encodings extracted from training data. Then, we find the instances in the database that are the closest in distance to the clusters, and their corresponding optimal angles .^{Footnote 1} The latter are then used for QAOA circuits on new instances, from which we keep the best QAOA output.
4.2 Instance encodings
In this work, we show two main approaches to encoding the instances for clustering. First, we computed a set of features following [17, 26]. Such features were used in [26] to decide among classical heuristics to solve MaxCut and QUBO problems. Inspired by [26], the features were also used for choosing when to apply QAOA against a classical approximation algorithm [17]. For ErdősRényi graphs, we took the graph density, the logarithm of the number of nodes and edges, the logarithm of the first and secondlargest eigenvalues of the Laplacian matrix normalized by the average node degree and the logarithm of the ratio of the two largest eigenvalues. For QUBOs, we reduced them to the MaxCut formulation and used the logarithm of the number of nodes, and the weighted Laplacian matrix eigenvaluesbased features.
We also show how to use graph embeddings using Graph Neural Networks (GNNs) [27], avoiding the need for the user to have to compute the features. We employ the Variational Graph AutoEncoders (VGAE) [28]. This technique only works on unweighted graphs by its design principle. Consequently, we only applied it to the MaxCut instances later in this work. a VGAE learns latent embeddings \(\mathbf{Z} \in \mathbb{R}^{N\times F}\) where F is the dimension of the latent variables and N the number of nodes. Given the adjacency matrix A and nodes feature vector X, the model outputs the parameters of a Gaussian distribution μ, σ for the latent representation generation. We feed to the model the ErdősRényi graphs, and we add as node features the degree of the nodes. Once learning is completed, we compute the embeddings by a common average readout operation [27, 29]. The latter operation can be defined as averaging the node embeddings for a graph with vertex set \(\mathcal{V}\) \(\frac{1}{\mathcal{V}}\sum_{n\in \mathcal{V}}Z_{n}\). This allows having a fixed dimension F for the encoding to be used by a clustering algorithm.
Having defined different strategies for clustering, we apply them to the data we generated and compare their performances. In the following section, we present our results obtained by taking a Machine Learning approach, starting from a simple baseline and crossvalidating each method.
4.3 Results
In this section, we apply the abovementioned proposed strategies to the generated data where EDA revealed different areas of concentration. As the first baseline for angle setting strategy, we experiment with simple aggregation of angle values (median and average). Then we follow this up by Kmeans by varying the number of clusters from 3 to 10 as the underlying clustering algorithm. Finally, we change the Kmeans data to cluster based on instance encodings instead of angle values. We computed first a set of graph features that were used in a previous study [30]. Then we investigate graph autoencoders to learn the encodings of the Maxcut instances. We crossvalidate each method using 5fold crossvalidation where we report the ratios \(\frac{(C_{\mathrm{opt}}  E_{\gamma ,\beta} (C))}{(C_{\mathrm{opt}} E^{\mathit{cluster}}_{\gamma ,\beta} (C))} \) on test instances. A value higher than 1 would mean that the average cost yielded by clustering has improved over the one found by optimization. We also consider the case where one trains on smaller instances to apply to the bigger ones.
4.3.1 From angle values
As simple baseline, we compute the average and the median of the optimal angles from the database \(\{ (\gamma ^{\ast} , \beta ^{\ast})_{1}, \ldots ,(\gamma ^{\ast} , \beta ^{\ast})_{Q} \}\). From depthaggregated results, averaging the angle values yielded a median ratio of 0.524 for MaxCut and 0.672 for QUBOs, while taking the median values increased it to respectively 0.950 and 0.941. This can be explained by the fact that the median value is statistically more robust than the mean when handling data sets with large variability.
As expected with Kmeans, increasing the number of clusters yielded better median ratios. With \(K=10\), the median ratios are 0.998 and 0.985 on each dataset, a less than 1–2% reduction in performances w.r.t. the optimal angles. Figure 4 shows the improvement with increased number of clusters. We observe also that with increased depth, median ratio performances are reduced. We conjecture that, when the dimension of the parameter space increases, more clusters are naturally needed to ensure a sensible recommendation.
Also, such a deterioration of performance w.r.t. circuit depth is more substantial on the QUBO instances than on the MaxCut ones, which can be explained by the clustering patterns in the MaxCut scenario being more significant and regular (Fig. 3). In addition, this observation suggests that for future work, for dense QUBO instances where the cluster center is not representative for all points pertaining to it, it is more reasonable to take a supervised learning method, which takes the problem instance as input as predicts the optimal angle values.
We also observed that, for the MaxCut problem, the cluster centroid of Kmeans can be quite distant from the data points when the number of clusters is small and the circuit depth is high. Particularly, this phenomenon deteriorates the median ratio by ca. 30% for 3 and 4 clusters with \(p=3\). Hence, we decided to take the closest data point to the centroid in each cluster as the recommendation, which solves this issue. For QUBOs, using the cluster centroids directly yields better results.
Overall, increasing the number of angles attempted will improve the quality of the QAOA output. Clearly, the results with less than 4 clusters present examples where the ratio is low, worsening the median performances. For instance, with 3 clusters on QUBOs, the median ratio is 0.915. In the context where the budget of quantum circuit calls is very limited, this could be problematic and call for more robust approaches. To this end, we consider using instance features for clustering.
4.3.2 From instance encodings
To witness whether using instance features can improve the quality of clustering, we divided the ratios obtained with instance features by the ones using angle values. We show these results in Fig. 5 and Fig. 6 where we can clearly see better ratios with less than 4 clusters, and similar results on average otherwise.
As for learned encodings or embeddings with autoencoders, the GNN model configuration we use is the same twolayer graph convolutional layer as [28]. Namely, the first one has 32 outputdimension using the ReLU activation function. This is followed by two 16dimensional output layers for the generation of the latent variables. We train using Adam with a learning rate of 0.01 for 100 epochs and batch size set to the dataset size. Our implementation uses the Deep Graph Library (DGL) [29]. The embeddings obtained by averaging are of dimension \(F=16\). This allows having a fixed dimension for the encoding as input of the same Kmeans strategy described above. We observe in Fig. 7 that the results are similar to the ones obtained using instance features. Yet, in some instances, we see better results. Hence, many clustering results can be combined to improve the performances in ratios canceling each other weaknesses at the cost of trying more angles to find the best ones. As future work, we could also decide which heuristic to use depending on a given test instance by using a ML model.
Finally, our approaches can save numerous circuit calls compared to de novo optimization. The median numbers of circuit calls for the BFGS runs giving the best QAOA angles were 56, 150, 320 for each depth respectively on MaxCut and 44, 132, 252 for QUBO, while in the cluster approach, the number of calls is always the cluster size, which is considerably smaller than the cost of BFGS. Instance size does not seem to affect the number of circuit calls by BFGS. In our approaches, we limited circuit calls to 10 and we do not need multiple restarts.
4.4 Aggregating results
Following the presentation of the different clustering approaches, we compare their performances to determine which approach works best. We propose to take the Empirical cumulative distribution functions (ECDF) of the ratios as the performance measure to compare those different approaches. Given a sample \(\{r_{i}\}_{i=1}^{R}\) of the ratios and a value of interest \(t\in [0, 1]\), ECDF is the fraction of the sample points less or equal to t: \(F(t) = \frac{1}{R} \sum_{i} \mathbf{1}_{[0, r_{i}]}(t)\), where 1 denotes the indicator function, which returns one only if \(t\in [0, r_{i}]\) and zero otherwise. They enable us to aggregate the results of the different numbers of clusters and depths. A better method will have more proportion of higher ratios, resulting in an ECDF curve located more to the right. From Fig. 8, we observe that using instance encodings is more successful in yielding better angles than using the angle values. This is also witnessed in Fig. 9 with increased depth and a low number of clusters. Also, VGAE seems to be slightly better than instance features on the MaxCut problems. However, these methods can complement each other, especially as we do not need to increase dataset size. Hence, combining them at the cost of circuit calls becomes an option for running QAOA, as we showcase with RQAOA in the next section.
4.5 Case when test instances are bigger than training instances
One important consideration of these methods is to analyze scaling. This is relevant in settings where one is interested in solving larger instances given small ones. In our case, we apply these approaches in the case \(K=3\) by a 60–40% traintest split. From Fig. 10 and 11, we find similar conclusions with respectively VGAE on MaxCut and instance features on the QUBO problems yielding better results. Note that we did not use the logarithm of the number of nodes and edges as features when using instance features as the values between training and test are too different.
5 Demonstration with RQAOA
RQAOA [19] is a recursive algorithm where, given an Ising problem \(\sum_{i,j} w_{ij} Z_{i} Z_{j} \), one starts by applying QAOA on the former. the quantum state output \({ \vert {\boldsymbol{\gamma },\boldsymbol{\beta }} \rangle }\) is then used to compute correlations \(M_{ij} = { \langle{\boldsymbol{\gamma },\boldsymbol{\beta }} \vert } Z_{i} Z_{j} { \vert {\boldsymbol{\gamma },\boldsymbol{\beta }} \rangle } \). Then, variable elimination is carried out by selecting a pair of variables satisfying \((i_{l},j_{l}) = \operatorname{arg\,max} M_{ij}\), and substituting \(Z_{j_{l}}\) with \(\operatorname{sign}(M_{i_{l}, j_{l}})Z_{i_{l}}\) in the Ising formulation. This reduces the number of variables by 1. We then get a new reduced problem and we reiterate the procedure for a number of userdefined number of iterations. The choice of iteration fixes the size of the final instance which is then solved using a bruteforce (or some other classical) approach, and the substitutions are used onto it to obtain a final solution.
As RQAOA requires optimizing many QAOA instances that iteratively shrink in sizes, we demonstrate the application of our clustering approaches in this context. We do so for the MaxCut problems where we limit the number of iterations to half of the size of the ErdősRényi graphs. We do not consider the dense QUBOs as RQAOA would reduce an original dense graph to nondense intermediate subproblems not part of the database. As per the number of QAOA parameters attempted per iteration, we limit it to 3 and apply the three clustering approaches: anglevalue, instance features, and VGAEoutput based. We do so by using our previous database and training each method on all instances to get 3 QAOA parameter recommendations. The latter are then used for QAOA on the RQAOA generated instances.
Figure 12 shows that with the three approaches, we obtain a median 0.94117 approximation ratio with RQAOA. The minimal ratio obtained is 0.8367 and the optima were found on 33 instances. When looking at each method independently, we observe that the anglevalue clustering performances at \(p=3\) are lower than the others. This is due to the fact that we use the Kmeans clusters directly as it allowed us to find more instances with a ratio of 1. Graph features and VGAE seem similar in performance, with a small advantage at depth 2 for VGAE. Looking at the frequencies where the best ratio by instance was obtained, VGAE is more successful. Respectively, each method achieves the bestfound ratios over 88, 118, and 165 instances. Finally, we also tried using random angles, by sampling uniformly values in \([0, 2\pi ]^{p}\), and optimizing further the angles from each approach with BFGS up to 100 iterations maximum. We clearly see better performances with clustering approaches compared to random angles. This is also the case when using BFGS (starting with random angles) limited to 3 circuit calls when optimizing, the same budget as our clusteringbased approaches. Dividing the MaxCut ratios obtained with BFGS with the ones without further optimization yielded a median value of 1. Hence, the results were similar to the BFGSoptimized approaches, saving many circuit calls.
To conclude, our unsupervised approaches can be used to run quantum algorithms where QAOA is used as a subroutine. They are then considered as hyperparameters that can be tweaked to achieve better performances for QAOAfeatured algorithms, depending on a userdefined budget definition. In our RQAOA showcase, the maximal depth of QAOA, as well as the number of parameters to try at each iteration, was set to 3, and optimizing further did not improve. For MaxCut on ErdősRényi graphs, leveraging VGAE in RQAOA achieved the best ratios over 82.5% of the instances.
6 Discussion
In this work, we study different strategies for fixing the parameters of QAOA based on unsupervised learning. We focused on clustering given previous works highlighting the concentration property and exploratory data analysis of the best angles found for MaxCut on ErdősRényi graphs and dense QUBOs. We however use a methodology closer to machine learning by crossvalidating compared to related work.
Furthermore, we demonstrated that these techniques can be leveraged to restrict the number of QAOA circuit calls to small numbers (less than 10) with a less than 1–2% reduction in approximation ratio on average from the best angles found when crossvalidating. We also showed how to compare different clustering strategies and that leveraging instance encodings (by computing features or computing them with a model, in our case a VGAE) for angle setting strategies yields better results than using angle values only. Although the VGAE embeddingbased is quite competitive, we recommend using the simpler instance features in practice since the VGAE brings extra computation overhead. For generalization, in regard to the problem scale, both instance features and VGAEbased clustering approaches manage to retain the performance for unseen problem instances larger than the training set. For dense QUBOs, increasing the clusters is less impactful compared to MaxCut, in which we conjecture that the clusters in QUBO are of large spread and less separable, hindering the performance of the clustering approach in higher dimensions. For both problems, it is necessary to increase the cluster number to retain a good performance when the circuit becomes deeper.
From an application perspective, we envision these techniques to be employed in algorithms where QAOA is run on a small part of the problem to solve such as divideandconquer [15, 16] and iterative algorithms [17–19]. Restricting to a few numbers of circuit calls will help decrease the runtime of quantumfeatured or quantumenhanced algorithms, making them closer to competing with classical heuristics. We showcased our approach in the context of Recursive QAOA as hyperparameters under a limited budget (QAOA depth and number of QAOA parameters per iteration limited to 3), where we were able to achieve a 0.94 median approximation ratio. With our approaches, we obtain quite comparable performance to the case where we extensively optimize the angles, hence saving numerous circuit calls.
For future work, other clustering techniques can be studied and extended to predict the angle values by instance in a semisupervised approach, and for different problem instances. Plus, ML can be used to decide which heuristic to use depending on a given test instance. We also did not apply GNN to the dense QUBOs as graph autoencoders are mostly applied to unweighted graphs. Using VGAE that can reconstruct graph adjacency and node features is then another research direction. Since we use unsupervised methods, we expect the same methodology to be used on noisy hardware. Studying different approaches to resilience under different noisy settings would be also considered of main interest. Finally, these approaches can be studied within different QAOAfeatured algorithms and under different settings (depth of QAOA, number of clusters, Ising instances properties to name a few).
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.
Notes
Since the clustering algorithm outputs encodings that do not contain QAOA angle information, we use the QAOA angles of the closest training instances to the clusters.
Abbreviations
 CO::

Combinatorial Optimization
 QAOA::

Quantum Approximate Optimization Algorithm
 ECDF::

Empirical cumulative distribution functions
 VGAE::

Variational Graph AutoEncoders
 QUBO::

Quadratic Unconstrained Binary Optimization
 EDA::

Exploratory Data Analysis
References
Preskill J. Quantum Computing in the NISQ era and beyond. Quantum. 2018;2:79. https://doi.org/10.22331/q2018080679.
Moll N, Barkoutsos P, Bishop LS, Chow JM, Cross A, Egger DJ, Filipp S, Fuhrer A, Gambetta JM, Ganzhorn M, Kandala A, Mezzacapo A, Müller P, Riess W, Salis G, Smolin J, Tavernelli I, Temme K. Quantum optimization using variational algorithms on nearterm quantum devices. Quantum Sci Technol. 2018;3(3):030503. https://doi.org/10.1088/20589565/aab822.
Benedetti M, Lloyd E, Sack S, Fiorentini M. Parameterized quantum circuits as machine learning models. Quantum Sci Technol. 2019;4(4):043001. https://doi.org/10.1088/20589565/ab4eb5.
Farhi E, Goldstone J, Gutmann S. A Quantum Approximate Optimization Algorithm. 2014. arXiv:1411.4028.
Farhi E, Harrow AW. Quantum supremacy through the Quantum Approximate Optimization Algorithm. 2016. arXiv:1602.07674.
Brandão FGSL, Broughton M, Farhi E, Gutmann S, Neven H. For Fixed Control Parameters the Quantum Approximate Optimization Algorithm’s Objective Function Value Concentrates for Typical Instances. 2018. arXiv:1812.04170.
Zhou L, Wang ST, Choi S, Pichler H, Lukin MD. Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on NearTerm Devices. 2018. arXiv:1812.01041.
Khairy S, Shaydulin R, Cincio L, Alexeev Y, Balaprakash P. Learning to optimize variational quantum circuits to solve combinatorial problems. In: Proceedings of the AAAI conference on artificial intelligence 34(03). 2020. p. 2367–75. https://doi.org/10.1609/aaai.v34i03.5616.
Lee X, Saito Y, Cai D, Asai N. Parameters fixing strategy for quantum approximate optimization algorithm. In: 2021 IEEE international conference on quantum computing and engineering (QCE). 2021. p. 10–6. https://doi.org/10.1109/QCE52317.2021.00016.
Galda A, Liu X, Lykov D, Alexeev Y, Safro I. Transferability of optimal QAOA parameters between random graphs. 2021. arXiv:2106.07531.
Sauvage F, Sim S, Kunitsa AA, Simon WA, Mauri M, PerdomoOrtiz A. FLIP: A flexible initializer for arbitrarilysized parametrized quantum circuits. 2021. arXiv:2103.08572.
Akshay V, Rabinovich D, Campos E, Biamonte J. Parameter concentrations in quantum approximate optimization. Phys Rev A. 2021;104:010401. https://doi.org/10.1103/PhysRevA.104.L010401.
Streif M, Leib M. Training the Quantum Approximate Optimization Algorithm without access to a quantum processing unit. 2019. arXiv:1908.08862.
Crooks GE. Performance of the Quantum Approximate Optimization Algorithm on the maximum cut problem. 2018. arXiv:1811.08419.
Li J, Alam M, Ghosh S. Largescale quantum approximate optimization via divideandconquer. 2021. arXiv:2102.13288.
Guerreschi GG. Solving Quadratic Unconstrained Binary Optimization with divideandconquer and quantum algorithms. 2021. arXiv:2101.07813.
Moussa C, Wang H, Calandra H, Bäck T, Dunjko V. Tabudriven quantum neighborhood samplers. In: Zarges C, Verel S, editors. Evolutionary computation in combinatorial optimization. Cham: Springer; 2021. p. 100–19.
Shaydulin R, UshijimaMwesigwa H, Safro I, Mniszewski S, Alexeev Y. Quantum local search for graph community detection. In: APS March meeting abstracts. APS meeting abstracts. vol. 2019. 2019. p. 42–009.
Bravyi S, Kliesch A, Koenig R, Tang E. Obstacles to state preparation and variational optimization from symmetry protection. 2019. arXiv:1910.08980.
Kochenberger GA, Glover F. A unified framework for modeling and solving combinatorial optimization problems: a tutorial. In: Multiscale optimization methods and applications. Boston: Springer; 2006. p. 101–24. https://doi.org/10.1007/038729550X_4.
Hinterberger H. Exploratory data analysis. In: Encyclopedia of database systems. Boston: Springer; 2009. p. 1080–. https://doi.org/10.1007/9780387399409_1384.
van der Maaten L, Hinton G. Visualizing data using tsne. J Mach Learn Res. 2008;9(86):2579–605.
Broyden CG. The convergence of a class of doublerank minimization algorithms 1. General considerations. IMA J Appl Math. 1970;6(1):76–90. https://doi.org/10.1093/imamat/6.1.76.
Hansen N, Auger A, Ros R, Finck S, Posík P. Comparing results of 31 algorithms from the blackbox optimization benchmarking BBOB2009. In: Pelikan M, Branke J, editors. Genetic and evolutionary computation conference, GECCO 2010, proceedings, companion material. July 711, 2010. Portland, Oregon, USA. New York: ACM; 2010. p. 1689–96. https://doi.org/10.1145/1830761.1830790.
Lloyd S. Least squares quantization in pcm. IEEE Trans Inf Theory. 1982;28(2):129–37. https://doi.org/10.1109/TIT.1982.1056489.
Dunning I, Gupta S, Silberholz J. What works best when? A systematic evaluation of heuristics for maxcut and qubo. INFORMS J Comput. 2018;30(3):608–24. https://doi.org/10.1287/ijoc.2017.0798.
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M. Graph neural networks: a review of methods and applications. 2018. arXiv:1812.08434.
Kipf TN, Welling M. Variational graph autoencoders. NIPS workshop on Bayesian deep learning. 2016.
Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X, Zhou J, Ma C, Yu L, Gai Y, Xiao T, He T, Karypis G, Li J, Zhang Z. Deep graph library: A graphcentric, highlyperformant package for graph neural networks. 2019. Preprint. arXiv:1909.01315.
Moussa C, Calandra H, Dunjko V. To quantum or not to quantum: towards algorithm selection in nearterm quantum optimization. Quantum Sci Technol. 2020;5(4):044009. https://doi.org/10.1088/20589565/abb8e5.
Acknowledgements
CM and VD acknowledge support from TotalEnergies.
Funding
This work was supported by the Dutch Research Council (NWO/OCW), as part of the Quantum Software Consortium programme (project number 024.003.037). This research is also supported by the project NEASQC funded from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 951821).
Author information
Authors and Affiliations
Contributions
CM, HW and VD designed all the experiments. The manuscript was written with contributions from all authors. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Moussa, C., Wang, H., Bäck, T. et al. Unsupervised strategies for identifying optimal parameters in Quantum Approximate Optimization Algorithm. EPJ Quantum Technol. 9, 11 (2022). https://doi.org/10.1140/epjqt/s40507022001314
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjqt/s40507022001314
Keywords
 Quantum computing
 Combinatorial optimization
 Quantum Approximate Optimization Algorithm
 Clustering