Skip to main content

Unsupervised strategies for identifying optimal parameters in Quantum Approximate Optimization Algorithm


As combinatorial optimization is one of the main quantum computing applications, many methods based on parameterized quantum circuits are being developed. In general, a set of parameters are being tweaked to optimize a cost function out of the quantum circuit output. One of these algorithms, the Quantum Approximate Optimization Algorithm stands out as a promising approach to tackling combinatorial problems. However, finding the appropriate parameters is a difficult task. Although QAOA exhibits concentration properties, they can depend on instances characteristics that may not be easy to identify, but may nonetheless offer useful information to find good parameters. In this work, we study unsupervised Machine Learning approaches for setting these parameters without optimization. We perform clustering with the angle values but also instances encodings (using instance features or the output of a variational graph autoencoder), and compare different approaches. These angle-finding strategies can be used to reduce calls to quantum circuits when leveraging QAOA as a subroutine. We showcase them within Recursive-QAOA up to depth 3 where the number of QAOA parameters used per iteration is limited to 3, achieving a median approximation ratio of 0.94 for MaxCut over 200 Erdős-Rényi graphs. We obtain similar performances to the case where we extensively optimize the angles, hence saving numerous circuit calls.

1 Introduction

Noisy Intermediate-Scale Quantum (NISQ) era hardware [1] faces many limiting challenges preventing fault-tolerant quantum algorithm execution (e.g., the number of qubits, decoherence, etc.). Hence near-term hybrid quantum-classical algorithms were designed as an alternative for applications such as quantum chemistry problems [2], quantum machine learning [3] and combinatorial optimization [4].

With a user-specified depth p, the Quantum Approximate Optimization Algorithm (QAOA) [4] consists of a quantum circuit involving 2p real parameters (or angles). QAOA exhibits a few properties that makes it interesting for combinatorial optimization such as a perfect theoretical performance at infinite depth [4], a sampling advantage [5] and the concentration of parameters [6]. The latter suggests that optimal parameters found for one instance can be reused on another. Most importantly, this means we can reduce the classical optimization loop and number of calls to a quantum device (saving runtime of QAOA-featured algorithms).

Many works have studied or illustrated this concentration property [614]. However, in many algorithms which feature QAOA as a subroutine [1519], many distributions of instances are generated and several areas of parameter concentrations may arise. Hence, balancing between finding good QAOA parameters and reducing circuit calls will be key to QAOA-featured algorithms.

In this work, we propose to apply unsupervised learning for setting QAOA angles, namely clustering. Our main contributions are as follows:

  • We consider different approaches for the problem of setting QAOA angles with clustering: using directly the angle values, instance features, and the output of a variational graph autoencoder as input to the clustering algorithm.

  • We analyze our methods by comparing them on two types of problems: MaxCut on Erdős-Rényi graphs and Quadratic Unconstrained Binary Problems on random dense matrices.

  • We demonstrate that our techniques can be used to learn to set QAOA parameters with respectively a less than 1–2% reduction (in relative value) in approximation ratio in cross-validation while reducing circuit calls.

  • We show that leveraging instance encodings for angle setting strategies yields better results than using angle values only.

  • Finally, we demonstrate their usage in Recursive-QAOA (RQAOA) [19] up to depth 3 on the Erdős-Rényi graphs. We limit the number of QAOA circuit calls per iteration to 3 (in contrast to a de novo optimization which would require many more calls), and achieve a 0.94 median approximation ratio. With our approaches, we obtain similar performances to the case where we extensively optimize the angles, hence saving numerous circuit calls.

The structure of the paper is as follows. Section 2 provides the necessary background and related works. Section 3 analyses the optimal angles found in both problems, pointing to concentration effects and the suitability of clustering. Section 4 shows different unsupervised learning strategies using different data encoding for clustering and the comparison between them. Section 5 sums up our experiments on RQAOA. We conclude this work with a discussion in Sect. 6.

2 Background

2.1 QUBO and QAOA

Quadratic Unconstrained Binary Optimization (QUBO) problems are specified by the formulation \(\min_{x \in \{0,1\}^{n}} \sum_{i \le j} x_{i} Q_{ij} x_{j}\) where n is the dimensionality of the problem and \(Q\in \mathbb{R}^{n\times n}\). This formulation is connected to the task of finding so called «ground states»of «Ising models», i.e., configurations of binary labels \(\{1,-1\}\) minimising the energy of spin Hamiltonians, commonly tackled in statistical physics and quantum computing, i.e.,:

$$\begin{aligned} \min_{s \in \{-1,1\}^{n}} \sum_{i} h_{i} s_{i} + \sum_{j>i} J_{ij} s_{i} s_{j}, \end{aligned}$$

where \(h_{i}\) are the biases and \(J_{ij}\) the interactions between spins. QUBO can express an exceptional variety of combinatorial optimization (CO) problems such as Quadratic Assignment, Constraint Satisfaction Problems, Graph Coloring, and Maximum Cut [20].

The QAOA algorithm [4] was inspired by adiabatic quantum computing with the goal to tackle CO problems. It consists of a quantum circuit whose construction depends on the classical cost function. Indeed, the latter is encoded in a quantum Hamiltonian defined on N qubits by replacing each variable \(s_{i}\) in Eq. (1) by the single-qubit operator \(\sigma _{i}^{z}\):

$$\begin{aligned} H_{C} = \sum_{i} h_{i} \sigma _{i}^{z} + \sum_{j>i} J_{ij} \sigma _{i}^{z} \sigma _{j}^{z} . \end{aligned}$$

Here, the bitstring corresponding to the ground state of \(H_{C}\) also minimizes the cost function. Another Hamiltonian named mixer \(H_{B} = \sum_{j=1}^{N} \sigma _{j}^{x}\) is also employed in QAOA. These operators are then used for building a quantum circuit with real parameters and organized as layers. This circuit is initialized in the \({ \vert {+} \rangle }^{\otimes N}\) state, corresponding to all bitstrings in superposition with equal probability of being measured. Then, applying p layers sequentially yields the following quantum state:

$$ {\vert{\boldsymbol{\gamma },\boldsymbol{\beta }} \rangle } = e^{-i \beta _{p} H_{B}} e^{-i\gamma _{p} H_{C}} \cdots e^{-i\beta _{1} H_{B}} e^{-i \gamma _{1} H_{C}} {\vert{+} \rangle }^{\otimes N}, $$

defined by 2p real parameters \(\gamma _{i},\beta _{i}\), \(i=1,\ldots,p\) or QAOA angles as they correspond to angles of parameterized quantum gates. Such output corresponds to a probability distribution over all possible bitstrings. The classical optimization challenge of QAOA is to find the sequence of angles γ, β minimizing the expected value of the cost function from the measurement outcome. In the limit of infinite depth, the distribution will converge to the global optimum.

An interesting property of the algorithm is the concentration of the QAOA objective for fixed angles [6] due to typical instances having (nearly) the same value of the objective function. Additionally, the QAOA landscape is instance-independent when instances come from a «reasonable» distribution (with the number of certain types of subgraphs of fixed size themselves concentrate, which in turn implies the values concentrate). Hence, we can focus on finding good parameters on a subset of instances that could be re-applied to new ones, with a few extra calls to the quantum device in order to refine. As stated earlier, in the most general case, characterizing distributions which are «reasonable» may be involved, or even characterizing the distribution at hand may be hard. Previous work [6, 13, 14] referenced [12] reported concentrations over optimal parameters even when QAOA is applied on random instances. These distributions over optimal parameters are empirically shown to behave non-trivially with respect to n. [12] pointed out this problem as «folklore of concentrations».

Hence, even though angles concentrate in many settings asymptotically, for finite-size problems, different areas of concentration may rise. Therefore, choosing good angle values is challenging, especially when considering the runtime of quantum algorithms. As such, some studies built on this property and resorted to using Machine Learning (ML) or characterizing instances by some properties for finding good QAOA parameters. We present a few of them in the next subsection.

2.2 Related work

Many previous works have extensively employed the concentration property [714]. Among them, a few employed ML or designed strategies for setting good QAOA parameters for different objectives. In [8], a simple kernel density model was trained on the best angles and instances solved by QAOA to exhibit better QAOA optimization than the Nelder-Mead optimizer. Parameter fixing strategies for QAOA are also studied in [7, 9] where the best-found angles at depth p are used as starting points for depth \(p+1\) before using a classical optimizer.

[13] present a strategy to find good parameters for QAOA based on topological properties of the problem graph and tensor network techniques. [10] point out that the success of transferability of parameters between different problem instances can be explained and predicted based on the types of subgraphs composing a graph. Finally, meta-learning is used in [11] to learn good initial angles for QAOA. They focused on initialization-based meta-learners in which a single set of parameters is used for a distribution of problems as initial parameters of a gradient-based optimizer. The meta-learner is a simple neural network that takes as inputs some meta-features of the QAOA circuit to predict the angles to apply (depth and which angle to output the value). However, no instance-related features are involved in their work.

In our case, we focus on clustering with the goal of proposing many parameter values to try for new QAOA circuits. In contrast to all the approaches we discussed above, we do not use a classical optimization loop after setting them. Hence, our approaches allow balancing between circuit calls of small quantum computers and performances. Such settings for instance naturally occur in divide-and-conquer-type schemes to enable smaller quantum computers to improve optimization [1518], or in Recursive-QAOA [19] as we demonstrate later.

3 Revisiting the concentration property

In contrast to previous related works, we propose unsupervised approaches that also exploit these concentration effects. We take a data-driven approach where from examples of good angles, we will infer new good angles for new instances. Namely, we use clustering in order to obtain clusters that can be used to reduce calls to the quantum device to small numbers (in our case, less than 10) when applying QAOA on new instances, without further optimization.

We take a usual ML approach to this problem. First, from generated instances, we apply exploratory data analysis [21] (EDA) that suggests clustering may be a good approach for recommending good angles to new instances. Namely, we look at the density of angle values and apply t-distributed stochastic neighbor embedding (t-SNE) [22] for visualizing concentration effects. t-SNE is a nonlinear dimensionality reduction technique for mapping high-dimensional data to a lower d-dimensional space (typically \(d\in \{2,3\}\)). Briefly, this method constructs a probability distribution to measure the similarity between each pair of points, where closer pairs are assigned with a higher probability. Then, in the lower-dimensional space \(\mathbb{R}^{d}\), we use a Student t-based distribution to quantify the similarity among the embeddings of the original data points. Finally, the optimal embeddings are chosen by minimizing the Kullback–Leibler divergence between the similarity distributions in the original and the lower-dimensional spaces. We follow by explaining how clustering is used in order to recommend angles for new instances. The approaches we outline differ in input to the clustering algorithm. We consider clustering from the angle values directly but also from instance encodings. Finally, we compare these approaches allowing us to provide recommendations for their usage.

3.1 Data generation

We generated two datasets that show different concentration behavior. The first one consists of 200 Erdős-Rényi graphs for MaxCut problems. The graphs have 10, 12, 14, 16 and 18 nodes. We utilized the following probabilities of edge creation: 0.5, 0.6, 0.7, 0.8. We have generated 10 graphs per number of nodes and probability. The second dataset consists of 100 instances of QUBO problems, specified by their weight matrix Q (20 per aforementioned number of nodes). Their coefficients are sampled uniformly in \([-1,1]\). For the purpose of computing approximation ratios, we are interested in \(C_{\mathrm{opt}}\) – the maximal value of the MaxCut (or QUBO) – over all possible bit configurations, and as a reference, this was computed using brute-force. Our experiments were achieved using a classical simulator.

We then obtained for each problem the best set of angles by running the BFGS optimizer [23] 1000 times for \(p=1,2,3\), and selecting the ones which achieve the best QAOA objective. BFGS with random restarts is deemed a very good optimizer for continuous differentiable functions [24]. These angles are saved as a database and apply unsupervised approaches to learn to set optimal angles for unseen instances. Our approach is clearly optimization method-specific but can be applied to other state-of-the-art optimizers. Different optimizers would give different data (as the optimizers could fail to find the optimal QAOA parameters) but they can be combined and one would select the best set of angles found among all considered.

3.2 Exploratory data analysis

Having obtained the optimal angles, we apply EDA to observe concentration effects. We look at their corresponding performance ratios using the average cost yielded by QAOA for angles γ, β denoted with \(E_{\gamma ,\beta} (C)\). For MaxCut on unweighted Erdős-Rényi graphs, we compute approximation ratios as \(\frac{E_{\gamma ,\beta} (C)}{C_{\mathrm{opt}}}\). This value is upper bounded by 1, which is the optimal value. For QUBOs, we compute optimality gaps \(\frac{C_{\mathrm{opt}} - E_{\gamma ,\beta} (C)}{C_{\mathrm{opt}}} \) as the optima were all negative and the closer to 0, the better. We show boxplots in Fig. 1 the ratios wrt depth. Increasing depth results in better ratios.

Figure 1
figure 1

Violin plots of ratios on MaxCut and optimality gaps over QUBOs (bottom plot) for \(p = 1, 2, 3\). The respective median by depth is 0.802954, 0.827901, 0.840478 for MaxCuts and 0.457434, 0.335144, 0.278984 for QUBOs, illustrating improved performances with increased depth

Next, we looked at the distribution of \(\gamma _{i}\), \(\beta _{i}\) values. Figure 2 shows that the concentration per each parameter is significant since their corresponding density functions are quite peaky. Also, we also observed multiple clusters of angles as the density functions are multimodal. Finally, we applied t-SNE with two components to visualize the angle values in 2D for \(p=2,3\). This highlights potentially a number of clusters for each depth and problem. Note that it may be possible that we may not obtain global optima with these angles, or know if they are unique.

Figure 2
figure 2

Distribution of angle values \(\gamma _{i}\), \(\beta _{i}\) for each depth. Plots a), b) and c) concern MaxCut problems while the others refer to the dense QUBO matrices. We witness concentration effects of the angle values, suggesting the suitability of clustering as an angle setting strategy

We notice that the probability of edge creation, represented by a different color, does not seem to influence the clusters. For dense QUBOs, we observe one important cluster and a few instances that start to form another. Finally, in the dense instances case, we witness a more important spread in angle values at depth 1. This can be explained by differences between instances. Although the concentration effect is present, such order of magnitude will impact the performances of parameter setting strategies, and make an interesting playground to benchmark them.

Using clustering techniques can then reveal potential areas of QAOA angles values where good angles can be found to try on new instances. The angle values related to clusters can be used as recommendations for new instances. This becomes interesting as this enables lowering runtime and allow comparing based on function evaluations, or on the number of quantum circuit calls, in algorithms where QAOA would be used as a subroutine.

4 Clustering-based (unsupervised) learning for angles

As the EDA highlights a clustering effect, we propose different clustering approaches that use different data for angle recommendations. Namely, we describe first using the angle values directly for building clusters serving as angles to try. Then, we switch to using instance-related features. Finally, for the unweighted case, we use graph auto-encoders whose outputs can be used for clustering instead of computing graph features. In the following, we detail each clustering approach for flexible angle recommendation.

4.1 Identifying clusters of angles or problem instances

We first considered clustering using angle values. Given a database of optimal angles for Q problem instances \(\{ I_{1}, \ldots , I_{Q} \}\), \(\{ (\gamma ^{\ast} , \beta ^{\ast})_{1}, \ldots ,(\gamma ^{\ast} , \beta ^{\ast})_{Q} \}\), this can be seen as computing or selecting a good set of angle values the database to apply on new instances. In this case, we do not use the problem instances during clustering. Given a user-specified number of angles to be tested K, this set of angle values is then applied to new QAOA circuits. To specify them, we can use a clustering algorithm on the database \(\{ (\gamma ^{\ast} , \beta ^{\ast})_{1}, \ldots , (\gamma ^{\ast} , \beta ^{\ast})_{Q} \}\). For instance, K-means [25] will output centroids to use directly as angle recommendations for QAOA on new instances. The K-means algorithm aims to partition a set of n data points \(x_{i}\) into K disjoint clusters C, characterized by the mean/centroid of the points within a cluster, denoted \(\mu _{j}\). The partition \(P = \{P_{1}, P_{2}, \ldots , P_{K}\}\) (\(\forall i\neq j \in [1..K]\), \(P_{i} \neq \emptyset \), \(P_{i} \cap P_{j}=\emptyset \), \(\cup _{i} P_{i} = \{x_{i}\}_{i=1}^{n}\)) is chosen by minimizing the within-cluster sum of squares, i.e., \(\operatorname{arg\,min}_{P}\sum_{i=1}^{K}\sum_{x\in P_{i}}||x - \mu _{i}||^{2}\), where the centroid \(\mu _{i} = |P_{i}|^{-1}\sum_{x \in P_{i}}x\). The algorithm iteratively updates the centroids by assigning each data point to its nearest centroid and computing the mean, until convergence.

To incorporate knowledge from instances when recommending angles, we change the data fed to the clustering algorithm. We distinguish computing instance features from learning an embedding, that is a user-defined F-dimensional representation or encoding of the instances as data. We denote an encoding of an instance \(I_{t}\) as \(f(I_{t})\). The angle recommendation framework using a clustering algorithm for such instance representation is presented in Algorithm 1. First, clusters are learned from the encodings extracted from training data. Then, we find the instances in the database that are the closest in distance to the clusters, and their corresponding optimal angles .Footnote 1 The latter are then used for QAOA circuits on new instances, from which we keep the best QAOA output.

Algorithm 1
figure a

K-angle recommendation framework for QAOA

4.2 Instance encodings

In this work, we show two main approaches to encoding the instances for clustering. First, we computed a set of features following [17, 26]. Such features were used in [26] to decide among classical heuristics to solve MaxCut and QUBO problems. Inspired by [26], the features were also used for choosing when to apply QAOA against a classical approximation algorithm [17]. For Erdős-Rényi graphs, we took the graph density, the logarithm of the number of nodes and edges, the logarithm of the first and second-largest eigenvalues of the Laplacian matrix normalized by the average node degree and the logarithm of the ratio of the two largest eigenvalues. For QUBOs, we reduced them to the MaxCut formulation and used the logarithm of the number of nodes, and the weighted Laplacian matrix eigenvalues-based features.

We also show how to use graph embeddings using Graph Neural Networks (GNNs) [27], avoiding the need for the user to have to compute the features. We employ the Variational Graph Auto-Encoders (VGAE) [28]. This technique only works on unweighted graphs by its design principle. Consequently, we only applied it to the MaxCut instances later in this work. a VGAE learns latent embeddings \(\mathbf{Z} \in \mathbb{R}^{N\times F}\) where F is the dimension of the latent variables and N the number of nodes. Given the adjacency matrix A and nodes feature vector X, the model outputs the parameters of a Gaussian distribution μ, σ for the latent representation generation. We feed to the model the Erdős-Rényi graphs, and we add as node features the degree of the nodes. Once learning is completed, we compute the embeddings by a common average readout operation [27, 29]. The latter operation can be defined as averaging the node embeddings for a graph with vertex set \(\mathcal{V}\) \(\frac{1}{|\mathcal{V}|}\sum_{n\in \mathcal{V}}Z_{n}\). This allows having a fixed dimension F for the encoding to be used by a clustering algorithm.

Having defined different strategies for clustering, we apply them to the data we generated and compare their performances. In the following section, we present our results obtained by taking a Machine Learning approach, starting from a simple baseline and cross-validating each method.

4.3 Results

In this section, we apply the above-mentioned proposed strategies to the generated data where EDA revealed different areas of concentration. As the first baseline for angle setting strategy, we experiment with simple aggregation of angle values (median and average). Then we follow this up by K-means by varying the number of clusters from 3 to 10 as the underlying clustering algorithm. Finally, we change the K-means data to cluster based on instance encodings instead of angle values. We computed first a set of graph features that were used in a previous study [30]. Then we investigate graph autoencoders to learn the encodings of the Maxcut instances. We cross-validate each method using 5-fold cross-validation where we report the ratios \(\frac{(C_{\mathrm{opt}} - E_{\gamma ,\beta} (C))}{(C_{\mathrm{opt}}- E^{\mathit{cluster}}_{\gamma ,\beta} (C))} \) on test instances. A value higher than 1 would mean that the average cost yielded by clustering has improved over the one found by optimization. We also consider the case where one trains on smaller instances to apply to the bigger ones.

4.3.1 From angle values

As simple baseline, we compute the average and the median of the optimal angles from the database \(\{ (\gamma ^{\ast} , \beta ^{\ast})_{1}, \ldots ,(\gamma ^{\ast} , \beta ^{\ast})_{Q} \}\). From depth-aggregated results, averaging the angle values yielded a median ratio of 0.524 for MaxCut and 0.672 for QUBOs, while taking the median values increased it to respectively 0.950 and 0.941. This can be explained by the fact that the median value is statistically more robust than the mean when handling data sets with large variability.

As expected with K-means, increasing the number of clusters yielded better median ratios. With \(K=10\), the median ratios are 0.998 and 0.985 on each dataset, a less than 1–2% reduction in performances w.r.t. the optimal angles. Figure 4 shows the improvement with increased number of clusters. We observe also that with increased depth, median ratio performances are reduced. We conjecture that, when the dimension of the parameter space increases, more clusters are naturally needed to ensure a sensible recommendation.

Also, such a deterioration of performance w.r.t. circuit depth is more substantial on the QUBO instances than on the MaxCut ones, which can be explained by the clustering patterns in the MaxCut scenario being more significant and regular (Fig. 3). In addition, this observation suggests that for future work, for dense QUBO instances where the cluster center is not representative for all points pertaining to it, it is more reasonable to take a supervised learning method, which takes the problem instance as input as predicts the optimal angle values.

Figure 3
figure 3

2D angles visualization \(\gamma _{i}\), \(\beta _{i}\) for each depth. Plots a), b) and c) concern MaxCut problems while the others refer to the dense QUBO matrices. For \(p=2,3\), t-SNE is applied for projecting the angle values to 2D. Different areas of concentration are revealed again. We use different colors for differentiating the probability of edge creation of the Erdős-Rényi graphs, showing no correlation with clusters

Figure 4
figure 4

Boxplot visualization of ratios to optimal angles’ expectation value per clustering method and depth on MaxCut (a) and dense QUBOs (b), when using the angle values. We show also the boxplots when computing the median angle values, yielding a median ratio of 0.524307 for MaxCut and 0.671572 for QUBOs. The median ratios are respectively 0.990579, 0.995610 and 0.998293 for 3, 5 and 10 clusters. For QUBOs, we get 0.956099, 0.970842, and 0.984787 taking the same number of clusters. With reference to the optimal angles’ expectation value, this corresponds on average to a less than 1–2% reduction in performances when using 10 clusters. For MaxCut, we had to use the closest data point in the dataset to the cluster, as it results in better performances. For instance, with 3 clusters at \(p=3\), the median ratio was 0.618275

We also observed that, for the MaxCut problem, the cluster centroid of K-means can be quite distant from the data points when the number of clusters is small and the circuit depth is high. Particularly, this phenomenon deteriorates the median ratio by ca. 30% for 3 and 4 clusters with \(p=3\). Hence, we decided to take the closest data point to the centroid in each cluster as the recommendation, which solves this issue. For QUBOs, using the cluster centroids directly yields better results.

Overall, increasing the number of angles attempted will improve the quality of the QAOA output. Clearly, the results with less than 4 clusters present examples where the ratio is low, worsening the median performances. For instance, with 3 clusters on QUBOs, the median ratio is 0.915. In the context where the budget of quantum circuit calls is very limited, this could be problematic and call for more robust approaches. To this end, we consider using instance features for clustering.

4.3.2 From instance encodings

To witness whether using instance features can improve the quality of clustering, we divided the ratios obtained with instance features by the ones using angle values. We show these results in Fig. 5 and Fig. 6 where we can clearly see better ratios with less than 4 clusters, and similar results on average otherwise.

Figure 5
figure 5

Boxplot plot visualization of ratios to optimal angles’ per clustering method and depth on MaxCut (a) and dense QUBOs (b), when using instance features. For Erdős-Rényi graphs, K-means yielded ratios 0.996214, 0.996368 with 3, 4 clusters and 0.998429 with 10. On dense QUBOs, we obtained respective median ratios of 0.963129, 0.971778 and 0.982964. With reference to the optimal angles’ expectation value, this corresponds on average to a less than 1–2% reduction in performances when using 10 clusters

Figure 6
figure 6

Boxplot of ratios comparing K-means with instance features against angle values on MaxCut (a) and QUBOs (b). A value higher than 1 (highlighted by a horizontal line) means using instance features results in better QAOA objective. We see an overall improvement with 3, 4 clusters mainly at \(p=3\)

As for learned encodings or embeddings with auto-encoders, the GNN model configuration we use is the same two-layer graph convolutional layer as [28]. Namely, the first one has 32 output-dimension using the ReLU activation function. This is followed by two 16-dimensional output layers for the generation of the latent variables. We train using Adam with a learning rate of 0.01 for 100 epochs and batch size set to the dataset size. Our implementation uses the Deep Graph Library (DGL) [29]. The embeddings obtained by averaging are of dimension \(F=16\). This allows having a fixed dimension for the encoding as input of the same K-means strategy described above. We observe in Fig. 7 that the results are similar to the ones obtained using instance features. Yet, in some instances, we see better results. Hence, many clustering results can be combined to improve the performances in ratios canceling each other weaknesses at the cost of trying more angles to find the best ones. As future work, we could also decide which heuristic to use depending on a given test instance by using a ML model.

Figure 7
figure 7

Boxplot visualization of ratios on MaxCut obtained using Variational Graph Auto-Encoders compared to using instance features. A value higher than 1 means using VGAE results in a better QAOA objective. Overall, performances are similar as the ratios are close to 1 on average

Finally, our approaches can save numerous circuit calls compared to de novo optimization. The median numbers of circuit calls for the BFGS runs giving the best QAOA angles were 56, 150, 320 for each depth respectively on MaxCut and 44, 132, 252 for QUBO, while in the cluster approach, the number of calls is always the cluster size, which is considerably smaller than the cost of BFGS. Instance size does not seem to affect the number of circuit calls by BFGS. In our approaches, we limited circuit calls to 10 and we do not need multiple restarts.

4.4 Aggregating results

Following the presentation of the different clustering approaches, we compare their performances to determine which approach works best. We propose to take the Empirical cumulative distribution functions (ECDF) of the ratios as the performance measure to compare those different approaches. Given a sample \(\{r_{i}\}_{i=1}^{R}\) of the ratios and a value of interest \(t\in [0, 1]\), ECDF is the fraction of the sample points less or equal to t: \(F(t) = \frac{1}{R} \sum_{i} \mathbf{1}_{[0, r_{i}]}(t)\), where 1 denotes the indicator function, which returns one only if \(t\in [0, r_{i}]\) and zero otherwise. They enable us to aggregate the results of the different numbers of clusters and depths. A better method will have more proportion of higher ratios, resulting in an ECDF curve located more to the right. From Fig. 8, we observe that using instance encodings is more successful in yielding better angles than using the angle values. This is also witnessed in Fig. 9 with increased depth and a low number of clusters. Also, VGAE seems to be slightly better than instance features on the MaxCut problems. However, these methods can complement each other, especially as we do not need to increase dataset size. Hence, combining them at the cost of circuit calls becomes an option for running QAOA, as we showcase with RQAOA in the next section.

Figure 8
figure 8

Empirical cumulative distribution functions of ratios to optimal angles’ for all depths and number of clusters. A lower curve for an approach means better results when using it aggregating depths and number of clusters. We see for MaxCut (a) and QUBO problems (b) that instance features achieve better results, VGAE being competitive with instance features. When using 3 clusters, using VGAE on MaxCut instance and instance features for QUBOs lead to better ratios

Figure 9
figure 9

Empirical cumulative distribution functions of ratios to optimal angles per method and depth. The lower the curve, the better the method. In most cases, the curve corresponding to instance features was lower (except for QUBOs (b) at \(p=1\), and VGAE’s curve was more competitive at \(p=2\) for MaxCut (a)). This was also the case when using 3 clusters

4.5 Case when test instances are bigger than training instances

One important consideration of these methods is to analyze scaling. This is relevant in settings where one is interested in solving larger instances given small ones. In our case, we apply these approaches in the case \(K=3\) by a 60–40% train-test split. From Fig. 10 and 11, we find similar conclusions with respectively VGAE on MaxCut and instance features on the QUBO problems yielding better results. Note that we did not use the logarithm of the number of nodes and edges as features when using instance features as the values between training and test are too different.

Figure 10
figure 10

Boxplot of ratios comparing K-means \(K=3\) using instance features against angle values on MaxCut (a) and QUBOs (b). The ratios are obtained from 40% of the instances with the highest number of nodes. From depth-aggregated results, on MaxCut, using the median values gives a median ratio of 0.859316, 0.928519 with angle values, 0.976959 with instance features, and 0.981618 using VGAE. On QUBOs, we obtained respectively 0.926136 for the median of angle values, 0.936679 clustering with angle value, and 0.963677 with instance features

Figure 11
figure 11

Empirical cumulative distribution functions of ratios to optimal angles’. The ratios are obtained from 40% of the instances with the highest number of nodes. Similar results to Fig. 8 and Fig. 9 are obtained

5 Demonstration with RQAOA

RQAOA [19] is a recursive algorithm where, given an Ising problem \(\sum_{i,j} w_{ij} Z_{i} Z_{j} \), one starts by applying QAOA on the former. the quantum state output \({ \vert {\boldsymbol{\gamma },\boldsymbol{\beta }} \rangle }\) is then used to compute correlations \(M_{ij} = { \langle{\boldsymbol{\gamma },\boldsymbol{\beta }} \vert } Z_{i} Z_{j} { \vert {\boldsymbol{\gamma },\boldsymbol{\beta }} \rangle } \). Then, variable elimination is carried out by selecting a pair of variables satisfying \((i_{l},j_{l}) = \operatorname{arg\,max} |M_{ij}|\), and substituting \(Z_{j_{l}}\) with \(\operatorname{sign}(M_{i_{l}, j_{l}})Z_{i_{l}}\) in the Ising formulation. This reduces the number of variables by 1. We then get a new reduced problem and we reiterate the procedure for a number of user-defined number of iterations. The choice of iteration fixes the size of the final instance which is then solved using a brute-force (or some other classical) approach, and the substitutions are used onto it to obtain a final solution.

As RQAOA requires optimizing many QAOA instances that iteratively shrink in sizes, we demonstrate the application of our clustering approaches in this context. We do so for the MaxCut problems where we limit the number of iterations to half of the size of the Erdős-Rényi graphs. We do not consider the dense QUBOs as RQAOA would reduce an original dense graph to non-dense intermediate subproblems not part of the database. As per the number of QAOA parameters attempted per iteration, we limit it to 3 and apply the three clustering approaches: angle-value, instance features, and VGAE-output based. We do so by using our previous database and training each method on all instances to get 3 QAOA parameter recommendations. The latter are then used for QAOA on the RQAOA generated instances.

Figure 12 shows that with the three approaches, we obtain a median 0.94117 approximation ratio with RQAOA. The minimal ratio obtained is 0.8367 and the optima were found on 33 instances. When looking at each method independently, we observe that the angle-value clustering performances at \(p=3\) are lower than the others. This is due to the fact that we use the K-means clusters directly as it allowed us to find more instances with a ratio of 1. Graph features and VGAE seem similar in performance, with a small advantage at depth 2 for VGAE. Looking at the frequencies where the best ratio by instance was obtained, VGAE is more successful. Respectively, each method achieves the best-found ratios over 88, 118, and 165 instances. Finally, we also tried using random angles, by sampling uniformly values in \([0, 2\pi ]^{p}\), and optimizing further the angles from each approach with BFGS up to 100 iterations maximum. We clearly see better performances with clustering approaches compared to random angles. This is also the case when using BFGS (starting with random angles) limited to 3 circuit calls when optimizing, the same budget as our clustering-based approaches. Dividing the MaxCut ratios obtained with BFGS with the ones without further optimization yielded a median value of 1. Hence, the results were similar to the BFGS-optimized approaches, saving many circuit calls.

Figure 12
figure 12

The violin plot visualization represents ratios obtained on MaxCut using all three unsupervised approaches, using just 3 circuit calls per RQAOA iteration, without optimizing further with BFGS. A median ratio of 0.94117 was obtained. The boxplots represent the MaxCut ratios obtained using each approach. We added using random angles per iteration as a baseline as well as using BFGS (starting with random angles) with a budget of 3 angle values attempted during optimization, and we witness clustering approaches yielded better results. When dividing the ratios of the methods with the ones obtained by adding BFGS, we obtain a median ratio of 1, meaning we saved many circuit calls for similar results with clustering

To conclude, our unsupervised approaches can be used to run quantum algorithms where QAOA is used as a subroutine. They are then considered as hyper-parameters that can be tweaked to achieve better performances for QAOA-featured algorithms, depending on a user-defined budget definition. In our RQAOA showcase, the maximal depth of QAOA, as well as the number of parameters to try at each iteration, was set to 3, and optimizing further did not improve. For MaxCut on Erdős-Rényi graphs, leveraging VGAE in RQAOA achieved the best ratios over 82.5% of the instances.

6 Discussion

In this work, we study different strategies for fixing the parameters of QAOA based on unsupervised learning. We focused on clustering given previous works highlighting the concentration property and exploratory data analysis of the best angles found for MaxCut on Erdős-Rényi graphs and dense QUBOs. We however use a methodology closer to machine learning by cross-validating compared to related work.

Furthermore, we demonstrated that these techniques can be leveraged to restrict the number of QAOA circuit calls to small numbers (less than 10) with a less than 1–2% reduction in approximation ratio on average from the best angles found when cross-validating. We also showed how to compare different clustering strategies and that leveraging instance encodings (by computing features or computing them with a model, in our case a VGAE) for angle setting strategies yields better results than using angle values only. Although the VGAE embedding-based is quite competitive, we recommend using the simpler instance features in practice since the VGAE brings extra computation overhead. For generalization, in regard to the problem scale, both instance features- and VGAE-based clustering approaches manage to retain the performance for unseen problem instances larger than the training set. For dense QUBOs, increasing the clusters is less impactful compared to MaxCut, in which we conjecture that the clusters in QUBO are of large spread and less separable, hindering the performance of the clustering approach in higher dimensions. For both problems, it is necessary to increase the cluster number to retain a good performance when the circuit becomes deeper.

From an application perspective, we envision these techniques to be employed in algorithms where QAOA is run on a small part of the problem to solve such as divide-and-conquer [15, 16] and iterative algorithms [1719]. Restricting to a few numbers of circuit calls will help decrease the runtime of quantum-featured or quantum-enhanced algorithms, making them closer to competing with classical heuristics. We showcased our approach in the context of Recursive QAOA as hyperparameters under a limited budget (QAOA depth and number of QAOA parameters per iteration limited to 3), where we were able to achieve a 0.94 median approximation ratio. With our approaches, we obtain quite comparable performance to the case where we extensively optimize the angles, hence saving numerous circuit calls.

For future work, other clustering techniques can be studied and extended to predict the angle values by instance in a semi-supervised approach, and for different problem instances. Plus, ML can be used to decide which heuristic to use depending on a given test instance. We also did not apply GNN to the dense QUBOs as graph autoencoders are mostly applied to unweighted graphs. Using VGAE that can reconstruct graph adjacency and node features is then another research direction. Since we use unsupervised methods, we expect the same methodology to be used on noisy hardware. Studying different approaches to resilience under different noisy settings would be also considered of main interest. Finally, these approaches can be studied within different QAOA-featured algorithms and under different settings (depth of QAOA, number of clusters, Ising instances properties to name a few).

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.


  1. Since the clustering algorithm outputs encodings that do not contain QAOA angle information, we use the QAOA angles of the closest training instances to the clusters.



Combinatorial Optimization


Quantum Approximate Optimization Algorithm


Empirical cumulative distribution functions


Variational Graph Auto-Encoders


Quadratic Unconstrained Binary Optimization


Exploratory Data Analysis


  1. Preskill J. Quantum Computing in the NISQ era and beyond. Quantum. 2018;2:79.

    Article  Google Scholar 

  2. Moll N, Barkoutsos P, Bishop LS, Chow JM, Cross A, Egger DJ, Filipp S, Fuhrer A, Gambetta JM, Ganzhorn M, Kandala A, Mezzacapo A, Müller P, Riess W, Salis G, Smolin J, Tavernelli I, Temme K. Quantum optimization using variational algorithms on near-term quantum devices. Quantum Sci Technol. 2018;3(3):030503.

    Article  ADS  Google Scholar 

  3. Benedetti M, Lloyd E, Sack S, Fiorentini M. Parameterized quantum circuits as machine learning models. Quantum Sci Technol. 2019;4(4):043001.

    Article  ADS  Google Scholar 

  4. Farhi E, Goldstone J, Gutmann S. A Quantum Approximate Optimization Algorithm. 2014. arXiv:1411.4028.

    MATH  Google Scholar 

  5. Farhi E, Harrow AW. Quantum supremacy through the Quantum Approximate Optimization Algorithm. 2016. arXiv:1602.07674.

    Google Scholar 

  6. Brandão FGSL, Broughton M, Farhi E, Gutmann S, Neven H. For Fixed Control Parameters the Quantum Approximate Optimization Algorithm’s Objective Function Value Concentrates for Typical Instances. 2018. arXiv:1812.04170.

  7. Zhou L, Wang S-T, Choi S, Pichler H, Lukin MD. Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on Near-Term Devices. 2018. arXiv:1812.01041.

  8. Khairy S, Shaydulin R, Cincio L, Alexeev Y, Balaprakash P. Learning to optimize variational quantum circuits to solve combinatorial problems. In: Proceedings of the AAAI conference on artificial intelligence 34(03). 2020. p. 2367–75.

    Chapter  Google Scholar 

  9. Lee X, Saito Y, Cai D, Asai N. Parameters fixing strategy for quantum approximate optimization algorithm. In: 2021 IEEE international conference on quantum computing and engineering (QCE). 2021. p. 10–6.

    Chapter  Google Scholar 

  10. Galda A, Liu X, Lykov D, Alexeev Y, Safro I. Transferability of optimal QAOA parameters between random graphs. 2021. arXiv:2106.07531.

  11. Sauvage F, Sim S, Kunitsa AA, Simon WA, Mauri M, Perdomo-Ortiz A. FLIP: A flexible initializer for arbitrarily-sized parametrized quantum circuits. 2021. arXiv:2103.08572.

  12. Akshay V, Rabinovich D, Campos E, Biamonte J. Parameter concentrations in quantum approximate optimization. Phys Rev A. 2021;104:010401.

    Article  ADS  MathSciNet  Google Scholar 

  13. Streif M, Leib M. Training the Quantum Approximate Optimization Algorithm without access to a quantum processing unit. 2019. arXiv:1908.08862.

    Google Scholar 

  14. Crooks GE. Performance of the Quantum Approximate Optimization Algorithm on the maximum cut problem. 2018. arXiv:1811.08419.

    Google Scholar 

  15. Li J, Alam M, Ghosh S. Large-scale quantum approximate optimization via divide-and-conquer. 2021. arXiv:2102.13288.

    Google Scholar 

  16. Guerreschi GG. Solving Quadratic Unconstrained Binary Optimization with divide-and-conquer and quantum algorithms. 2021. arXiv:2101.07813.

  17. Moussa C, Wang H, Calandra H, Bäck T, Dunjko V. Tabu-driven quantum neighborhood samplers. In: Zarges C, Verel S, editors. Evolutionary computation in combinatorial optimization. Cham: Springer; 2021. p. 100–19.

    Chapter  Google Scholar 

  18. Shaydulin R, Ushijima-Mwesigwa H, Safro I, Mniszewski S, Alexeev Y. Quantum local search for graph community detection. In: APS March meeting abstracts. APS meeting abstracts. vol. 2019. 2019. p. 42–009.

    Google Scholar 

  19. Bravyi S, Kliesch A, Koenig R, Tang E. Obstacles to state preparation and variational optimization from symmetry protection. 2019. arXiv:1910.08980.

    Google Scholar 

  20. Kochenberger GA, Glover F. A unified framework for modeling and solving combinatorial optimization problems: a tutorial. In: Multiscale optimization methods and applications. Boston: Springer; 2006. p. 101–24.

    Chapter  MATH  Google Scholar 

  21. Hinterberger H. Exploratory data analysis. In: Encyclopedia of database systems. Boston: Springer; 2009. p. 1080–.

    Chapter  Google Scholar 

  22. van der Maaten L, Hinton G. Visualizing data using t-sne. J Mach Learn Res. 2008;9(86):2579–605.

    MATH  Google Scholar 

  23. Broyden CG. The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J Appl Math. 1970;6(1):76–90.

    Article  MATH  Google Scholar 

  24. Hansen N, Auger A, Ros R, Finck S, Posík P. Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009. In: Pelikan M, Branke J, editors. Genetic and evolutionary computation conference, GECCO 2010, proceedings, companion material. July 7-11, 2010. Portland, Oregon, USA. New York: ACM; 2010. p. 1689–96.

    Chapter  Google Scholar 

  25. Lloyd S. Least squares quantization in pcm. IEEE Trans Inf Theory. 1982;28(2):129–37.

    Article  MathSciNet  MATH  Google Scholar 

  26. Dunning I, Gupta S, Silberholz J. What works best when? A systematic evaluation of heuristics for max-cut and qubo. INFORMS J Comput. 2018;30(3):608–24.

    Article  MATH  Google Scholar 

  27. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M. Graph neural networks: a review of methods and applications. 2018. arXiv:1812.08434.

    Google Scholar 

  28. Kipf TN, Welling M. Variational graph auto-encoders. NIPS workshop on Bayesian deep learning. 2016.

    Google Scholar 

  29. Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X, Zhou J, Ma C, Yu L, Gai Y, Xiao T, He T, Karypis G, Li J, Zhang Z. Deep graph library: A graph-centric, highly-performant package for graph neural networks. 2019. Preprint. arXiv:1909.01315.

  30. Moussa C, Calandra H, Dunjko V. To quantum or not to quantum: towards algorithm selection in near-term quantum optimization. Quantum Sci Technol. 2020;5(4):044009.

    Article  ADS  Google Scholar 

Download references


CM and VD acknowledge support from TotalEnergies.


This work was supported by the Dutch Research Council (NWO/OCW), as part of the Quantum Software Consortium programme (project number 024.003.037). This research is also supported by the project NEASQC funded from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 951821).

Author information

Authors and Affiliations



CM, HW and VD designed all the experiments. The manuscript was written with contributions from all authors. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Charles Moussa or Vedran Dunjko.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moussa, C., Wang, H., Bäck, T. et al. Unsupervised strategies for identifying optimal parameters in Quantum Approximate Optimization Algorithm. EPJ Quantum Technol. 9, 11 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: