Efficient realization of quantum algorithms with qudits

The development of a universal fault-tolerant quantum computer that can solve efficiently various difficult computational problems is an outstanding challenge for science and technology. In this work, we propose a technique for an efficient implementation of quantum algorithms with multilevel quantum systems (qudits). Our method uses a transpilation of a circuit in the standard qubit form, which depends on the parameters of a qudit-based processor, such as their number and the number of accessible levels. This approach provides a qubit-to-qudit mapping and comparison to a standard realization of quantum algorithms highlighting potential advantages of qudits. We provide an explicit scheme of transpiling qubit circuits into sequences of single-qudit and two-qudit gates taken from a particular universal set. We then illustrate our method by considering an example of an efficient implementation of a $6$-qubit quantum algorithm with qudits. We expect that our findings are of relevance for ongoing experiments with noisy intermediate-scale quantum devices that operate with information carrier allowing qudit encodings, such as trapped ions and neutral atoms as well as optical and solid-state systems.


I. INTRODUCTION
Progress in engineering coherent quantum many-body systems with a significant degree of control makes it realistic to study properties of exotic quantum phases [1][2][3][4][5][6] and to prototype quantum algorithms [7][8][9][10].One of the key issue in the future scaling of such systems is preserving their coherent properties when the system size is increased.Existing prototypes of quantum computing devices are based on various physical platforms, such as superconducting circuits [5], semiconductor quantum dots [11][12][13], trapped ions [3,6,9], neutral atoms [1,2,4], photons [14,15], etc.The use of such objects as twolevel systems (qubits) in many cases is an idealization since underlying physical systems are essentially multilevel.The idea of using additional levels of quantum objects for realizing quantum algorithms is at the heart of qudit-based quantum information processing.This approach has been widely studied last decades [16,17], both theoretically and experimentally .The most recent result is the pioneering realization of universal multi-qudit processors with trapped ions [61][62][63], superconducting [59,64] and optical systems [65].
Although manipulating with additional levels faces additional challenges, recent experiments show dramatic progress in increasing the fidelities of qudit operations and making them comparable with the ones for qubits.In particular, high-fidelity qutrit CZ and CZ † gates, with estimated process fidelities of 97.3(1)% and 95.2(3)%, respectively, have been recently demonstrated in Ref. [59].Also with superconducting systems, fidelity of 97.7% for two-qutrit CPHASE gate have been achieved [64].Two qudit MVCXd gate on two photonic ququarts has been implemented with fidelity 95.2% in [65].For the trapped ion platform, on which a qudit processor with 8-level qudits was developed, two-qutrit gate fidelity 97.5(2)% has been achieved [61].Remarkably, 8-level qudits are controlled by a single laser acousto-optic modulator (AOM) as reported in Ref. [61] Quantum algorithms within the digital quantum computing model can be presented as qubit-based circuits, so there are several approaches for processing them using qudits.First of all, qudits can be decomposed of a set of qubits [19-21, 36, 37, 66].This approach may decrease the cost of realizing quantum algorithms by replacing some two-qubit operations requiring interaction between distinct physical objects by single-qudit ones, which do not require an interaction between distinct physical objects.However, this method is not universal in the sense that the total number of operations strongly depends on the mapping, i.e. the way how qubits are encoded in qudits.As we demonstrate below, specific mappings applied to specific qubit circuits may even lead to a substantial increase in the number of operations in comparison with the standard qubit-based approach.Second, higher qudit levels can be used for substituting ancilla qubits [30,41,67,68].This is especially important for decomposing multiqubit gates, such as the generalized Toffoli gate.In particular, an additional (third) energy level of a transmon qubit has been used in the experimental realization of Toffoli gate [47] (see also Refs.[69,70]), which is a key primitive of many quantum algorithms, such as Shor's and Grovers's algorithms.While existing quantum computing schemes that are based on qubits platform benefit from several approaches for the realization of quantum algorithms, which require compilers, transpilers, and optimizers, qudit-based quantum computing remains described mostly at the level of logic operations [71].
In the present work, we propose a technique for an efficient realization of qubit-based quantum algorithms, which employs the combination of two aforementioned approaches for the use of additional levels of qudits.The crucial element of our method is a transpilation of a qubit circuit, which depends on the parameters of an accessible qudit-based processor (e.g., number of levels and fidelity of operations).As a result, one obtains qubitto-qudit mapping and comparison to the standard qudit realization.A qudit circuit can be executed via quantum processors or classical emulators, and corresponding outcomes can be further post-processed in order to be interpreted as results of an algorithm.Clearly, due to exponential complexity, classical emulation is possible only in the case of low-width or low-depth circuits.We develop an explicit scheme of transpiling qubit circuits into sequences of single-qudit and two-qudit gates taken from a particular universal set, which can be different for quantum processors based on various physical platforms.We provide an illustrative example of the qudit-based transpilation for a six-qubit quantum circuit, where we demonstrate the main features of our approach.We also discuss types of quantum algorithms, where the developed approach can show the greatest improvement compared with a straightforward qubit-based implementation.
The paper is organized as follows.In Sec.II, we revise the basic principles of quantum computing with qubits.In Sec.III, we discuss the general approach for implementing qubit circuit on qudit-based processors.In Sec.IV, we provide a concrete realization of a qudit-based transpiler.In Sec.V, we present an example of applying the developed approach for realizing 6-qubit circuits with four 4-level qudits.In Sec.VI, we discuss the scalability of the developed approach and its most promising use cases.Finally, we conclude in Sec.VII.

II. QUBIT-BASED APPROACH
The essence of qubit-based quantum computation is applying a unitary operator U qb circ to a set of n two-level particles (qubits), initialized in the fixed state |0⟩ ⊗n , and the measuring the resulting state in the computational basis to obtain a sample from the following distribution: . ( Here we denote computational basis states of qubits as |0⟩ and |1⟩, |x⟩ ≡ |x 0 ⟩ ⊗ . . .⊗ |x n−1 ⟩, and x = (x 0 , . . ., x n−1 ) ∈ {0, 1} n .Commonly, the same circuit is executed several times, which results in a sequence of independent and identically distributed (i.i.d.) random n-bit strings (x (1) , . . ., x (N ) ), where N is the number of samples, and each sample x (i) is obtained from distribution (1).The operator U qb circ is originally represented in the form of a sequence of some standard unitary operators (gates) U qb i constituting hardware-agnostic (idealized) circuit circ qb .For applying U qb circ to real physical objects, an additional transpilation step of decomposing U qb circ to native (usually, single-qubit and two-qubit) operations is required [71][72][73].One of DiVincenzo's criteria [74] to quantum processors is the requirement to realize a universal set of gates that allows obtaining an efficient approximation of an arbitrary unitary operation up to a predefined accuracy.Although multiqubit processors based on various physical principles have been demonstrated, the problem limited quality of quantum operations restricts the computational capabilities of such systems [75].A particular issue is the realization of high-quality two-qubit quantum operations that require interactions between quantum information carriers.Another important factor that has to be taken into account, is the restricted coupling map of information carriers, which represents the opportunity to implement two-body interactions.This issue can be overcome by adding additions SWAP operations.However, this problem is beyond the scope of our work, and further we suppose that the quantum processor has an all-to-all coupling map.

III. QUANTUM COMPUTING WITH QUDITS
The idea of using of qudits, i.e. d-level quantum systems with d > 2, have been widely considered in the context of quantum information processing [18][19][20][21][22][23][24][25][26][27][28][29][30][31].Clearly, an m-qudit system can be used in order to obtain the same result as in the case of qubit-based computing -obtaining the number of samples coming from the distribution determined by an n-qubit circuit, but potentially with fewer resources, e.g.smaller number of information carriers and/or operations.The dimension of qudits and their number has to be compatible with the given n-qubit circuit.In what follows, we assume that A specific question that we are interested in is a transpilation of a circuit given in the qubit form depending on the parameters of a qudit-based processor.A scheme of our approach is presented in Fig. 1.It consists of the three following stages: (i) qubit-to-qudit circuit transpilation, (ii) circuit execution, and (iii) classical post-processing of the measurement results.We note that stages (i) and (iii) are performed with a classical computer, while stage (ii) is realized with an accessible qudit-based processor or its classical software emulator.
The input for our scheme is a hardware-agnostic qubit circuit circ qb , necessary runs number N , and the general information about the accessible qudit-based processor, specifically, the number of qudits m, their dimension d, and the set of native gates (usually, it consists of singlequdit gates and a set of two-qudit gates within a certain connectivity graph indicating the possibility of direct realization of a two-qubit/two-qudit operation).The output of the transpilation step is an 'optimized' qudit circuit circ qd opt , which is a sequence of native qudit gates, and an 'optimized qubit-to-qudit mapping' that is an injective function assigning a qudit's computational basis state to each of the qubits' ones.The general idea is that running of circ qd opt and processing the output measurement outcomes according to ϕ opt provides bitstrings equivalent to ones obtained after running circ qb on a standard qubit-based quantum processor (we formulate the rigorous consistency condition below).The term optimized appears here because various qubit-to-qudit mappings, which are assignments between qubits' and qudits' levels, result in different qudit-based circuits that are equivalent to the input qubit-based circuit under a particular mapping.In this way, the goal of the qudit-based transpiler consists not only in transforming qubit gates of circ qb to native qudit ones but also in finding a favorable mapping such that the realization of the resulting qudit-based circuit is beneficial over the straightforward realization of circ qb on a qubit-based processor.We note that the optimized mapping depends both on the input circuit (it can be different for different circuits) and the architecture of the accessible qudit-based processor.
The desirable characteristics of the mapping can be defined in different ways.Below, we consider a particular implementation of a qudit-based transpiler, where we use the number of two-qudit interactions as the main figure of merit for quantifying the performance of the transpilation (see Sec. IV).The reason for this is that usually, two-body gates are the main source of errors during the process of executing quantum circuits.Nevertheless, alternative metrics, such as circuit depth or resulting fidelity estimation, can be used.
To achieve the goal of reducing the number of twobody gates in going from qubits to qudits, two main techniques, as well as their combination, can be implemented.The first technique [19-21, 36, 37], employs a qudit's d-dimensional space for embedding several qubits (the technique works for d ≥ 2 m ′ with m ′ > 1).Its main advantage is the possibility to reduce the number of employed physical information carriers (e.g., particles, such as atoms or ions).However, as we show in Sec.IV, this method is not universal in the sense that the total number of operations strongly depends on the mapping, i.e. the way how qubits are embedded in qudits.It appears that the cost of the realization of two-qubit operations between two qubits inside one qudit may be close to a couple of single-qudit operations, since it does not require any interaction between distinct physical particles.In contrast, in the realization of a two-qubit operation between qubits belonging to different qudits, additional entangling operations are required to presume the state of other qubits inside these qudits but untouched by the two-qubit gate.
The second technique is to use 'upper' qudit levels (|a⟩, a ≥ 2) for substituting ancillary qubits within standard multiqubit gates decompositions [30,41,46,67,68].This approach allows decreasing both the number of required two-body interactions (entangling gates) and the number of employed quantum information carriers by removing the necessity of ancillary qubits and is useful in the case of quantum circuits containing multiqubit gates.We would like to note that these two approaches can be combined in the case of d > 2 t for some t ≥ 2: The first 2 t levels of a qudit can be used for embedding ⌊log 2 d⌋ qubits, while the remaining ones can be used for subsisting ancillas.
There are two main aspects regarding the qudit-based transpilation.The first is related to the possibility of realizing qudit gates.As for qubits, a universal set of gates can be composed of arbitrary single-qudit gates, supplemented with a two-qudit entangling gate of a particular type.One of the approaches for making this two-qudit gate is to employ the original two-qubit gate (used within the qubit-based architecture), yet considered in the full qudit state space.We note that this approach has been successfully demonstrated in experiments with trapped ions, and it has been shown that the resulting gate fidelities are comparable with the ones for corresponding qubit-based architectures [61,62].
The second aspect is related to finding an appropriate qubit-to-qudit mapping.In the case of small-and intermediate-scale circuits, one may use an exhaustive search through all possible mappings.However, this approach requires significant classical computational resources for large-scale circuits.In this case, one is sufficed to find a mapping that is not the best possible one, but still gives the lower number of two-body gates compared to the standard qubit implementation (or gives a higher fidelity).If the number of available qudits m is not less than the number of qubits in the input circuit n, then it can be assured that the number of two-qudit gates in the resulting qudit circuit does not exceed the number of two-qubit gates in the input circuit.This follows from the fact that there is a trivial mapping, where each qubit is embedded in its own qudit.In the qutrit case (d = 3) and m ≥ n, there is no problem with searching for the appropriate mapping: One can employ n qutrits, each used as a qubit plus the ancillary state.For more complex embeddings of qubits in qudits we describe several approaches of the optimized mapping finding algorithms in Sec.IV A. The comparison between the number of two-body gates for the best-found mapping with the number of two-body gates for the straightforward qubit-based implementation can serve as a benchmark for the efficiency of the qudit-based transpilation process and could be placed in the supporting information.
Let us back to the description of the main stages of running the qubit-based circuit with the qudit-based processor shown in Fig. 1.At stage (ii), the qudit circuit circ qd opt is the input for the qudit-based processor (or emulator) that applies the gates from circ qd opt to the qudit register initialized in the state |0⟩ ⊗m , where we use {|l⟩} d−1 l=0 to denote computational basis states of each qudit.The resulting qudit state is measured in the computational basis, and a sample from the following distribution is obtained: where U qd circ is the resulting qudit unitary operator, y = (y 0 , . . ., y m−1 ), y i ∈ {0, . . ., d − 1}, and |y⟩ ≡ |y 0 ⟩ ⊗ . . .⊗ |y m−1 ⟩.The circuit is run N times, that yields a Nlength sequence (y (1) , . . ., y (N ) ), where each y (i) is the string of m numbers from {0, . . ., d − 1}.
The final post-processing stage takes the read-out results (y (1) , . . ., y (N ) ) and a mapping ϕ in order to obtain equivalent qubit outcomes (ϕ −1 (y (1) ), . . ., ϕ −1 (y (N ) )) as output, where ϕ −1 outputs n-length bit strings out of y (i) .The general condition for the scheme's correctness is as follows: where image(ϕ) is the set of all possible outputs of the mapping ϕ.The consistency condition (4) guarantees that only y ∈ image(ϕ) can appear as measurements results of the qudit circuit, and the obtained bit strings (ϕ −1 (y (1) ), . . ., ϕ −1 (y (N ) )) are indistinguishable from ones that can be obtained with a qubit-based processor.The set of bitstrings (ϕ −1 (y (1) ), . . ., ϕ −1 (y (N ) )) together with the supporting information is the final output of our approach.One can see that from the viewpoint of classical processing, the most challenging is the quditbased transpilation stage.We discuss it in detail below.

IV. QUDIT-BASED TRANSPILATION
Here we describe the concrete realization of the quditbased transpiler designed for a specific model of a quditbased processor.We assume that the available processor consists of m d-dimensional qudits, labeled as Q1, . . ., Qm.As a set of native qudit gates, we consider single-qudit operations where e ıθ is located in αth position, and a two-qudit operation Example of a mapping in the form, given by Eq. ( 10) and (11), between n = 5 qubits and m = 3 qudits of dimension d = 5.Within the presented mapping, position in qudit[q4] = 2, qudit index[q4] = Q2, and qubit index[Q2, 2] = q4.
which applies a fixed phase factor −1 to the pair of levels given by α and β.Here we use the following notations: σ x , σ y , σ z are standard single-qubit Pauli matrices, α, β ∈ {0, . . ., d − 1} denote levels in qudits' space, φ, θ are real-valued arbitrary angles, and 1 stands for the identity matrix.In what follows, we use subindices over unitary operators to specify quantum information carriers or carriers (qubits or qudits) on which this operator acts.We assume that two-qudit gates can be implemented for every pair of qudits within the all-to-all coupling map.Note that CZ 1,1 Qj1,Qj2 realizes a standard qubit controlled-phase gate acting in the four-dimensional subspace spanned by the first two levels of Qj 1 and Qj 2 , and acts as identity in the remaining subspace.Moreover, CZ α,β Qj1,Qj2 with arbitrary α and β can be realized by surrounding a single instance of CZ 1,1 Qj1,Qj2 with single-qudit operations.
We note that to realize the considered single-qudit gates in Eq. ( 5), it is enough to have a connected (but not fully connected) coupling graph of allowed transitions between levels, as shown in Ref. [76].Knowing the exact coupling map between levels, single-qudit operations can be easily reformulated in terms of accessible transitions.This is the case for superconducting [77,78], ion-based [43,61,62], and neutral-atom-based [60] qudits.Moreover, in real existing experimental setups, transitions within a given coupling graph are usually addressed with a single laser.For example, in Ref. [61], 10 allowed transitions inside 8-level qudit realized by 40 Ca + ions are accessed by a single narrowband laser at 729nm with AOM.The employed two-qudit gate (6) can be realized via Rydberg blockade neutral atom-based [60] qudits, and via common quantized motion mode in ion-based platform [79].
The input for the designed transpiler n-qubit hardware-agnostic qubit circuit circ qb , acting of qubits denoted by q1, . . ., qn, is assumed to consist of singlequbit gates where σ φ = σ x cos φ + σ y sin φ, and a κ-qubit gates with κ ∈ {2, 3, . . ., n}.One can see that multi-body operations ( 9) and ( 6) correspond to acquiring a phase factor of −1 on a particular multi-body state.We note that a multi-qubit operation ( 9) can be transformed into a generalized Toffoli gate by applying single-qubit gates.
We also note that both the considered qudit-based and qubit-based sets of gates are universal.Without loss of generality, we assume that circ qb terminates with read-out measurements in a computational basis acting on each of n qubits.The initial state of the qubit register is assumed to be |0⟩ ⊗n .We note, however, that the developing technique of transforming qubit gates into qudit gates is independent of the chosen initial state, and can be applied in the same way within other types of initialization.
To simplify operations within the considered special case of mapping ϕ, it is convenient to introduce the following functions: where the first two functions provide the address of a given qubit with the qudits' space, and the third function returns an index of a qubit given its address.We also introduce the following function: that return indices of qubits located in a given qudit and the total number of qubits in a given qudit correspondingly.
The considered 'qubit-to-qudit' mapping, in particular Eq. ( 10), implies that the computational basis measurement at the end of the qubit circuit corresponds to a computational basis measurement of qudits, also assumed to be realizable on the qudit-based processor.
The developed qudit transpiler consists of two modules: (i) the mapping finder and (ii) the qudit circuit constructor (see Fig. 3).Both of them take as input a qudit processor description (values of m and d) and a hardware-agnostic qubit-based circuit circ qb .The goal of the mapping finder is to search for mapping ϕ opt ∈ {ϕ}, which minimizes a chosen figure of merit, while the purpose of the qudit circuit constructor is to generate a qudit circuit circ qd ϕ that is equivalent to circ qb under a mapping ϕ.Finally, the qudit-based transpiler outputs the optimized mapping ϕ opt and the corresponding circuit circ qd opt := circ qd ϕopt .The mapping finder can also output some supporting information, which contains, e.g., the exact number of single and two-qudit gates in circ qd opt and its comparison with the number of single-and two-qubit gates is in the qubit circuit circ qb stand resulted from the standard qubit-based transpilation of circ qb .Below we describe the operation of modules in more detail.

A. Mapping finder
Here we introduce several approaches of how the optimized qubit-to-qudit mapping can be obtained.As a figure of merit for a mapping ϕ we consider the number of two-qudit gates in circ qd ϕ .This choice is motivated by the fact that entangling gates typically represent the main source of fidelity loss.However, as mentioned before, one can alternative other figures of merits, e.g., circuit depth or fidelity estimations, which can be efficiently calculated given the classical representation of the corresponding qudit-based circuit.

Finding the optimal mapping with an exhaustive search
The straightforward way for optimizing qubit-to-qudit mapping is to employ an exhaustive search over all possible mappings Φ ≡ {ϕ} of the form Eq. ( 10).This approach is applicable if the number of available qudits m and their dimension d are reasonably small.
The first step of the exhaustive search is to construct a set of all non-equivalent mappings Φ ⊂ Φ.Here we call two mappings equivalent if they are different only up to permutations of qubits indices within a particular qudit, or up to permutation of whole sets of qubits' indices belonging to different qudits (and thus definitely provide the same number of entangling gates).Then, the mapping finder sequentially inputs each ϕ ∈ Φ to the qudit circuit constructor to get the corresponding qudit circuit circ qd ϕ .By comparison of two-qudit gate numbers in circ qd ϕ while going through all mappings, the mapping finder chooses the one (ϕ opt ), which provides the smallest number of two-qudit gates.
As we show below, the complexity of generating circ qd ϕ is linear with respect to the number of gates in the original qubit-based circuit circ qb , so the possible bottleneck is in the number of mappings in Φ.We note that this issue does not appear in the case of qutrits (d = 3), where there is only a single non-equivalent mapping: each qubit qi is mapped to a qutrit Qi.
In Fig. 4 we show the behavior of the total number of non-equivalent mappings | Φ| for different values of n and d ≥ 4. Given the fact for d ≤ 31 and n ≤ 7, the resulting number of non-equivalent mappings is no more than thousand, it is possible to go through all ϕ ∈ Φ within a reasonable time.We note that in Fig. 4 we take the number of qudits m to be equal to the number of qubits n, to maximize the number of possible mappings.Since we deal with a special case of mappings, where each qubit is entirely embedded in a single qudit, the number of mappings for different qudit dimensions taken from a range d = 2 n ′ , . . ., 2 n ′ +1 − 1 for certain n ′ is the same.

Searching for optimized mappings with polynomial heuristic algorithms
When the exhaustive search is not applicable, some approximate polynomial methods can be employed.We emphasize that the problem of finding a mapping from Φ, providing an advantage of using a qudit-based approach compared to the standard qubit-based approach, is much easier than finding the best mapping among all mappings in Φ.Indeed, in the case of m ≥ n, the one-toone mapping ϕ (0) definitely provides no more entangling  gates compared to the standard qubit-based transpilation, since upper qudit levels are used only for multiqubit gates decomposition.We emphize that in the case of ϕ (0) , two-qubit entangling CZ gates from the input qubit circuit are realized within the qudit-based version exactly in the same way as in the qubit-based one.If the input qubit circuit has at least one Toffoli gate, then the number of entangling gates in the corresponding qudit circuit circ qd ϕ (0) is strictly smaller compared to the one in the standard qubit-based transpilation result circ qb stand .Comparing ϕ (0) with some limited number candidates from Φ definitely doesn't make things worse.In the case of m < n, yet m⌊log 2 d⌋ ≥ n, the developed qudit-based transpilation method makes it possible to run an n-qubit circuit with m d-level qudits, which is not possible with m qubits at all.
In the case of m ≥ n, the choice of candidates from Φ can be directed by the following observations.First, it is advisable to consider embedding a pair of two qubits into the space of a single qudit, if there is a relatively huge amount of two-qubit gates connecting these qubits within the given circuit, and there is a relatively small amount of gates affecting each of these qubits separately.Second, it is reasonable to put qubits affected by multiqubit gates into qudits with free upper levels to use these levels as ancillas for multiqubit gates decomposition.
One can also consider an iterative "greedy" approach of poly(n) complexity for finding an optimized mapping, where a sequential joining of qubits in the space of qudit is considered.We sketch the idea for the case of m ≥ n and d = 4, . . ., 7 (each qudit can embed no more than two qubits).Initially, the one-to-one mapping ϕ (0) is considered, and the resulting number of entangling gates N (0) ent is stored.At the first step, all n(n − 1)/2 mappings, where one qudit embeds a qubit pair and n − 2 other qudits embed remaining n − 2 qubits are considered.If the  minimal number of entangling gates among these mappings ent , then the corresponding mapping ϕ (1)  with the fewest entangling gates is chosen as a starting point for the next step.Otherwise, ϕ opt := ϕ (0) is the output.In the second step, (n − 2)(n − 3)/2 mappings with two-qubit pairs (the previously selected pair and a newly tested one) are considered, and so on.The algorithm proceeds until the number of entangling gates starts to grow, or a mapping with the maximal number ⌊n/2⌋ of qubit pairs is obtained.Although this algorithm does not guarantee getting the best possible mapping, it provides the resulting number of entangling gates to be no more than the one for a straightforward qubit-based realization, and the maximal number of iterations scales as O(n 3 ).Given the polynomial complexity (in the number of qubits and number of gates) of the transpilation procedure for a given mapping, we obtain a polynomial complexity of the whole qudit-based transpiler.
It is also worth noting an interesting approach for finding a qubit-to-qudit mapping recently proposed in Ref. [80].The goal of this algorithm is also to lower the number of non-local operations within the realization of qubit circuits with qudits.For this purpose, the authors use a weighted graph representation of a given qubit circuit, where qubit levels represent nodes, graph edgeslocal and nonlocal operations, and weights -the number of corresponding operations.The authors propose to use an adaption of the K-means algorithm to cluster the graph to place edges of the highest weights in distinct clusters.This clusterization is then interpreted in terms of the qubit-to-qudit mapping.We note that this algorithm is applied to input qubit circuits already transpiled down to single-and two-qubit gates, and therefore does not utilize the full potential of qudits for operating with multiqubit gates, which of the central features of the approach considered in our work.

B. Qudit circuit constructor
Here we consider in detail how the qudit circuit constructor transpiles qubit circuit circ qb to the qudit circuit circ qd ϕ according to the given qubit-to-qudit mapping ϕ.The transpilation process of circ qb into circ qd ϕ is performed in a gate-by-gate principle, shown in Fig. 5.At the very beginning of the process, circ qd ϕ is initialized as empty.Then, for each gate from the qubit circuit circ qb , the constructor takes a set of qubits, affected by this gate, and finds the set of corresponding qudits, possessing qubits from qubit set: We note that qudit set does not contain duplicates of qudit indices.Thus, the number of elements in qudit set, which we denote by |qudit set|, can be less than the number of involved qubits κ if several affected qubits are located in the space of the same qudit according to the mapping ϕ.
The processing of the gate is determined by the value of |qudit set|.If |qudit set| = 1, then the qubit gate under processing can be realized on the qudit processor as a sequence of single-qudit gates (see subsection IV B 1).In the case of |qudit set| > 1, two-qudit gates become necessary.It is convenient to distinguish the case of |qudit set| = 2 and the case of |qudit set| ≥ 3 that we describe in detail in sections IV B 2 and IV B 3, correspondingly.For all these cases, we obtain a sequence of qudit gates that implement the processed qubit gate.This sequence is added to the end of circ qd ϕ , and then the procedure is repeated for the next gate from circ qb until all gates have been processed.Below we describe the exact decomposition of qubit gates into the sequence of qudit gates for all possible values of |qudit set|.

Single-qudit case
The case of |qudit set| = 1 can appear in two situations: (i) the processed gate is a single-qubit one (κ = 1), and (ii) the processed gate in a multi-qubit (κ ≥ 2) with all affected qubits being located in the same qudit according to the mapping ϕ.
First, let us consider the case of a single-qubit gate, acting on a qubit qi 1 .Let Qj = qudit index[qi 1 ], pos = position in qudit[qi 1 ], and #j = number of qubits [Qj].Remind that in our realization, the only type of singlequbit gates is rotation r qi (φ, θ), defined in Eq. ( 8).To implement this unitary in the qudit's space, we need to consider a tensor product of a 2 × 2 unitary acting in a subspace of affected qubits with an identity operation acting in the remaining space of a qudit.The resulting correspondence between the qubit gate and a sequence of qudit gates is given by where the product is made over all possible pair levels (α, β) satisfying the following condition: Here bin(α) and bin(β) are #j-length binary representation of α and β, correspondingly (let us remind that α, β ∈ {0, 1, . . ., d − 1} and #j ≤ log 2 d), and x 1 . . .x pos−1 x pos+1 . . .x #j are all possible bitstrings of length #j − 1.The sequence of qudit unitaries, given by Eqs. ( 16) and (17), is in agreement with the employed structure of qubit-to-qudit mappings shown in Eq. (11) [see also Fig. 6(a) for an intuitive explanation].
Let us then consider the case of a multi-qubit gate CZ qi1,...,qiκ , where all qubits qi 1 , . . ., qi κ are located in the same qudit Qj.Let pos i k = position in qudit[qi k ] for k = 1, . . ., κ.Then the desired gate CZ qi1,...,qiκ can be realized with the following sequence of single-qudit phase gates: where qudit levels α satisfy the following condition: (here bin(α)[pos i k ] stands for pos i k th bit in a #j-length binary representation of α).As in the case of a singlequbit gate, the resulting sequence provides a proper transformation at the required qudit levels [see Fig. 6(b)].
Clearly, the processing of the single-qudit case is of O(1) space and time complexity on a classical computer.

Two-qudit case
Here we consider the case where qubits, which are involved in certain multi-qubit gate CZ qi1,...,qiκ , are located Qudit-based realization of a three-qubit gate CZqi 1 ,qi 2 ,qi 3 in the case where qi1 and qi2 are embedded into Qj1 and qi3 together with some other qubit are embedded into Qj2 correspondingly (Qj1 and Qj2 are 4-level systems).
As in the case of a single-qubit gate, the resulting transformation performed in the space of two qudits is obtained as a tensor product of the unitary corresponding to CZ qi1,...,qiκ in the proper subspace of the two-qudits space and identity operator in the remaining subspace.This operation reads where pairs of levels (α, β) are all possible admissible pairs satisfying the condition bin(α)[pos According to Eqs. ( 20) and ( 21), the number of CZ α,β Qj1,Qj2 gates in the resulting sequence is determined by the number of qubits located in qudits Qj 1 and Qj 2 and not involved by CZ qi1,...,qiκ (see an example for two-qudit case in Fig. 7).Each unused qubit doubles the number of pairs (α, β) satisfying Eq. ( 21), and so the resulting number of two-qudit gates is given by where #j 1 and #j 2 are the number of qubits in Qj 1 and Qj 2 , correspondingly.Eq. ( 22) captures an intuition behind qudit CZ gate.On the one hand, implementation of, e.g., two-qubit gates (κ = 2) in the case where qudits Qj 1 and Qj 2 contain other qubits (#j 1 + #j 2 > 2) costs more two-body CZ-type interactions than in the case of direct qubitbased realization (2 #j1+#j2−1 compared to 1).Recall, however, that in the case of the one-to-one mapping ϕ (0) , #j 1 = #j 2 = 1 and κ = 2, so there in no overhead in the number of entnagling gates.On the other hand, in the case of multi-qubit gates with κ > 2, the resulting number of two-body CZ-type interactions can become smaller compared to the one obtained from known multi-qubit gates decompositions into single-qubit and two-qubit gates (see e.g.Ref. [81]).We also remind that in the case where all qubits affected by CZ qi1,...,qiκ fall into the same qudit, there is no need for two-body interactions at all.
As in the single-qudit case, the described processing is of O(1) space and time complexity on a classical computer.We also note that for other types of two-qudit interactions, different from CZ α,β Qj1,Qj2 , Eqs. ( 20) and ( 22) have to be modified.For example, as shown in [82], in a trapped-ion platform with native parametric two-qudit Mølmer-Sørensen gate, to implement two-qubit CZ gates (κ = 2) in the case, where qudits Qj 1 and Qj 2 contain two qubits each, a single two-qudit Mølmer-Sørensen gate with increased value of effective rotation angle, compared to the rotation angle used within the qubit-based verstion, is needed.

Multi-qudit gate case
Here we describe the most complicated case, where qubits affected by the gate CZ qi1,...,qiκ , fall into more than two qudits.To make the decomposition description more clear, let us introduce new notations.For each qudit Qj, we define states |0⟩ Qj ≡ |0⟩ Qj and |1⟩ Qj ≡ |d 2 − 1⟩ Qj that corresponds to multi-qubit states |0 . . .0⟩ qij,1,...,qi j,#j and |1 . . .1⟩ qij,1,...,qi j,#j , correspondingly, with respect to the considered mapping ϕ (remind that qi j,1 , . . ., qi j,#j denote labels of qubits located in the space of qudit Qj).If qudit dimension d > 2 #j , we also define an ancillary state |a⟩ Qj ≡ |2 #j ⟩ Qj that is beyond the qubits' subspace in the space of Qj, and a flag, indicating whether the ancillary level is available in this qudit: Let qudit set be a set of qudit indices involved in the realization of CZ qi1,...,qiκ , see Eq. (15).We assign each qudit from qudit set to one of three possible types labeled as A, B, or C.
We say that Qj belongs to type A, if all qubits, located in this qudit, are affected by CZ qi1,...,qiκ and Qj has no ancillary level, that is (indices of qubits(Qj) ⊂ qubit set) ∧ ancilla[Qj], (24) where ancilla[Qj] stands for ancilla[Qj] = False.If all qubits located in qudit Qj are involved in the decomposed qubit gate and Qj has ancillary level, that is then we say that qudit Qj belongs to type B. Qudit Qj ∈ qudit set belongs to type C if there is at least one qubit located in Qj but not affected by CZ qi1,...,qiκ : We denote the number of qudits of types A, B, and C as |A|, |B|, and |C|, correspondingly.The transpilation of CZ qi1,...,qiκ is performed by, first, constructing an intermediate qudit circuit circ qd-int ϕ presented in Fig. 8(a), and then decomposing circ qd-int ϕ down to single-qudit and two-qudit gates.
We then split circ qd-int ϕ into five steps: (i) multi-qudit controlled gate with controls on type A qudits and target on the first qudit of type B; (ii) down-step ladder-like sequence of two-qudit gates on all qudits of type B; (iii) multi-qudit gate acting on the last qudit of type B and all qudits of type C; (iv) up-step ladder-like sequence that is the uncomputation of step (ii); (v) the uncomputation of step (i).
The general idea of this structure is as follows.According to CZ qi1,...,qiκ , we have to add a phase factor of −1 to all basis states of involved qudits such that all corresponding qubits qi 1 , . . ., qi κ , embedded in these qudits, are in the state 1.We note that for qudits of types A and B, there is a single level that is a candidate for acquiring the phase factor, namely |1⟩.For type C qudits, the situation is different.Since the presence of unaffected qubits, the phase factor is acquired or not acquired (depending on the state of other involved qudits), to several levels of type C qudit, namely to all levels α satisfying the condition bin(α where pos values correspond to positions of affected qubits in the considered qudit.Roughly speaking, we need to add a phase factor of −1 to a computational basis state of involved qudits, if all type-A and type-B qudits are in the state |1⟩, and all qubits encoded in type-C qudits are in the state |1⟩. The important feature of type B-qudits is that they possess an ancillary level that can be employed for storing temporary information within the gate decomposition.We use an ancillary level of type B qudit for storing a 'flag' whether this qudit and all 'previous' qudits are in the proper state for acquiring the phase factor: this is the way how the ladder type sequences [parts (ii) and (iv)] appear in our construction.As we discuss further, the operation with qudits of type A and C is based on standard schemes of reconstructing multi-qubit controlled gates down to single-qubit and two-qubit gates.At the same time, the implementation of a two-qubit operation for qubits in qudits, possessing other uninvolved qubits (qudits of type C), results in overhead in the number twoqudit operation (as we discussed in Section IV B 2).In order to avoid the doubling of this overhead in the uncomputation, we put the operation with C type qudits in the middle of our circuit.
However, it is a possibly realizable situation when the number of two-qudit gates required for processing C type qudits is lower than the number of two-qudit gates for processing A type qudits.In this case, it is preferable to swap A and C-type qudits in the structure of circ qd-int ϕ .
In order to simplify our description, next we consider the processing of the original circuit presented in Fig. 8(a).The possible improvement related to swapping A and C qudits in the structure of circ qd-int ϕ is discussed in Appendix B. We also consider modifications of the described scheme in cases where one or two types of qudits are missing (e.g.|B| = 0) in Appendix A.
Below we consider the decomposition of each of the described groups of gates to the set of basic single-qudit and two-qudit gates for our transpiler.
a. Processing the multi-controlled gate of Step (i).The idea of its decomposition is as follows.First, by employing single-qudit rotations R 0,1 where Qj is an affected type-B qudit, and notations 0 ≡ 0, 1 ≡ d 2 − 1, and a ≡ 2 #j are used, we turn the desired multi-qudit gate into the gate of CZ type [see Fig. 8(b)].Namely, it adds the phase factor −1 to the state |1⟩ ⊗ . . .⊗ |1⟩ of the affected qudits and leaves the remaining states unchanged.

1,1
Qj1Qj2 , where the correspondence between qubits p0, . . ., p|A| and affected qudits is realized via a straightforward ordering (see Fig. 8 (e)).One can see that this construction provides the realization of CZ operation in the space spanned by a tensor product of states |0⟩ and |1⟩ of affected qudits.
We note that it is possible a situation, where the taken qubit-based decomposition of CZ p0,...,p|A| realizes the gate up to a global phase.In our case, where we embed this decomposition into qudit space, the global phase turns into a relative one between the CZ operation in the subspace spanned by a tensor product of |0⟩ and |1⟩ states and the identity operation in the remaining subspace.However, this relative phase is removed in the uncomputation Step (v), given that all operations in the uncomputation are Hermitian conjugates of ones in Step (i).
b. Processing the ladder-like sequence on type-B qudits of Step (ii).Remind that each type-B qudit has an ancillary level |a⟩ that we use to store the information about whether 'previous' (according to the ordering in Fig. 8(a)) qudits are in the state |1⟩.The idea behind employed gates in the ladder-like sequence of Step (ii) is quite straightforward: Each gate turns the state of the target qudit from |1⟩ to |a⟩ if and only if the control qudit is in the state |a⟩.One can see that by realizing the sequence of Step (ii), following Step (i), the last type-B qudit appears in the state |a⟩ if and only if all type-A and type-B qudits were initially in the state |1⟩.The de- Step (iv) Step (ii) Step (iii) Type-B < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 B I T 6 n + 1 3 y 7 K C s d x 9 i E 0 Y 0 w y y j 3 3 2 ) composition employed in Step (ii) two-qudit gates using native qudit gates can be performed according to schemes of Fig. 8(b) and (c).

c. Processing the multi-controlled gate of Step (iii).
The goal of the considered gate is to acquire the phase factor of −1 to the input state if the last type-B qudit is in the ancillary state |a⟩, and type-C qudits are in a such state that all qubits embedded in the type-C qudits and affected by the decomposed gate are in the state |1⟩.As has been mentioned, the important point of type-C qudits is that they also contain unaffected qubits, which results in the fact that the gate of Step (iii) has to acquire the phase factor of −1 to several computational basis states.Specifically, the phase factor of −1 has to be acquired to 2 #unaff states, where #unaff is a total number of unaffected qubits in type-C qudits.The intuition behind this fact is exactly the same as behind Eq. ( 22) of the required number of two-qudit gates for realizing an operation between qubits embedded in these qudits.
The idea of decomposing the gate of Step (iii) is very similar to the one of decomposing the gate at Step (i).First, we turn the gate at Step (iii) to a multi-controlled gate of CZ type by adding R 1,a Qj (π/2, π) and R 1,a Qj (π/2, −π) rotations on the last qudit Qj of type B. Then we take an ancilla-free decomposition of (|C| + 1)-qubit CZ p0,...,p(|C|) gate (acting on virtual qubits p0, . . ., p|C|) to singlequbit rotations r pk (ϕ, θ) and two-qubit CZ pk1pk2 gates [see Fig. 7(f)].
Each single-qubit gate r pk (ϕ, θ) is transformed into the sequence of single-qudit gates where Qj is a qudit corresponding to pk according to the straightforward ordering, and (α, β) are all appropriate level pairs satisfying the condition and {pos ℓ } is a set of position of qubits embedded in Qj and affected by the multi-qudit gate of Step (iii).Two-qubit gates CZ pk1pk2 are transformed according to Sec.IV B 2. Namely, we take qudits Qj 1 and Qj 2 corresponding to pk 1 and pk 2 , chose the set of qubits qi ′ 1 , . . .qi ′ κ that are embedded in Qj 1 and Qj 2 and are affected by the multi-qudit gate of Step (iii), and transform CZ qi ′ 1 ,...qi ′ κ according to (20) and (21).In contrast to the decomposition of the multicontrolled gate of Step (i), here we should also take into account the global phase factor that may appear from the qubit-based decomposition.Though it is insignificant in the qubit case, when we use this decomposition for qubit embedded into qudits, the phase turns from the global to the relative one (this is the relative phase between the CZ operation in the subspace of affected qubits, and identity operation in the remaining subspace).To compensate for this phase explicitly, we add a phase single-qubit gate Ph p0 (γ) = e ıγ 0 0 e ıγ (30) to the qubit p0.The value of γ is chosen to make the whole sequence of gates, applied to p0, . . ., p|C|, to realize CZ p0,...,p|C| without any global phase.
From the viewpoint of the qudit circuit, p0 corresponds to the type-B qudit Qj involved in the multi-controlled gate of Step (iii).The qubit phase gate Ph p0 (γ) transforms into d. Uncomputation steps (iv) and (v).By its construction, the implementation of Steps (i)-(iii) realizes an acquiring of the phase factor of -1 to such qudits input state, where all embedded qubits affected by the decomposed gate CZ q1,...,qκ are in state |1⟩.However, we employ ancillary level |a⟩ of type-B qudits.To remove the population from |a⟩ to original levels, we employ uncomputation, which is a 'mirror reflection' of steps (i) and (ii).Namely, Steps (iv) and (v) are obtained as a Hermitian conjugate of a sequence of steps (i) and (ii): their order is reversed, and each r Qj (φ, θ) i s transformed to r Qj (φ, −θ) (note that CZ Qj1Qj2 = CZ † Qj1Qj2 ).As it was already mentioned, the uncomputation also removes the relative phase between the subspace of affected qubits and the remaining subspace of type-A qudits, possibly acquired in Step (i).
We see that all routines during the processing of the multi-qudit case are efficient, and the resulting complexity has no more than quadratic growth with an increase the degree of the processed generalized Toffoli gate (the quadratic asymptotics can appear from the used template for ancillary-free decomposition [81]).That is why the whole complexity of transpiling an n-qubit circuit consisting of L gates scales linearly L and no more than polynomially with n.
V. REALIZING 6-QUBIT QUANTUM CIRCUIT WITH QUQUARTS As an example, we consider the realization of an n = 6 qubit circuit, which is presented in Fig. 9(a), with a qudit-based processor consisting of m = 4 'ququarts' (qudits with d = 4).First, let us consider a straightforward implementation of the input circuit with a qubit-based processor.To simplify the transpiration of multi-qubit gate CZ q1,...,q 6, we use two additional ancillary qubits q7, q8.Using schemes shown in Fig. 9(b,c) together with one from Fig. 8(d), CZ q1,...,q6 can be realized with 5 × 6 = 30 two-qubit gates and a number of single-qubit gates.In this way, the straightforward qubit-based decomposition of the input circuit results in N qb 2-body = 33 two-qubit operations.We note that no restrictions on the coupling map between qubits are considered here.
In contrast, the qubit-to-qudit mapping, shown in Fig. 9(d), allows realizing the input qubit circuit with only N qd 2-body = 6 two-qudit gates.In Fig. 9(e) we show a transformation of 1024 measurement outcomes, obtained with a qudit-based classical emulator, to the read-out measurement outcomes performed in the input qubit circuit.The transpiled qudit circuit is shown in Fig. 9(f).One can see that the qudit-based realization provides an advantage both in the circuit width and depth.
Recall that the results of the qudit-based transpilation, shown in Fig. 9, remain also valid for initial qubit states other than |0⟩ ⊗n .To realize the qubit circuit with respect to another initial state using qudits, the only thing that is required is to update the initial state of qudit's register in accordance with the qubit-to-qudit mapping.We note that within the considered mappings of the form (10), any separable state of qubits corresponds to a separable state of qudits (but not vice-versa).

VI. DISCUSSION
Here we stress some important points related to the developed qudit-based transpilation approach.First, we emphasize its scalability with respect to the width and depth of a processed input qubit circuit.The scalability is assured by the facts that (i) the complexity of transpiling of a given single-, two-, or multi-qubit gate to its qudit version grows no more than polynomially with the number of qubits affected by the gate, (ii) the complexity of transpiling the whole qubit circuit with respect to the given qubit-to-qudit mapping grows linearly with a number of gates, and (iii) a polynomial in qubit number n algorithm for mapping finding, which provides an advantage (or at least doesn't make things worse) compared to a standard qubit-based transpilation, can be used.In particular, a greedy algorithm for the mapping finder of O(n 3 ) complexity for 4 ≤ d ≤ 7 is shown.Therefore, the resulting qudit-based transpilation complexity is polynomial in the number of qubits n in the processed circuit and linear in the number of gates.
We also recall that in the case of m < n ≤ m⌊log 2 d⌋, where m is the number of available d-dimensional qudits, the developed transpilation approach makes it possible to run an n-qubit circuit, which cannot be launched at all with m qubits.Thus, our approach allows one to expand the range of algorithms suitable for running in terms of the required number of qubits.On the other hand, if m ≥ n, then the corresponding qudit circuit definitely has no more entangling gates than the transpiled in a standard qubit-based version and strictly less number of entangling gates, if there is at least one multiqubit gate in the original qubit circuit.
An important feature of the developed approach is its adaptiveness to qudit dimension in the processing of multi-qubit gates.It allows leveraging the power of extra levels in qudits to a greater extent compared to the approaches (see, e.g., [80]), where first a given qubit circuit is transpiled down to single-and two-qubit gates, and then a qubit-to-qudit mapping is obtained.
These features together demonstrate the applicability of our approach to useful near-term quantum algorithms.The main application area of our approach is quantum algorithms, which typically contain multi-qubit gates.A clear example of two-particle gate reduction provided by qudit-based realization is Grover's search algorithm.As shown in Ref. [83], a thousandfold reduction in entangling gate number starting from eight qubits implementation can be achieved with ququints (d = 5).Multiqubit gates are inherent in solving factorization [84] and discrete logarithm [85] problems.It's worth emphasizing that decompositions of general multiqubit unitaries, e.g.Haar-random, are also based on generalized Toffoli gates [86].We also note that the presented qudit-based transpilation approach is promising within the employing Toffoli + Hadamard universal gate set [87], where new interesting results were reported recently [88].

VII. CONCLUSION AND OUTLOOK
We have presented the approach for an efficient implementation of qubit circuits with qudit-based processors.The proposed approach consists of finding the optimized qubit-to-qudit mapping, transpiling a qubit circuit according to this mapping, running a transpiled circuit on a qudit-based processor (or emulator), and then reassigning read-out measurement results back to the qubit-based representation.We have developed a qudit-based transpilation algorithm with respect to a particular universal set of single-qudit and two-qudit gates and proposed an idea of a mapping finder algorithm with polynomial complexity.Then we have shown an example of applying the developed approach for realizing a 6-qubit circuit with four 4-level qudits.We have demonstrated that the resulting number of two-particle operations required for implementing the given circuit with qudits appears considerably smaller than the one within a straightforward qubit-based implementation.Taking into account recent progress in improving the fidelity of qudit gates, we expect an overall increase in the resulting fidelity of imple-menting qubit circuits with qudits.
We note that the main goal of the current work is to provide a general approach for qubit circuit execution with qudit-based hardware.The considered example of qudit-based transpilation has to be modified for each particular physical platform with a specific set of native gates and qudits' connection topology.We leave these particular platform-specific problems for further consideration.Although the realization of d ′ -ary circuits with qudits is beyond the scope of this work, their transpilation for qudit processors with d ≥ d ′ + 1 levels is also an improvement option for the developed qudit transpiler.
We also note that one can consider a refinement of the optimized qudit circuit criterion.It can be defined not only by the number of two-particle operations but also as a total qudit circuit fidelity (or its estimation), which takes into account both single-qudit and two-qudit gates fidelities.Although recent papers demonstrate that fidelities of single-and two-qudit gates are comparable with qubit gates' fidelities, this metric allows one to more accurately take into account the effects of decoherence arising from the usage of upper levels.While processing multi-qubit gates CZ qi1,...,qiκ with |qudit set| ≥ 3, it is possible a situation, where one or two types (A, B, C) are missing.In this case, the described decomposition needs some slight corrections.
If there are no type-B qudits, then the intermediate circuit consists of a single multi-qudit CZ gate acting on all |qudit set| qudits.To decompose this gate we take a standard multi-qubit gate decomposition as a template as it is described in Sec.IV B 3 a and IV B 3 c.Then each qubit gate is replaced with the corresponding qudit gate(s) taking into account the type of involved qudits (type A or type C).The phase correction, discussed in Sec.IV B 3 c, has to be applied if it is necessary.
If there are no type-A qudits, yet there is at least one type-B qudit, the ladder-like part of decomposition is started with the control on the first qudit in the state |1⟩.If type-C qudits are also missing, then the ladderlike part of the decomposition ends with a target on the last type B qudit in the state |1⟩.

Appendix B: Alternative form of the intermediate circuit
The structure of the intermediate qudit circuit circ qd-int-alt ϕ , which is presented in Fig. 10 is similar to the structure of the previously described circ qd-int ϕ .The main difference between them is the location of type A and C qudits in the scheme.In the alternative version, the multi-qudit gate on type-C qudits is employed twice, and the multi-qudit gate on type A is employed once in the central part of the scheme.We note that operations on type B qudits remain the same as in the previously described scheme in Fig. 8(a).
Taking into account invariability of operations on type B qudits and the symmetry of CZ type operation in the core of type-A and C multi-qudit gates, realization of the CZ type operation on A and C qudits reduces to the procedures described in Secs.IV B 3 a and IV B 3 c, correspondingly.

Competing interests
Owing to the employment and consulting activities of authors, A.S.N., E.O.K., and A.K.F. have financial interests in the commercial applications of quantum computing.A.S.N., E.O.K., and A.K.F.do not have any non-financial competing interests.K e p i E J m H L T 2 f U Z P t X K A P u R 1 B U C n q m / J 1 I S K D U J P N 0 Z E B i p R W 8 q / u f 1 E v C v 3 Z S H c Q I s p P N F f i I w R H g a B R 5 w y S i I i S a E S q 5 v x X R E J K G g A y v p E O z F l 5 d J u 1 q x L y r V u 8 t y 7 S a P o 4 i O 0 Q k 6 Q z a 6 Q j V 0 i x q o h S h 6 R M / o F b 0 Z T 8 a L 8 W 5 8 z F s L R j 5 z i P 7 A + P w B j H u V R g = = < / l a t e x i t > Step (v) Step (iv) Figure 10.Alternative intermediate qudit circuit circ qd-int-alt ϕ for the decomposition of CZqi 1 ,...,qiκ gate.The difference from the described in the main text scheme (see Fig. 8) is the locations of swapping operations of type-A and type-C qudits.

Figure 1 .
Figure 1.The main stages of implementing a qubit circuit with a qudit-based processor or emulator.

Figure 3 .
Figure 3. Data transfer scheme in the developed qudit-based transpiler.

Figure 4 .
Figure 4.The total number of non-equivalent mapping | Φ| depending on the number of qudits m for d ∈ {4, . . ., 31}.To maximize the number of nonequivalent mappings, we take the number of qubits n the same as the number of qudits m.

Figure 5 .
Figure 5. Qudit circuit transpilation algorithm implemented in the qudit-based circuit constructor.Gates from circ qb are sequentially classified by the number of qudits in qudit set and then transpiled to qudit-based gates depending on the number of qudits in qudit set.

Figure 6 .
Figure 6.(a) Qudit-based realization of a single-qubit gate rqi(φ, θ) in the case where qi is embedded into 8-level qudit Qj at the 2nd position (Qj contains three qubits in total).The involved transitions within the 8-level qudit Qj are shown.(b) Qudit-based realization of a two-qubit gate CZqi 1 ,qi 2 in the case where the affected qubits qi1 and qi2 are embedded into the same qudit Qj at the 1st and 3rd positions, correspondingly.The levels acquiring the phase factor of -1 are shown.

Figure 8 .
Figure 8.(a) Intermediate qudit circuit circ qd-int ϕ used for the decomposition of CZqi 1 ,...,qiκ with |qudit set| ≥ 3.Each gate of circ qd-int ϕ is then transpiled down to native single-qudit and two-qudit gates.(b) The scheme of transforming an inversion operation between levels |1⟩ and |a⟩ to the phase accruing operation at level |1⟩ via single-qudit operations.(c) The scheme of transforming a control at level |a⟩ to the control at level |1⟩ via single-qudit operations.(d) An example of the qubit-based ancilla-free decomposition of three-qubit CZp0,p1,p2 gate according Ref. [81] that can be used as a template to decompose multiqudit CZ type gates at Steps (i), (iii), and (v).(e) Decomposition of a three-qudit gate used in Steps (i) and (v) to single-qudit and two-qudit gates according to the template, shown in (d).(f) Decomposition of a three-qudit gate used in Step (iii) to single-qudit and two-qudit gates according to the template, shown in (d).

Figure 9 .
Figure 9. (a) Example of a qubit circuit that is an input to the developed qudit-based transpiler.The circuit acts on n = 6 qubits q1, . . ., q6, two additional qubits q7, q8 are used for decomposing five-qubit CZq1,...,q5 gate.(b) Decomposition of fivequbit CZq1,...,q5 gate with two 'clean' ancillas down to Toffoli gates.For the decomposition of Toffoli gates, the circuit identity shown in (c) and then the scheme of Fig. 8(d) can be used.(d) Qubit-to-qudit mapping that is used for the realization of the given qubit circuit with m = 4 qudits of d = 4 levels.(e) Equivalence of read-out results that are obtained with qudit-based emulator and post-processed outputs that can be interpreted as results of the qubit circuit implementation.(f) The result of qudit-based transpilation of the input qubit circuit.
Appendix A: Intermediate circuit in the case of incomplete set of type-A, B, C qudits < l a t e x i t s h a 1 _ b a s e 6 4 = " X V 7 S 6 0 U N n e g C I 8 k F 4 4 0 9h g b K / F 4 = " > A A A B / X i c b V D L S s N A F J 3 U V 6 2 v + N i 5 G S y C G 0 t S B V 0 W u 3 F Z o S 9 o Q p l M J + 3 Q y Y O Z G 7 G G 4 K + 4 c a G I W / / D n X / j t M 1 C W w 9 c O J x z L / f e 4 8 W C K 7 C s b 6 O w s r q 2 v l H c L G 1 t 7 + z u m f s H b R U l k r I W j U Q k u x 5 R T P C Q t Y C D Y N 1 Y M h J 4 g n W 8 c X 3 q d + 6 Z V D w K m z C J m R u Q Y c h 9 T g l o q W 8 e O c A e I G 1 q 6 z x L H U o E r m d 9 s 2 x V r B n w M r F z U k Y 5 G n 3 z y x l E N A l Y C F Q Q p X q 2 F Y O b E g m c C p a V n E S x m N A x G b < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 B I T 6 n + 1 3 y 7 K C s d x 9 i E 0 Y 0 w y y j I = " > AA A B / X i c b V D L S s N A F J 3 U V 6 2 v + N i 5 G S y C G 0 t S B V 2 W u n F Z o S 9 o Q p l M J + 3 Q y Y O Z G 7 G G 4 K + 4 c a G I W / / D n X / j t M 1 C W w 9 c O J x z L / f e 4 8 W C K 7 C s b 6 O w s r q 2 v l H c L G 1 t 7 + z u m f s H b R U l k r I W j U Q k u x 5 R T P C Q t Y C D Y N 1 Y M h J 4 g n W 8 8 c 3 U 7 9 w z q X g U N m E S M z c g w 5 D 7 n B L Q U t 8 8 c o A 9 Q N r U 1 n m W O p Q I X M / 6 Z t m q W D P g Z W L n p I x y N P r m l z O I a B K w E K g g S v V s K w Y 3 J R I 4 F S w r O Y l i M a F j M m Q 9 T U M S M O W m s + s z f K q V A f Y j q S s E P F N / T 6 Q k U G o S e L o z I D B S i 9 5 U / M / r J e B f u y k P 4 w R Y S O e L / E R g i P A 0 C j z g k l E Q E 0 0 I l V z f i u m I S E J B B 1 b S I d i L L y + T d r V i X 1 S q d 5 f l W j 2 P o 4 i O 0 Q k 6 Q z a 6 Q j V 0 i x q o h S h 6 R M / o F b 0 Z T 8 a L 8 W 5 8 z F s L R j 5 z i P 7 A + P w B i v a V R Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " y n r E B 3 0 C w 1 U H J u r R R K M F A a 0 7 E p 4 = " > A A A B / X i c b V D L S s N A F J 3 U V 6 2 v + N i 5 G S y C G 0 t S B V 1 W 3 b i s 0 B c 0 o U y m k 3 b o 5 M H M j V h D 8 F f c u F D E r f / h z r 9 x 2 m a h r Q c u H M 6 5 l 3 v v 8 W L B F V j W t 1 F Y W l 5 Z X S u u l z Y 2 t 7 Z 3 z N 2 9 l o o S S V m T R i K S H Y 8 o J n j I m s B B s E 4 s G Q k 8 w d r e 6 G b i t + + Z V D w K G z C O m R u Q Q c h 9 T g l o q W c e O M A e I G 1 o 6 z R L H U o E v s p 6 Z t m q W F P g R W L n p I x y 1 H v m l 9 O P a B K w E K g g S n V t K w Y 3 J R I 4 F S w r O Y l i M a E j M m B d T U M S M O W m 0 + s z f K y V P v Y j q S s E P F V / T 6 Q k U G o c e L o z I D B U 8 9 5 E / M / r J u B f u i k P 4 w R Y S G e L / E R g i P A k C t z n k l E Q Y 0 0 I l V z f i u m Q S E J B B 1 b S I d j z L y + S V r V i n 1 W q d +f l 2 n U e R x E d o i N 0 g m x 0 g W r o F t V R E 1 H 0 i J 7 R K 3 o z n o w X 4 9 3 4 m L U W j H x m H / 2 B 8 f k D i X G V R A = = < / l a t e x i t >