Quantum pricing with a smile: implementation of local volatility model on quantum computer

Quantum algorithms for the pricing of financial derivatives have been discussed in recent papers. However, the pricing model discussed in those papers is too simple for practical purposes. It motivates us to consider how to implement more complex models used in financial institutions. In this paper, we consider the local volatility (LV) model, in which the volatility of the underlying asset price depends on the price and time. As in previous studies, we use the quantum amplitude estimation (QAE) as the main source of quantum speedup and discuss the state preparation step of the QAE, or equivalently, the implementation of the asset price evolution. We compare two types of state preparation: One is the amplitude encoding (AE) type, where the probability distribution of the derivative’s payoff is encoded to the probabilistic amplitude. The other is the pseudo-random number (PRN) type, where sequences of PRNs are used to simulate the asset price evolution as in classical Monte Carlo simulation. We present detailed circuit diagrams for implementing these preparation methods in fault-tolerant quantum computation and roughly estimate required resources such as the number of qubits and T-count.


I. INTRODUCTION
With recent advances of quantum computing technologies, researchers are beginning considering how to utilize them in industries.One major target is finance (see [1] for a review).Since financial institutions are performing enormous tasks of numerical calculation in their daily works, speed-up of such tasks by quantum computer can bring significant benefits to them.One of such tasks is pricing of financial derivatives 1 .Financial derivatives, or simply derivatives, are contracts in which payoffs are determined in reference to the prices of underlying assets at some fixed times.Large banks typically have a huge number of derivatives written on various types of assets such as stock price, foreign exchange rate, interest rate, commodity and so on.Therefore, pricing of derivatives is an important issue for them.
In derivative pricing, we represent random movements of underlying asset prices using stochastic processes and calculate a derivative price as a expected value of the sum of payoffs discounted by the risk-free interest rate under some specific probability measure.In order to calculate the expected value, Monte Carlo simulation is often used.There are quantum algorithms for Monte Carlo simulation [5,6], which bring quadratic speed-up compared with that on classical computers and there already exists some works which discuss application of such quantum algorithms to derivative pricing [7][8][9].However, in order to bring benefits to practice in finance, previous works have some room to be extended.That is, previous works consider the Black-Scholes (BS) model [10,11].Although the BS model is the pioneering model for derivative pricing and still used in many situations in today's financial firms, it is insufficient to consider only the BS model as an application target of Monte Carlo for practical business for some reasons.First, for various types of derivatives, market prices of derivatives are inconsistent with the BS model.This phenomenon is called volatility smile, which we will explain in Section II.In order to precisely price derivatives taking into account volatility smiles, financial firms often use models which have more degree of freedom than the BS models.Second, the BS model is so simple that analytic formulae are available for the price of some types of derivatives in the model.In such cases, Monte Carlo simulation is not necessary.Actually, banks use Monte Carlo simulation mainly for complex models which can take into account volatility smiles.Although it is the natural first step to consider Monte Carlo in the BS model, the above points motivate us to consider how to apply quantum algorithms for Monte Carlo to the advanced models.
In this paper, we will focus on one of the advanced models, that is, the local volatility (LV) model [12].The LV model, which we will describe later, is the model in which a volatility of an asset price depends on the price itself and time, so the BS model is included in this category as a special case.With degrees of freedom to adjust the function form of volatility, the LV model can make derivative prices consistent with volatility smiles.So this model is widely used to price derivatives, especially exotic derivatives, which have complex transaction terms such as early redemption, in many banks.
In order to price a derivative by Monte Carlo simulation, we generate paths, that is, random trajectories of time evolution of asset prices, then calculate the expectation value of the sum of discounted payoffs which arise in each path.Since we cannot generate continuous paths on computers, we usually consider evolutions on a discretized time grid, using a random number (RN) for each time step, which represents the stochastic evolution in the step.In this paper, we focus on how to implement such a time evolution in the LV model on quantum computers.
We can consider two ways to implement the time evolution.In this paper, we call them the register-per-RN way and the PRN-on-a-register way.The difference between them is how to generate RNs required to generate a path.The register-per-RN is adopted in previous papers [7][8][9].In this way, following the procedure described in, e.g., [13], one creates a superposition of bit strings which correspond to binary representations of possible values of a RN, where the probability amplitude of a each bit string is the square root of the possibility that the RN take the corresponding value.The point is that one register is used for one RN, so the required qubit number is proportional to the number of RNs used to generate a path.This can be problematic in terms of qubit number when many RNs are required.The number of RNs is equal to that of time steps times and that of underlying assets 2 .The number of time steps can be large for derivatives with long maturity.Maturity can be as long as 30 years, so if we take time grid points monthly, the total number of time steps is 360.The number of underlying assets can be multiple, and furthermore, there are situations where we must simultaneously consider assets concerning different derivative contracts in a portfolio, for example, XVA 3 .Assuming that the number of asset is O (10) and that of time steps is O(100), the total number of RNs becomes O(10 3 − 104 ).If we use a register with O(10) qubits for each RN, the total qubit number can be O(10 5 ) easily.The current state-of-art quantum computers have only O (10) qubits [15].Even if we obtain large-scale fault-tolerant machines in the future, the large qubit overhead might be required to make a logical bit (see [16] as a review and references therein).Therefore, calculations which require the large number of qubits as above might be prohibitive even in the future.This situation is similar to credit portfolio risk measurement, which is described in [17].
We are then motivated to consider the PRN-on-a-register way, which is proposed in [17].In this way, one does not create RNs on different register, but generates a sequence of pseudo-random number (PRN) on a register.At each time step, the PRN sequence is progressed and its value is used to evolve the asset price.Therefore, the required qubit number does not depend on the number of RNs and is largely reduced.The drawback is the circuit depth.Here, we define the circuit depth as the number of layers consisting of gates on distinct qubits that can be performed simultaneously, as T-depth used in [18,19].Since calculations to update the PRN is sequentially performed on a register, the circuit depth is now proportional to the number of RNs.Since in the fault-tolerant computation some kinds of gates, for example T-gates in the Clif-ford+T gate set, can take long time to be run [20,21], the sequential run of such gates might be also prohibitive in terms of calculation time [22].At any rate, in the current stage, where it is difficult to foresee the spec of future quantum computers, we believe that it is meaningful to consider the implementation which saves qubits but consumes depth as a limit.
When it comes to the LV model, the PRN-on-a-register way becomes more motivating, since its disadvantage on the circuit depth compared with the register-per-RN way is alleviated.In the LV model, the volatility varies over time steps depending on the asset price, so the calculation for the time evolution is necessarily stepwise 4 .Therefore, the PRN-on-aregister way and the register-per-RN way are equivalent with respect to this point, that is, the circuit depth is proportional to the time step number in both ways.This is different from the situation in credit portfolio risk management [22], where, in the register-per-RN way, a register is assigned to each random number which determine whether each obligor defaults or not and parallel processing on different registers reduces circuit depth.
In this paper, we design the quantum circuits in the above two way in the level of elementary arithmetic.In doing so, we follow the policies of the two ways to the extent possible.That is, not only with respect to RNs but also in other aspects, we try to reduce qubits accepting some additional procedures in the PRN-on-a-register way, and vice versa in the registerper-RN way.For example, in the PRN-on-a-register way, we have to intermediately output the information of the volatility used to evolve the asset price at each time step and clear it by the next step.Otherwise, we need a register to hold the information per step and the required qubit number becomes proportional to the number of time steps.It is nontrivial to implement such a procedure and we will present how to do this later.Note that such clearing procedure is unnecessary in the register-per-RN way.
We then estimate the resources to implement the proposed circuits.We focus on two metrics: qubit number and T-count.As mentioned above, we see that the qubit number in the PRNon-a-register way is independent from the time step number and much less than the register-per-RN way.The T-count is proportional to the time step number in the both ways.We see that in some specific setting the both ways yield the Tcounts of same order of magnitude, except that in the PRNon-a-register way is larger by some O(1) factor.
The rest of this paper is organized as follows.Section II and III are preliminary sections, the former and the latter briefly explain the LV model and the quantum algorithm for Monte Carlo simulation, respectively.In section IV, we present the circuit diagram in the two way.In section V, we estimate qubit number and T-count of the proposed circuits.Section VI gives a summary.

II. LV MODEL
In this paper, we consider only the single-asset case, since it is straightforward to extend the discussion in this paper to the multi-asset case.

A. pricing of derivatives
Consider a party A involved in a derivative contract written on some asset.We let S t denote a stochastic process which represents the asset price at time t, which is set as t = 0 at the present.We assume that the payoffs arise at the multiple times t pay i , i = 1, 2, ... and the i-th payoff is given by f pay i S t pay i , where f pay i is some function which maps R to R. The positive payoff means that A receives a money from the counterparty and the negative one means vice versa.For example, the case where A buys an European call option with the strike K corresponds to with a single payment date t pay 1 .Note that this type of derivative contract is too simple to cover all trades in financial markets.For example, callable contracts, in which either of the parties has a right to terminate the contract at some times, are widely dealt in markets.We leave studies for such exotic derivatives for future works and, in this paper, concider only those which can be expressed in the above form.
Following the theory of arbitrage-free pricing, the price V of the contract for A is given as follows [3,4]: where E[•] represents the expectation value under some probability measure, the so-called risk-neutral measure.Here and hereafter, we assume that the risk-free interest rate is 0 for simplicity.

B. LV model
In the LV model, the evolution of the asset price is modeled by the following stochastic differential equation (SDE) in the risk-neutral measure5 W t is the Wiener process which drives S t .dX t is the increment of a stochastic process X t over an infinitesimal time interval dt.The deterministic function is the local volatility.Note that the BS model corresponds to the case where where σ BS is a positive constant, which we hereafter call a BS volatility.
The LV model was proposed to explain volatility smile.In order to describe this, let us define implied volatility first.In the BS model, a price of a European call option with strike K and maturity T at t = 0 is given by the following formula: where Φ SN is the cumulative distribution function of the standard normal distribution.We can price the option if we determine the BS volatility.Conversely, given the market price of the option V call,mkt (T, K), we can reversely calculate the BS volatility.That is, we can define the following function of K and T : We call BS volatilities drawn back from the market option prices by ( 6) as implied volatilities.If the market is described well by the BS model, implied volatilities σ IV (T, K) take a same value for any K and T .Although this is the case for some markets, σ IV (T, K) varies depending on K and T in many markets.Especially, if σ IV (T, K) depends on K, it is said that we observe the volatility smile for the market.
Volatility smiles mean that possible scenarios of asset price evolution in the BS model do not match those which market participants consider.For example, if market participants think that extreme scenarios, big crashes or sharp rises, are more possible than the BS model predicts, the volatility smile arises.In fact, it is often said that the Black Monday, the big crash in the stock markets at 1987, was one of triggers of appearance of volatility smiles.
With the LV model, we can make European option prices given by the model consistent with any market prices, as long as there is no arbitrage in the market.This is intuitively apparent since we can expect that the degree of freedom of the local volatility σ(t, S ) as a two-dimensional function is available to reproduce the two-dimensional function V call,mkt (T, K).In fact, if we can get V call,mkt (T, K) as a function, that is, the market option prices for continuously infinite strikes and maturities, we can determine σ(T, K) which reproduces V call,mkt (T, K) as follows [12]: In reality, the market option prices are available only for several strikes and maturities.Therefore, in the practical business, we usually use a specific functional form of σ(t, S ) with degrees of freedom sufficient to reproduce several available market option prices.In this paper, we use the following form.First, we set the n t grid points in the time axis, t 0 = 0 < t 1 < t 2 < ... < t n t .Second, we set the n S grid points in the asset price axis for each time grid point, that is, s i,1 , ..., s i,n S for t i .Then, σ(t, S ) is set as follows: σ(t, S ) = a i, j S +b i, j ; for s i, j−1 ≤ S < s i, j , j = 1, ..., n S +1 (8) for t i−1 ≤ t < t i , where a i, j , b i, j are constants satisfying σ(t, S ) > 0 for any t and S and s i,0 = −∞, s i,n S +1 = +∞.In other words, the two-dimensional space of (t, S ) is divided in the direction of t and in each region σ(t, S ) is set to a function which is piecewise-linear with respect to S .In this paper, we assume that a i, j , b i, j are preset to the value for which the option prices in the LV model, which we here write as V call,LV (T, K, S 0 , {a i, j }, {b i, j }), match the market prices by some standard.For example, they can be set to where (T I , K I )'s are several sets of maturity and strike for which the market option prices are available.

C. Monte Carlo simulation in the LV model
We here describe how to calculate the derivative price (2) in the LV model by Monte Carlo simulation.
First, we have to discretize the time into sufficiently small meshes, since we can deal with the continuous time on neither classical nor quantum computers.For simplicity, we set the time grid points to the above t i 's, those for the LV function.Then, the time evolution given by ( 3) is approximated as follows: where w 1 , ..., w n t are independent standard normal random numbers (SNRNs).Among various ways to discretize the SDE, we here adopt the Euler-Maruyama method [23].
Even after time discretization, we cannot consider all of continuously infinite patterns of SNRNs.One solution for this is discretized approximation of SNRNs.We can choose the finite numbers of the grid points and assign probability to each point so that standard normal distribution is approximately reproduced.Now, the patterns of discretized SNRNs are finite, so we can approximate (2) as where p n is the probability that the n-th pattern of values of SNRNs are realized and S (n) t is the asset price at time t in the n-th pattern.
There are some possible ways to take petterns considered in (11).In the register-per-RN way, we take all patterns.If we take N grids to discretize each of n t SNRNs, the number of possible patterns of SNRNs is N n t .Although this is exponentially large, quantum computers can take into account all patterns with a polynomial number of qubits by quantum superposition.
On the other hand, this cannot be adopted on classical computers, since the number of the SNRN patterns are exponentially large.Usually, Monte Carlo pricing on classical computers is done in the following way, which the PRN-on-aregister way is also based on.We consider sampled patterns of SNRNs.That is, we generate finite but sufficiently many sample sets of (w 1 , ..., w n t ) and use them to generate sample paths of the asset price which evolves according to (10).We then approximate (2) by the average of sums of payoffs in sample paths, where S (n) t is the value of the asset price at time t on the n-th sample path and N path is the number of sample paths.

III. QUANTUM ALGORITHM FOR MONTE CARLO SIMULATION A. outline of the algorithm
We here review the quantum algorithm for Monte Carlo simulation [5,6].It can be divided into the following steps.First, we create a superposition of possible values of a random number used to calculate a sample value of the integrand on a register.If multiple random numbers are necessary to calculate the integrand, one register is assigned per random number.As mentioned above, continuous random numbers must be approximated in some discretized way.Second, we calculate the integrand into another register, using the random numbers.Note that the results for many patterns of random numbers are simultaneously calculated in quantum parallelism.Third, by controlled rotation, the integrand value is reflected into the amplitude of the ancilla.Finally, amplitude estimation [6,24,25] on the ancilla gives the expectation value of the integrand.
The quantum state is transformed as follows: Here, the first, second and third kets correspond to the random number registers, the integrand register and the ancilla, respectively.x i represents the binary representation of values of random numbers in the i-th pattern and p i is the probability that it realizes.f is the integrand and f (x i ) is its value for x i .Note that the probability to observe 1 on the ancilla is i p i f (x i ), the integral value which we want.Although we do not explain how to estimate the probability amplitude in this paper, it is studied in many papers.For example, see [6,24,25].Using such methods, we can estimate the integral with the statistical error which decays as O(N −1 ), where N is the number of oracle calls.This decay rate is quadratically faster than that in the classical algorithm, O(N −1/2 ).
B. the scheme using the PRN generator We here briefly review the quantum way for Monte Calro simulation using the PRN generator.The calculation flow for the current problem, the time evolution of asset price in the LV model, based on this way is described in Section IV B.
It is proposed in [17] in order to reduce the required qubits to generate RNs in the application of the quantum algorithm for Monte Carlo to extremely high-dimensional integrations.When it is neccesary to generate many RNs to compute the integrand, the naive way, in which we assign a register to each RN and create a superposition of possible values, leads to the increase of qubit numbers in proportion to the number of RNs.In order to aviod this, we can adopt the following way.First, we prepare two registers, R samp and R PRN .Then, we create a superposition of integers, which specify the start point of the PRN sequence, on R samp .With the start point, we sequentially generate PRNs on R PRN .This is possible because a PRN sequence is a deterministic sequence whose recursion equation is explicitely given, and in [17] we gave the implementation of one of PRN generators on quantum circuits.Using the PRNs, we compute the integrand step by step, which corresponds to time evolution of the asset price and calculation of payoffs in this paper.Finally, the expectation value of the integrand is estimated by quantum amplitude estimation.In this way, since we need only R samp and R PRN to generate PRNs, the required qubit number is now independent from the number of RNs and much smaller than the naive way.The drawback is the increase of the circuit depth.

IV. CIRCUIT DESIGN
Now, we present quantum circuits for time evolution of an asset price in the LV model in the two ways: PRN-on-aregister and register-per-RN.

A. elementary gate
Before we present circuits, we here list up elementary gates we use.We here simply assume their existence.Actually, implementation of such elementary arithmetics are widely studied in previous works: see, for example, .We will discuss the implementation in Section V A.
With these, we can construct other types of arithmetic we use.For example, subtraction |x |y → |x − y |y can be done as addition by the method of 2's complement.Comparison |x |y |z → |x |y |z ⊕ (x > y) can be done as subtraction in 2's complement method, since the most significant bit represents whether the result of subtraction is positive or negative.So a comparator is constructed as two adder, including uncomputation.
Note that the above multiplier uses two registers as operands and outputs the product into another register.Most of previously proposed multipliers are of this output-to-other type.On the other hand, we also need the self-update type of multiplier which updates either of input registers with the product, otherwise we have to add a register for each multiplication and qubit number explodes.Such a operation is realized by the following trick.When we want to multiply x by y, given the two registers holding x and y and an ancilla register, we can do: Here, the first step is output-to-other multiplication.The second step is swap between the first and third registers, which is not necessary if we change our recognition on which of two register is ancillary at every multiplication.The third step is the inverse operation of division.
B. the PRN-on-a-register way

calculation flow
We first present the calculation flow in the PRN-on-aregister way.We consider the flow until calculation of the sum of payoffs, which corresponds to from the first to the third line in (13), since the controlled rotation in the fourth line does not depend on the problem.
In the PRN-on-a-register way, PRNs are used for evolution of the asset price (10).More concretely, we preselect some sequence of pseudo standard normal random numbers (PSNRNs) and divide it into subsequences, then evolve the asset price using them.
Before we present the calculation flow, we explain some setups.We prepare the following register: • R samp This is a register where a superposition of integers which determine the start point of the PSNRN sequence.We write its qubit number as n samp .N samp = 2 n samp is the number of sample paths we generate.
• R W This is a register where we sequentially generate PSNRNs.
• R S This is a register where the value of the asset price is stored and which we update for each time step of the evolution, using R W .
• R payoff This is a register into which the payoffs determined by R S are added.
Note that we need some ancillary registers in addition to the above registers.We assume that the required calculation precision is n dig -bit accuracy and R W , R S , R payoff and ancillary registers necessary to update them have n dig qubits.
We assume that the following gates are available to generate a sequence of PSNRNs.

• P W
This progresses a PSNRN sequence by one step.In other words, it acts on R W and updates x i to x i+1 , where x i is the i-th element of the sequence: |x i → |x i+1 .
• J W This lets the PSNRN sequence jump to the starting point.That is, it is input an integer i on a register and outputs x in t +1 into another register which is initially set to |0 : |i |0 → |i |x in t +1 .Remember that n t , the number of time steps, is equal to the number of RNs used to generate one sample path.
The concrete implementation of these gates are discussed later.
Then, the calculation flow is as follows: 1. Initialize all registers to |0 except R S , which is initialized to |S t 0 .
3. Operate J W to set x in t +1 to R W , where i is determined by the state of R samp .These are the starting points of subsequences.
4. Perform the time evolution (10) using the value on R W . R S is updated from |S t 0 to |S t 1 .

5.
Calculate the payoff at time t 1 and add into R payoff .
6. Operate P PRN to update R W from x in t +1 to x in t +2 .
7. Iterate operations similar to 4-6 for each time steps until the time reaches t n t .
8. Finally we obtain a superposition of states in which the value on R payoff is the sum of payoffs in each sample path.Estimate the expectation value of R payoff by methods like [6,24].This is an estimate for (12).
Here and hereafter, we assume that a payoff arises at each time step, for simplicity.
The flow of state transformation is as follows: where the first, second, third and fourth kets correspond to R samp , R W , R S and R payoff , respectively.

overview of the circuit
Schematically, the circuit which realizes the flow ( 15) is as shown in Figure 1.In the figure, the gate U j corresponds to the j-th step of asset price evolution, that is, the j-th iteration of step 4-6 in the above calculation flow.
U j is implemented as shown in Figure 2. P W is already explained and the gate Payoff j calculates f pay j (S (i) t j ) using R S and adds it into R payoff .In addition to these, U j has gates V ( j) 1 , ..., V ( j) n S , which update R S .The detail of V ( j)  k is shown in Figure 3.This gate (i) checks whether the asset price is in the k-th interval [s j,k−1 , s j,k ), (ii) if so, update R S using the LV in the interval, (iii) clears the intermediate output.It requires ancillary registers R count , R S and R g .They have log 2 n t , n dig and 1 qubits respectively.At the start of V ( j)  k , R count takes | j or | j + 1 and the others take |0 .Then the detailed calculation flow is: using the value x in t + j on R W and add 1 to R count .

Calculate
S t j+1 − b j,k ∆t j x in t + j 1 + a j,k ∆t j x in t + j (17) into R S , using the value on R S as S t j+1 and that on R W as x in t + j .
4. If R count is j + 1 and R S is in [s j,k−1 , s j,k ), flip R g .This uncomputes R g .

Do the inverse operation of 3.
Let us explain the meaning of this flow.First, R count is necessary as an indicator of whether the j-th step of evolution has been already done or not.Without this, it is possible that the asset price is doubly updated in a time step.If and only if the j-th step has not been done, that is, R count is j and the asset price is in [s j,k−1 , s j,k ), the update of the asset price with the LV function a j,k S + b j,k is done.To do this conditional update, the check result is intermediately output to R g and the gate corresponding ( 16) is operated on R S under control by R g .Besides, the increment of R count controlled by R g is also done, so that R count indicates completion of the j-th step if so.Steps 3-5 is necessary to clear R g .If the asset price has been updated in Step 2, Step 3 draws back it to the value before the update.Conversely, we can determine whether the update has been done in Step 2 from the result of Step 3.That is, for the reason mentioned soon later, the condition that R count is j + 1 and R S is in [s j,k−1 , s j,k ) after Step 3 is equivalent to the condition that R count is j and R S is in [s j,k−1 , s j,k ) before Step 2. Therefore, Step 4 flip R g if and only if it is |1 , so it goes back to |0 .In summary, through the sequential operation of V ( j) 1 , ..., V ( j) n S +1 , R S is updated only once at the appropriate V ( j) k , R count is updated from | j to | j + 1 and all intermediate outputs on ancillary registers are cleared.
We here mention a restriction on the LV model so that it can be implemented in the PRN-on-a-register way.Note that through V ( j) 1 , ..., V ( j) n S +1 , the state is transformed from | j |S (i) , where the first and second kets correspond to R count and R S respectively and other registers are omitted since they are unchanged.This means that the map from S (i) t j to S (i) must be one-to-one correspondence, since unitarity is violated if not.Actually, this is not so strong restriction.As shown in Appendix, if we set a i, j , b i, j so that σ(t, S ) is continuous with respect to S and we set ∆t j is small enough that the increment ∆S t j is much smaller than S t j itself, the above condition is satisfied.This one-to-one correspondence lets Step 3 work.That is, since the map between S (i) t j and S (i) t j+1 is one-to-one correspondence, the result of Step 3 is in [s j,k−1 , s j,k ) if and only if the value on R S before Step 3 is in the image of [s j,k−1 , s j,k ) under the map.

implementation of respective gates
We now consider how to implement respective gates in the PRN-on-a-register way to the level of four arithmetic operations.
(i) V ( j) k Note that most parts of V ( j)  k consist of only arithmetic operations, addition, subtraction, multiplication, division and comparison, which mentioned in Sec IV A.
For example, the gate z ← z ⊕ (x = j and y ∈ I) can be divided to two parts.The first part is checking that the value on R count is equal to j and this can be done by the multiple control Toffoli gate, which is studied in [19,48,54].The second part is checking that the asset price is in a given interval, which can be constructed from two comparisons.Combining these, the gate z ← z ⊕ (x = j and y ∈ I) in Figure 3 is constructed as shown in Figure 4.Note that the bitwise flips X 1− j 0 ⊗ ... ⊗ X 1− j nx −1 are operated before the multi control Toffoli.Here, j a is the a-th digit of the binary representation of j, so the a-th qubit is flipped if and only if j a = 0.This convert |x to |1 ... |1 if and only if x = j.
The operation x ← x + (ax + b)y in Figure 3 where the third ket corresponds to an ancillary register.The first arrow is just setting a constant on a register.The second arrow is the multiplication by a constant a.The third arrow is the self-update multiplication, so it needs another ancillary register.The fourth arrow is again multiplication by a constant b and the final arrow is uncomputation of the first and second arrows.Note that this is done under control by R g .In order for this to be controlled, it is sufficient to control only the second, fourth and final arrows, since the third arrow becomes a multiplication by 1 without the second.Also note that multiplication by a n-bit constant a or b can be done by n adders, that is, n shift-and-add's: ax = n−1 i=0 a i 2 i x, where a i is the i-th bit of a.This saves qubits compared with the case where we use a multiplier, which needs holding a on an ancillary register.
The operation x ← (x − by)/(1 + ay) in Figure 3 where the first, second, third and fourth kets are R S , R W , another ancillary register and R S , respectively.Here, the first and second arrows are same as (18), the third is the multiplication by a constant −b and the final one is division.
Here, we do not have to uncompute R S and the ancillary register, since the whole of this operation is uncomputed soon later in V j,k .
(ii) J W , P W In [17], implementation of PRN on quantum circuits is presented, using the PRN generator called PCG [49].Note that this PRN generator generates uniform RNs.We now need PSNRNs, so we must transform uniform distribution to standard normal distribution.There are some method and we adopt the inverse transform sampling.Schematically, J W and P W are implemented as shown in Figure 5. Here, J PRN is the gate to let the PRN sequence jump to the in t + 1 and P PRN is the gate to progress the PRN sequence by a step.They sequentially generate uniform RNs on the ancillary register R PRN and they are transformed to PSNRN on R W .
Although we refer to [17] for the detail of implementation of the PRN generator, we here briefly explain.This generator is combination of linear congruential generator (LCG) and permutation of bit string.For LCG, update of the PRN sequence is done by where a, N are positive integers and c is a nonnegative integer, and the n-th element of the sequence is computed from n and the initial value x 0 by According to [26], we can construct the modular adder using 5 plain adders.Modular multiplication by a n-bit constant can be done as n modular shift-and-add's.Modular division by a constant a − 1 can be done as modular multiplication by a constant integer β such that β(a − 1) = 1 mod N, if exists.Modular exponentiation a x mod N is computed as a sequence controlled modular multiplication [26].So, to summarize, we can perform ( 20) and ( 21) using only controlled adders.In (20), the state is transformed as In the circuit we are considering, the first and second registers are R PRN and an ancillary register, which interchange their role at every step, and the third is another ancillary register.Each step corresponds to an elementary operation as follows.The first arrow is modular multiplication.The second arrow is the inverse modular multiplication by a integer α such that aα = 1 mod N and this clearing step is necessary to avoid increase of ancillas.The third is just loading c on an ancillary register, the fourth is modular addition and the last is unloading.(21) progresses as follows: where the first, third, second and fourth registers are respectively R samp , R PRN and two ancillary registers.The first arrow is modular exponentiation, the second is modular multiplication, the third is loading, the fourth is modular addition and the last is uncomputation of the first and third.We do not explain permutation: see [17] for the detail.We just make a comment that it is implemented by a simple circuit, for example, Xorshift is implemented as a sequence of CNOT.
We also need the gate to calculate Φ −1 SN , the inverse function of CDF of standard normal distribution.There are some ways to calculate this efficiently and we here adopt the method in [50].In the method, R is divided into some intervals and Φ −1 SN is approximated by a polynomial in each interval.We adopt the setting where the number of intervals is 109 6 and polynomials are cubic, which realizes the error smaller than 10 −6 .Here, we write the approximated inverse CDF as for n ICDF are the end points of the intervals and n ICDF is the number of the interval.Consider that x ICDF −1 = −∞, x ICDF n ICDF +1 = +∞.Such a piecewise cubic function can be implemented as Figure 6.We here explain how it works.First, the sequence of comparators and "Load c m,i s" gates load c m,0 , ..., c m,3 into the register R c 0 , ..., R c,3 respectively as follows.The comparators compare the value x of R PRN and the grid points x ICDF m and flip R g if x < x ICDF m .If R g is 1, the "Load c m,i s" gates are activated.They are actually collections of bitwise flips, that is, X gates.If x ≥ x ICDF n ICDF , only "Load c n ICDF +1,i s" gate is performed and it loads c n ICDF +1,0 , ..., c n ICDF +1,3 .If x ICDF n ICDF −1 ≤ x < x ICDF n ICDF , "Load c n ICDF ,i s" and "Load c n ICDF +1,i s" are performed.So, we set "Load c n ICDF ,i s" so that it compensates flips done by "Load c n ICDF +1,i s" and c n ICDF ,0 , ..., c n ICDF ,3 are successfully loaded.The case where x is in another interval is similar.The point is that if x ICDF M−1 ≤ x < x ICDF M , every other gates are activated.That is, the activated gates are "Load c m,i s" of m = M, M + 2, ..., n ICDF , n ICDF + 1 if n ICDF − M is even and m = M, M + 2, ..., n ICDF − 1, n ICDF + 1 if n ICDF − M is odd.This is because every comparator after the M-th one flips R g and R g takes 0 and 1 alternatingly.Considering this, the X gates in "Load c m,i s" are set as Figure 7, so that c m,0 , ..., c m,3 for appropriate m are loaded after the sequence of all activated gates.After load of coefficients, the cubic function is calculated in the Horner's method This is done by the sequence of adders and multipliers in the latter half of the circuit in Figure 6.

(iii) Payoff
In this paper, we do not consider gates to calculate payoffs in detail, since the resource the gates require is same in both 6 if we include (−∞, x ICDF 0 ) and [x ICDF n ICDF , ∞, ), it is 111 the PRN-on-a-register way and the register-per-RN way.We here make just a short comment.In many cases, a payoff can be expressed in the following form: where a i , b i , c i , f i are real constants, that is, a linear function of the asset price with the upper bound (cap) c i and the lower bound (floor) f i .For example, a payoff in an European call option (1) corresponds to Payoffs expressed as ( 26) can be calculated easily by combination of comparators, adders and multipliers.
C. the register-per-RN way

calculation flow
Also for the register-per-RN way, we start from presenting the calculation flow, which is somewhat simpler than the PRN-on-a-register way.Again, we consider the flow until calculation of the payoff sum.
Before we present the calculation flow, we explain the required registers.
This is a register for the i-th SNRN.We need such a register per random number, so the total number is n t .
• R S i , i = 0, 1, ..., n t This is a register where the value of the asset price at time t i is held.
• R payoff,i , i = 1, ..., n t This is a register where the value of the sum of payoffs by t i is held.
We again omit ancillary registers here and explain them later.Besides, we again assume that these and ancillary registers necessary to update them have n dig qubits.
We here concretely define a superposition of SNRN values as the following state.In advance, we set the equally spaced N SN + 1 points for discretization of the distribution x SN,0 < x SN,1 < ... < x SN,N SN , where x SN,0 and x SN,N SN are the upper and lower bounds of the distribution and set to, say, -4 and +4, respectively.We here assume N SN = 2 n dig for simplicity.Then, we define |SN as where p SN,i = x SN,i+1 x SN,i φ SN (x)dx and φ SN (x) is the probability density function of the standard normal distribution.We consider how to create such a state later.Since x SN,i can be easily calculated from the index i by a linear function, we identify i as x SN,i .
Then, the calculation flow is as follows: 1. Initialize all registers to |0 except R S 0 , which is initialized to |S t 0 .

Generate superpositions of SNRNs on
That is, set each of them to |SN .
3. Perform the time evolution (10) using the value on R W 1 as w 1 .The result is output to R S 1 as |S t 1 .
4. Calculate the payoff at time t 1 using R S 1 and output the sum of it and the previous payoffs to R payoff,i .
5. Iterate operations similar to 3-4 for each time step until the time reaches t n t .
6. Finally we obtain a superposition of states in which the value on R payoff,n t is the sum of payoffs for each pattern of values of SNRNs.Estimate the expectation value of R payoff,n t to get (11).
The flow of state transformation is as follows.Writing only R W 1 , ..., R W nt , R S 0 , R S 1 , ..., R S nt and R payoff,1 , ..., R payoff,n t , where is the value of the asset price at time t j evolved by w 1 = x SN,i 1 , ..., w j = x SN,i j .

overview of the circuit
The outline of the circuit in the register-per-RN is as shown in Figure 8. First, |SN is created on each R W j by the gate SN, which is considered in detail later.After that, the gate U j performs j-th step of asset price evolution and payoff calculation.For each evolution step, ancillary registers R flg, j and R LV, j , which have 1 and 2n dig qubits respectively, are necessary.U j is then implemented as Figure 9.In this gate, the sequence of comparators and "Load" gates set a j,k , b j,k in (8) into R LV, j by the trick similar to that in the circuit in Figure 6.Then, the operation x ← x + (ax + b)y updates the asset price according to (10).Since the register-per-RN way does not aim to uncompute ancillary qubits in order to reduce them, x ← x + (ax + b)y can be done as follows: where the first ket is R S j−1 , the second is R W j , the third and fourth are R LV, j , the fifth is an ancillary register and the last is R S j .So, this operation consists of copying a state and three multiplications.At the end of U j , the payoff is calculated using R S j .Here, the "Payoff j " gate performs the following operation ) , (30) where the first, second and third ket are R S j , R payoff, j−1 and R payoff, j .This is done as copying R payoff, j−1 to R payoff, j followed by calculation and addition of f ) to R payoff, j .

implementation of the SN gate
Let us now consider implementation of the SN gate, which creates a superposition of SNRN values.The outline was presented in [13].In addition to this, we here explain some details, which were not explicitely explained in [13].
We construct the state in the recursive way.We assume that we have already divided [x SN,0 , x SN,N SN ] by equally-spaced where p (m) SN,i = x (m) SN,i+1 x (m) SN,i φ SN (x)dx.We also assume that we have a gate to efficiently compute θ (m) Then, the following state transformation is possible: where we use the gate to compute θ (m) i at the first arrow and perform the controlled rotation at the second arrow.Repeating this until m = n dig − 1, we get |SN .
However, as far as the authors know, neither [13] nor other papers present the gate to compute θ (m)  i .Here, we propose a way to do this on the basis of a simple Taylor expansion.Consider We can show that by simple calculation.So, g(x, δ) is well-approximated by a linear function of x for small δ.We can use this to compute f (m) i , since Given x SN,N SN , x SN,0 and m large enough that ∆/2 m is sufficiently small, this can be approximately seen as a linear function of i, since x (m) SN,i = x SN,0 + ∆ 2 m i.Actually, we numerically confirmed that for m ≥ 7 the above approximation gives error smaller than 10 −5 .
We then reach the circuit in Figure 10 for calculation of f (m)  i .For m ≤ 6, as shown in Figure 10a, the sequence of comparators and "Load" gates set f (m) i to the output register R f (m) i , using the trick similar to the circuit in Figure 6 again.Note that comparators now check whether the input value i is equal to a given integer constant or not, so each of them is actually a combination of bitwise flips and a multi-controlled Toffoli gate, similar to that appeared in Figure 4. Also note that, if the input i is I, "Load f (m)  i " gates are activated for i ≥ I. Considering this point, each "Load" gate is set so that operations of it and the following gates are successfully compensated.For m ≥ 7, the f (m) i is just a linear transformation, which is implemented as bitwise flips followed by a constant multiplier.Of course, according to required accuracy, the value of m where the two types of f (m) i are switched should be adjusted and the degree of the Taylor approximation for g(x, δ) should be increased.
Using this f (m) i gate, the SN gate is constructed as Figure 11.First, we operate a Hadamard gate to the most significant bit in R W j to assign probability 1/2 to positive and negative halves of [x SN,0 , x SN,N SN ] respectively.Then, we operate the sequence of the gates U SN m , which corresponds to the m-th step of the above recursive calculation.U SN m is constructed as a combination of the f (m) i gate, the gates to calculate square root and arc cosine and the controlled rotation gate R(θ).
We here comment on implementation of arccos and square root.The implementation of the inverse trigonometric function based on the piecewise polynomial approximation is proposed in [51].Although [51] considers not arccos but arcsin, these can be easily related as arccos(x) = π 2 − arcsin(x).We adopt a setting with the polynomial degree 3 and 2 intervals, which leads to accuracy 10 −5 [51].The circuit for square root is given in [52].

V. ESTIMATION OF REQUIRED RESOURCES
Then, let us estimate the machine resources required for the implementation in the PRN-on-a-register way and the registerper-RN way.We consider the two metrics, qubit number and T-count, as many papers do.

A. elementary gates
We first summarize the resources of the elementary gates which are necessary to construct the LV circuit.We here consider fixed-point arithmetic.Among various implementations proposed previously, we adopt one for each of the elementary gates and summarize their resources when operands are n-bit in Table I.Since we aim to estimate the orders of the above metrics, we take only the leading term with respect to n for required resources.For example, we approximate an + b as an..
We make a comment on multiplication and the division.For these operation, we use modified versions of circuits proposed in [42] and [46].This is because of the following reason.In order to store the strict value of the product of two n-bit numbers, original circuits use 2n-bits.This causes a problem in the situation where we have to sequentially perform multiplication as in the LV circuit, since the required qubit number doubles at every multiplication.Therefore, we have to truncate lower bits of product and keep the digit number constant.In order to do this, we modify the multiplier in [42].We also construct the divider, which is dedicated to drawback of the modified multiplication.This is why the qubit number for divider in Table I is different from that in [42,46].We explain the details of the modified multiplier and divider in Appendix.

B. assumptions on registers
We assume the following points on qubit numbers of various registers, some of which have already been mentioned.
• Registers which store numerical numbers, R W , R S , R payoff , R LV, j etc., and ancillary registers concerning them have n dig qubits.This is determined according to the required accuracy.We hereafter set n dig = 16.
• R PRN in Figure 5 exceptionally has n PRN qubits, since this is set to so large a value that the PRN sequence has good statistical property, e.g.long period.[49] considers the PRN generators with 64 bits and we also use this value for n PRN hereafter.Ancillary registers necessary for calculation of PRN sequence have n PRN qubits too.Even if n PRN > n dig , we use only n dig qubits in R PRN for transformation to PSNRN.
• R samp has n samp qubits.
• Other registers, e.g.R count , have only several qubits and we neglect their contributions to the whole qubit number.
C. the PRN-on-a-register way Then, let us consider the required resources in the PRN-ona-register way.

qubit number
In Table II, we summarize qubits necessary in each step in the circuit.Registers which hold some values throughout the circuit are as follows: R samp , R S , R payoff ans R PRN .Except these, the following parts in the circuit can consume qubit number most heavily.
• J PRN and P PRN : 2n PRN qubits • Φ −1 SN : 7n dig qubits Therefore, the total qubit number required in the PRN-on-aregister way is roughly Let us comment on some technical points for obtaining Table II.We first make a supplementary explanation on the ancillary qubit number in V ( j)  k .There are two parts which require a Since we can construct the modular adder using 5 plain adders [26], we use 5 times the values of the adder for T-count.b The circuit given in [52] takes a n-bit input and returns the n/2-bit square root and the n/2-bit remainder.In order to keep n-bit accuracy, we add n 0's to the input and calculate the n-bit square root of the virtual 2n-bit input.We treat the n bits added to the input and the n bits where the remainder is output as ancillas.c Since the value shown in [51] is Toffoli count, we simply multiply it by 7 for converting it to T-count.
n dig ancilla 4n dig For x ← x + (ax + b)y and z ← z+x−by 1+ay ; see the comment in the body text.P PRN ancilla 2n PRN To hold intermediate outputs; see (22).
ancillas in V ( j)  k .First, x ← x + (ax + b)y needs the following ancillas: a n dig -bit register to which 1 + ay is output, a n dig -bit register to which the result is temporally output in the selfupdate multiplication and a 2n dig -bit register necessary for the inverse division to clear the input x.Second, z ← z+x−by 1+ay needs the following: a n dig -bit register to which 1 + ay is output and a 2n dig -bit register necessary for division.In total, 4n dig bits are sufficient 7 .
We also comment on the ancilla number in Φ −1 SN .As we can see from Figure 6, we need four registers to which coefficients are loaded and two registers for intermediate outputs.Therefore, 6n dig ancillas are necessary 8   7 Strictly speaking, comparisons between R S or R S and s j,k 's require loading s j,k 's into some register.This does not require another register, since at least one of ancillary registers used in x ← x + (ax + b)y and z ← z+x−by 1+ay is empty at loading. 8Although we also need a register to which x ICDF m 's are loaded at compar-

T-count
Since we are interested in only the leading contribution, we focus on multiplications, divisions and repeated additions.Besides, we do not consider the T-count of J W , which is used only once.For the parts in U j , which is used repeatedly, we specify T-counts as follows: k includes the following parts: As we can see in (18) As we can see in (19), this includes one division and 2n dig additions, which comes from two multiplications by constant.In total, the T-count is 63n 2  dig .• Uncomputation of z ← z+x−by 1+ay Similar to the above.
Therefore, the total T-count in one V ( j)  k is 245n 2 dig .Since V ( j)  k is used n S + 1 times, the total T-count in them is 245n 2  dig n S (only the leading term).

D. the register-per-RN way
Next, we consider the required resources in the register-per-RN way.

qubit number
In the register-per-RN way, registers shown in Table III are added per time step.Note that we do not uncompute ancillas.Summing up all registers, the qubit number necessary for one time step is roughly 3n 2 dig + 111n dig , and for the entire circuit it is Note that the dominant part comes from the iterative calculation in the SN gates, which prepare superpositions of the values of the SNRNs.

T-count
Again, we focus on operations with large T-count.For each part in the circuit, we estimate the T-count as follows:

E. comparison between two ways
We then compare resources necessary in two ways in Table IV.Naturally, qubit number is independent from n t in the PRN-on-a-register way but proportional to n t in the registerper-RN way and T-count is proportional to n t in the both ways.If we take a setting, which is typically necessary for practical use in derivative pricing 10 : the values in the PRN-on-a-register way by a factor 2 roughly.We here comment on the parts which consume T-count most heavily in each way.In the PRN-on-a-register way, there are two parts which contribute to T-count equally and dominantly.The first is the update of the asset price in V ( j) k .Note that additional operations for reduction of qubits, such as inverse division in self-update multiplication and drawing back the asset price to clear R g , increase T-count compared with the register-per-RN way.The second is modular multiplications in update of the PRN sequence.Since the PRN generator requires the large bit number, say n PRN = 64, in order to it keep good statistical properties such as long period, the T-count of operations for the PRN becomes large.On the other hand, in the registerper-RN way the dominant contribution to T-count comes from arccos's in preparing SNRNs.Because not only an arccos itself is T-count consuming but also it is used in each iteration in the SN gate, the total T-count piles up.

VI. SUMMARY
In this paper, we considered how to implement time evolution of the asset price in the LV model on quantum computers.Similar to other problems in finance, derivative pricing by Monte Carlo simulation requires a large number of random numbers, which is proportional to n t , the number of time steps for asset price evolution, and this may cause difficulty in implementation.We then considered two ways of implementation: the PRN-on-a-register way and the register-per-RN way.In the former we sequentially generate pseudo random numbers on a register and use them to evolve the asset price.In the latter, standard normal random numbers necessary to time evolution are created as superpositions on separate registers.For both ways, we present the concrete quantum circuits in detail.For not only random number generation but also other aspects, we try to save qubit numbers permitting some additional procedures in the PRN-on-a-register way and do opposite in the register-per-RN way.We then give estimations of qubit number and T-count required in each way.In the PRNon-a-register way, qubit number is kept constant against n t .On the other hand, in the register-per-RN way qubit number is proportional to n t .Each way has T-count consuming parts and the total T-counts for both ways are of same order of magnitude, expect the PRN-on-a-register way has the larger T-count by a factor about 2, in some specific setting.
Note that analyses of resources required for implementation of the LV model in this paper strongly depend on designs of elementary circuits for arithmetic.For example, in the register-per-RN way the dominant contribution to T-count comes from arccos's in preparing SNRNs.If more efficient circuits are proposed and we replace the current choice with them, required resources may change.
Finally, we would like to notice that this study is not enough for application of quantum algorithm for Monte Carlo simulation to pricing in the LV model.Although we assumed that the LV function is given, in practice we have to calibrate the LV so that the model prices of European options fit to the market prices.Besides, we have not considered how to evaluate terms in exotic derivatives, for example, early exercise.In future works, we will consider such things and aim to present how to apply quantum computers in the whole process of exotic derivative pricing.we use 5n.This is larger than the value in [46] by 2n, reflect-ing the addition of dummy qubits and the register to which the dividend is copied 11 . .k , which updates R S if the asset price is in the k-th grid of the LV function.Here and hereafter, the wire going over a gate means that the corresponding register is not used in the operation of the gate.A formula at the center of a gate represents the operation the gate performs and '-1' means the inverse operation.A formula beside a wire and in a gate means the input or the output of the gate.

Figure 1 :
Figure 1: The overview of the circuit for asset price evolution in the LV model in the PRN-on-a-register way.Here and hereafter, ancillary qubits are sometimes omitted for simple display.

Figure 2 :
Figure2: The overview of the U j , which performs the j-th step of asset price evolution, in the PRN-on-a-register way.

Figure 3 :
Figure 3: V ( j)k , which updates R S if the asset price is in the k-th grid of the LV function.Here and hereafter, the wire going over a gate means that the corresponding register is not used in the operation of the gate.A formula at the center of a gate represents the operation the gate performs and '-1' means the inverse operation.A formula beside a wire and in a gate means the input or the output of the gate.
J W , the gate to let the PSNRN sequence jump to the in t + 1 element.P W , the gate to progress the PSNRN sequence by a step.

RPRNyFigure 6 :
Figure 6: The gate to calculate the inverse CDF of standard normal distribution by piecewise polynomial approximation.

Load cm, 1 Load cm, 2 Load cm, 3 ( 2 ,
a) The detail of the "Load c m,i 's" gate.nLoad cm,i = . . .Load cm,i,0 Load cm,i,n−1 (b)The detail of the "Load c m,i " gate.Load c m,i,k = X ; if c m,i,k ⊕ c m+2,i,k ⊕ • • • ⊕ c m+2 n ICDF −m i,k ⊕ c n ICDF +1,i,k = 1 I ; otherwise (c)The detail of the "Load c m,i,k " gate.

Figure 7 :
Figure 7: Parts of the circuit in Figure 6.Here, c m,i,k means the k-th digit of c m,i .

Figure 8 :
Figure 8: The overview of the circuit for asset price evolution in the LV model in the register-per-RN way.

1 yFigure 9 :
Figure9: U j , which performs the j-th step of asset price evolution, in the register-per-RN way.

Figure
Figure Implementation of the SN gate.

Table I :
Resources for various elementary gates.We here assume that operands are n-bit.We omit subleading terms with respect to n.

Table II :
Qubits necessary in each step in the PRN-on-a-register circuit.We neglect registers with only several qubits.
, this includes one multiplication and one division, which come from one isons between them and R PRN , we can use R W or intermediate output registers, which are empty at comparisons.
This includes 2n S additions (n S comparisons) and three multiplications.So one U j gates has T-count of 63n2dig + 28n S n dig roughly.

Table III :
Table IV becomes as TableV.The total T-count is of same order of magnitude in the both way but larger for Qubits added in each time step in the register-per-RN circuit.We neglect registers with only several qubits.We only take the leading contributions.

Table IV :
Qubit number and T-count required in the PRN-on-a-register and register-per-RN ways.