 Research
 Open Access
 Published:
A coherent perceptron for alloptical learning
EPJ Quantum Technology volume 2, Article number: 10 (2015)
Abstract
We present nonlinear photonic circuit models for constructing programmable linear transformations and use these to realize a coherent perceptron, i.e., an alloptical linear classifier capable of learning the classification boundary iteratively from training data through a coherent feedback rule. Through extensive semiclassical stochastic simulations we demonstrate that the device nearly attains the theoretical error bound for a model classification problem.
Introduction
Recent progress in integrated nanophotonic engineering [1–10] has motivated followup proposals [11, 12] of nanophotonic circuits for alloptical information processing. While most of these focus on implementations of digital logic, we present here an approach to alloptical analog, neuromorphic computation and propose design schemes for a set of devices to be used as building blocks for large scale circuits.
Optical computation has been a longtime goal [13, 14], with research interest surging regularly after new engineering capabilities are attained [15, 16], but so far the parallel progress and momentum of CMOS based integrated electronics has outperformed alloptical devices.
In recent years we have seen rapid progress in the domain of machine learning, and artificial intelligence in general. Although most current ‘big data’applications are realized on digital computing architectures, there is now an increasing amount of computation done in specialized hardware such as GPUs. Specialized analog computational devices for solving specific subproblems more efficiently than possible with either GPUs or general purpose computers are being considered or already implemented by companies such as IBM, Google and HP and in academia, as well [17–20]. Specifically in the field of neuromorphic computation, there has been impressive progress on CMOS based analog computation platforms [21, 22].
Several neuromorphic approaches to use complex nonlinear optical systems for machine learning applications have recently been proposed [23–26] and some initial schemes have been implemented [9, 27]. So far, however, all of these ‘optical reservoir computers’ have still required digital computers to prepare the inputs and process the output of these devices with the optical systems only being employed as static nonlinear mappings for dimensional lifting to a high dimensional feature space [28], in which one then applies straightforward linear regression or classification for learning an inputoutput map [29].
In this work, we address how the final stage of such a system, i.e., the linear classifier could be realized alloptically. We provide a universal scheme, i.e., independent of which particular kind of optical nonlinearity is employed, for constructing tunable alloptical, phasesensitive amplifiers and then outline how these can be combined with selfoscillating systems to realize an optical amplifier with programmable gain, i.e., where the gain can be set once and is then fixed subsequently.
Using these as building blocks we construct an alloptical perceptron [30, 31], a system that can classify multidimensional input data and, using preclassified training data learn the correct classification boundary ‘online’, i.e., incrementally. The perceptron can be seen as a highly simplified model of a neuron. While the idea of alloptical neural networks has been proposed before [32] and an impressive scheme using electronic, measurementbased feedback for spiking optical signals has been realized [33], to our knowledge, we offer the first complete description for how the synaptic weights can be stored in an optical memory and programmed via feedback.
The physical models underlying the employed circuit components are high intrinsicQ optical resonators with strong optical nonlinearities. For theoretical simplicity we assume resonators with either a \(\chi_{2}\) or a \(\chi_{3}\) nonlinearity, but the design can be adapted to depend on only one of these two or alternative nonlinearities such as those based on free carrier effects or optomechanical interactions.
The strength of the optical nonlinearity and the achievable Qfactors of the optical resonators determine the overall power scale and rate at which a real physical device could operate. Both a stronger nonlinearity and higher Q allow operating at lower overall power.
We present numerical simulations of the system dynamics based on the semiclassical Wignerapproximation to the full coherent quantum dynamics presented in [10]. For photon numbers as low as (∼1020) this approximation allows us to accurately model the effect of optical quantum shot noise even in largescale circuits.
In the limit of both very high Q and very strong nonlinearity, we expect quantum effects to become significant as entanglement can arise between the field modes of physically separated resonators. In the Appendix, we provide full quantum models for all basic components of our circuit. The possibility of a quantum speedup is being addressed in ongoing work. Recently, DWave Systems has generated a lot of interest in their own superconducting qubit based quantum annealer. Although the exact benefits of quantum dynamics in their machines has not been conclusively established [34], recent results analyzing the role of tunneling in a quantum annealer [35] are intriguing and suggest that quantum effects can be harnessed in computational devices that are not unitary quantum computers.
The perceptron algorithm
The perceptron is a machine learning algorithm that maps an input \(x\in \mathbb {R}^{n}\) to a single binary class label \(\hat{y}_{w}[x]\in\{0, 1\}\). Binary classifiers generally operate by dividing the input space into two disjoint sets and identifying these with the class labels. The perceptron is a linear classifier, meaning that the surface separating the two class label sets is a linear space, a hyperplane, and its output is computed simply by applying a step function \(\theta (u):={1}_{u \ge0}\) to the inner product of a single data point x with a fixed weight vector w:
Geometrically, the weight vector w parametrizes the hyperplane \(\{ z\in \mathbb {R}^{n}: w^{T} z=0\}\) that forms the decision boundary.
In the above parametrization the decision boundary always contains the origin \(z=0\), but the more general case of an affine decision boundary \(\{\tilde{z}\in \mathbb {R}^{n}: \tilde{w}^{T} \tilde{z} = b\}\) can be obtained by extending the input vector by a constant \(z = (\tilde {z}^{T}, 1)^{T}\in \mathbb {R}^{n+1}\) and similarly defining an extended weight vector \(w=(\tilde{w}^{T}, b)^{T}\).
The perceptron converges in a finite number of steps for all linearly separable problems [30] by randomly iterating over a set of preclassified training data \(\{ (y^{(j)},x^{(j)}) \in\{0, 1\} \otimes \mathbb {R}^{n}, j=1, 2,\dots, M\}\) and imparting a small weight correction \(w\to w + \Delta w\) for each falsely classified training example \(x^{(j)}\)
The learning rate \(\tilde{\alpha}>0\) determines the magnitude of the correction applied for each training example. The expression in parentheses can only take on the values \(\{ 0, 1, 1\}\) with the zero corresponding to a correctly classified example and the nonzero values corresponding to the two different possible classification errors.
Usually there exist many separating hyperplanes for a given linear binary classification problem. The standard perceptron is only guaranteed to find one that works for the training set. It is possible to introduce a notion of optimality to this problem by considering the minimal distance (‘margin’) of the training data to the found separating hyperplane. Maximization of this margin naturally leads to the ‘support vector machine’ (SVM) algorithm [36]. Although the SVM outperforms the perceptron in many classification tasks it does not lend itself to a hardware implementation as readily because it cannot be trained incrementally. It is this that makes the perceptron algorithm especially suited for a hardware implementation: We can convert the discrete update rule (2) to a differential equation
and then construct a physical system that realizes these dynamics. In this continuoustime version the inputs are piecewise constant \(x(t) = x^{(j_{t})}\), \(y(t) = y^{(j_{t})}\) and take on the same discrete values as above indexed by \(j_{t} := \lceil\frac{t}{\Delta t} \rceil \in\{1,2,\dots, M = \frac{T}{\Delta t}\}\).
The circuit modeling framework
Circuits are fully described via Quantum Hardware Description Language (QHDL) [37] based on Gough and James’ SLHframework [38, 39]. To carry out numerical simulations for large scale networks, we derive a system of semiclassical Langevin equations based on the Wignertransformation as described in [10]. Note that there is a perfect onetoone correspondence between nonlinear cavity models expressed via SLH and the Wigner method as long as the nonlinearities involve only oscillator degrees of freedom. There is ongoing research in our group to establish similar results for more general nonlinearities [40].
Both the Wigner method and the more general SLH framework can be used to model networks of quantum systems where the interconnections are realized through bosonic quantum fields. The SLH framework describes a system interacting with n independent input fields in terms of a unitary scattering matrix S parametrizing direct field scattering, a coupling vector \(L=(L_{1}, L_{2}, \dots, L_{n})^{T}\) parametrizing how external fields couple into the system and how the system variables couple to the output and a Hamilton operator inducing the internal dynamics. We summarize these objects in a triplet \((S, L, H)\). L and H are sufficient to parametrize any Schrödinger picture simulation of the quantum dynamics, e.g., the master equation for a mixed system state ρ is given by
The scattering matrix S is important when composing components into a network. In particular, the inputoutput relation in the SLH framework is given by
where the \(dA_{\mathrm{in}/\mathrm{out},j}\), \(j=1,2,\dots, n\) are to be understood as quantum stochastic processes whose differentials can be manipulated via a quantum Ito calculus [38]. The Wigner method provides a simplified, approximate description which is valid when all nonlinear resonator modes are in strongly displaced states [10]. The simulations presented here were carried out exclusively at energy scales for which the Wigner method is valid, allowing us to scale to much larger system sizes than we could in a full SLHbased quantum simulation. This is because the computational complexity of the Wigner method scales at most quadratically (and in sparsely interconnected systems nearly linearly) with the number of components as opposed to the exponential state space scaling of a quantum mechanical Hilbert space. We nonetheless provide our models in both Wignermethod form and SLH form in anticipation that our component models will also be extremely useful in the full quantum regime.
In the Wignerbased formalism, a system is described in terms of timedependent complex coherent amplitudes \(\alpha (t)=(\alpha _{1}(t), \alpha _{2}(t),\dots, \alpha _{m}(t))^{T}\) for the internal cavity modes and external inputs \(\beta _{\mathrm {in}}(t) = (\beta _{\mathrm {in},{1}}(t), \beta _{\mathrm {in},{2}}(t), \dots, \beta _{\mathrm {in},{n}}(t))^{T}\). These amplitudes relate to quantum mechanical expectations as \(\langle \alpha _{j} \rangle\approx\langle a_{j}\rangle_{\mathrm{QM}}\), where \(\langle\cdot\rangle\) denotes the expectation with respect to the Wigner quasi distribution and \(\langle\cdot\rangle_{\mathrm{QM}}\) a quantum mechanical expectation value. See [10] for the corresponding relations of higher order moments.
To simplify the analysis, we exclusively work in a rotating frame with respect to all driving fields. As in the SLH case we define output modes \(\beta _{\mathrm {out}}(t)\) that are algebraically related to the inputs and the internal modes. The full dynamics of the internal and external modes are then governed by a multidimensional Langevin equation
as well as a purely algebraic, linear inputoutput relationship
The complex matrices A, B, C, D as well as the constant bias input vectors a and c parametrize the linear dynamics, whereas the function \(A_{\mathrm{NL}}(\alpha ,t)\) gives the nonlinear contribution to the dynamics of the internal cavity modes.
Each input consists of a coherent, deterministic part and a stochastic contribution \(\beta _{\mathrm {in},{j}}(t)=\bar {\beta }_{\mathrm {in},j}(t) + \eta_{j}(t)\). The stochastic terms \(\eta_{j}(t) = \eta_{j,1}(t) + i \eta_{j, 2}(t)\) are assumed to be independent complex Gaussian white noise processes with correlation function \(\langle\eta_{j,s}(t)\eta_{k,r}(t')\rangle = \frac{1}{4}\delta_{jk}\delta_{sr}\delta(tt')\).
The linearity of the inputoutput relationship in either framework (5) and (7) in the external degrees of freedom leads to algebraic rules for deriving reduced models for whole circuits of nonlinear optical resonators by concatenating component models and algebraically solving for their interconnections [10, 39]. To see the basic component models used in this work see the Appendix. Netlists for composite components and the whole circuit will be made available at [41].
The coherent perceptron circuit
The full perceptron’s circuit is visualized in Figure 1. The input data x to the perceptron circuit is encoded in the real quadrature of N coherent optical inputs. Equation (3) informs us what circuit elements are required for a hardware implementation by decomposing the necessary operations:

1
Each input \(x_{j}\) is multiplied by a weight \(w_{j}\).

2
The weighted inputs are coherently added.

3
The sum drives a thresholding element to generate the estimated class label \(\hat{y}\).

4
In the training phase (input \(T=1\)) the estimated class label \(\hat{y}\) is compared with the true class label (input Y) and based on the outcome, feedback is applied to modify the weights \(\{w_{j}\}\).
The most crucial element for this circuit is the system that multiplies an input \(x_{j}\) with a programmable weight \(w_{j}\). This not only requires having a linear amplifier with tunable gain, but also a way to encode and store the continuous weights \(w_{j}\). In the following we outline one way how such systems can be constructed from basic nonlinear optical cavity models: Section 2.1 presents an elegant way to construct a phase sensitive linear optical amplifier where the gain can be tuned by changing the amplitude of a bias input. In Section 2.2 we propose using an above threshold nondegenerate optical parametric amplifier to store a continuous variable in the output phase of the signal (or idler) mode. In Section 2.3 these systems are combined to realize an optical amplifier with programmable gain, i.e., a control input can program its gain, which then stays constant even after the control has been turned off. Finally, we present a simple model for alloptical switches based on a cavity with two modes that interact via a crossKerreffect in Section 2.4. This element is used both for the feedback logic as well as the thresholding function to generate the class label \(\hat{y}\).
Tunable gain Kerramplifier
A single mode Kerrnonlinear resonator driven by an appropriately detuned coherent drive ϵ can have a strongly nonlinear dependence of the intracavity energy on the drive power. When the drive of a single resonator is given by the sum of a constant large bias amplitude and a small signal \(\epsilon=\frac{1}{\sqrt{2}} (\epsilon_{0} +\delta\epsilon)\), the steady state reflected amplitude is \(\epsilon'=\frac{1}{\sqrt{2}} (\eta\epsilon_{0} + g_{}(\epsilon _{0}) \delta\epsilon+ g_{+}(\epsilon_{0}) \delta\epsilon^{\ast}) +O(\delta\epsilon^{2})\), where \(\eta\le1\) with equality for the ideal case of negligible intrinsic cavity losses. The small signal thus experiences phase sensitive gain dependent on the bias amplitude and phase. We provide analytic expressions for the gain in Appendix A.2.1.
Placing two identical resonators in the arms of an interferometer allows for isolating the signal and bias outputs even if their amplitudes vary by canceling the scattered bias in one output and the scattered signal in the other (cf. Figure 2). This highly symmetric construction, which generalizes to any other optical nonlinearity, ensures that the signal output is linear in δϵ up to third order.^{Footnote 1} If the system parameters are wellchosen, the amplifier gain depends very strongly on small variations of the bias amplitude. This allows to tune the gain from close to unity to its maximum value, which, for a given waveguide coupling κ and Kerr coefficient χ depends on the drive detuning from cavity. For Kerrnonlinear resonators there exists a critical detuning beyond which the system becomes bistable and exhibits hysteresis. This can be used for thresholding type behavior though as shown in [42] in this case it may be advantageous to reduce the symmetry of the circuit. It is convenient to engineer the relative propagation phases such that at maximum gain, a real quadrature input signal \(x\in \mathbb {R}\) leads to an amplified output signal \(x' = g_{rr}^{\mathrm{max}}x\) with no imaginary quadrature component (other than noise and higher order contributions). However, for different bias input amplitudes and consequently lower gain values the output will generally feature a linear imaginary quadrature component \(x' = [g_{rr}(\epsilon_{0}) + i g_{ir}(\epsilon_{0}) ]x\) as well. Figure 2(b) demonstrates this for a particular choice of maximal gain. We note that there exist previous proposals of using nonlinear resonator pairs inside interferometers to achieve desirable inputoutput behavior [42], but to our knowledge, no one has proposed using these for signal/bias isolation and tunable gain. To first order the linearized Kerr model is actually identical to a subthreshold degenerate OPO model. This implies that it can be used to generate squeezed light and also that one could replace the Kerrmodel by an OPO model.
An almost identical circuit, but featuring resonators with additional internal loss equal to the waveguide coupling^{Footnote 2} and constantly biased to dynamic resonance \(\langle\alpha^{2} \rangle_{\mathrm{ss}} = \Delta/\chi\) can be used to realize a quadrature filter, i.e., an element that has unity gain for the real quadrature and zero for the imaginary one. Now the quadrature filtered signal still has an imaginary component, but to linear order this only consists of transmitted noise from the additional internal loss. While it would be possible to add one of these downstream of every tunable Kerr amplifier, in our specific application it is more efficient to add just a single one downstream of where the individual amplifier outputs are summed (cf. Section 2.5). This also reduces the total amount of additional noise introduced to the system.
Encoding and storing the gain
In the preceding section we have seen how to realize a tunable gain amplifier, but for programming and storing this gain (or equivalently its bias amplitude) an additional component is needed. Although it is straightforward to design a multistable system capable of outputting a discrete set of different output powers to be used as the amplifier bias, such schemes would likely require multiple nonlinear resonators and it would be more cumbersome to drive transitions between the output states.
An alternative to such schemes is given by systems that have a continuous set of stable states. Recent analysis of continuous time recurrent neural network models trained for complex temporal information processing tasks has revealed multidimensional stable attractors in the internal network dynamics that are used to store information over time [43].
A simple semiclassical nonlinear resonator model to exhibit this is given by a nondegenerate optical parametric oscillator (NOPO) pumped above threshold; for low pump input powers this system allows for parametric amplification of a weak coherent signal (or idler) input. In this case vacuum inputs for the signal and idler lead to outputs with zero expected photon number. Above a critical threshold pump power, however, the system downconverts pump photons into pairs of signal and idler photons.
Due to an internal \(U(1)\) symmetry of the underlying Hamiltonian (cf. Appendix A.2.3), the signal and idler modes spontaneously select phases that are dependent on each other but independent of the pump phase. This implies that there exists a whole manifold of fixpoints related to each other via the symmetry transformation \((\alpha_{s}, \alpha_{i})\to(\alpha_{s} e^{i\phi}, \alpha _{i} e^{i\phi})\), where \(\alpha_{s}\) and \(\alpha_{i}\) are the rotating frame signal and idler mode amplitudes, respectively. Consequently the signal output of an above threshold NOPO lives on a circular manifold (cf. Figure 3).
Vacuum shot noise on the inputs leads to phase diffusion with a rate of \(\gamma_{\Phi}= \frac{\kappa}{8n_{0}}\), where κ is the signal and idler line width and \(n_{0}\) is the steady state intra cavity photon number in either mode. We point out that this diffusion rate does not directly depend on the strength of the nonlinearity which only determines how strongly the system must be pumped to achieve a given intra cavity photon number \(n_{0}\).
A weak external signal input breaks the symmetry and biases the signal output phase towards the external signal’s phase. This allows for changing the programmed phase value.
Finally, we note that parametric oscillators can also be realized in materials with vanishing \(\chi_{2}\) nonlinearity. They have been successfully realized via fourwave mixing (i.e., exploiting a \(\chi _{3}\) nonlinearity) in [1, 2, 44] and even in optomechanical systems [8] in which case the idler mode is given by a mechanical degree of freedom.
In principle any nonlinear optical system that has a stable limit cycle could be used to store and encode a continuous value in its oscillation phase. Nondegenerate parametric oscillators stand out because of their theoretical simplicity allowing for a ‘static’ analysis inside a rotating frame.
Programmable gain amplifier
Combining the circuits described in the preceding sections allows us to construct a fully programmable phase sensitive amplifier. In Figure 2(b) we see that there exists a particular bias amplitude at which the real to real quadrature gain vanishes \(g_{rr}(\epsilon_{0}^{\mathrm{min}}) = 0\). We combine the NOPO signal output \(\xi=r e^{i\Phi}\) with a constant phase bias input \(\xi_{0}\) (cf. Figure 3(a)) on a beamsplitter such that the outputs vary between zero gain and the maximal gain bias values \(\vert \frac{\xi_{0} \pm r e^{i\Phi}}{\sqrt{2}}\vert \in[\epsilon _{0}^{\mathrm{min}}, \epsilon_{0}^{\mathrm{max}}]\). To realize both positive and negative gain, we use the second output of that beamsplitter to bias another tunable amplifier. The two amplifiers are always biased oppositely meaning that one will have maximal gain when the other’s gain vanishes and vice versa. The overall input signal is split and sent through both amplifiers and then recombined with a relative π phase shift. This complementary setup leads to an overall effective gain tunable within \(G_{rr}(\Phi) \in[\frac{g_{rr}^{\mathrm{max}}}{2}, \frac{g_{rr}^{\mathrm{max}}}{2}]\) (cf. Figure 3(b)).
In Figure 4 we present both the complementary pair of amplifiers and the NOPO used for storing the bias as well as some logic elements (described in Section 2.4) used for implementing conditional training feedback. We call the full circuit a synapse because it features programmable gain and implements the perceptron’s conditional weight update rule.
The resulting synapse model is quite complex and certainly not optimized for a minimal component number but rather the ease of theoretical analysis. A more resource efficient programmable amplifier could easily be implemented using just two or three nonlinear resonators. E.g., inspecting the real to imaginary quadrature gain \(g_{ir}(\epsilon_{0})\) in Figure 2(b) we see that close to \(\epsilon _{0}^{\mathrm{max}}\) it passes through zero fairly linearly and with an almost symmetric range. This indicates that we could use a single tunable amplifier to realize both positive and negative gain. Using only a single resonator for the tunable amplifier could work as well, but it would require careful interferometric bias cancellation and more tedious upfront analysis. We do not think it is feasible to use just a single resonator for both the parametric oscillator and the amplifier because any amplified input signal would have an undesirable backaction on the oscillator phase.
Optical switches
The feedback to the perceptron weights (cf. Equation (3)) is conditional on the binary values of the given and estimated class labels y and \(\hat{y}\), respectively. The logic necessary for implementing this can be realized by means of alloptical switches. There have been various proposals and demonstrations [7, 45] of alloptical gates/switches and quantum optical switches [46].
The model that we assume here (cf. Figure 5) is to use two different modes of a resonator that interact via a crossKerreffect, i.e., power in the control mode leads to a refractive index shift (or detuning) for the signal mode. The index shift translates to a control mode dependent phase shift of a scattered signal field yielding a controlled optical phase modulator. Wrapping this phase modulator in a MachZehnder interferometer then realizes a controlled switch: If the control mode input is in one of two different states \(\xi \in{0, \xi_{0}}\), the signal inputs are either passed through or switched. This operation is often referred to as a controlled swap or Fredkin gate [47] which was originally proposed for realizing reversible computation. This dispersive model has the advantage that the control input signal can be reused.
Note that at control input amplitudes significantly different from the two control levels the outputs are coherent mixtures of the inputs, i.e., the switch then realizes a tunable beamsplitter.
Finally, we point out that using two different (frequency nondegenerate) resonator modes has the advantage that the interaction between control and signal inputs is phase insensitive which greatly simplifies the design and analysis of cascaded networks of such switches.
Generation of the estimated label
The estimated classifier label \(\hat{y}\) should be a step function applied to the inner product of the weight vector and the input. In the preceding sections we have shown how individual inputs \(x_{j}\) can be amplified with programmable gain to give \(\tilde{s}_{j} = \tilde {G}(\Phi_{j})x_{j}\), thus realizing the individual contributions to the inner product. These are then summed on an nport beamsplitter that has an output which gives the uniformly weighted sum \(\tilde{s} := \frac{1}{\sqrt {N}}\sum_{k=1}^{N} \tilde{G}(\Phi_{k})x_{k}\).
The gain factors \(\tilde{G}(\Phi_{k}) = G_{rr}(\Phi_{k}) + i G_{ir}(\Phi _{k})\) generally have an unwanted imaginary part which we subtract by passing the summed output through a quadrature filter circuit (cf. the last paragraph of Section 2.1), which has unit gain for the real quadrature and zero gain for the imaginary quadrature leading to an overall output \(s = \operatorname{Re} \tilde{s} = \frac{1}{\sqrt{N}}\sum_{k=1}^{N} G_{rr}(\Phi_{k})x_{k}\). The thresholding circuit should now produce a high output if \(s>0\) and a zero output if \(s \le0\).
It turns out that the optical Fredkin gate described in the previous section already works almost as a two mode thresholder, where the control input leads to a steplike response in the signal outputs: A constant signal input amplitude which encodes the logical ‘1’ state is applied to one of the signal inputs. When the control input amplitude is varied from zero to \(\xi_{0}\), the signal output turns on fairly abruptly at some threshold \(\xi_{\mathrm{th}} < \xi_{0}\). To make the thresholding phase sensitive, the control input is given by the sum of s and a constant offset \(s_{0}\) that provides a phase reference: \(c = \frac{1}{\sqrt{2}}(s + s_{0})\).
For a Fredkin gate operated with continuous control inputs the signal output is almost zero for a considerable range of small control inputs. However, for very high control inputs, i.e., significantly above \(\xi _{0}\), the signal output decreases instead of staying constant as would be desirable for a stepfunction like profile. We found that this issue can be addressed by transmitting the control input through a single mode Kerrnonlinear cavity, with resonance frequency chosen such that the transmission gain \(c'/c\) is peaked close to \(c'=\xi_{0}\). For input amplitudes larger than c, the transmission gain is lower (although \(c'\) still grows monotonically with \(c\)) which extends the input range over which the subsequent Fredkin gate stays in the onstate.
Results
The perceptron’s SDEs where simulated using a newly developed custom software package named QHDLJ [48] implemented in Julia [49] which allows for dynamic compilation of circuit models to LLVM [50] bytecode that runs at speed comparable to C/C++. All individual simulations can be carried out on a laptop, but the results in Figure 8 were obtained by averaging over the results of 100 stochastic simulation run on an HP ProLiant server with 80 cores. The current version of QHDLJ uses one process per trajectory, but the code could easily be vectorized.
In Figure 6 we present an example of a single application of an \(N=8\) perceptron including both a learning stage with prelabeled training data and a classification testing stage in which the perceptron’s estimated class labels are compared with their correct values. The data to be classified here are sampled from a different 8dimensional Gaussian distribution for each class label with their mean vectors separated by a distance \(\ \mu_{1}  \mu_{0} \_{2} / \sigma= 2\) relative to the standard deviation of both individual clusters. For each sample the input was held constant for a duration \(\Delta t = 2 \kappa^{1}\) where κ is the NOPO signal and idler line width. The perceptron was first trained with \(M_{\mathrm{train}}=100\) training examples and subsequently tested on \(M_{\mathrm{test}}=100\) test examples with the learning feedback turned off.
In Figure 7 we visualize linear projections of the testing data as well as the estimated classification boundaries. We can see that the classifier performs very well far away from the decision boundary. Close to the decision boundary there are some misclassified examples. We proceed to compare the performance of the classifier to the theoretically optimal performance achievable by any classifier and with the optimal classifier for this scenario, Gaussian Discriminant Analysis (GDA) [51, 52], implemented in software. Using the identical perceptron model as above and an identical training/testing procedure, we estimate the error rate \(p_{\mathrm{err}} = \mathbb{P}[y\ne\hat{y}]\) of the trained perceptron as a function of the cluster separation \(\ \mu_{1}  \mu_{0} \_{2} / \sigma \). The results are presented in Figure 8(a). Identically distributed training and testing data was used to evaluate the performance of the GDA algorithm and both results are compared to the theoretically optimal error rate for this discrimination task, which can be computed analytically to be \(p_{\mathrm{err,\ optim.}} = \frac {1}{2} \operatorname{erfc} (\frac{\ \mu_{1}  \mu_{0} \_{2}}{\sqrt{8}\sigma } )\), where \(\operatorname{erfc}(x) = \frac{2}{\sqrt{\pi}} \int_{x}^{\infty}e^{u^{2}} {du}\) is the complementary error function. We see that the alloptical perceptron’s performance is comparable to GDA’s performance for this problem and both algorithms attain performance close to the theoretical optimum.
The learning rate of the perceptron is determined by two things, the overall strength of the learning feedback as well as the time for which each example is presented to the circuit. In Figure 8(b) we plot the estimated error rate for varying feedback strength and duration. As can be expected intuitively, we find that there are tradeoffs between speed (smaller Δt preferable) and energy consumption (smaller α preferable).
Time scales and power budget
Here we roughly estimate the power consumption of the whole device and discuss how to scale it up to a higher input dimension.
Any realworld implementation will depend strongly on the engineering paradigm, i.e., the choice of material/nonlinearity as well as the engineering precision, but based on recently achieved progress in nonlinear optics we will estimate an order of magnitude range for the input power.
The signal and feedback input power to the circuit will scale linearly in the number of synapses N.
The bias inputs for the amplifiers has to be larger than the signal to ensure linearly operation, but it should be expected that some of the scattered bias amplitudes can be reused to power multiple synapses.
In our models we have defined all rates relative to the line width of the signal and idler mode of the NOPO, because this is the component that should necessarily have the smallest decay rate to ensure a long lifetime for the memory.
All other resonators are employed as nonlinear inputoutput transformation devices and therefore a high bandwidth (corresponding to much lower loaded quality factor) is necessary for achieving a high bit rate. For our simulations we typically assumed quality factors that were lower than the NOPO’s by 12 orders of magnitude. Based on selfoscillation threshold powers reported in [1, 2, 4, 53] and the switching powers of [7] we estimate the necessary power per synapse to be in the range of ∼10100 μWatt. By reusing the scattered pump and bias fields it should be possible to reduce the power consumption per amplifier even further. Even for the continuous wave signal paradigm we have assumed (as opposed to pulsed/spiking signals such as considered in [25]) the devices proposed here could be competitive with the current state of the art CMOSbased neuromorphic electrical circuits [22].
In the simulations for the 8dimensional perceptron our input rate for training data was set to \(\Delta t^{1} = \frac{\kappa}{2}\). This value corresponds to roughly ten times the average feedback delay time between arrival of an input pattern and the conditional switching of the feedback logic upon arrival of the generated estimated state label \(\hat{y}\). This time can be estimated as \(\tau_{fb}(n) \approx G_{\mathrm{max}}\kappa_{A}^{1} + \kappa_{QF}^{1} + \kappa_{\mathrm{thresh}}^{1} + n \kappa_{F}^{1}\), where n is the index of the synaptic weight, \(G_{\mathrm{max}}\) is the amplifier gain range and \(\kappa_{A}\), \(\kappa_{QF}\), \(\kappa _{\mathrm{thresh}}\) and \(\kappa_{F}\) are the line widths of the amplifier, quadrature filter, the combined thresholding circuit (cf. Figure 5) and the feedback Fredkin gates. There is a contribution scaling with n because the feedback traverses the individual weights sequentially to save power.
When scaling up the perceptron to a higher dimension while retaining approximately the same input signal powers, it is intuitively clear that the combined ‘inner product’ signal amplitude s scales as \(s\propto\sqrt{N}s_{1}\), where \(s_{1}\) is the signal amplitude for a single input. This allows to similarly scale up the amplitude \(\zeta _{0}\) of the signal encoding the generated estimated state label \(\hat {y}\) and consequently the bandwidth of the feedback Fredkin gates that it drives. A detailed analysis reveals that the Fredkin gate threshold scales as \(\sqrt{N}\), in particular we find that \(\sqrt{\chi }\zeta_{0} \propto\kappa_{F} \propto\sqrt{\chi}\xi_{0} \propto \kappa_{\mathrm{thresh}}\propto\sqrt{\chi} s\propto\sqrt{N\chi}s_{1}\). The first two scaling relationships are due to the constraints on the Fredkin gate construction (cf. Appendix A.2.2), the next two scaling relationships follow from demanding that the additional thresholding resonator be approximately dynamically resonant at the highest input level (cf. Appendices A.2.1 and A.2.2). The last proportionality is simply due to the amplitude summation at the Nport beamsplitter.
This reveals that when increasing N the perceptron as constructed here would have to be driven at a lower input bit rate scaling as \(\Delta t^{1} \propto N^{\frac{1}{2}}\) or alternatively be driven with higher signal input powers. A possible solution that could greatly reduce the difference in arrival time \(\sim\kappa_{F}^{1}\) at each synapse could be to increase the waveguidecoupling to the control signal and thus decrease the delay per synapse. The resulting increase in the required control amplitude \(\zeta_{0}\) can be counteracted with feedback, i.e., by effectively creating a large cavity around the control loop. When even this strategy fails one could add fanout stages for \(\hat {y}\) which introduce a delay that grows only logarithmically with N.
Finally, we note that the bias power of all the Kerreffect based models considered here scales inversely with the respective nonlinear coefficient \(\{\zeta_{0}^{2}, s^{2}\} \times\chi \sim\mathrm{const}\) when keeping the bandwidth fixed. This implies that improvements in the nonlinear coefficient translate to lower power requirements or alternatively a faster speed of operation.
Conclusion and outlook
In conclusion we have shown how to design an alloptical device that is capable of supervised learning from input data, by describing how tunable gain amplifiers with signal/bias isolation can be constructed from nonlinear resonators and subsequently combined with selfoscillating resonators to encode the programmed amplifier gain in their oscillation phase. By considering a few additional nonlinear devices for thresholding and alloptical switching we then show how to construct a perceptron, including the perceptron feedback rule. To our knowledge this is the first endtoend description of an alloptical circuit capable of learning from data. We have furthermore demonstrated that despite optical shotnoise it nearly attains the performance of the optimal software algorithm for the classification task that we considered. Finally, we have discussed the relevant timescales and pointed out how to scale the circuit up to large input dimensions while retaining the signal processing bandwidth and a low power consumption per input.
Possible applications of an alloptical perceptron are as the trainable output filter of an optical reservoir computer or as a building block in a multilayer alloptical neural network.
The programmable amplifier could be used as a building block to construct other learning models that rely on continuously tunable gain such as Boltzmann machines and hardware implementations of message passing algorithms.
An interesting next step would be to design a perceptron that can handle inputs at different carrier frequencies. In this case wavelength division multiplexing (WDM) might allow to significantly reduce the physical footprint of the device.
A simple modification of the perceptron circuit could autonomously learn to invert linear transformations that were applied to its input signals. This could be used for implementing a circuit capable of solving linear regression problems. In combination with a multimode optical fibers such a device could also have applications for alloptical sensing.
Finally, an extremely interesting question is whether harnessing quantum dynamics could lead to a performance increase. We hope to address these ideas in future work.
Notes
 1.
One can easily convince oneself that all even order contributions are scattered into the bias output.
 2.
In the photonics community this is referred to as critically coupled, whereas the amplifier circuit would ideally be strongly overcoupled such that additional internal losses are negligible.
 3.
In this appendix we denote expectations with respect to the Wigner function as \(\langle\cdot\rangle_{\mathrm{W}}\) and quantum mechanical expectations as \(\langle\cdot\rangle\).
 4.
It is possible to drop this resonance assumption for the pump.
References
 1.
Kippenberg T, Spillane S, Vahala K. Kerrnonlinearity optical parametric oscillation in an ultrahighQ toroid microcavity. Phys Rev Lett. 2004;93(8):83904.
 2.
Del’Haye P, Schliesser A, Arcizet O, Wilken T, Holzwarth R, Kippenberg TJ. Optical frequency comb generation from a monolithic microresonator. Nature. 2007;450(7173):12147.
 3.
Levy M. Nanomagnetic route to biasmagnetfree, onchip Faraday rotators. J Opt Soc Am B. 2005;22(1):25460.
 4.
Razzari L, Duchesne D, Ferrera M, Morandotti R, Chu S, Little BE, Moss DJ. CMOScompatible integrated optical hyperparametric oscillator. Nat Photonics. 2009;4(1):415.
 5.
Englund D, Faraon A, Fushman I, Stoltz N, Petroff P, Vucković J. Controlling cavity reflectivity with a single quantum dot. Nature. 2007;450(7171):85761.
 6.
Fushman I, Englund D, Faraon A, Stoltz N, Petroff P, Vuckovic J. Controlled phase shifts with a single quantum dot. Science. 2008;320(5877):76972.
 7.
Nozaki K, Tanabe T, Shinya A, Matsuo S. Subfemtojoule alloptical switching using a photoniccrystal nanocavity. Nat Photonics. 2010;4:47783.
 8.
Cohen JD, Meenehan SM, MacCabe GS, Gröblacher S, SafaviNaeini AH, Marsili F, Shaw MD, Painter O. Phonon counting and intensity interferometry of a nanomechanical resonator. p. 110. arXiv preprint, arXiv:1410.1047 (2014).
 9.
Vandoorne K, Mechet P, Van Vaerenbergh T, Fiers M, Morthier G, Verstraeten D, Schrauwen B, Dambre J, Bienstman P. Experimental demonstration of reservoir computing on a silicon photonics chip. Nat Commun. 2014;5:16.
 10.
Santori C, Pelc JS, Beausoleil RG, Tezak N, Hamerly R, Mabuchi H. Quantum noise in largescale coherent nonlinear photonic circuits. Phys Rev Applied. 2014;1:054005.
 11.
Mabuchi H. Nonlinear interferometry approach to photonic sequential logic. Appl Phys Lett. 2011;99(15):153103.
 12.
Pavlichin DS, Mabuchi H. Photonic circuits for iterative decoding of a class of lowdensity paritycheck codes. New J Phys. 2013;16:105017. doi:10.1088/13672630/16/10/105017.
 13.
Abraham E, Smith SD. Optical bistability and related devices. Rep Prog Phys. 1982;45:815.
 14.
Smith SD. Optical bistability: towards the optical computer. Nature. 1984;307(5949):3156.
 15.
Miller DAB. Physical reasons for optical interconnection. Int J Optoelectron. 1997;11(3):15568.
 16.
Miller DAB. Are optical transistors the logical next step? Nat Photonics. 2010;4(1):35.
 17.
Ananthanarayanan R, Esser SK, Simon HD, Modha DS. The cat is out of the bag. In: Proceedings of the conference on high performance computing networking, storage and analysis  SC ’09 (c); 2009.
 18.
Neven H. Hardware initiative at quantum artificial intelligence lab. 2014. http://googleresearch.blogspot.com/2014/09/ucsbpartnerswithgoogleonhardware.html.
 19.
Strukov DB, Snider GS, Stewart DR, Williams RS. The missing memristor found. Nature. 2008;453(7191):803.
 20.
Wang Z, Marandi A, Wen K, Byer RL, Yamamoto Y. Coherent Ising machine based on degenerate optical parametric oscillators. Phys Rev A. 2013;88(6):063853.
 21.
Choudhary S, Sloan S, Fok S, Neckar A, Trautmann E, Gao P, Stewart T, Eliasmith C, Boahen K. Silicon neurons that compute. In: Artificial neural networks and machine learning ICANN 2012. Lecture notes in computer science vol. 7552. Berlin: Springer; 2012. p. 1218.
 22.
Cassidy AS, AlvarezIcaza R, Akopyan F, Sawada J, Arthur JV, Merolla PA, Datta P, Tallada MG, Taba B, Andreopoulos A, Amir A, Esser SK, Kusnitz J, Appuswamy R, Haymes C, Brezzo B, Moussalli R, Bellofatto R, Baks C, Mastro M, Schleupen K, Cox CE, Inoue K, Millman S, Imam N, Mcquinn E, Nakamura YY, Vo I, Guok C, Nguyen D, Lekuch S, Asaad S, Friedman D, Jackson BL, Flickner MD, Risk WP, Manohar R, Modha DS. Realtime scalable cortical computing at 46 gigasynaptic OPS/Watt with \({\sim}100 \times\) speedup in timetosolution and \({\sim}100\mbox{,}000\times\) reduction in energytosolution. In: Proceedings of the international conference for high performance computing, networking, storage and analysis; 2014. p. 2738.
 23.
Duport F, Schneider B, Smerieri A, Haelterman M, Massar S. Alloptical reservoir computing. Opt Express. 2012;20(20):22783.
 24.
Vandoorne K, Dambre J, Verstraeten D, Schrauwen B, Bienstman P. Parallel reservoir computing using optical amplifiers. IEEE Trans Neural Netw. 2011;22(9):146981.
 25.
Van Vaerenbergh T, Fiers M, Mechet P, Spuesens T, Kumar R, Morthier G, Schrauwen B, Dambre J, Bienstman P. Cascadable excitability in microrings. Opt Express. 2012;20(18):20292.
 26.
Dejonckheere A, Duport F, Smerieri A, Fang L, Oudar JL, Haelterman M, Massar S. Alloptical reservoir computer based on saturation of absorption. Opt Express. 2014;22(9):10868.
 27.
Larger L, Soriano MC, Brunner D, Appeltant L, Gutierrez JM, Pesquera L, Mirasso CR, Fischer I. Photonic information processing beyond turing: an optoelectronic implementation of reservoir computing. Opt Express. 2012;20(3):32419.
 28.
Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput. 1965;EC14(3):32634.
 29.
Verstraeten D. Reservoir computing: computation with dynamical systems. PhD thesis (2010).
 30.
Rosenblatt F. The perceptron  a perceiving and recognizing automaton. Report 85, Cornell Aeronautical Laboratory; 1957.
 31.
Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65:386408.
 32.
Miller DAB. Novel analog selfelectroopticeffect devices. IEEE J Quantum Electron. 1993;29:67898.
 33.
Fok MP, Tian Y, Rosenbluth D, Prucnal PR. Pulse lead/lag timing detection for adaptive feedback and control based on optical spiketimingdependent plasticity. Opt Lett. 2013;38(4):41921.
 34.
Boixo S, Rønnow TF, Isakov SV, Wang Z, Wecker D, Lidar DA, Martinis JM, Troyer M. Evidence for quantum annealing with more than one hundred qubits. Nat Phys. 2014;10(3):21824.
 35.
Boixo S, Smelyanskiy VN, Shabani A, Isakov SV, Dykman M, Denchev VS, Amin M, Smirnov A, Mohseni M, Neven H. Computational role of collective tunneling in a quantum annealer. arXiv:1411.4036v1 (2014).
 36.
Cortes C, Vapnik V. Supportvector networks. Mach Learn. 1995;20(3):27397.
 37.
Tezak N, Niederberger A, Pavlichin DS, Sarma G, Mabuchi H. Specification of photonic circuits using Quantum Hardware Description Language. Philos Trans R Soc A, Math Phys Eng Sci. 2012;370(1979):527090.
 38.
Gough J, James MRR. The series product and its application to quantum feedforward and feedback networks. IEEE Trans Autom Control. 2009;54(11):253044.
 39.
Gough J, James MR. Quantum feedback networks: Hamiltonian formulation. Commun Math Phys. 2008;287(3):110932.
 40.
Hamerly R, Mabuchi H. Quantum noise of freecarrier dispersion in semiconductor optical cavities. arXiv:1504.04409 (2015).
 41.
Tezak N. Perceptronfiles. https://github.com/ntezak/perceptronfiles (2014).
 42.
Tait AN, Shastri BJ, Fok MP, Nahmias MA, Prucnal PR. The DREAM: an integrated photonic thresholder. J Lightwave Technol. 2013;31(8):126372.
 43.
Sussillo D, Barak O. Opening the black box: lowdimensional dynamics in highdimensional recurrent neural networks. Neural Comput. 2013;25(3):62649.
 44.
Savchenkov A, Matsko A, Strekalov D, Mohageg M, Ilchenko V, Maleki L. Low threshold optical oscillations in a whispering gallery mode CaF_{2} resonator. Phys Rev Lett. 2004;93(24):243905.
 45.
Poustie AJJ, Blow KJJ. Demonstration of an alloptical Fredkin gate. Opt Commun. 2000;174(14):31720.
 46.
Milburn G. Quantum optical Fredkin gate. Phys Rev Lett. 1989;62(18):21247.
 47.
Fredkin E, Toffoli T. Conservative logic. Int J Theor Phys. 1982;21(3/4):21953.
 48.
Tezak N. QHDLJ. https://bitbucket.org/ntezak/qhdlj.jl (2014).
 49.
Bezanson J, Edelman A, Karpinski S, Shah VBJ. Julia: A fresh approach to numerical computing. arXiv:1411.1607 (2014).
 50.
Lattner C, Adve V. LLVM: a compilation framework for lifelong program analysis and transformation. In: International symposium on code generation and optimization, 2004. CGO 2004; 2014. p. 7586.
 51.
Fisher RA. The use of multiple measurements in taxonomic problems. Ann Eugen. 1936;7(2):17988.
 52.
McLachlan GJ. Discriminant analysis and statistical pattern recognition. Wiley series in probability and statistics. Hoboken: Wiley; 1992.
 53.
Levy JS, Gondarenko A, Foster MA, TurnerFoster AC, Gaeta AL, Lipson M. CMOScompatible multiplewavelength oscillator for onchip optical interconnects. Nat Photonics. 2009;4(1):3740.
 54.
Graham R, Haken H. The quantumfluctuations of the optical parametric oscillator. I. Z Phys. 1968;210(3):276302.
Acknowledgements
This work is supported by DARPAMTO under award no. N660011114106. NT acknowledges support from a Stanford Graduate Fellowship. We would also like to thank Ryan Hamerly, Jeff Hill, Peter McMahon and Amir SafaviNaeini for helpful discussion.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.
Appendix: Basic component models
Appendix: Basic component models
Here we present the component models used to build the perceptron circuit. We will first describe the static components such as beamsplitters, phase shifts and coherent displacements, then proceed to describe the different Kerrnonlinear models and finally the NOPO model.
A.1 Static, linear circuit components
All of these components have in common that they have no internal dynamics, implying that the A, B and C matrices and the avector have zero elements, and \(A_{\mathrm{NL}}\) is not defined.
A.1.1 Constant laser source
The simplest possible static component is given by single input/output coherent displacement with coherent amplitude η. This model is employed to realize static coherent input amplitudes. The D matrix is trivially given by \(D=(1)\) and the coherent amplitude is encoded in \(c=(\eta)\). This leads to the desired inputoutput relationship \(\beta_{\mathrm{out}} = \eta+ \beta_{\mathrm{in}}\). For completeness we also provide the SLH [38] model \(((1), (\eta), 0)\).
A.1.2 Static phase shifter
The static single input/outputs phase shifter has \(D=(e^{i\phi})\) and \(c = (0)\), leading to an input output relationship of \(\beta_{\mathrm{out}} = e^{i\phi} \beta_{\mathrm{in}}\). Its SLH model is \(((e^{i\phi}), (0), 0)\).
A.1.3 Beamsplitter
The static beamsplitter mixes (at least) two input fields and can be parametrized by a mixing angle θ. It has \(D = \bigl({\scriptsize\begin{matrix} \cos\theta & \sin\theta\cr \sin\theta& \cos \theta \end{matrix}}\bigr) \) and \(c = (0,0)^{T}\). This leads to an input output relationship
Its SLH model is \(\bigl( \bigl({\scriptsize\begin{matrix} \cos\theta & \sin\theta\cr \sin\theta& \cos \theta \end{matrix}}\bigr) , \bigl({\scriptsize\begin{matrix} 0\cr 0 \end{matrix}}\bigr) , 0 \bigr)\).
A.2 Resonator models
We consider resonator models with m internal modes and n external inputs and outputs. We assume for simplicity that \(a = \mathbf{0}\) and \(c = \mathbf{0}\) meaning that we will model all coherent displacements explicitly in the fashion described above. We also assume that their scattering matrices are trivially given by \(D = \mathbf{1}_{n}\) which means that far offresonant input fields are simply reflected without a phase shift. Furthermore, none of our assumed models feature linear coupling between the internal cavity modes. This implies that the Amatrix is always diagonal. We are always working in a rotating frame.
A.2.1 Single mode Kerrnonlinear resonator
A Kerrnonlinearity is modeled by the nonlinear term \(A_{\mathrm{NL}}^{\mathrm{Kerr}}(\alpha) = i \chi\alpha^{2} \alpha\) which can be understood as an intensity dependent detuning. The Amatrix is given by \((\frac{\kappa_{T}}{2}i\Delta)\), its Bmatrix is \((\sqrt{\kappa_{1}}, \sqrt{\kappa_{2}}, \dots, \sqrt {\kappa_{n}})\), where the total line width is given by \(\sum_{j=1}^{n}\kappa_{j} = \kappa_{T}\) and the cavity detuning from any external drive is given by Δ. The Cmatrix is given by \(C=B^{T}\). The corresponding SLH model is
where the detuning differs slightly \(\tilde{\Delta} = \Delta+ \chi\) as can be shown in the derivation of the Wignerformalism [10].
The special case of a single mirror with coupling rate κ and negligible internal losses is of interest for constructing the phase sensitive amplifier described in Section 2.1. Considering again an input given by a large static bias and a small signal \(\epsilon=\frac{1}{\sqrt{2}} (\epsilon_{0} +\delta\epsilon)\), the steady state reflected amplitude is to first order
For negligible internal losses we can give provide exact expressions for η, \(g_{+}\) and \(g_{}\). Rather than parametrizing these by the bias \(\epsilon_{0}\) we parametrize them by the mean coherent intracavity amplitude \(\alpha_{0}\). When the system is not bistable (see below) relationship (14) defines a onetoone map between \(\epsilon_{0}\) and \(\alpha_{0}\).
The Kerr cavity exhibits bistability for a particular interval of bias amplitudes if and only if \(\Delta/\chi< 0\) and \(\Delta \ge\frac {\sqrt{3}\kappa}{2}=\Delta_{\mathrm{th}}\).
At any fixed bias amplitude and corresponding internal steady state mode amplitude the maximal gain experienced by a small signal is given by \(g^{\mathrm{max}} = g_{}+g_{+}\). Here maximal means that we maximize over all possible signal input phases relative to the bias input. To experience this gain, the signal has to be in an appropriate quadrature defined by \(\arg{\delta\epsilon} = \frac{\arg g_{}  \arg g_{+}}{2}\). The orthogonal quadrature is then maximally deamplified by a gain of \(g_{}g_{+}\) and it is possible to show that for negligible losses the perfect squeezing relationship \((g_{}+g_{+} )g_{}g_{+} = \vert g_{}^{2}  g_{+}^{2}\vert = 1\) holds for any bias amplitude. Furthermore, for fixed cavity parameters \(g^{\mathrm{max}}\) is maximized at a particular nonzero intracavity photon amplitude
Note that the maximal gain does not depend on the strength of the nonlinearity. The relationship between \(g^{\mathrm{max}}\) and Δ can be inverted:
Using all this it is straightforward to construct a tunable Kerramplifier. The symmetric construction proposed in Section 2.1 provides the additional advantage that one does not have to cancel the scattered bias. It is also convenient to prepend and append phase shifters to the signal input and output that ensure \(g_{}, g_{+} \in \mathbb{R}_{> 0}\) at maximum gain.
The quadrature filter construction relies on the presence of additional cavity losses that are equal to the input coupler \(\kappa_{2} = \kappa _{1} = \kappa\). In this case the gain coefficients for reflection of the first port are given by
and one may easily verify that for dynamic resonance, i.e., \(\chi \alpha_{0}^{2} = \Delta\), the gain coefficients are equal in magnitude \(g_{}=g_{+}\) which implies that there exists an input phase for which the reflected signal vanishes.
A.2.2 Two mode Kerrnonlinear resonator
We label the mode amplitudes as \(\alpha_{1}\) and \(\alpha_{2}\). In this case the nonlinearity includes a crossmode induced detuning
The model matrices are
and the corresponding SLH model is
with \(\tilde{\Delta}_{a/b} = \Delta_{a/b} + \chi_{a/b} + \frac {\chi_{ab}}{2}\) and where the Wignercorrespondence^{Footnote 3} is \(\langle\alpha_{1}\rangle_{\mathrm{W}} = \langle a \rangle\), \(\langle\alpha_{2}\rangle_{\mathrm{W}} = \langle b \rangle\).
We briefly summarize how to construct a controlled phase shifter using an ideal twomode Kerr cavity with a single input coupling to each mode and negligible additional internal losses. We exploit that in this case the reflected steady state signal amplitude \(\zeta'\) is identical to the input amplitude ζ up to a power dependent phase shift
We assume that the control input amplitude takes on two discrete values \(\xi= 0\) or \(\xi= \xi_{0}\) and that variations of the signal input amplitude are small \(\zeta\approx\zeta_{0}\). In this case a good choice of detunings and coupling rates is given by
in addition to two inequality constraints
that ensure that the system is stable. This construction ensures that \(\frac{ \zeta' _{\xi=\xi _{0}}}{ \zeta' _{\xi=0}} = 1\) and in fact it can easily be generalized to the more realistic case of nonnegligible internal losses.
Finally note that the inequality constraints imply that the lower bounds for the input couplings scale as \(\kappa_{a}^{\mathrm{min}}, \kappa _{b}^{\mathrm{min}} \propto\zeta_{0}\) which is important for our power analysis in Section 3.1. This, in turn implies that \(\xi_{0} \propto\zeta_{0}\) which is a fairly intuitive result.
The controlled phase shifter can now be included in one arm of a MachZehnder interferometer to create a Fredkin gate (cf. Section 2.4).
To realize a thresholder, the control mode input is prepended with a two port Kerrcavity with parameters chosen such that it becomes dynamically resonant with maximal differential transmission gain close to where its output gives the correct high control input \(\xi_{0}\).
Overall, we remark that even when we account for the prepended cavity, the relationship \(c \propto\zeta_{0}\) still holds, where c is the input to the thresholder. To see how the total decay rate of the thresholding cavity \(\kappa_{\mathrm{thresh}}\) scales consider first that to get maximum differential gain or contrast, we ought pick a detuning right at or below the Kerr stability threshold \(\Delta\approx\Delta _{\mathrm{th}} = \sqrt{3}\kappa_{\mathrm{thresh}}/2\).
We choose the maximum input amplitude such that it approximately achieves dynamic resonance within the prepended thresholding cavity. This occurs when \(\Delta= \chi\alpha_{0}^{2}\) (cf. Appendix A.2.1) and at an input amplitude of \(c \propto\sqrt {\kappa_{\mathrm{thresh}}\vert \frac{\Delta}{\chi} \vert } \propto \kappa_{\mathrm{thresh}}\).
A.2.3 NOPO model
The NOPO model has consists of three modes, the signal and idler modes \(\alpha_{s}\), \(\alpha_{i}\) and the pump mode \(\alpha_{p}\). We assume a triply resonant model^{Footnote 4} and that \(\omega_{s} + \omega_{i} = \omega_{p}\), allowing for resonant conversion of pump photons into pairs of signal and idler photons and vice versa. The nonlinearity is given by
and the model matrices are
Here, the SLH model is given by
where now a, b and c correspond to \(\alpha_{s}\), \(\alpha_{i}\) and \(\alpha_{p}\).
A steady state analysis of the system driven only by a pump input amplitude ϵ reveals that below a critical threshold \(\epsilon < \epsilon_{\mathrm{th}} = \frac{\kappa\sqrt{\kappa_{p}}}{4 \chi}\) the system as a unique fixpoint with \(\alpha_{s}=\alpha_{i}=0\) and \(\alpha_{p} = \frac{2\epsilon}{\sqrt{\kappa_{p}}}\). Above threshold \(\epsilon \ge\epsilon_{\mathrm{th}}\), the intracavity pump amplitude stays constant at the threshold value \(\alpha_{p} = \frac{2\epsilon_{\mathrm{th}}\epsilon/\epsilon}{\sqrt{\kappa_{p}}} = \frac{\kappa\epsilon/\epsilon}{2\chi}\) and the signal and idler mode obtain nonzero magnitude
As an interesting consequence of the model’s symmetry there exists not a single above threshold state but a whole manifold of fixpoints parametrized by a correlated signal and idler phase
where the common phase \(\phi_{0}\) is fixed by the pump input phase via
In particular, for \(\epsilon< 0\) we have \(\alpha_{i} = \alpha_{s}^{\ast}\). Above threshold the system will rapidly converge to a fixpoint of welldefined phase ϕ. Without quantum shot noise ϕ would remain constant. With noise, however, the system can freely diffuse along the manifold. When the pump bias input is sufficiently large compared to threshold and consequently there are many signal and idler photons present in the cavity at any given time (\(\alpha_{s/i}^{2} \gg 1\)) one can analyze the dynamics along the manifold and of small orthogonal deviations from the manifold. In the symmetric case considered here where signal and idler have equal decay rates, the differential phase degree of freedom \(\phi= \frac{\arg\alpha_{i}  \arg\alpha_{s}}{2}\) decouples from all other variables and approximately obeys the SDE
It is relatively straightforward to generalize these results to a less symmetric model with different signal and idler couplings and even nonzero detunings, but for a given nonlinearity the model considered here provides the smallest phase diffusion and thus the best analog memory. For a very thorough analysis of this model we refer to [54].
A.3 Composite component models
Due to the scope of this article, we will refrain from including the full net lists for the composite component models in this article and instead publish them online at [41]. We remark that composing a photonic circuit from the above described nonlinear photonic models is often complicated by the fact that the steady state inputoutput relationships are hard or even impossible to invert analytically. A systematic approach to optimizing component parameters would be highly desirable.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Tezak, N., Mabuchi, H. A coherent perceptron for alloptical learning. EPJ Quantum Technol. 2, 10 (2015) doi:10.1140/epjqt/s4050701500233
Received
Accepted
Published
DOI
Keywords
 optical information processing
 coherent feedback
 machine learning
 photonic circuits
 nonlinear optics
 perceptron