Mbit/s-range alkali vapour spin noise quantum random number generators

Spin noise based quantum random number generators ﬁrst appeared in 2008 and have since then garnered little further interest, in part because their bit rate is limited by the transverse relaxation time T 2 which for coated alkali vapour cells is typically in the kbit/s range. Here we present two advances. The ﬁrst is an improved bit generation protocol that allows generating bits at rates exceeding 1/ T 2 with only a minor increase of serial correlations. The second is a signiﬁcant reduction of the time T 2 itself by removing the coating, increasing the vapour temperature and introducing a magnetic-ﬁeld gradient. In this way we managed to increase the bit generation rate to 1.04 Mbit/s. We analyse the quality of the generated random bits using entropy estimation and we discuss the extraction methods to obtain high-entropy bitstreams. We accurately predict the entropy output of the device backed with a stochastic model and numerical simulations.


Introduction
The conceptual challenge in designing a random number generator (RNG) is to guarantee the true randomness of its output.Traditionally, the device was put through a number of tests [1][2][3] to ensure a satisfactory degree of randomness.Passing those tests is a necessary, but not a sufficient condition for true randomness.In fact, there has recently been a strong shift in recommendations away from empirical testing and towards theoretical modelling of the entropy generating physical process [4,5].With a priori knowledge of the system behaviour, the device can be guaranteed to produce cryptographically secure random numbers with near perfect entropy, so long as the model assumptions are shown to hold experimentally.Quantum systems, which are naturally probabilistic, are perfect candidates in the construction of a RNG, because the physical origin of randomness (measurement processes) are usually well defined, thus suitable for modelling.
First quantum random number generators (QRNGs) were based on the timing of radioactive decay [6], while nowadays photonic systems are more common [7].Standard approaches are the 50/50 beam splitter configuration, timing photon detection events, or using photon counts [7], with the fastest QRNGs based on laser phase noise fluctuations surpassing bitrates of tens of Gbit/s [7].There also exist a number of non-optical approaches, for example avalanche detection in semiconductors [8] or quantum tunneling effect of electrons across p-n junctions [9].
In this work, we focus on the implementation known as the spin noise QRNG [10].This QRNG approach is rooted in the experimental technique of spin noise spectroscopy (SNS), used in studying the relaxation properties of alkali metal [11] or semiconductor systems [12] in a nonperturbative way.When a sample is placed in a transverse magnetic field, a laser beam along a longitudinal axis can be used to probe fluctuations of the spin polarization of the sample which are imprinted on the polarization of the laser beam through Faraday rotation.A spin noise spectrum with a width connected to the transverse relaxation time T 2 [13] can be measured by averaging many spectra.Spin noise RNGs generate random numbers from the random spin polarization fluctuations of a sample [10].Correlations present in the output bits are dependent on the T 2 of the measured system, requiring short relaxation times to achieve high bitrates.On the other hand, a shorter T 2 leads to a wider spin-noise spectrum [13], thus lowering the signal-to-noise ratio of the signal.An example are semiconductor systems with typical values of T 2 ranging from ns to ps [14,15], however, such systems are currently challenging to use for a spin noise QRNG due to a poor signal-to-noise ratio.
Since being first explored in Ref. [10], little further research has been done on the spin noise QRNGs.In this work we propose to improve random number generation in a Cs gas cell by reducing T 2 using three modifications.First, an uncoated cell is used in order to increase the dephasing due to wall collisions.Second, heating of the cell increases the number of collisions between atoms per unit of time, leading to increased dephasing.Third, a magnetic gradient, that can be tuned continuously, is applied along the beam propagation axis to additionally increase the relaxation rate in a controlled manner.This leads to improvements of a factor of 50 over previous works.
The original bit generation scheme presented in Ref. [10] works by checking when the random spin noise signal crosses a threshold.In our work we present an approach that looks at the timing of the signal's fluctuations.This alternative protocol generates a larger quantity of random bits per second, leading to bitrates that surpass 1/T 2 .In this way, more entropy can be extracted from the quantum system, although some bits have to be rejected as part of the extraction phase to rid the bitstream of minimal serial correlations.It can also be added that this is an approach for the generation of random bits from a stochastic signal and is not limited to this specific type of QRNG.

Experimental work
The experimental setup is outlined in Fig. 1(a).It consists of a Toptica TA pro 852 nm laser source blue-detuned by 1 GHz from the Cs D2 line.The beam is Gaussian with the 1/e 2 width of approximately 1 mm.The laser light is linearly polarized before entering a magnetically shielded cylindrical uncoated Thorlabs Cs glass reference cell with a diameter of 19 mm and a length of 75 mm (GC19075-CS).The input polarization is set by a λ/2 wave-plate after the linear polarizer.The laser beam passes through the cell and is split on a polarizing beam splitter (PBS).We attenuate one of the beams by a factor of 10 by using an OD1 attenuator and measure their intensities with photodiodes.ν L = 510 kHz in the absence of a magnetic field gradient (blue) and in a high gradient (red) (dB z /dy ≈ 1.3 mT/m).(c) Polarimeter output signal; the red vertical dashed line separates the cases where the spin noise is absent (left) and when it is present (right).The red horizontal lines correspond to ±5σ bg where σ bg is the standard deviation of photon shot noise and electronic noise of the measurement system.The QCNR is σ s /σ bg ≈ 3.16 where σ s corresponds to the standard deviation of the signal with spin noise.LP, linear polarizer; λ/2, half-wave plate; B, magnetic field; PBS, polarizing beam splitter; A, attenuator; PD, photodiode By using the λ/2 wave-plate we balance the beam intensities after the PBS so that the intensity is equal on both photodiodes.In the case where one of the beam lines is attenuated, a balanced signal will correspond to the case where one beam line exiting the PBS is significantly higher in power than the other.This allows us to use higher laser powers in the sample while at the same time eliminating any non-linear effects of the photodetectors due to high incident intensity [16].We use this setup to ensure SNR saturation of the spin noise signal, although higher powers lead to power broadening [17].
The Cs cell is placed in a magnetic field B = 0.129 mT perpendicular to the laser propagation axis, corresponding to a Larmor frequency of approximately ν L = 450 kHz, which defines the center of the spin noise spectrum.When the polarization has a non-zero component along the beam propagation axis, s y = 0, the polarization angle θ (vertical in the xz plane before entering the λ/2 waveplate) of the beam changes according to θ = N θ 0 s y ; this is known as Faraday rotation (here N is the number of atoms in the volume of the beam and θ 0 is the rotation per atom).Since fluctuations of s y are random, θ also changes randomly with time.This is then mapped to fluctuations of amplitude by use of the PBS, and measured by a balanced polarimeter.The balanced polarimeter subtracts two photodetector signals, and feeds its output into a SR650 filter unit, which amplifies the polarimeter signal by 20-30 dB.The filter output is sampled using a Digilent Analog Discovery Pro 3000 Series at a sample rate of 100 MHz.
An example of the captured spin noise spectrum is shown in Fig. 1(b).The recorded spectrum is comprised of the Lorentzian spin noise signal, and a flat background (-170 to -180 dBV/Hz) which is a sum of photon shot noise and electronic noise.The background as seen in Fig 1(b) primarily consists of photon shot noise as in our experiments electronic noise is orders of magnitude lower.In our experiments the ratio σ s /σ bg ≈ 2 -4 represents the quantum to classical noise ratio (QCNR) where σ s and σ bg are the standard deviations of the signal with and without spin noise present, respectively.
We note that it is crucial that before any bit generation occurs the data is band-pass filtered to eliminate any noise arising from unwanted sources.For this purpose one may use, for example, a Butterworth filter of order 20 centered at ν L .The width of the bandpass filter is to be chosen such that the entire spectrum above the photon shot noise floor is captured.This filtering step is necessary as other spectral components of the polarimeter signal arise from sources outside the measurement system (e.g.radio signals, 50 Hz power line hum etc.)

Transverse relaxation time tuning
The overarching idea of this section is to introduce improvements towards a faster dephasing rate T -1 2 in different experimental conditions in order to reduce bit correlations.Relaxation time estimation was primarily done by computation of the autocorrelation function , where Z is the length of s(t).This can be calculated efficiently using the Wiener-Khinchin theorem: where ω L = 2πν L , F(f (t)) denotes the Fourier transform (FT) of f (t), and F -1 (f (ω)) its inverse.As can be seen in Fig. 1(b), the spectrum of spin noise is Lorentzian [18].This implies that in this case C(t), the inverse FT of a Lorentzian, is an exponentially decaying trigonometric function.The decay of the autocorrelation function coincides with the transverse relaxation time T 2 , which can consequently be extracted.This is done by fitting an exponentially decaying trigonometric function to a numerically computed C(t).
In coated Cs cells the largest contributor to spin dephasing seem to be wall collisions [20,21], which randomize the valence electron spin, and, due to hyperfine interaction, the nuclear spin.This relaxation is directly proportional to the number density of atoms in the cell n.The lack of a coating in our cell increases the dephasing from the usual rates of 20 Hz [20] or even 0.01 Hz [22] for coated cells to approximately 0.28 MHz at 75 • C, as seen from Fig. 2(a).
In this case, the spin noise spectrum is broadened to such an extent that the peak is below the shot noise level at room temperature.Due to this, we heat the cell in order to increase the number density of Cs atoms n, as the signal scales with √ n (see the Appendix).We install our cell in a ceramic oven which is additionally thermally isolated from the surrounding environment with a 5 mm layer of glass wool.This setup allows us to reach temperatures of up to 140 • C. The heating element, powered by a DC current, consists of a twisted wire to minimize stray magnetic fields.The constant heating is done throughout experiments, keeping the Cs cell at a stable 140 • C; we observed no detrimental effect of the current on the spin noise spectrum.
The second largest contribution to dephasing is spin-exchange with a dephasing rate 1/T 2 = σ SE vn [23], where σ SE is the cross section for a spin-exchange collision, and v is the relative velocity of Cs atoms.This dephasing rate depends on the temperature through v and n, both of which increase with temperature, making the dephasing quicker.This is clearly seen in Fig. 2  In order to further increase 1/T 2 we install an additional gradient generating coil on top of the heating oven.This allows us to continuously change T 2 by an additional factor.
When a large gradient is present in the sample, it is no longer true that the time correlator of the system will be an exponentially decaying trigonometric function.The spin noise spectrum spreads from a Lorentzian [24] as seen in Fig. 1(b) and can be thought of a sum of peaks at varying Larmor frequencies.The time correlation function will then be where f (ω) represents the spectral profile of the spin fluctuations.Due to this, the system decoheres at a faster effective rate T * 2 which can still be determined by using the Wiener-Khinchin theorem.By assuming the system precesses at an average frequency ωL , we can perform the same fitting procedure as C(t) ∝ e -t/T * 2 cos( ωL t).The faster decay is shown in Fig. 2(b), where we observe a change by a factor of 2. A stronger gradient could be achieved by either moving the coil closer to the cell, applying a higher current to the coils, or altering the coil geometry.This allows continuous changes to the T 2 , provided the entire spin noise spectrum remains in the positive frequency domain.Since increasing the 1/T 2 spreads the signal in the frequency domain this worsens the SNR, as shown on Fig. 1(b).

Bit generation
We use the digitized polarimeter signal to generate a bitstream of N integers valued either 0 or 1 (bits).A perfect random bit has equal probability to be 0 or 1 and is not correlated to any other bit; such bits are identically and independently distributed (IID), as well as uniformly distributed.In this case, an entropy source will have a non-biased output B = 0, where we define the bias as where N(i) represents the number of bits valued i in the entire bitstream.We now present two different methods of generating bits from a digitized spin noise signal -here dubbed protocols.We first define a threshold that the signal has to cross before any random numbers can be generated.This is done to ensure robustness of the QRNG (i.e. the generated noise comes from the quantum system in question and not from electronic or shot noise).
To fix a suitable threshold we must find the variance of the signal in the absence of spin noise fluctuations.To do this, we shift the spin noise spectrum outside of the working frequency range by applying a high magnetic field (ν L > 5 MHz), so that only the electronic and photon shot noise remain.The time signal is then measured until 10 6 samples are collected, from which we calculate the variance σ 2 .The threshold is chosen as a multiple of σ , usually = 5σ .Finally, the spin noise is shifted back to ν L ≈ 450 kHz and the bit generation can proceed.An example of this is pictured on Fig. 1(c).

Protocol 1: threshold hitting
The threshold hitting protocol is a simple approach of bit generation that was explored in Ref. [10].One generates bits according to whether a threshold was crossed in the positive or the negative direction.An additional waiting step is implemented.The generation algorithm is as follows: 1. Wait until a threshold (-) is exceeded in the positive (negative) direction.
3. Wait MT 2 , where M ∈ R. 4. Back to 1.The third step is crucial to avoid bit correlations, as the time correlator for this Ornstein-Uhlenbeck [25] process is C(t) ∝ e -t/T 2 cos(ω L t).Therefore waiting MT 2 between bit detections exponentially removes correlations from the bitstream.Usually M = 10 to limit correlations to e -10 ≈ 10 -5 .If we define an event as one execution of the algorithm above, we can say that one event generates close to 1 bit of entropy (this is further discussed later).Using this protocol we achieve bitrates of up to 50 kHz, however this approach is heavily limited by the T 2 .

Protocol 2: times above threshold
Here the random variable in question is the time the signal spends above (or below) the threshold (or -).This time is given by two crossings (events), and we will show it gives more bits of entropy per event than protocol 1.The protocol is given as follows: 1. Wait until a threshold or -is exceeded at t 1 .
2. Wait until the same threshold is crossed a second time at t 2 .
3. Record the time above (under) threshold as t = t 2t 1 .4. Back to step 1.Instead of looking at the variance of the signal in time the entropy comes from the phase jitter of the signal, analogous to a temporal mode optical QRNG [7].This approach is limited in principle by the inverse of the sampling rate δt = 10 ns of our digitizer, since phase information is lost between the samples. 1 This means that the times above threshold will have an uncertainty up to one sample period (10 ns) which is not produced by the quantum system.This non-quantum source of randomness should be excluded in the extraction process.
This protocol generates a non-uniform distribution as shown in Fig. 3(a).To generate bits from this distribution we represent each integer value using 8 bits and take L least significant bits (LSB).This is because each successive bit is distributed more unevenly, as shown on Fig. 3(b).Major imbalances in the output bitstream should be avoided in order to retain an acceptably low bias, therefore last three bits are discarded (additionally, the first LSB should be discarded due to its non-quantum nature as noted in the previous paragraph).If we now instead define an event as one execution of the second algorithm, an event now generates at most L bits of entropy.The blocks of length L -1 are joined into a bitstream.However, the blocks can be expected to remain correlated and therefore The discerning features are a zero-probability at t = 0 due to band-pass filtering, a peak at t > 0 due to Larmor precession, and the short-time tail that evolves from concave to convex for increasing threshold values.(b) Distribution of bits generated from random times above a threshold = 1.44σ .The dark bars represent the probability for a bit to equal 0, and the light the probability for a bit to be equal to 1.One can see that each successive bit has a higher bias, so L least significant bits are taken from each integer in order to avoid large biases in the output bitstream produce a sequence that is not ideally random.The next section deals with tackling this issue.

Entropy
The randomness of a RNG is quantified by its entropy output.Generally, instead of Shannon entropy, the more conservative min-entropy is used, giving the lower bound of randomness.In this use case, we are interested in maximizing the min-entropy generated per bit, where N is the length of the bitstream.Empirically, this can be estimated using a variety of tests.In our case, we use the methods given in the NIST SP800-90B special publication [26], which defines entropy estimation procedures for IID (identically and independently distributed) and non-IID sources.Entropy estimation for IID sources of random numbers is much simpler compared to estimation for non-IID sources, and the claim that the source is IID can also be tested, as defined in Ref. [26].Using the IID entropy estimation techniques on a non-IID source will overestimate the entropy, while non-IID testing on IID sources will underestimate the entropy.This is hinted at in Table 1.In any case, if the estimated min-entropy is not satisfactory (sufficiently close to 1 bit of entropy per bit) then an extraction algorithm must be used on the output of the entropy source.
For protocol 1 with high waiting times MT 2 between bit generations, the resulting bitstreams can have sufficiently low correlation, and when extra care is put in the balancing the bias can be B ≈ 10 -5 -10 -3 , which makes it possible to use protocol 1 with no extraction. 2Protocol 2 typically has a larger bias B ≈ 10 -2 in the output bitstream than protocol 1. Futhermore, converting non-uniformly distributed integers into series of bits introduces short range correlations on the order of L (this is discussed later).
The extraction method we use is the Toeplitz hashing algorithm, which works in the following way.Take a randomly generated Toeplitz matrix T with dimensions dim(T ) = (a, b).Now take b bits from the bitstream to generate a vector x of length b and multiply it y = T x to generate a new vector y of length a.This can be repeated using all the bits from the bitstream to generate a compressed bitstream with more entropy per bit if b > a.
The dimensions of T are chosen according to how much compression b/a is needed.In our case, we generate random bits for 10 s using protocol 2 (L = 7) at a rate of 1.41 Mbit/s.This produces approximately 14.1 Mb of random bits with an estimated min-entropy per bit H min /N = 0.703 ± 0.001 using non-IID tests on 10 samples of 10 6 bits.We additionally remove the first least significant (LSB) bit due to possible classical contributions which can significantly impact the min-entropy of the bitstream.To minimize this effect we find that it is also more suitable to take L = 6 least significant bits.This leads to an estimated H min /N = 0.750 ± 0.005.Since blocks of L bits are correlated, we choose b = 1024 L.
The second dimension is then given by a = bH min /N = 768.A Toeplitz matrix of this size will guarantee that the output bitstream is practically IID and unbiased.An example of this claim is shown in Table 2.We compress the above mentioned bitstream with differently sized Toeplitz matrices.Using insufficient compression ratios leads to failure of the IID tests.Since the compression ratio for a = 682 exceeds approximately 1/0.75 = 1.333 the bitstream passes all IID tests and the output entropy per bit is estimated as H min = 0.995 ± 0.001.Performing the extraction with a slighly lower compression ratio of 1.35 makes it possible to reach the final bitrate of 1.04 Mbit/s of highly entropic random numbers.We note that this bitrate is higher than 1/T 2 = 500 kHz in this particular test where no magnetic field gradient was applied.
On top of entropy estimation, batteries of statistical tests are usually done on the output bitstreams in order to find statistical shortcomings of the entropy source.Some standard testing suites we used are dieharder [1], TestU01 [2], and the NIST [3] tests, which were all required to pass (and they did pass).We do not put much emphasis on such statistical testing, as it is always possible to successfully pass all standard tests using sufficient compression even with badly flawed generators.

Stochastic modelling
We describe the spin state of the alkali vapour using a density matrix ρ which evolves randomly in time.In the absence of coherent excitations, ρ is diagonal, since all off-diagonal elements decay due to decoherence effects.In magnetic fields typically used, the alkali gas remains close to unpolarized due to thermal effects (kT gμ B B, where g is the g-factor and μ B is the Bohr magneton) and, to a good approximation, the density matrix is maximally entropic, ρ ∝ 1.
The time evolution of ρ, however, is random.The mechanism of randomness generation are collisions.Two processes can be distinguished.The first is spin exchange when two Cs atoms collide, where the total spin S 2 is conserved.Atoms with an initial angular momentum state |m evolve to |m + j with j = 0, ±1 during a collision [10,27].Collisions with j = 0 cause fluctuations of diagonal elements of ρ, whereas collisions with j = ±1 cause relaxation.There exist other kinds of collisions between atoms, for example spin-destruction collisions, but their cross sections are typically orders of magnitude lower than those of spin-exchange [28].
The second major randomness generating process are wall collisions.Reference vapour cells are coated internally (e.g.paraffin) such that the polarization of an atom persists through as many collisions as possible.However, due to lack of a cell coating in our experiment, an atom's polarization is essentially randomized upon collision with the cell wall surface [20].The interaction between the atoms of the wall and the valence alkali-metal spin is comprised of a dipole-dipole interaction and a spin orbit-type coupling, and it is a well-understood quantum process [29].Because of the large reduction in T 2 after the coating is removed, we believe that this randomness generating process dominates over Cs-Cs atomic collisions.
To model the evolution of the system it is easier to consider a stochastic picture rather than to perform quantum-mechanical calculations.Consider a system of alkali metal atoms with a transverse relaxation rate T 2 in a magnetic field along the z axis with a Larmor frequency ω L = 2πν L .Then the time evolution of the spin expectation value s = ( s x , s y ) T is given by a stochastic differential equation [23] where the matrices D and F are defined as and dη = dη x dη y is a two-dimensional Wiener process.This alternative model is exact when the alkali gas is unpolarized, which is only approximately fulfilled given that kT/gμ B B ≈ 10 4 for typical values B = 10 -2 T and T = 140 • C in our experiments.Knowing all the system parameters, we can calculate the entropy per generated bit.To do this we calculate stochastic properties of s , apply the appropriate protocol, and then calculate what is expected on the output of the polarimeter.

Protocol 1
To find the entropy per bit in protocol 1 we must know the amplitude distribution of the signal.We rewrite Eq. ( 5) as a Langevin equation for the variable s = s x + i s y : with the fluctuations f (t) defined by f (t) = 0 and f (t)f (t ) = 1 4T 2 δ(tt ).From this the time evolution of the variances of s x and s y are found to be the same as a univariate Ornstein-Uhlenbeck process: We consider the stationary case t → ∞, as the sample is approximately unpolarized.Alternatively, one can solve the stationary Fokker-Planck equation to obtain the same result.
In order to predict the min-entropy we must know the variance (σ ) 2 = 4Nθ 0 σ 2 s y (see the Appendix for the derivation) of the signal at the output of the polarimeter.
If we suppose that there are no correlations within the bistream (large waiting time M), then the min-entropy of the bitstream depends only on the bias μ of the signal, which is the mean of the signal's amplitude distribution.This bias is the result of imperfect balancing or drifting of the measurement system.One event generates H min = -log 2 (max x={0,1} P(X = x)) bits of entropy, where as shown in the Appendix, where (σ ) 2 is the variance of the polarimeter signal.For example, if μ/σ = 10 -4 , and /σ = 1.4,then per event we generate a minimum of -log 2 (P(X = 1)) ≈ 0.9990 bits of entropy.This holds when successive bits are uncorrelated, i.e., as M approaches infinity.In reality, serial correlations are present to some small degree.

Protocol 2
The problem of calculating the distribution of times above a threshold q for a Markov process has been studied since the early 70s [30].The challenge of evaluating q depends upon the complexity of the infinitezimal propagator A that drives the stochastic process [31].Although the problem is solved in Ref. [30] for the univariate Ornstein-Uhlenbeck process, the bivariate case presented in this paper leads to a non-trivial differential equation.We opted to numerically simulate this distribution using the stochastic model in Eq. ( 5) to generate times above the different thresholds as shown in Fig. 4(a). 3 The generated distributions can be used to estimate the entropy production of the experimental system.We start with an experimentally observed distribution q with threshold binned in n bins.We then compute a distribution q from a simulated signal by using the same experimental parameters ω L = 2πν L , T 2 , , again using n bins.Because of uncertainties of the experimental parameters, imperfections of the integrator, and deviations from the quantum model, the two distributions might not match perfectly.An example of the overlap between experimental and simulated histograms is demonstrated in Fig. 4(b).Now, the min-entropies H min (q) = -log 2 max t q(t) are computed.The simulated curve has a min-entropy of H min = 5.56 bits, while the experimental H min = 5.55 bits.In our case we generate L (H min /N) bits of entropy from each block of L bits.If we consider the entirety of the experimental distribution we must take all L = 8 LSB, and the estimated min-entropy is found to be H min /N = 0.4735 ± 0.0007 (using 10 samples of 2.2 × 10 6 bits).One event thus generates L(H min /N) = 3.788 bits of entropy, the rest is lost due to correlations.This is clearly the first advantage over protocol 1, where each event (i.e., one execution of the protocol 1 algorithm) generates at most 1 bit of entropy.By using a more sophisticated extraction protocol, we can extract more entropy from the system per event as established above.
Another way to improve the bitrate is to increase the Larmor frequency ν L while also keeping ν L δt constant.This makes for more events per second with adequate compensation by sampling faster, which is required to keep the distribution as flat (maximally 3 The numerical integration was done by a fixed step-size explicit 3-stage Milstein method for an Ito problem with strong and weak order 1.0.The used parameters were dt = 10 ns, ω L = 2π × 450 kHz, and T 2 = 2 μs.The 10 8 samples of the time signal are computed according to Eq. ( 5) and then band-pass filtered around ω L .  3 This captures the experimental results, although the experimental and simulated distributions do not overlap completely due to uncertainty of the experimental parameters.(b) The simulated signal can be used to calculate the distribution q (using the same ω L = 450 kHz, T 2 = 2 μs, and threshold = 5σ ), which can be used to estimate the entropy production of the experimental system Figure 5 Decay of serial bit correlations for both protocols.The dashed green line represents the case where C(t) = 1/Z Z i=1 X i X i+t = 0.5.Protocol 1 (with M = 0) has exponentially decaying correlations present in the output bitstream that decays close to 0.5, based on the bias.Protocol 2 has a short-term correlation of length L (here L = 5).An uptick at the 5th serial bit is seen for protocol 2, as the times above threshold remain correlated, although, as shown, this correlation decays faster than that of protocol 1. Due to the larger biases present in protocol 2, it can also be seen that the autocorrelation does not decay as close to 0.5 as that of protocol 1 entropic) as possible.However, these events would be more correlated, so a higher compression ratio would have to be used in the extraction process, thus it is hard to predict the limit to this approach.
The second reason why protocol 2 is superior to protocol 1 can be seen from the serial correlation of the generated bits.This is shown on Fig. 5, where bit autocorrelations C(t) were computed using the Wiener-Khinchin theorem for bits generated using both protocols (for a fair comparison M = 0 has to be taken) on the same experimental run.
For an unbiased IID source, each bit would have an autocorrelation C(t) of exactly 0.5 to the previous bit.In this case, this is not true until we perform extraction, where the extraction ratio required is a direct consequence of the magnitude of correlations present.What Fig. 5 suggests is that the correlation between the bits generated using protocol 2 decays faster than that of the bits generated using protocol 1 (with M = 0).Consequently, a lower extraction ratio is required in order to have a satisfactory min-entropy using protocol 2.

Conclusion
We show an improvement of three orders of magnitude of the bit rate in spin noise QRNGs by using several advances.First, we reduced T 2 by using a heated uncoated cell.The relaxation rate is further continuously tunable using a gradient inducing coil.This allows a bit generation rate of up to 50 kHz using previously known methods (protocol 1).Secondly, we used a more efficient entropy extraction scheme (protocol 2).We show that bits generated using different protocols have correlations that decay at different rates, in this case making protocol 2 superior to protocol 1, achieving bitrates of 1.04 Mb/s.
A spin noise QRNG could also be realized in solid-state systems.We propose two possible candidates for further research.The first is heavily doped (close to the metal-insulator transition) GaAs heterostrocture, because data suggests a relatively high T 2 ≈ ns at room temperatures [14], while the band gap can be engineered by changing the heterostructure geometry.Another alternative could be a n-doped CdTe quantum well, which can have up to an order of magnitude slower relaxation rates at room temperatures [14].
By lowering the T 2 to increase the bit rate or min-entropy, the signal-to-noise ratio (SNR) is worsened as the spectrum broadens.The SNR can be improved using more involved measurement systems (e.g., heterodyne detection improves SNR in low-shot-noise scenarios [32]).The limiting factor in spin noise QRNG type experiments is photon shot noise, as the electronic noise of the measurement electronics is orders of magnitude lower.Shot noise squeezing could also improve SNR [33] by tackling this issue, although there are known limits to this approach [34].It seems that improvement in experimental techniques is still required to capture live spin noise signals at faster relaxation rates without lengthy time averaging.If, one day, semiconductor based spin noise QRNGs are realised, the advances to bit generation and T 2 tuning presented in this work directly apply.

Appendix: Protocol 1 entropy estimate
To model the entropy exiting the polarimeter we have to propagate the probability distribution function of the amplitudes, as given by Eq. ( 8), through the measurement system.We pick the propagation axis for the laser y.At any given time, the Faraday rotation θ of the laser is proportional to the polarization s y along the axis of propagation where N is the number of atoms in the laser beam volume in the cell, θ 0 the Faraday rotation due to one spin, λ the wavelength of the light, A the laser cross section, γ the radiative width of the transition, and δ the detuning from the transition line.Upon exiting the glass cell the laser light, imprinted with randomly fluctuating polarization, is split on the PBS into two components given by the vector J J = cos(θ + π/4) sin(θ + π/4) .
The phase shift of π/4 is due to the balancing of the polarimeter.Although N = 10 12 , and θ 0 ∼ 10 nrad in our experiments, s y remains 0. On the other hand, fluctuations of the Faraday rotation angle are given by σ θ = √ N θ 0 σ s y where σ s y = 1/8 are fluctuations of the spin polarization in the y direction as shown in Eq. (8).As √ N θ 0 ≈ 10 -3 , the fluctuations σ θ remain small, which allows us to calculate the polarimeter output signal S by Taylor expansion S = cos(θ + π/4) -sin(θ + π/4) ≈ -2θ + O θ 3 .( 12) This shows that the variance of the polarimeter signal, (σ ) 2 , is the sum of two completely correlated processes, therefore (σ ) 2 = 4N θ 2 0 σ 2 s y .This shows the distribution of amplitudes of the signal outputted from the polarimeter remains Gaussian, given that θ are small.Any amplification of the signal (in the polarimeter, the SR650, or the Digilent Analog Discovery Pro) additionally spreads this Gaussian, however, we do not elaborate on this further.
In protocol 1 with M → ∞ the outputted bits are uncorrelated.The only deviation from ideal random numbers is then due to an imbalanced signal.Consider that the mean of the signal μ drifts in time away from 0. At any given time, the probability the signal is above or below a threshold is easily calculated and similarly P(S < -) = 1 2 erfc( +μ √ 2σ ).These two probabilities correspond to the probability to generate bit X ∈ {0, 1} at the output, given that |S| > .For example, P(X = 1) = P(S > | |S| > ).This allows us to predict μ from the measured bias using Bayes' theorem B = P(X = 1) -P(X = 0) P(X = 1) + P(X = 0) = P(S > ) -P(S < -) P(S > ) + P(S < -) .(14) This also enables us to make a min-entropy claim for the output of the QRNG with protocol 1 (and a large M) by expanding P(X = 1) or P(X = 0) as a Taylor series around ± μ.To second order in μ, the probability to generate a bit X = 1 is P(X = 1) = P(S > ) P(|S| > ) Alternatively, for P(X = 0) one has to make the substitution μ → -μ.In this approximation the bias equals It is important to note the shortcomings of this analytical method.The issue with such an expansion is that when σ the approximations cease to hold well, as the derivative of P(X = 1), D, becomes underestimated to the extreme case where it is the inverse of the correct value.This can partly be mended by taking the absolute value of the derivative of

Figure 1
Figure 1 (a) Schematic representation of the experimental configuration.A laser beam is linearly polarized and sent through a magnetically shielded uncoated heated Cs cell in a magnetic field.The beam is split on a PBS according to the orientation of the λ/2 plate and recorded by a balanced polarimeter, after which the signal is sent to post processing.(b) Examples of recorded spin noise power spectral densities (PSD) at (a).It is unclear to what degree spin exchange plays a role in comparison to wall collisions.

Figure 2
Figure 2Dephasing rate effects.(a) Dependence of the relaxation rate 1/T 2 on the temperature and number density in the cell.Heating increases the dephasing rate due to increased wall and spin exchange collisions through the relative velocity and number density of Cs atoms (described in the text).The number density n at every temperature is determined from data in Ref.[19].(b) Dependence of the relaxation rate 1/T * 2 on magnetic field gradient.The added gradient effectively dephases the system at a faster rate (T * 2 ) -1 .This was done at T = 140 • C. The solid lines correspond to linear fits, the error bars are inside the markers

Figure 3
Figure 3 Protocol 2 (a) Examples of distributions of times above threshold for different thresholds.The discerning features are a zero-probability at t = 0 due to band-pass filtering, a peak at t > 0 due to Larmor precession, and the short-time tail that evolves from concave to convex for increasing threshold values.(b) Distribution of bits generated from random times above a threshold = 1.44σ .The dark bars represent the probability for a bit to equal 0, and the light the probability for a bit to be equal to 1.One can see that each successive bit has a higher bias, so L least significant bits are taken from each integer in order to avoid large biases in the output bitstream

Figure 4
Figure 4 Simulations of protocol 2. (a) Distributions of times above threshold for a simulated time signal at different threshold values .Here, a sample denotes a timestep of the integrator.3This captures the experimental results, although the experimental and simulated distributions do not overlap completely due to uncertainty of the experimental parameters.(b) The simulated signal can be used to calculate the distribution q (using

Table 1
Comparison of SP800-90B testing for protocols 1 and 2 (no extraction).From protocol 1 we see that the non-IID entropy is underestimated if the IID assumption is true.On the contrary, we can see from protocol 2 that the min-entropy is greatly overestimated in IID testing if the IID claim is false; non-IID testing has to be done to get an accurate min-entropy initial claim before extraction.Protocol 1 entropy is estimated from one sample of size 0.4 × 10 6 bits, where for protocol 2, 10 samples of 1.4 × 10 6 bits were used

Table 2
The result of compressing a 1.41 Mbit/s bitstream with differently sized Toeplitz extractors T with compression ratios CR.The bistream was generated using protocol 2 with L = 6 and with the first LSB bit removed, which gives a biased output with correlations present on the order of L bits.The min-entropy per bit was determined after extraction by splitting the bitstream in 10 equally sized samples of approximately 10 6 bits and then using either non-IID or IID tests, depending on whether the IID assumption holds for the given bitstream