Improving randomness characterization through Bayesian model selection

Random number generation plays an essential role in technology with important applications in areas ranging from cryptography to Monte Carlo methods, and other probabilistic algorithms. All such applications require high-quality sources of random numbers, yet effective methods for assessing whether a source produce truly random sequences are still missing. Current methods either do not rely on a formal description of randomness (NIST test suite) on the one hand, or are inapplicable in principle (the characterization derived from the Algorithmic Theory of Information), on the other, for they require testing all the possible computer programs that could produce the sequence to be analysed. Here we present a rigorous method that overcomes these problems based on Bayesian model selection. We derive analytic expressions for a model’s likelihood which is then used to compute its posterior distribution. Our method proves to be more rigorous than NIST’s suite and Borel-Normality criterion and its implementation is straightforward. We applied our method to an experimental device based on the process of spontaneous parametric downconversion to confirm it behaves as a genuine quantum random number generator. As our approach relies on Bayesian inference our scheme transcends individual sequence analysis, leading to a characterization of the source itself.

derive analytic expressions for a model's likelihood which is then used to compute its posterior distribution.Our method proves to be more rigorous than NIST's suite and Borel-Normality criterion and its implementation is straightforward.We applied our method to an experimental device based on the process of spontaneous parametric downconversion to confirm it behaves as a genuine quantum random number generator.As our approach relies on Bayesian inference our scheme transcends individual sequence analysis, leading to a characterization of the source itself.
Random numbers have acquired an essential role in our daily lives because of our close relationship with communication devices and technology.There are also numerous scientific techniques and applications that rely fundamentally on our ability for generating such numbers and typically pseudo-random number generators (pRNGs) suffice for those purposes.A new alternative has been proposed by exploiting the inherently probabilistic nature of quantum mechanical systems.These Quantum Random Number Generators (QRNGs) are in principle superior to their classical counterparts and recent experiments have shown 4 that they can reach the same quality as commercial pRNGs.However, the natural question of how to assess whether a sequence is truly random is not yet fully established.Pragmatically, the NIST test suite 1 has become the standard method for analysing sequences coming from a RNG.The suite is based on testing certain features of random sequences that are hard to reproduce algorithmically, such as its power spectrum, longest string of consecutive 1's, and so on.Even though it constitutes an easily applicable procedure, recent findings show that its reliance on P -values is a drawback 5,6 , while its lack of formality is a major disadvantage.On the other hand, although no definition of randomness is deemed ab-solute, a rigorous characterization is presented by the Algorithmic Theory of Information (ATI) but it is unfortunately inapplicable in real cases 2 .An alternative which overcomes both formal and applicability issues is the Borel-normality criterion 3 (BN).Intuitively, this approach works by successively compressing a given dataset, e.g.ŝ = {0101010010101010101011010 • • •} of M bits, by taking strings of β consecutive bits and computing the frequency of occurrences γ (β) i of each of those i = 0, 1, . . ., 2 β − 1 possible strings.For example, β = 1 corresponds to looking for the frequencies of the strings {0, 1} in the dataset ŝ, while β = 2 corresponds to analysing the frequencies of the strings {00, 01, 10, 11}, and so on.The whole sequence is said to be Borel-normal if the frequencies are bounded individually according to and with β an integer ranging from 1 to β max = log 2 log 2 M .It is important to mention that BN criterion is a (nearly) necessary condition for a sequence to be considered random 2 .Note that this test is restricted to a-single-sequence classification, so it cannot determine the random character of the generating source.
In the present work, we show that randomness characterization can also be addressed using a Bayesian inference approach for model selection 7 , borrowing the compression scheme of BN.For simplicity, for a fixed β we denote each string with its decimal base representation The first step consists in identifying the models which could have generated a compressed dataset ŝ.For instance if β = 1, we can describe it as M realizations of a Bernoulli process, leading to two possible models: with and without bias.Similarly, for β = 2, a model represents a way of constructing ŝ with bias in some of the 2 2 possible strings.A simple combinatorial counting reveals that all the possible bias assignations correspond to all partitions of the four strings of Ξ 2 .
Thus, in general, given the set Ξ β , let P Ξ β denote the family of its possible partitions 8 , with B 2 β the Bell's numbers and 2 β K the Stirling numbers of the second kind, which counts the different ways of grouping 2 β elements into K sets.Formally, α (K) = {ω (1) , . . ., ω (K) } ∈ P Ξ β would refer to the -th partition into K subsets, but for notational simplicity we will omit henceforth the index .To each partition α (K) there corresponds a unique model M α (K) which assigns a probability p j to string j ∈ Ξ β according to the following rule: This means that all strings contained in a given subset ω (r) are deemed equiprobable within the specified model.Thus, keeping β fixed, the likelihood of observing the given dataset ŝ in a model M α (K) is: where k is the frequency of string j ∈ Ξ β and we have defined as the aggregate frequencies of the strings in the subset ω (r) .(For further use, we also introduce the relative aggregate frequencies γ ω (r) = β M k ω (r) .)From this perspective, only the model that is symmetric under any reordering of the possible strings is identified with a complete random source, because any other model entails biases assignations according to the strings' grouping represented by the corresponding partition.This symmetry only exists when the partition is the set Ξ β itself, hence we denote M α (1) = M sym .
Consider now that when characterising randomness the only essential feature is whether bias for or against some strings is present, but the degree of bias is irrelevant.We can eliminate the dependence on the bias parameters by multiplying with a prior for {θ r } K r=1 and derive the so called evidence for a given model 9 .Following 10 , we use the Jeffreys prior for it yields a model's probability distribution invariant under reparametrization and provides a measure of a model's complexity, thus giving a mathematical representation of Occam's Razor principle [10][11][12] .After integrating in the parameter space, we arrive at (see Supplementary Information (SI), Sec. 2) Eq. ( 4) is our main result, for it will let us perform the model selection straightforwardly.For M sym , its evidence is fairly intuitive: Finally, we want to infer the model that best describes our source, after a dataset ŝ is given.
Using Bayes' theorem the posterior distribution P (M α (K) |ŝ) reads: Henceforth we will consider a uniform prior over models (which is justified in SI), so the model's posterior is simply proportional to its evidence.
Suppose now we want to assess whether a source can be considered truly random.This is performed in two steps.As the first step, we need a model ranking procedure based on the posterior distribution.The second step consists in quantifying the goodness of our choice of model.
As a decision rule for the ranking process we use the Bayes Factor 13 perspective, Thus, we will choose M α over M α whenever BF α,α > 1.It has been shown that BF α,α provides a measure of goodness of fit and lim M →∞ BF α,α = ∞ if M α is the true model 14 .
To implement the second step, which is nothing more than a hypothesis testing problem, we have two alternatives: either we check whether log 10 BF α,α ≥ 2 which is considered decisive in favour of model M α 13 , or we compute the ratio between the posterior and the prior of a given model to assess how certain the posterior has become under the information provided by the dataset.
From a computational point of view notice that the evaluation of the posterior requires to being able to compute the normalization factor γ P (ŝ|M γ )P 0 (M γ ) that appears in (6).When the number of models is very large we can choose either to work with a subspace of models or use the logarithm of the Bayes Factor, as in this case the normalisation factor cancels out.
It is clear that a full test of randomness requires different values of β to be used for the same dataset, while the strings should be short enough so that the M bits allow for each of the possible models to be sampled at least once.Thus, heuristically, B 2 βmax ∼ M whence we can reproduce the BN limit 3 , β max ∼ log 2 log 2 (M ), after using an asymptotic expansion for the Bell number.
Note that by fixing β we have the set of parameters ({γ j } 2 β −1 j=0 , M ), whose space can be divided into regions identifying the likeliest model according to Eq. ( 4).As illustrative cases, in where the orange-filled area delimits the parameters values that renders M sym the likeliest model.
The top panel includes the bounds according to the BN criterion (green curves) given by Eq. ( 1), and shows that for any sequence length, M , our method allows for considerably smaller variations of γ 0 .This is a significant improvement, since only necessary criteria exist for testing randomness.
The lower panel depicts the analogous regions when β = 2, for which there are fifteen models (see a list in the SI) and we have fixed two frequencies: γ 1 = 1/6 and γ 2 = 1/4.The complete models distribution can be deduced from the structure of this graph, by distinguishing, a posteriori, the equiprobable strings for which the corresponding model is the likeliest.Thus more information than complete randomness classification can be readily obtained from our method.
Also in Fig. 1, the red curves of the β = 1 case are bounds obtained by comparing the likelihood of M sym with models involving partitions into K = 2 subsets.Agreement with the regions boundary is excellent.Our choice of K = 2 is justified as we would expect that models corresponding to partitions into two subsets to be the closest ones to the model M sym .An explicit expression for these bounds is derived in SI, Sec. 3, and Extended Data Figures 2 and 3 depict that they also bound considerably well the region in which M sym is the likeliest for β = 2.
For further benchmarking, we have compared our method against the NIST test suite 1 .The result is depicted in Fig. 2, as a function of the sequence length M and bias b employed to generate a 0. The upper panel on Fig. 2 shows the averaged number of tests passed when employing the NIST suite, while the lower one shows the frequency of M sym being the likeliest, for β = 1, 2 and 3. We believe that our technique can contribute to test the quality of RNG in a more stringent form, since by applying a single test thrice (once for each value of β), we determined more precisely the  As an application, we have tested our method in a bit sequence obtained experimentally from the differences in time detection in the process of spontaneous parametric down conversion (SPDC).Sequences generated via a SPDC photon-pair source have been shown to fulfil with ease the BN criterion, and to pass comfortably the NIST's suite 4 .In the SPDC process a laser pump beam illuminates a crystal with a χ (2) nonlinearity, leading to the annihilation of pump photons and the emission of photon pairs, typically referred to as signal and idler 15 .Our experimental setup is shown in Extended Figure 1 and we explain how to construct a 0 or 1 symbol from the detection signals in Section 1 of SI.We generated a 4 × 10 9 bits sequence, so we used all the possible models in the comparison, while, for computational ease, when β = 4, we restricted the model space to the 32, 768 models corresponding to K = 1 and K = 2 subsets (consider that B 2 4 = 10 10 ).Our inference showed that M sym was the likeliest model for every value of β.
As explained above, to achieve a full characterization of our QRNG as a random source, we need to go further from the model ranking based on the Bayes Factor and measure our certainty that M sym is the true model governing the source.This (un)certainty quantification is the hallmark of Bayesian statistics, since P (M sym |ŝ) represents the probability that modelling our QRNG as a random source is correct.Computing this posterior distribution directly from Bayes' Theorem, Eq. 6, we arrive at the values shown in Table 1 for each β.The first three values are at least 0.95, but the corresponding to β = 4 is about 0.32, considerably smaller.However, this represents an improvement of order 10 4 when compared with the initial value for the prior, P 0 (M sym ) = 1/32, 768 ≈ 3.1 × 10 −5 .Alternatively, we computed log 10 BF sym,α for each value of β.The values reported in Table 1 correspond to the comparison of M sym and the second likeliest model, hence the inequality for β > 2. These two criteria combined lead us to conclude that there is decisive evidence for our hypothesis that M sym is the underlying model driving our source, thus verifying that the photonic RNG is strictly random in the sense described in the article.From a more general perspective, we propose that P (M α (K) |ŝ) quantifies our certainty on the hypothesis that a sequence ŝ was generated using the biases on strings associated with α (K) .
Because Bayesian methods entails a model's generalizability 9,10 , the likeliest model provides a characterization of the source of ŝ.All partitions can be identified with standard computational packages, although it can be computationally demanding for sequences of ∼ 10 10 bits.In any case, once a partition is given, its model's likelihood is easily found using Eq. ( 4).A simplified analysis can be performed with the BN-type bounds given in Section 3 of the SI, which also leads to more stringent criteria than other approaches.
We have used a pump beam from a diode laser (DL407) centred at 407nm with ∼ 60mW power, and as nonlinear medium a β barium borate (BBO) crystal of 1mm length; see Extended Data Figure 1.The BBO crystal, which is negative uniaxial, was cut so that the angle subtended by the optic axis with respect pump beam axis is θ pm = 29.2• which yields phase matching for the generation of frequency-degenerate, non-collinear photon pairs.Signal and idler photons are emitted on diametrically opposed portions of an emission cone centred on the pump beam axis, with a 3.6 • half opening angle.Pump photons are suppressed by transmitting the signal and idler modes through a long-pass filter which transmits wavelengths λ > 488nm (F1), followed by a bandpass filter centred at 800nm with a 40nm bandwidth (F2).
A halfwaveplate (HWP2) and a polarising beam splitter (PBS) are placed on the signal arm so that the signal photon is transmitted or reflected with 50/50 probability.Each of the idler, reflected signal and transmitted signal collection modes is defined by an f = 8mm focal length aspheric lens (L1, L2 and L3) which focuses incoming light into the core of a multi-mode fibre with a 50µm diameter (MMF1, MMF2 and MMF3).The plane defined by the collection fibres is chosen for convenience to be parallel to the optical table.By monitoring coincidences between the reflected signal and idler modes, on the one hand, and between the transmitted signal and idler modes, on the other hand, we are able to probabilistically exclude double (and multiple) pair events.
Each of the three photon-collection fibres leads to a silicon-based avalanche photodiode (APD1, APD2 and APD3), which emits an electronic TTL pulse for each detection event.The times of arrival of these pulses are monitored with a time to digital converter (TDC; id800 from IdQuantique), or time-tagger, with a resolution of 81 ps.The TDC produces three time series containing the time of arrival data for each of the idler (i n ), and transmitted (s t n ) and reflected (s r n ) signal channels.We generate by post-processing the two time series defined as c t n = s t n × i n , and c r n = s r n × i n , corresponding to those bins for which there are coincident detection events between the (reflected or transmitted) signal and idler channels.A sequence of bits is generated by comparing the differences in time detection with a fully regular time series with the same number of events per second.A value of 1 is assigned if the time of detection is smaller than the corresponding time in the regular time series, and a value of 0 otherwise 4 .
We have checked on the efficiency of our QRNG in our experimental setup.According to our data, the efficiency based on the SPDC is 240 kilocounts per second in each channel.If only those events in which the signal and the idler photon are detected in coincidence are registered, the efficiency of random number generation is reduced to 27 kilocounts per second.Moreover, our experimental setup is such that we are able to discriminate four-photon versus two-photon events.This is achieved by noticing that, first of all, we have used a pump power such that the rate of four-photon generation is essentially negligible: less than 0.2% according to our data.Secondly, in one of the SPDC arms we have placed a beamsplitter so that by discarding those events in which both APD's in that arm click, we can eliminate all the events in which events are detected in same time bin in the three detectors.
Extended Data Figure 1: Experimental Setup.A pump laser beam centred at 407nm (DL407) incides into nonlinear BBO crystal.The signal and idler generated photons are emitted at diametrically opposed portions of an emission cone which yields phase matching for frequency-degenerate non-collinear photon pairs.A polarising beam splitter (PBS) and a Half wavelength plate (HWP2) are placed at the signal portion of the cone so this photon can be transmitted or reflected with a 50/50 probability, the reflected and transmitted signal and idler photons are collected into multimode fibers that lead to avalanche photodiodes(APD1,2,3) which emit a TTL pulse for each detection event.

Derivation of Jeffreys Prior and Model's evidence
The idea of the Jeffreys prior is to take into account model indistinguishability from a point of view of a statistical sample.Based on Sanov's theorem 17 we know that the volume of models which are indistinguishable is inversely proportional to the square root of the determinant of the Fisher information matrix.This idea of measuring relevant volumes across models, but using a graining approach has also been explored previously 10,11 in a rigorous geometric treatment.Note that in this case, our parameters are the θ's of which only the (say) first K − 1 are independent due to the normalization requirement.Then, considering a model M α (K) -also obviating the index in the partition, as we did in the main text -we have the following minus log-likelihood for a string From here we derive the Fisher information matrix J ab for a, b = 1, . . ., K where . The proportionality constants will cancel out, once we normalize our expression for P Jeff .From here we have the following expression for Jeffreys prior: where the normalization factor comes from: Notice that in this case the Jeffreys prior always behaves as a proper one, that is, it is normalizable.
Finally, a similar integration shows that the model's evidence is given by 9 This allows us to identify the terms r) as the maximum likelihood estimators, and the ones involving the gamma functions as a measure of the relevant volume occupied in the parameter space, related to the model's complexity 10 .
3 Borel-normality-type (BN-type) bounds Suppose we are interested in discerning whether a given sequence is completely random or not.
This means that we must look for the region in the parameter space ({γ j } j∈Ξ β , M ) in which the evidence of the symmetric model -corresponding to the partition of Ξ β into one subset-is bigger than the rest of the models.As the empirical frequencies {γ j } j∈Ξ β are grouped into K subsets for a given partition α (K) , then the corresponding model has in effect K − 1 free parameters {γ ω (r) } K r=2 .Recalling that we used the Bayes Factor as a decision rule in the main text, we can explore the conditions such that M sym is the likeliest by the behaviour of the log-likelihood ratio, log P (ŝ|Msym) P (ŝ|M α (K) ) .
To obtain a BN-type bound, we do the following: i) look for the values {γ ω (r) } K r=2 which extremize the log-likelihood ratio; ii) do an expansion around those values up to second order.We eventually obtain: where the γ -unknowns obey the following set of equations ω (1) , r = 2, . . ., K .
Here the function ψ n (x) is the polygamma function of order n, with ψ(x) ≡ ψ 0 (x).As the symmetric model is the one that corresponds to no-free parameters, one could reasonable assume that the models which are closer to M sym are those which correspond a single free parameter.
This, in turn, corresponds to subfamilies of partitions into two subsets of lengths {2 β − q, q} for q = 1, . . ., 2 β /2, which will have aggregate frequencies 1 − γ |q| and γ |q| respectively.This is also justified by the lower panel of Figure 1 in the main text, which shows that the transition from K = 1 to a bigger value should necessarily go through a region where a model with K = 2 is likelier than M sym .Applying this to the set of Eqs. ( 11) and ( 12) we obtained that γ with the function W(γ |q| ) defined as where γ |q| are the aggregated frequencies of a subset of size q satisfying the extremisation condition for q = 1, . . ., 2 β /2.
In particular, for β = 1, there is only one model to compare to M sym , which precisely corresponds to K = 2. Here, the solution of ( 14) is exactly γ |1| = 1/2, which provides the following bound: This is the formula we used to draw the red curves in the top panel of Figure 1 of the main text together with the exact diagram.Agreement for this simple bound is excellent compared to the exact formulas, and rather different as compared to the one of BN.For the case β = 2, one must solve the set of equations numerically to evaluate the bounds.They work reasonably well in the parameter space and much better than the BN bounds as shown in Extended Data Figure 2. Notice that in these figures we only depict two regions in the parameter space: the orange one corresponds to the region in which the symmetric model is likeliest, while the grey-filled area in which it is not.
These previous bounds have the disadvantage of needing to solve the system (14) numerically.However, looking at the set of Eqs.(12) we notice that there is a particular set of partitions for which its solution is particularly simple, namely when the system is solved using only equipartitions, that is, partitions into subsets of the same size.With this restriction, it is possible to find simpler, less restrictive bounds, yet tighter than the ones derived from other methods.Suppose that we look at partitions into K subsets.Within this family (and of course for even K) we will have a subfamily of equi-partitions.For them we have that |ω and comparison between the bounds given by the simple formula (13) (solid red line) and the Borel-normality bounds (solid green line).K = 2 β subsets, the formula (11) becomes: a bound which, unlike the one of Borel-normality, couples all the empirical frequencies.Results of these broader bounds are plotted in Extended Data Figure 3.
4 Some examples for the evidence In this section, we illustrate, with some specific examples, the formulae Eq. ( 4) for the particular case of β = 2.Because explicit reference to specific partitions is made, we will use the full notation α (K) , although there is no natural order to assign the index .In this case we have the

Figure 1 :
Figure 1: Phase diagram of Randomness Characterisation.Division of the parameter space

Figure 2 :
Figure 2: Comparison with NIST Suite test.Comparison of the bias allowed on a given sequence

1 =1/ 6 , γ 2 =1/ 4 Extended Data Figure 2 : 1 =1/ 6 , γ 2 =1/ 4 Extended Data Figure 3 :
(r) | = |ω (1) | = 2 β K and therefore k ω (r) = M/(βK) and γ ω (r) = 1/K.In particular, for the model corresponding to a partition into BN-type bounds.Phase diagram of model selection for the 15 models for β = 2 and various fixed values of γ 1 and γ 2 .Here the orange filled area represents the region in which model M sym is the likeliest, while the grey filled area represents the region in the parameter space in which any other model is the likeliest.Solid red lines represent the BN-type bounds.We also compare with the BN bounds (green filled region).Notice that for the second and the third case, the BN bounds also provides a bound for M given by the solution of |1/4 − 1/5| = log 2 (M ) M and |1/4 − 1/6| = log 2 (M ) BN-type bounds.Phase-type diagram for model selection for β = 2