Semi-device-independent random number generation with flexible assumptions

Our ability to trust that a random number is truly random is essential for fields as diverse as cryptography and fundamental tests of quantum mechanics. Existing solutions both come with drawbacks—device-independent quantum random number generators (QRNGs) are highly impractical and standard semi-device-independent QRNGs are limited to a specific physical implementation and level of trust. Here we propose a framework for semi-device-independent randomness certification, using a source of trusted vacuum in the form of a signal shutter. It employs a flexible set of assumptions and levels of trust, allowing it to be applied in a wide range of physical scenarios involving both quantum and classical entropy sources. We experimentally demonstrate our protocol with a photonic setup and generate secure random bits under three different assumptions with varying degrees of security and resulting data rates.


Introduction
Randomness is an important resource in modern information science.It has a great number of applications, ranging from randomized sampling, simulations, randomized algorithms and above all, cryptography.Many of these applications critically depend on the quality of random numbers, and therefore the design of high quality random number generators (RNGs) is of utmost importance.There are many different sources of entropy that can be utilized for random number generator designs.These range from simple to generate but hard to predict computer data (such as the movement of a mouse cursor on a computer screen or the time between user keystrokes), to seemingly random physical phenomena (such as thermal noise or the breakdown in Zener diodes).In this regard, quantum mechanics offers the possibility of truly random events, such as nuclear decay or photons traveling through a semi-transparent mirror (see [1] for a review on quantum random number generators).
The quality of random number generators is traditionally assessed with the help of statistical tests, which can verify that the produced string is virtually indistinguishable from a truly random string.In essence, however, such an approach to analyzing random number generators is problematic, because the statistical tests do not assume anything about the origin of the data they test.As an example, take the binary expansion of the number e -although the string created in this manner would pass many of the conventionally used statistical tests, it is obviously not suitable for cryptographic purposes.This ignorance of the process used to generate the tested random string opens a window to various security risks.Aside from malicious attacks on the random number generator, such as inserting back-doors [2] or displaying a simple bias towards certain strings [3,4], its functioning can be compromised by a simple hardware malfunction, which is often hard to detect [5].
Considerations such as these have recently resulted in a different approach to random number generator designs based on quantum phenomena, where stronger forms of randomness certificates are possible [6].Such quantum random number generators are called device-independent (DI-RNGs) [7][8][9], because they assume very little about the hardware they use.The security proof for these devices is usually based on Bell-type arguments: the random number generator is composed of several non-communicating parts and runs for a set of randomness-generation rounds, which involve a predetermined quantum measurement.In a small, randomly chosen fraction of the run-time, the device is tested.In these test rounds, the ability of the devices to violate Bell-type inequalities is verified [10,11].The violation of local-realism can be seen as a certificate that the devices use quantum measurements and their outcomes are fundamentally unpredictable.Since Bell-type arguments do not assume anything about the devices used apart from space-like separation, this approach can truly be seen as device-independent.The disadvantage of DI-RNGs lies in their implementation -loophole-free Bell violations have been achieved only recently and under very strict laboratory conditions [12][13][14].
In this paper we present a new approach to semi-device-independent randomness certification that allows for flexible assumptions about the workings of an RNG.What sets our work apart from other SDI-RNG proposals is that our framework is formulated in a high-level abstract language of trusted randomness sources.This allows us to certify randomness in a large number of practical implementations utilizing both quantum and classical entropy sources.
Additionally, our framework can work with different levels of trust in particular parts of the RNG, without changing the protocol itself.This is in contrast to existing SDI-RNGs, where the protocol relies on a fixed set of assumptions about specific parts of the device.We showcase this flexibility using a photon source and a beam splitter as the source of entropy.Changing the assumptions on the photon source-whether it produces either single photons, coherent/thermal states, or is an unknown source characterized only by its average photon production rate-is possible in our framework, at the cost of changes in the amount of certifiable entropy.Unlike previous SDI-RNG designs, our implementation therefore comes with a user-defined security/production rate trade-off.
The paper is organized as follows.In section 2 we introduce our general framework, which consists of three abstract models of entropy sources, and a general protocol to extract perfect randomness from them.We discuss methods to lower bound the entropy of strings obtained from our protocol in section 3. Section 4 is devoted to a particular experiment implementing the described entropy sources with the use of a photon source and a beamsplitter.Here we also discuss how different assumptions on the experimental setup change its description within our framework, which results in trade-off between security and randomness production rate.Finally, in section 5 we experimentally implement the entropy source described in section 4 and post-process its outcomes with three different sets of assumptions, based on the amount of trust placed on the photon source.

General framework
In this section we introduce three different abstract models of randomness, with decreasing level of trust and describe a protocol, which uses a trusted shutter to extract randomness from such sources.
Our basic assumption about the entropy source is that at regular time intervals, it produces a signal with probability p, and with probability 1 − p, no signal is produced.Such an assumption on the source is conceptually simple, very natural, and in fact many conventional entropy sources mentioned above, such as Geiger counters, thermal noise, or the breakdown in Zener diodes can be modelled in this way.One might argue that such an assumption on the source is too strong, because if one also assumes perfect and trusted signal detectors, extracting randomness from such a source is trivial -click events can be interpreted as "1", and no click events as "0".Entropy of such an output string is easily calculable and it can be post-processed into a perfectly random string.Indeed, early trusted commercial quantum random number generators can be described this way (e.g.IDQuantique [27] using a photon source and a beam splitter as an entropy source).The main result of this work is that the above assumption on the entropy source can be made sufficient even in the case of partially untrusted measurement device.
In order to achieve this, we add an additional component to the setup -a movable shutter, which can block the signal being sent from the source to the measurement device (see Fig. 1).We call this scenario a simple scenario and the source of entropy a simple source.

A(x) S D
Figure 1: The simple scenario consists of a simple entropy source, S, which emits a random signal with probability p.This signal is assumed to be unpredictable to any potential adversary.The signal can be blocked with the help of a movable shutter, A, controlled via a binary variable x.The measurement device, D, is assumed to be dishonest.
Taking the simple entropy source introduced above as a building block, we can generalize to a scenario referred to as a mixed source scenario, where the entropy source is a probabilistic mixture of multiple simple sources.Formally, we define a discrete (potentially infinite) probability distribution, γ = {γ i }, γ i ≥ 0 ∀i, ∑ i γ i = 1.We associate a simple source S i with each γ i .
In the mixed source scenario, the simple source S i is chosen with probability γ i and subsequently a signal is sent with probability p i (see Fig. 2).

A(x)
In the second scenario, the entropy source S (depicted by a dashed rectangle) is a probabilistic mixture of several (potentially infinitely many) simple sources S 1 , . . ., S n .A random variable γ is used to choose a simple source S i , which emits a random signal with probability p i .The choice of source S i in a given round is known to the measurement device D and the adversary, but not to the user.The random variable γ is constrained either by a fixed probability distribution, or in a more general scenario by a fixed expected value.
The value of the random variable γ in each round is assumed to be known to the adversary and the measurement device, but unknown to the user.In order to derive bounds on the entropy produced, the variable γ has to be at least partially characterized.This characterization takes the form of a (potentially infinite) sequence of constraints, {f j (γ) = c j }.The strongest of such constraint sets is describing γ completely by specifying each γ i .We study this special case separately, however, we show that the entropy of the RNG output can be lower bounded even for weaker characterizations of γ.In fact, this is possible already if the constraint set contains only a single (smooth) function (see section 3).
Before we describe our protocol for extracting perfect randomness from the sources described above, we list a number of required technical assumptions.
• The shutter (A) can be reliably controlled by the user through their inputs x.

Data Collection
(1) In each round i ∈ {1, . . ., N } decide whether the current round is a test round (Q i = TEST) or a generation round (Q i = GEN) at random with probability (q, 1 − q).
(2) If Q i = TEST, choose the shutter setting open/closed (x i = 0 x i = 1)), each with probability 1 2 , 1 2 .Then record the setting and the measurement outcome (x i , y i ) (y i = 1 if the signal is detected, otherwise y i = 0).
(3) If Q i =GEN perform the measurement with the shutter open (x i = 0) round and record the measurement output y i .

Post-Processing
(1) Use the rounds i with Q i = TEST to estimate test statistics S, (see (2.1)).
(2) Estimate the min-entropy H min (Y E) of the random variable Y = {y i Q i = GEN}, (see section 3).
(3) Choose a security parameter ε and use a universal hash function on Y to obtain a string Z, (see section 5).
Protocol 1: An outline of our randomness extraction protocol.For post-processing, note that the estimation depends on the assumptions made on the entropy source used.The output of the protocol Z (of length H min (Y E)), is a random variable whose distribution deviates at most ε in variational distance from a uniform random variable.
• The user has access to a uniform random seed X uncorrelated to the devices.For example, this can be a private randomness source or public randomness beacon [28].
• The entropy source (S) is a passive element which does not change from round to round.
• The measurement device (D) is memoryless, which together with the previous assumption implies that each round is identical and independently distributed.
• In case of quantum entropy sources (S) in which the signal state is in a superposition with no-signal state, the measurement device (D) is described by a projector onto a basis that contains the no-signal state (see section 4 for an example with coherent photon sources).
• There is no communication between the devices besides the signal channel, and the laboratory is shielded from external eavesdroppers.In particular, neither the measurement device nor the source receive direct information about the shutter settings x.
As in any cryptographic protocol, if any of these assumptions cannot be met, the security of the final string cannot be guaranteed.The assumptions imply that the measurement device is left mostly uncharacterized, in particular, it may still be classically correlated with an adversary.Now we are ready to present the protocol, which consists of two parts: data collection and post-processing.For practical purposes, the protocol is run in large batches of N rounds.For the full description, see the box, Protocol 1.As is seen from the protocol, the user will use the testing rounds to obtain a statistical estimation of the workings of the device.S = (P (click x = 0), More precisely, the user will create a vector Ŝ as an estimate for S in Eq. (2.1), which will be filled out with observed experimental frequencies.This introduces an estimation error ε e , which can be made arbitrarily small by increasing the number of rounds in a batch N , and the testing rate q.To keep the main text easier to read, we assume that the experimentalist has access to the actual probabilities in Eq. (2.1), and elaborate on the sampling error in appendix E.
In the following section we describe the post-processing procedure.The goal is to estimate min-entropy H min (Y E) of the output string Y conditioned on the knowledge of the adversary E. Min-entropy H min (Y E) roughly describes the length of a perfectly random string obtainable from Y with the help of randomness extractors [29].The lower bound on min-entropy is obtained by lower bounding the probability of the adversary to guess the outcome of a single randomness generating round (shutter open), denoted g * .The obtained lower bound on depends on g * the type of the entropy source used -simple or mixed, with or without the full characterization of γ.Finally, guessing probability, is related to min-entropy of the outcome Y as where Y is the number of randomness generating rounds.

Entropy estimation
In this section we give a procedure to estimate the entropy of the data collected in the protocol described in the previous section.This is split into three parts based on the type of entropy source used.
3.1 Simple source entropy estimation.
The simplest case uses a simple entropy source S which sends a signal with probability p.Based on the assumptions introduced earlier, the strategy of the measurement device to click (i.e.behave as if it detected the signal) in a given round can be based only on whether the signal arrived or not.There are only four possible deterministic measurement device strategies: "Never Click" (S N ), "Always Click" (S Y ), "Click Honestly" (S H ), "Click Dishonestly" (S ¬H ).We represent these strategies by the observable behaviours of the measurement device, which can be expressed as vectors (P (click x = 0), P (click x = 1)): General non-deterministic strategies can be expressed as convex combinations of these four deterministic strategies, with a hidden variable λ that is shared between the measurement device and the adversary.We assume that the values (Y, N, H or ¬H) of the hidden variable λ are identically distributed throughout the rounds according to a probability distribution λ Y , λ N , λ H , λ ¬H ≥ 0; λ Y + λ N + λ H + λ ¬H = 1.The adversary tries to guess whether the measurement device clicked or not in each given round based on their knowledge of λ, and thus to guess the outcomes of the RNG.
Let us highlight the importance of the trusted movable shutter in our design.If the design did not contain it, then setting the random variable λ to be uniformly distributed over the strategies S N and S Y would lead to the output of the RNG being uniformly random as well.It would therefore pass any statistical test with high probability, even though the adversary would posses its perfect copy, rendering it useless for any cryptographic purpose.
In order to safely bound the amount of the entropy produced, the user must assume that any deviation from the idealized honest scenario S H is correlated with information gained by the adversary.We measure the information gain by the adversary's optimal guessing probability, g * , which is related to the min-entropy via g * = 2 −H min (Y E) .If in a given round the measurement device is following the strategies S N or S Y , then the adversary can be certain of the output, but if S H or S ¬H is used, the guessing probability is reduced to g ∶= max(1 − p, p).
Without loss of generality, let us assume that P (click x = 0) > P (click x = 1) (the other case can be treated similarly, due to the symmetry of the measurement device strategies).Then, S e = (α, β) can be written as the convex combination 1  2 (2β, 2β) + 1 2 (2α − 2β, 0).Note that the (2β, 2β) part can be obtained by the measurement device by using only the strategies S N or S Y , i.e. without decreasing the adversary's guessing probability.On the other hand, the (2α − 2β, 0) part can be obtained by using strategy S H only.In particular, this means that whenever our assumption holds, the adversary's optimal strategy is to set λ ¬H = 0, and their guessing probability can be obtained by solving the following optimization problem: Since the first three constraints contain only three variables and take the form of equalities, we can directly solve them for λ {N,Y,H} to obtain: In order to satisfy the last constraint, λ N,Y,H ≥ 0, the following needs to hold: These six conditions are not independent, but they can be reduced to three conditions which are required for the existence of the solution (see appendix A for a geometric interpretation): If these conditions are satisfied, the result of the optimization is and where Y is the size of the output string Y .

Mixed source entropy estimation
Let us now turn to the more involved case of a probabilistic mixture of countably many simple sources.Recall that in this case the source S is a mixture of simple sources S i , characterized by a known probability distribution γ.Since the measurement device knows which source S i is being used in a given round, it can produce different statistics S i for each source, and the overall observed statistics can be written as S = ∑ i γ i S i .Just like in the case of a single simple source, without loss of generality we assume that S = (α, β) satisfies α ≥ β.This assumption also implies (see appendix B) that in the optimal solution each S i = (α i , β i ) satisfies α i ≥ β i , as well as the full set of conditions in Eq. (3.3).Thus, for each source S i the produced statistics can be written as ), S N = (0, 0) being the constant strategies and S H i = (p i , 0) the honest strategy of the source S i .Since each source S i produces entropy according to g i ∶= max(p i , 1 − p i ) and contributes to the overall guessing probability g * by g * i = λ i,Y + λ i,N + λ i,H ⋅ g i weighted by γ i , we have: Hence, the bound to the adversary's guessing probability is given by the solution to the following linear program: In order to formulate the solution to this optimization problem, let us introduce some notation.We start by dividing the set of all entropy sources S into two sets, S + and S − .The source S i belongs to S + if and only if p i > 1 2 , otherwise it belongs to S − .Let us also define N + as the number of sources in the set S + (including the possibility that N + represents ∞).We will use positive integers i ≥ 1 to label the elements of S + , and negative integers i ≤ −1 to label the elements of S − .This allows us to define N − = − S − , where S − is the cardinality of S − (again, potentially infinite).Then, without loss of generality, we will use the ordering of the sources in the set S such that ∀i > j, p i ≥ p j .We use the convention that unless specified otherwise, ∑ i denotes the sum over all sources from S.Last but not least, note that we deliberately left out the index i = 0, as it is used in a formulation of the solution and its proof.
Using the above notation, the solution of the optimisation problem (3.6) reads (see appendix B for proof): Here, if ∑ i∈S+ γ i p i ≤ α − β, then N = 0 and p N = 1 2 , and otherwise N is defined to be the largest natural number such that Again, the guessing probability g * allows us to lower bound the min-entropy of the output string Y of length Y as

Mixed source with partial information
In the more general case, the probability distribution γ, which chooses the simple source S i to use in a given round, is not fully characterized, but is constrained by a set of functions, {f j (γ) = c j }.Formally, all the arguments from the previous case remain the same, except now the optimization needs to be done over the parameters γ i as well.The maximization task can be stated as follows. (3.9) Since the functions f j are in principle arbitrary, the constraints might not be linear anymore and thus it might not be possible to efficiently solve the problem, even numerically.However, if the functions f j are smooth, for every fixed distribution γ, we are able to optimize over the variables {λ} according to the previous section.Therefore we can use the solution in Eq. (3.7) as the objective function, and the optimization problem becomes: Note that this is still not an easy optimization problem, because even if the functions f j are smooth, a minor change in the distribution of γ i might lead to a change in the starting point of the summation in Eq. (3.8), as N is implicitly dependent on γ i via Eq.(3.8).
To address this problem, let us change the perspective on N .Instead of N being an implicitly defined value dependent on γ, we will interpret it as a free parameter.Additionally, it can be shown (see Appendix B) that the maximum is obtained when the condition in Eq. (3.8) for γ is satisfied with equality.In such a case the objective function can be written in a simpler form (see Eq. (B.36)) and the optimization problem becomes: Note that if we fix the value of N , this maximization problem becomes much easier, because the target function is linear in γ.This yields a simple algorithm to find the solution of (3.11).One can simply solve the problem for each possible N ∈ {1, . . ., N + }, and take the overall maximum over the solutions as the final outcome.
This algorithm of course involves a potentially infinite number of optimization problems to solve, but for simple (e.g.linear) constraint functions f j it can be shown that there is a threshold value N max , such that it is not possible to satisfy both the conditions given by f j and ∑ N+ i=N γ i p i = (α − β), whenever N > N max .Last but not least, note that if the functions f j are linear, for each fixed value of N the optimization problem (3.11) is a linear program and thus can be solved efficiently.Additionally, in appendix C we show that in case of a single linear constraint function f , feasible values of N are constrained to a small finite interval, which renders the optimization efficient.The solution to this optimization problem again yields the probability g * of the adversary to guess the outcome of a single generating round, which is related to min-entropy of output string Y as 4 Example: A photon through a beam-splitter In this section we describe a simple optical setup for randomness generation and analyze it with the help of our framework.The entropy source S consists of a photon source PS emitting photons through a beam-splitter BS with reflection probability π.Transmitted photons are coupled to a photon detector D and their path can be blocked by a movable shutter A, which can be reliably controlled via a binary variable x.Reflected photons are discarded (see Fig. 3).We use this physical setup to showcase the assumption flexibility our framework allows for.First of all, the model that describes the entropy source in this setup depends on the assumption we place on the photon source PS.

A(x)
PS BS

S D
Figure 3: The photonic entropy source S (depicted by a dashed rectangle) consists of a photon source PS coupled to a beamsplitter BS with the probability of reflection π and the probability of transmission 1 − π.Transmitted photons are interpreted as a random signal emitted from the entropy source, while the reflected photons are discarded.In order to extract randomness from such a source, we use a mostly uncharacterized and untrusted measurement device and a trusted shutter controlled by a binary variable x.According to the assumptions we place on the photon source PS, this photonic setup is able to realize all three different general scenarios that we introduced in section 2.
Single photon.If the photon source PS produces a single photon on demand, the entropy source S is a simple entropy source with the probability p = 1 − π of sending a signal.
Known photon distribution.If the photon source PS produces i photons with known probability γ i , the source S is a mixture of simple sources S i with p i = 1 − π i and mixing probability γ = {γ i }.
Known mean number of photons.If the photon source is characterized only by the mean photon number µ, the setup corresponds to a source S which is a mixture of simple sources S i with p i = 1 − π i and the mixing probability γ is constrained by While the single photon source case can be easily seen to be a simple source, the other two cases require further explanation.Assume that the source produces an n-photon event, where n ≥ 2. Since the number of photons transmitted through the beam-splitter can vary between 0 and n, the information available to the (photon-counting) measurement device is more complex than just binary information about receiving the signal or not.The response function of the measurement device is therefore potentially more complex than the four deterministic functions described in Eq. (3.1).In fact, there are 2 n+1 different deterministic response functions assigning click/no-click measurement device events to the number of transmitted photons.In appendix D we show that in spite of this exponential increase, for each n there are only four response functions that yield the optimal guessing probability for the adversary.The first two are "Never Click" and "Always Click", which are fully deterministic and do not depend on the number of received photons.The third response function is labeled "Click Honestly".Using this response function, the measurement device clicks when a positive number of photons arrive and does not click when no photons arrive.The last response function is called "Click Dishonestly", and the measurement device clicks only when no photons arrive.These are exactly the four strategies that characterize a simple entropy source, since the measurement device decides only on the binary information whether it received a signal (i.e.non-zero number of photons) or not.Therefore, the second case can be characterized as a mixed source with known mixing probability γ and the third case as a mixed source with mixing probability constrained by mean photon number µ.
Note that in the setup described above we do not assume anything about the coherence of the photon source PS.In fact, in order to be able to describe the strategies available to the measurement device as in the above paragraph, the setup needs to fulfill one out of two assumptions.Either PS produces states which are diagonal in the Fock basis (e.g.thermal states), or the measurement device is measuring in the Fock basis.In both cases, the mapping to the abstract mixed entropy sources is straightforward.Assuming that the state is diagonal in the Fock basis implies that the whole setup can be implemented with the use of classical sources of light.On the other hand, the Fock basis measurement assumption is very well motivated from the practical point of view and it allows us some leeway in the description of the PS.Namely, we do not require the source to produce a specific photonic state; it can be characterized solely by a probability distribution γ or its mean.Now we are ready to formulate the lower bounds on the guessing probability g * of a single generating round in case of the known photon distribution and known mean number of photons, which can be related to min-entropy of the output string Y of the RNG protocol as

Known distribution
According to the solution of the general case in Eq. ( 3 In this case, the optimal guessing probability is the optimal guessing probability is

Mean number of photons
In this section we assume that the photon source is characterized only by its mean photon number µ.This assumption requires us to solve an optimization problem of the form (3.11), where now the condition and In appendix C we show that the solution to this optimization problem contains only three non-zero probabilities γ i : for each feasible value of N = i, i ∈ {1, . . ., N + }.After plugging these values of γ 0 , γ N , γ N +1 into the target function, we obtain the optimal guessing probability: The overall solution of (3.11) is max N {g * N } where the the maximization is done over all feasible values of N .In appendix C we show that in general there is only a finite number of feasible values N , and therefore the maximum always exists.Although the number of these values can still be prohibitively large, in the analysis of the data obtained from the experiment we conducted (see section 5), only a single feasible value of N was encountered, making the analysis very efficient.

Other possible assumptions on the photonic setup
In order to emphasise the flexibility of our framework, in this subsection we discuss possible modifications of the optical setup described above.Notice that two probability distributions are characterized in the above setup.The first one is the photon number probability distribution γ and the second one is the beam-splitter reflection probability π, or, more generally, binary distributions with probability of success 1 − π i associated with each photon number.The difference between them is that in our setup, the randomness resulting from the beam-splitter events is assumed to be private, unlike γ, which is available to the adversary.Essentially, our framework can be seen as a procedure to certify randomness originating from the trusted source (in this case the beam-splitter) in a noisy setup, where the noise is only partially characterized.
One can, however, assume that the photon emission is also a private random event characterized by γ.This is natural if the photon source PS is coherent for example, since in this case it is impossible for the adversary to know the photon number in a given round before the measurement.In such a case, both the entropy originating from the beam-splitter and the entropy of the photon source can be combined into a simple source with the probability of signal Or, even more interestingly, the beam-splitter can be left out from the setup altogether and assuming Fock measurements, the setup can be analyzed as a simple source with signal probability 1 − γ 0 .Note that a similar experiment was studied in two recent works [20,30], but was analyzed by different techniques.In this section we implement the optical random number generation setup described in section 4. In the experimental implementation (see Fig. 4), the photon source PS is a source of weak coherent pulses, the shutter A(x) is implemented with an electro-optic modulator (EOM), and the detection D is performed by a single-photon detector (SNSPD) and a counting logic (time tagger) to extract the output bit string.In real world network applications, clock synchronisation of optical pulse trains and detectors would be straightforward.However, for the purposes of this demonstration, all electrical signals are generated by a Swabian instruments Pulse-Streamer 8 2. We drive a laser diode with 0.8 V analogue pulses of 8 ns duration (limited by analogue output bandwidth) at 5 MHz (limited by single-photon detection amplifier deadtime ∼ 150 ns), attenuate the weak coherent pulses to ∼ 1 photon per pulse, and are then incident through a fiber beam-splitter with power transmission 0.5118 ± 0.0005.These pulses are fastswitched via a fiber lithium niobite electro-optic intensity modulator driven by a digital output of the signal generator, which with probability q = 8 100 blocks the channel (i.e.TEST rounds with x = 1).A typical extinction ratio of 1 100 is observed and a slight thermal drift is calibrated for each 100, 000 rounds of the experiment.The pulses are then routed to the detectors through a channel with lumped efficiency from switch to detectors of 0.9339 ± 0.00005.Detection is made by superconducting nanowire single photon detectors with efficiency 0.9231 ± 0.0007.A QuTools QuTAG counting logic records time-tags from the detectors along with a clock signal from the signal generator, allowing coincidences for each pulse to be extracted using a 10 ns coincidence window, and the output bit string to be recovered.

Experiment and data analysis
Once the data is collected, we begin the post-processing on batches of size N = 100, 000.With probability 8 92 we randomly select some of the non-blocked rounds to be TEST rounds with x = 0 (such that the expected number of test rounds with x = 0 is the same as the expected number of test rounds with x = 1).Given the large number of total test rounds (∼ 16, 000), we use the Chernoff-Hoeffding bound to calculate the test statistic Ŝ (2.1) with sampling error ε = 10 −6 .In particular, we give a conservative estimate of S and the probability that either α or β falls outside of the desired interval is 2ε (see appendix E for details).For each batch, we used Ŝ to calculate an upper bound on the adversary's guessing probability g * .We have performed separate estimations for the three scenarios; (i) a single photon source, (ii) the photon number distributed according to a Poisson probability distribution with mean µ, (iii) the photon number being µ on average.For cases (ii) and (iii), we used µ = 1.06 since it is an upper bound on the observed average photon number per pulse, and that yields the least amount of entropy.
For simplicity, the length of the output string Y was chopped to a constant size of Y = 83, 000 per batch, which leads to a final lower bound on the entropy of the string Y , expressed as H min (Y E) ≥ −83, 000 ⋅ log 2 (g * ).To extract the final random string Z from Y (see the protocol description in Fig. 3), a hashing function, in our case a random binary Toeplitz matrix, has been applied to Y .To keep the discussion clear, we shall focus on case (ii) of a known (Poisson) distribution.The remaining cases follow an analogous post-processing strategy.In order to reduce the amount of randomness needed, we only generated one Toeplitz matrix and re-used it for every batch.Indeed, the Leftover Hashing Lemma guarantees that when using a Toeplitz matrix of dimensions Y × (− log 2 (g * ) + 2 log δ), the output string Z is at most δ-far in 1 -distance from being uniformly distributed [31].We take δ = 2 −100 ≈ 10 −31 , which implies that we can re-use the Toeplitz matrix ∼ 10 20 times and still maintain a 1 -distance from the uniform distribution of no more than 10 −10 .
Note that the estimates for different batches of data differ.However, in order to re-apply the same hash function to each batch, it is important to guarantee that all batches have at least the same amount of min-entropy.If all batches were used, the total min-entropy would be bounded by the min-entropy of the worst batch which can be rather low.Therefore it is advantageous to discard a (small) number of batches with low certified min-entropy, which increases the amount of min-entropy per batch, but decreases the number of batches.The experimental distribution of estimated min-entropies (per bit) can be seen in Fig. 5.One can optimize the cutoff threshold min-entropy for each case in order to extract the maximum amount of randomness.
In the known photon-number distribution scenario (ii), we chose a cutoff of 0.167 bits of entropy per physical bit, i.e. any batch whose estimated min-entropy was lower was simply discarded.Therefore, the Toeplitz matrix generated was of size 83, 000 × 13, 661.For demonstration purposes, we collected data from 1, 000 batches with each batch on average having an estimated 0.185 bits of entropy per physical bit.97.5% of the batches were calculated to be above the set threshold, resulting in a total of 13.2 Mbits of extracted randomness.The results for the other cases can be found in Table 1.
Finally, we carried out the industry-standard NIST randomness tests using an improved implementation presented in [32].As expected, the processed output performed well in all of these tests.

# Assumptions
Cutoff The same raw data (1, 000 batches consisting of 100, 000 rounds each) was used in each case.For comparison, a hypothetically ideal experiment, with beam splitter transmittance p = 1 2, single photon source PS, and perfect observed statistics S = (1 2, 0), would produce 69.5 Mbits from the same data.This takes into account that the data still has to be tested, and the estimator Ŝ is always taken in a conservative way.

Conclusions
In this paper, we have presented a novel framework to design and analyze semi-device-independent random number generators.In contrast with previous approaches, our framework does not require any fixed assumptions to be made on the workings of an RNG, can be applied in a very broad family of physical implementations, and can cover very different levels of trust placed on different parts of the random number generator.The centerpiece of our approach consists of a shutter that can trustfully block the transmitted signal.This, in connection with some limited trust in the source and/or measurement devices, proves to be enough to certify randomness.During the certification protocol, sample data is collected in order to characterize the behaviour of the measurement devices during both shutter settings (open or closed).This sample data is subsequently used to calculate the probability that the adversary, who is classically correlated with the measurement devices, is able to guess the outcome of the measurements with open shutter.This calculation involves solving a number of optimization problems expressed as linear programs.The exact formulation and number of these linear programs depends on the level of trust we place into the entropy source.Three different trust levels are possible: (i) a simple source emits a signal with probability p; or the entropy source is a mixture of multiple such simple sources governed by a probability distribution γ, which is either (ii) fully characterized; or (iii) partially characterized.The main benefit of our framework is that all three characterizations can be used with the same physical setup and can be seen as different levels of trust placed onto the entropy source.
We showcase the applicability of the framework by implementing a random number generator using a weak coherent optical source and a beam-splitter.This implementation allowed us to demonstrate the novel property of our framework: flexibility in the assumptions made about specific parts of the device.We have data analyzed from a single experiment under three very different sets of assumptions on the source -true single photons, coherent states, and an unknown source characterized only by its average photon production rate.In all cases, we were able to extract high-quality random strings, but with significantly different rates.This is natural, as stronger assumptions on the source allow for better extraction rates at the cost of giving the adversary more possibilities to attack.
Our approach provides significant practical benefits for secure randomness generation.Using the same simple device, a user can make their own choice of the level of secrecy or production rate, just by choosing the appropriate post-processing strategy.Very interestingly, our results show that even for the most adversarial assumption on the source, i.e. trust only in the mean number of photons, rates of the same order of magnitude were achieved as with the rather strong assumption of a coherent source.The average number of photons produced by a source is testable in principle via its energy consumption, which provides a possible means to further strengthen the security of our framework.Our results pave the way towards practical and experimentally feasible semi-device-independent random number generators, which play a crucial role in the ongoing quantum information revolution.optimal solution in the case of mixed entropy sources looks like, is that when p ≤ 1 2 , the optimal guessing probability g * in Eq. (A.1) does not depend on p at all, as in this case g = 1 − p. Then we have g * = 1 − (α − β), that is, the optimal guessing probability decreases with a rate proportional to the distance from the diagonal.In other words, an observed point (α, β) certifies the same amount of entropy for all simple sources with p ≤ 1 2 .On the other hand, if p > 1  2 , the rate at which the guessing probability decreases with the distance from the α = β line depends on p as p , which is a decreasing function of p.That is, an observed point (α, β) certifies more entropy for sources with a smaller parameter p.Additionally, it is important to note that the conditions for the solution 0 ≤ β ≤ α ≤ p + β(1 − p) correspond to (α, β) being in the blue polytope, below the α = β line (that is, the observed statistics is feasible and β ≤ α).
The geometric interpretation of a mixed source is slightly more involved.Feasible observed points S = (α, β) = ∑ i γ i S i are mixtures (according to a probability distribution γ) of feasible points S i of sources S i .The feasibility of S i means that it is constrained into its corresponding polytope defined by the strategies S ¬H i and S H i (see Fig. 7 for an example with a mixture of four simple sources).This implies that all feasible points (α, β) are constrained into a polytope defined by the deterministic strategies S N , S Y and the weighted averages of the non-deterministic strategies ) (the cyan polytope on the left subfigure of Fig. 7) Every non-extremal point of the feasible polytope can be decomposed into convex combinations of statistics of the simple sources, S i , in infinitely many ways.Let us now argue, that the decompositions which lead to the highest guessing entropy (and therefore are optimal), have a specific form.There are essentially three different cases.
Let us first deal with the easiest case, in which all the simple sources in the mixture belong to S − , i.e. p i ≤ 1 2 for all i.As we have discussed above, in such a case the contribution of each simple source S i to the total guessing probability depends only on the distance of S i from the diagonal, and not on the value of p i .Note that for any linear decomposition of (α, β) into S i with weights γ i , the weighted distance of the points S i from the diagonal is equal to . Therefore, each decomposition leads to the same guessing probability 1 − (α − β).
Whenever the mixed source contains also sources with p i > 1 2 (i.e. S + is non-empty), the problem becomes more interesting.Again, the weighted distance of all the points S i ∈ S from the diagonal is constant, however, the guessing probability contribution of the sources in S + depends both on the distance from the diagonal and their parameter p ithe larger the parameter p i , the smaller the contribution per distance from the diagonal.This simple observation can be used to describe the optimal decomposition of (α, β) into the points S i .Sources in S + , starting from the source with the highest p i , must contribute to the distance of (α, β) from the diagonal as much as possible.Therefore, starting Figure 7: Here we provide an example with a mixed source S being a mixture of four sources 4 .Note that simple sources with p i ≤ 1 2 have negative index and belong to S − and sources with p i > 1 2 have positive index and belong to S + .In the left figure we depict a (cyan) polytope to which all possible observed points S are constrained.It can be seen as a weighted average of the polytopes constraining each simple source in the mixture.In the middle figure we show a feasible (sub-optimal) solution of the optimization problem to maximize the adversary's guessing probability.The observed point S = (α, β) is a weighted mixture of the points S −2 , S −1 , S 1 , S 2 , which are constrained into their corresponding polytopes defined by the strategies S ¬Hi and S Hi .In the right figure we show an optimal solution with S −1 and S −2 on the zero-entropy diagonal (i.e.λ −2,H = λ −1,H = 0), S 2 = S H2 (i.e.λ 2,H = 1) and S 1 an intermediate point.Note that only S 1 and S 2 contribute to the distance of (α, β) from the diagonal.from the source S i with the highest p i and working downwards, we want to set as many S i = S H i as possible.This situation splits into two more cases.In the first one, we run out of sources in S + before reaching the desired distance.This means that ∀S i ∈ S + , we have S i = S H i , and the rest of the distance needs to be covered by sources in S − .This can be done arbitrarily, as we have argued before.In the last case, the distance from the diagonal can be reached with S i ∈ S + only.This leads to a situation where for the k < S + sources with the highest p i we have that S i = S H i , the point of the source with the (k + 1)-st largest value of p i in S + has a non-zero distance from the diagonal, and the rest of the points (of both S + and S − ) lie on the diagonal.An example of the last case is depicted in the rightmost subfigure of Fig. 7.Note that while this argument helps to build an intuition, in order to fully solve the problem one must calculate the exact contributions of S i ∈ S + to the distance of (α, β) from the diagonal (and thus the guessing entropy contributions) for an arbitrary feasible point of an arbitrary mixed source S, in all three possible cases.This is formally done in appendix B.

B Known distribution analysis
Before solving the optimization problem (B.1), we discuss that α ≥ β implies that the optimal solution fulfills α i ≥ β i , for all partial solutions S i associated with S i .This fact in turn implies that in the formulation of the optimization problem we can disregard strategies λ i,¬H .
We prove the above by contradiction: assume that α ≥ β, but in the optimal solution some of the boxes S ↑ ⊂ S have The observed statistics can be now written as Geometrically (see appendix A) this means that the point S lies between S ↑ and S ↓ .This allows us to construct a new solution, in which both S ↑ and S ↓ are moved proportionally to γ in the direction towards S.This decreases both α ↓ − β ↓ and β ↑ − α ↑ .Since the guessing probability of both partial solutions S ↓ and S ↑ directly depends on the distance from the diagonal, the new decomposition of S e has a lower guessing probability, which contradicts with the optimality of the original solution.
In what follows, we show how to find the solution to the following optimization problem: Recall that we use the following notation: We start by dividing the set of all sources of entropy S into two sets, S + and S − .The source S i belongs to S + if and only if p i > 1  2 , otherwise it belongs to S − .We define N + as the number of sources in the set S + (including the possibility that N + represents ∞).We use positive integers i ≥ 1 for indexing the elements of S + , and negative integers i ≤ −1 for indexing the elements of S − .This allows us to define N − = − S − , where S − is the cardinality of S − (again, potentially infinite).Then, without loss of generality, we order the sources in the set S such that ∀i > j We use a convention that unless specified otherwise, ∑ i denotes the sum through all sources from S.Last but not least, note that we deliberately left out the index i = 0, as it is used later in the proof.
In order to make the appendix self-consistent, also recall that for the measured parameters to be physical, we require Substituting the equality constrains of Eq. (B.1) together with Eq. (B.2), we can further simplify the optimization problem to It is now easy to see that in order to find the maximum of (B.4), we need to set as many λ i,H = 1 as possible, starting with ones with the highest parameter p i .Of course this needs to be done with the constraints (B.5) -(B.8) in mind.
Let us start with the simpler of two possibilities, which contains the first two forms of the optimal solution described in appendix A. If then we can set λ i,H = 1 for all i ∈ S + (and therefore λ i,N = λ i,Y = 0 for all i ∈ S + ).It remains to show that we can find values for the other λ variables such that the solution fulfills all the constraints.Let us first treat the case of both sets S − and S + being non-empty.The other two special cases will be treated separately later.First, let us set for Let us now show that this is indeed a valid assignment, i.e. 0 ≤ ∆ ≤ 1.Although the positivity of ∆ follows trivially from (B.9), the second inequality is a little more involved.In order to show that ∆ ≤ 1, let us note that since (B.3) holds for each α i , β i and p i (this is a necessary condition for the statistics produced by S i to be physical, see the middle subfigure of Fig. 7), due to its linearity it also holds for α = ∑ i α i , β = ∑ i β i and p = ∑ i γ i p i (see the left subfigure of Fig. 7).Therefore, from (B.3) we get which proves ∆ ≤ 1.Using the values λ i,H = 1 for i ∈ S + and λ i,H = ∆ for i ∈ S − , it is straightforward to verify that the constraint (B.5) is satisfied.
In order to satisfy (B.6) we need to show that Note that because of (B.7), we have that ∀i ∈ S − , (1 − ∆) = λ i,Y + λ i,N .Therefore, thus all the constraints of our optimization problem are satisfied.The first step to prove (B.12), is to show that where p − = Notice that p is a convex combination of p − and p + with p − ≤ p + , and therefore p − ≤ p ≤ p + .Then, it also holds that Now using (B.14) and (B.3) again, we get Since S − is non-empty, we have that ∑ i∈S− γ i p i ≠ 0, which leads to that is, Eq. (B.12) holds, which proves that it is possible to satisfy all conditions (B.5) to (B.8) while maximizing the guessing probability by a suitable choice of λ's.Setting λ i,H = 1, ∀i ∈ S + yields Let us now return to the two special cases.First, assume that S + is empty (this is in fact the first form of the solution described in appendix A).In such a case (B.15) is not well-defined (because in the definition of p + we divide by 0).However, the goal of (B.15) is to prove (B.14), which in this case holds trivially, since p = ∑ i∈S− γ i p i and ∑ i∈S− γ i = 1.
It remains to solve the case of S − being empty.Then from (B.9) we have that p ≤ α − β.Simultaneously, from (B.11) we have that p ≥ α − β.Therefore, α − β = p, and in order to fulfill (B.5), we require λ i,H = 1 for all i.Also, now we can use the identity α = p + β and (B.3) again to derive p + β ≤ p + β(1 − p), which allows a solution only for β = 0 (otherwise the observed point (α, β) is non-physical).With β = 0 it is easy to see that other constrains are satisfied as well and the maximum is equal to p.Note that this is in some sense an extreme case, since the observed point such as this can be obtained only with perfectly error-less devices in the limit of the infinite number of rounds (otherwise the sampling error bars would move the observed point away from the perfect case -see appendix E).Now we deal with the more interesting case of which represents the last possible form of the solution described in appendix A. In this case we cannot set λ i,H = 1 for ∀i ∈ S + , as this would violate condition (B.5).If we are only concerned with the variables λ i,H , it is clear that in the optimal case we could set as sources in S − does not contribute to the objective function in Eq. (B.4).Using this we can rewrite (B.4) into Next -still only being concerned with λ i,H -we argue that (B.21) is maximized by choosing λ i,H = 1 for the largest p i (large i) and zero elsewhere, except for a single element with 0 < λ i,H < 1 (see the rightmost subfigure of Fig. 7).This can be seen from the fact that keeping the sum of ∑ i γ i λ i,H ⋅ p i constant while minimizing ∑ i γ i λ i,H (as it enters the maximization function with a negative sign) is the same as maximizing ∑ i γ i λ i,H ⋅ p i while keeping ∑ i γ i λ i,H constant, which is clearly achieved by choosing γ i λ i,H large for p i large and vice versa.In the following, we show that it is indeed possible to choose the values of λ Note that both values are well defined, because γ N ∈ S + , and thus p N > 1 2 .All the boxes labeled by i < N are re-labeled i → i − 1, utilizing the so-far unused index i = 0.This new set of boxes will have the same properties as the old one, it is a mere change of mathematical description.For this new set it holds that Now we are ready to state the values of the parameters λ that maximize g * in the following way with all other parameters given by the condition for their sum.In the above formulas, we use the definition Note that this is well-defined, as ∑ N −1 i=N− γi = 0 would imply that S − is empty, as well as N = 1 and γ0 = 0 (i.e. the first non-zero γ i is γ N , but we can always start the indexing from the first non-zero element, that is, γ N = γ 1 ).This in turn implies that γ 1 = γ1 and (B.26) becomes ∑ i∈S+ γ i p i = α − β, which contradicts Eq. (B.19).Now the maximum guessing probability is The only thing that needs to be shown is that ω ≤ 1, as its positivity is obvious from its definition (B.31).This comes from the facts that N ≥ 1, p ∑ Note that if ∑ i∈S+ γ i p i > α − β, one needs to calculate N from (B.22) and (B.23).In case ∑ i∈S+ γ i p i ≤ α − β, we simply set N = 0 and p N = 1 2 and obtain the solution (B.18).Last but not least, using the modified parameters γi , the solution can take the following simple form obtained by plugging (B.26) into (B.32): with N explicitly defined by (B.26).This form is particularly useful in the derivation of the case with mixed sources with partially characterized γ (see Section 3.3), where we can show that in the optimal solution (B.22) holds with equality and therefore ∀i, γi = γ i .

C Mean photon number analysis
In section 3 we have shown that the optimization problem associated with the scenario with partial information about the mixed entropy source can be stated as We have also argued that the solution to this problem can be obtained by finding the maximum for each fixed value of N in the range {1, . . ., N + }, and the overall solution is the largest of these maxima.In order to proceed with the analytical solution of this problem, let us restrict to a single linear constraint function, c γ = ∑ i a i γ i , and reformulate the optimization problem using the Lagrange function for each fixed N , with the half-plane conditions γ i ≥ 0 for all i.
In order to find the maximum, we need to examine partial differentiation of (C.2) over all variables γ i , τ f , τ N , τ norm .While the partial derivatives over τ f , τ N , τ norm are the required equality constraints of (C.1) for γ, f and N , the partial derivatives over all γ i have the following form: Now we need to examine all the stationary points of (C.2).We will argue that on the stationary points, for each variable γ i , the corresponding partial derivative is either equal to 0, or γ i = 0 (so that the variable γ i is actually on the boundary of its allowed interval).We first divide the {γ i } into two sets: Γ 0 = {γ i ∂ γ i L N = 0} (note that this set cannot contain all the variables, because it is impossible to find values of τ {norm,f,N } for which all partial derivatives (C.3) and (C.4) vanish), and Γ b = {γ i } ∖ Γ 0 .Since by construction the variables in Γ b have non-zero derivatives, by the extreme value theorem the maximum of (C.2) must be attained when these variables are on their boundary.This is when γ i = 0, since all other constraints are taken care of with the derivatives ∂ {τ f ,τ N ,τnorm} L N = 0.A maximum may therefore be found if for all γ i ∈ Γ b we have ∂ γ i L N < 0 (i.e. the value of L is increasing towards the boundary of all γ i ∈ Γ b ), and since L N is linear in all γ i this maximum would be a global one.The remaining issue is therefore to find the optimal set Γ 0 for which the derivatives (C.3) and (C.4) vanish.We proceed to construct this optimal set by showing how alternative choices cannot be the optimal solution.
Further analysis now depends on the exact values of a i and p i .In section 5 we have shown that in our experiment, if we characterize the photon source with the mean number of photons only, the constraint function is µ = ∑ i iγ i .Therefore, we will focus on the case where {a i } is a non-negative, unbounded, and strictly increasing sequence, with our prime example being {a i = i}.Likewise {p i = 1 − π i } in our experimental section, so we require {p i } to be a nonnegative strictly increasing sequence such that {p i a i } is strictly decreasing.Since the sequence {a i } is unbounded, we need to have τ f ≥ 0, otherwise both (C.3) and (C.4) will become positive for some (large enough) value of i.One (trivial) solution is choosing τ f = 0, which is only possible for τ norm ≥ 0 by (C.3).This would allow to have all γ i for i < N potentially non-zero.But then (C.4) will become If τ N ≥ 2, then this equation is always negative, leading to γ i = 0 for i ≥ N , which would violate the last constraint (C.1).For smaller τ N , assume ∂ γ i L N = 0 for some i.Then, since the {p i } are increasing, ∂ γ i+1 L N > 0. As we have argued above, that would not lead to a maximum, since γ i ≥ 0, and the derivatives of the Γ b variables should be negative.Therefore we conclude that τ f > 0.
For i < N , Eq. (C.3) now reads τ norm = −τ f a i .Since all of the {a i } are different, this equation can only be satisfied for a single variable γ i .Notice however that (C.3) is a decreasing function in i, therefore in order to guarantee that all the non-zero partial derivatives ∂ γ i L N are negative, we must have τ norm = −a 0 τ f .That is, γ 0 ∈ Γ 0 and γ 0<i<N ∈ Γ b , i.e. γ i = 0 for 0 < i < N .Eq. (C.4) now reads: Since we have two free parameters available (τ f and τ N ), it is possible to achieve ∂ γ i L N = 0 for at most two different values of i.Notice that (a i − a 0 ) > 0, and we have shown that τ f > 0. Therefore (2 − τ N ) must be positive, or else all ∂ γ i≥N L N < 0. Furthermore, since {p i a i } is strictly decreasing, then (C.6) is also strictly decreasing.Therefore, in order to satisfy (C.4) for two different γ i and to have all the rest of the partial derivatives negative, it must hold that ∂ γ i=N L N = 0 and ∂ γ i=N L N +1 = 0.The conditions cannot be solved for the rest, so γ i>N +1 = 0. Now, we know that Γ 0 = {γ 0 , γ N , γ N +1 } are the only non-zero variables.We can therefore use the original problem constraints to solve for the unknowns.Namely: This linear system of equations is then solved.The only difficulty remaining is that, depending on the values of {a i } and {p i }, it is not at all clear that the solutions satisfy γ i ≥ 0 for a given N .In fact, we will show that in our prime example, {a i = i}, c γ = µ, and {p i = 1 − π i }, only a finite number of N can satisfy the positivity constraints for γ.We therefore switch to this concrete example to finish this section.The solution to the linear system of equations reads: Note that γ N is approaching infinity with increasing N .This means that only a finite number of values N need to be tested, as for sufficiently large N we have γ N > 1 and the positivity constraints for γ N +1 and γ 0 cannot be satisfied.Therefore, the final guessing probability will be the maximum from the finite number of guessing probabilities of the form:

D Multiphoton events
In this appendix we show that n photon events in the setting described in section 4 can be interpreted as simple sources with p = 1 − π n .Recall that in case the photon source emits n photons, the number of transmitted photons can vary between 0 and n.The response function of a measurement device needs to assign a click/no-click event to each received photon number.This can be done in 2 n+1 different ways.
Let us start by characterizing all the possible 2 n+1 response functions.Each response function can be characterized by specifying for which number of transmitted photons the measurement device clicks, i.e. by a subset C ⊆ {0, . . ., n}.If we denote by T the classical event of counting the transmitted photons, we have that where p C = ∑ i∈C P (T = i) and For each of these response functions, the adversary tries to guess whether the measurement device clicks or does not click in a given round.The guessing probability corresponding to the response function C is In what follows, we show that every strategy can be written as a convex combination of the four response functions corresponding to C = ∅ ("Never Click"), C = {0, . . ., n}, ("Always Click"), C = {1, . . ., n} ("Honest Strategy") and C = {0} ("Opposite of Honest Strategy").Their corresponding S C vectors are: Clearly, each response function with 0 ∉ C has p C ≤ 1 − π n and therefore it can be expressed as a convex combination of S N and S H . Similarly, each response function with 0 ∈ C has p C ≥ π n and thus can be expressed as a convex combination of S Y and S ¬H .
In order to finish the argument, it remains to show that simulating each S C with the above-mentioned strategies actually increases the adversary's guessing probability.Without loss of generality let us examine response functions with 0 ∉ C. The argument for response functions with 0 ∈ C is analogous.As argued before, any such strategy can be written as This proves that expressing any strategy of the form S C as a convex combination of S H and S N is not only possible, but advantageous for the adversary.Therefore, the optimal adversary strategy in each n photon event involves only response functions that decide on whether the measurement device received a signal (a positive number of photons) or not.Thus each n photon event can be described by a simple source S i , which sends a signal with probability 1 − π n .

E Sampling error
In this section we address the inevitable uncertainty associated with estimating parameters of probability distributions.Both α = P (click x = 0) and β = P (click x = 1) describe the parameters of Bernoulli random variables.Therefore, the most conservative approach is to use one-sided (1 − ) confidence intervals to bound these parameters.Overestimating β or under-estimating α would lead to overestimating the entropy of the data.
To see how this is done, first let X 1 , X 2 , . . ., X n be a sequence of n observed i.i.d.trials, with X i ∼ Bernoulli(θ), and X = 1 n ∑ n i=1 X i .The desired error can be obtained by using the Chernoff-Hoeffding inequality [33].
Note that since E[ X] = θ, these inequalities bound the probability that the estimated (observed) probability of success ( X) is larger (E.1) or smaller (E.2) than the true value θ by more than t.By setting = e −2nt 2 and solving for t, we obtain t = ln 1 2n.Equipped with this, we construct bounded estimators α, β, such that: Since the choice of shutter settings are independent, and the trials themselves are i.i.d., the confidence of both estimators being in their respective intervals is the product of the individual events.During the experiment we observe t α (t β ) clicks in n α (n β ) test rounds, from which we construct the estimators α = t α n α − ln 1 2n α (E.4) The number of test rounds n α and n β can be optimized as to increase the output entropy per batch of size N , by solving the following problem: Where H * min is the calculated min-entropy per bit, depending on the assumed scenario.In practice, however, such an optimization requires the knowledge of the ratios t α n α and t β n β , which are precisely the values that are being estimated.To break this cycle, this optimization can be done iteratively in practice, setting some original guesses (say n = n α,0 = n β,0 ) and on each subsequent batch of size N assume that the clicking probabilities are the same as the previous round, which allows to solve equation (E.6).It is important to highlight, however, that this optimization process is only for increasing the amount of extractable entropy, and the security of the protocol is not dependent on finding an optimal solution.Indeed, in our proof of principle experiment we only solved (E.6) approximately.

Figure 4 :
Figure 4: Experimental implementation: The proposed protocols are tested in an optical setup where weak coherent pulses of 8 ns duration at 5 MHz are generated by driving a laser diode with a signal generator (Pulse Streamer), and attenuating its power to ∼ 1 photon per pulse.The photons (represented by the solid red line) are incident on a fiber beam-splitter (BS) that discards the reflected photons.The transmitted photons are fast-switched via a fiber electro-optic intensity modulator (EOM) that is driven by the digital output of the pulse streamer.The pulses at the output of the shutter (composed of the BS and EOM) are sent to a superconducting nanowire single photon detector (SNSPD).The counts of the detector and a clock signal from the pulse streamer are recorded with a counting logic (quTAG time-tagger), which allows one to extract coincidences for each pulse and generate the bit string.

Figure 5 :
Figure 5: Distribution of min-entropy estimates depending on the assumptions on the photon source used for 1, 000 experimental batches.From left to right, the assumptions are: (iii) mean photon number µ = 1.06, (ii) Poisson probability distribution with µ = 1.06, and (i) single photon source.

Figure 6 :
Figure 6: Since observed probabilities S = (α, β) are mixtures of deterministic strategies, the space of possible values of S can be depicted as the blue polytope.With the assumption that α ≥ β, i.e. the measurement devices click more often when the shutter A(x) is open, we can restrict the adversary to use of three strategies S Y , S N , S H only.In such a case, the decomposition of the observed probabilities S = (α, β) into deterministic strategies is unique.
i,H according to this procedure, and satisfy all the constraints Eqs.(B.5) -(B.8), by providing an explicit assignment of all the λ variables in Eqs.(B.27) -(B.29).To obtain the explicit assignment, let us first define the natural number N in the following implicit wayN+ i=N γ i p i ≥ α − β (B.22) N+ i=N +1 γ i p i < α − β. (B.23)In we have that ∑ N+ i=N γ i p i > α − β, we perform the following trick: we formally divide the box labeled N into two boxes, labeled N − 1 and N , both having the same p N .The new parameters γN and γN−1 will be defined in the following way γN = α − β − ∑ N+ i=N +1 γ i p i p N (B.24) γN−1 = γ N − γN .(B.25)

Table 1 :
H min Batches Used Extracted Randomness (i) Single photon source p n ∼ δ 1,n Amount of experimental randomness extracted from different scenarios.In our experiment we have used µ = 1.06.