Realistic noise-tolerant randomness amplification using finite number of devices

Randomness is a fundamental concept, with implications from security of modern data systems, to fundamental laws of nature and even the philosophy of science. Randomness is called certified if it describes events that cannot be pre-determined by an external adversary. It is known that weak certified randomness can be amplified to nearly ideal randomness using quantum-mechanical systems. However, so far, it was unclear whether randomness amplification is a realistic task, as the existing proposals either do not tolerate noise or require an unbounded number of different devices. Here we provide an error-tolerant protocol using a finite number of devices for amplifying arbitrary weak randomness into nearly perfect random bits, which are secure against a no-signalling adversary. The correctness of the protocol is assessed by violating a Bell inequality, with the degree of violation determining the noise tolerance threshold. An experimental realization of the protocol is within reach of current technology.


I. INTRODUCTION
Randomness is a useful resource in a variety of applications, ranging from numerical simulations to cryptography.However almost always one needs ideal, or close to ideal, randomness, while typical random sources are far from ideal.Randomness amplification, defined as a process that maps a source of randomness into another closer to an ideal source, is a potential solution to this problem.But is randomness amplification possible at all?
We model a source of randomness as an ε-Santha-Vazirani source (ε-SV source), given by a probability distribution p(x 1 , . . ., x n ) over bit strings such that for every i ≤ n, For example, when ε = 0 the source is fully random, while when ε = 1/2 it may be fully deterministic.The previous equation is the only assumption on the source, which otherwise is completely unknown.Given an SV source, is it possible to process the bits so that the quality of the randomness is improved?In particular, can we obtain a fully random bit by processing an arbitrary large number of bits from a SV source?In [1], Santha and Vazirani answered the question in the negative: It is impossible to improve the quality of the randomness of SV sources 1 .However their argument only applies to classical protocols and it leaves open the possibility that the situation might be different once quantum resources are considered.Indeed it is trivial to generate randomness in quantum mechanics, e.g. by preparing a state and measuring it in a complementary basis.However this assumes that one has full control of the state preparation and the measurement.A more demanding task is to generate randomness in a device independent manner, treating the quantum system as a black-box and obtaining randomness only as a consequence of the correlations in measurement outcomes and fundamental physical principles such as the no-signaling principle.The existence of non-local quantum correlations violating Bell inequalities already suggests that device-independent randomness amplification could be achieved.However, the violation of Bell inequalities requires that the measurements be performed in a random manner, independent of the system upon which they are performed [3,4].
In a seminal work [5], Colbeck and Renner showed that despite this difficulty non-local quantum correlations can be used to amplify the randomness of Santha-Vazirani sources sufficiently close to fully random.This result was later improved by Gallego et al. [7], who gave a protocol using quantum non-local correlations to amplify general SV sources, as long as they are not deterministic.Neither of the two protocols tolerate noise however.In [8], we gave a different protocol that is robust to noise and can transform any non-deterministic SV source into a fully random one.A major drawback of the protocols in [7,8] is that they require an infinite number of space-like separated devices.Therefore a natural open question is the existence of a randomness amplification protocol using a fixed number of devices, on the one hand, and allowing for the amplification of arbitrary non-deterministic sources, on the other hand.See also [10][11][12][13][14][15] for more recent work in the area.
A related, but distinct, task to randomness amplification is (device-independent) randomness expansion, where one assumes that an input seed of perfect random bits is available and the goal is to expand it into a larger random bit string.Quantum non-locality has found application also in this latter task [16, 19-23, 27, 29] (as well as in device-independent quantum key-distribution (see e.g.[24][25][26]28]).

A. Results
In this paper we overcome the shortcomings of previous protocols and obtain: Theorem 1.For every ε > 0, there is a protocol using an ε-SV source and eight non-signalling devices with the following properties: • Using the devices poly(n, 1/δ) times, the protocol either aborts or produces n bits which are δ-close to uniform and independent of any side information (e.g. held by an adversary).
• Local measurements on many copies of a four-partite entangled state, with poly(1 − 2ε) error rate, give rise to devices that do not abort the protocol with probability larger than 1 − 2 −Ω(n) .
See Proposition 7 in Section III for a precise formulation of the theorem.

B. Protocol and Outline of its Correctness Proof
The protocol we use for randomness amplification is in fact simpler than previous protocols [5,7,8].It is given precisely in Fig. 2 and illustrated in Fig. 1, but its rough structure is the following: First one uses several bits from the Santha-Vazirani source in order to choose inputs for eight non-signalling boxes (each of which is reused many times).Then one collects the outputs of the boxes and using the empirical data on inputs and outputs decides whether to abort.Then if the protocol is not aborted, one applies a randomness extractor (see Section II A) to all output bits of the eight devices.The output bits of the protocol, which by Theorem 1 are close to fully random, are just the output bits of the extractor.
The proof of correctness of the protocol consists of three main steps.
(i) We show that we can effectively work with two independent devices, each consisting of four boxes; by independent we mean devices that given any fixed inputs produce uncorrelated outputs.
(ii) We show that conditioned on passing a certain test on the inputs and outputs of the boxes, for a single device and a given input, with high probability, the outputs are a source of linear min-entropy.
(iii) We use existing results on extracting randomness from uncorrelated min-entropy sources.
Step (i) is presented in Section IV D. The idea is to adapt recent de Finetti theorems for nonsignalling devices [30,31] (based on information-theoretical methods) to the situation in which subsystems are selected from a Santha-Vazirani source, instead of being selected uniformly at random.In Lemma 15 we show that given two non-signalling boxes, if we select at random (using a SV source) a block of boxes from the second device among a sufficiently large number of blocks, this block of boxes will be approximately uncorrelated with the first device.
Step (ii) is established by a sequence of implications (which have a similar flavour to the estimation of [19][20][21] for the related task of randomness expansion) as follows: • In Lemma 8 we present an estimation procedure which ensures that with high probability the value of the Bell expression with settings chosen from an SV source is small2 for a linear fraction of boxes conditioned on previous inputs and outputs.This will follow from an application of Azuma's inequality.
• In Lemma 11 we show that by the very definition of SV sources, the value of the Bell expression with uniformly random settings is also small.
• In Lemma 10 we show that a small uniform Bell value implies that for any setting appearing in the Bell expression, the probability of any output is bounded away from one.This is achieved by linear-programming, analogously to the approach of [7,8].
• In Lemma 14, in turn, we show that if a linear fraction of conditional boxes have probability of outputs bounded away from one (which is ensured by Lemmas 8,11 and 10), then given the input, the output has linear min-entropy with high probability.
Finally, Step (iii) is an application of known results regarding extracting randomness from two independent min-entropy source (see Section II A).
It is instructive to point out already here where the non-local nature of our protocol plays a role.Clearly the first step is entirely classical, and it does not use any feature of non-local correlations.It only exploits the fact that the devices do not signal to each other and are time-ordered (the remarkable fact here is that the de Finetti bound of Lemma 15 does apply also to general nonsignaling correlations).This step ensures that given a particular input, the outputs of the selected boxes in each device are independent of each other.Note this is the case both for non-local as well as for classical boxes.It is also clear that the third step is independent of any non-local feature of the devices.
It is in the second step that the non-local nature of correlations are exploited.There looking at the inputs and outputs of the devices obtained, one can verify in a device-independent manner that the outputs must have been somewhat random.This is impossible classically without making further assumptions.

C. Extensions
Theorem 1 has the drawback that the extractor used in Protocol 2 is non-explicit (we only know it exists by the probabilistic method).We can improve on this aspect if we are willing to increase the number of non-signalling devices.
Theorem 2. For every ε > 0, there is a protocol using an ε-SV source and twelve non-signalling devices with the following properties: • Using the devices poly(n, 1/δ) times, the protocol either aborts or produces n bits which are δ-close to uniform and independent of any side information (e.g. held by an adversary).
• Local measurements on many copies of a four-partite entangled state, with poly(1 − 2ε) error rate, give rise to devices that do not abort the protocol with probability larger than 1 − 2 −Ω(n) .
Moreover the protocol is fully explicit.
The protocol is given in Fig. 3 (for the theorem above we use it with k = 3 and the four-partite Bell inequality given in Section II C).Its proof of correctness is completely analogous to the proof of Theorem 1, the only differences being that we use a version of the de Finetti bound for more devices (given in Lemma 16) and the explicit extractor from part (ii) of Lemma 5.

A. Randomness Extractors
In the protocol, we use randomness extractors to obtain nearly uniform bits from a finite number of randomness sources.We use the min-entropy to measure the amount of randomness in a source S. Definition 3. The min-entropy of a random variable S is given by For S ∈ {0, 1} n , S is called an (n, H min (S)) source with entropy rate 1 n H min (S).
From multiple weak sources of randomness, we use independent source extractors [6] to extract nearly uniform random bits.Definition 4 (Independent-source extractor).An independent source extractor is a function Ext : ({0, 1} n ) k → {0, 1} m that acting on k independent (n, H min (S)) sources, outputs m bits with error ξ, i.e. for k independent (n, H min (S)) sources S 1 , . . ., S k we have where |.| is the distance between the two distributions and U m denotes the uniform distribution on the m bits.
The results about extractors that we will use are summarized in the following lemma Lemma 5. [Extractors Constructions] (i) [2] There exists a (non-explicit) deterministic extractor that given two independent sources of minentropy larger than h, outputs Ω(h) bits 2 −Ω(h) -close to uniform.
(ii) [18] There exists an explicit extractor that given three independent sources, one having min-entropy larger than τ n (for any τ > 0) and the other two larger than h ≥ polylog(n), outputs Ω(h) bits 2 −h Ω(1) -close to uniform.
Theorem 1 uses the non-explicit extractor for two sources from part (i), while Theorem 2 uses Rao's extractor [18] from part (ii).

B. Assumptions on the Devices
The protocol presented in this paper can be applied to any Bell inequality involving t parties with the following properties: • The maximal no-signaling value of the Bell expression is attainable by quantum correlations.
• For every measurement setting u = (u 1 , . . ., u t ), the probability of any output x = (x 1 , . . ., x t ) is bounded away from one for all no-signaling boxes that achieve maximum violation of the Bell inequality.
As an example of a Bell inequality with the above properties, we present a four-partite inequality in the next section.The assumptions needed for the protocol are now explained in this four partite setting, but can be straightforwardly generalized to a Bell expression involving any other number of parties as well.
(ii) Non-signaling Structure: In the protocol, we consider 4k parties performing k Bell experiments in parallel, for some finite k (the simplest case for which the protocol works is k = 2).The parties act on k devices each of which consists of four no-signaling apparatuses.Moreover, the k devices are also no-signaling between each other.Each device may be thought of as a channel that transforms inputs (settings) u into outputs x.An external party, for example an eavesdropper (Eve), acts on a system that is no-signaling with respect to the k devices.On the j-th device for j ∈ [k], the parties conduct M j runs of the Bell experiment sequentially with a time-ordered nosignaling structure, i.e. the outputs of the l-th run may depend on the inputs and outputs of the previous runs.
We describe the correlations in the k + 1 devices by the following joint probability distribution: where e.g.x 1 ≤M 1 denotes the vector (x 1 1 , . . ., x 1 M 1 ).For device j with 1 ≤ j ≤ k, the inputs and outputs of the l j -th box (associated to the l j -th use of the device) are given by u j l j and x j l j , respectively.Likewise the input and output of the last device (held by the eavesdropper) are given by W and Z.One may think of the adversary holding the set of random variables Z, W and supplying k devices which produce the conditional probability distribution P (•|Z, W) whose behaviour depends on Z and W.
We say such P is a (k; M 1 , . . ., M k ) time-ordered non-signaling box, meaning that it is nonsignaling between each of the k devices and time-ordered non-signaling within each device.
(iii) Conditional Boxes: The settings are drawn from SV-source described by a distribution ν.
For each device j ∈ [k], this defines a probability space given by sequences of inputs and outputs (u j 1 , x j 1 , . . ., u j M j , x j M j ), with measure P (u j 1 , x j 1 , . . ., u j M j , x j M j ) = ν(u j 1 , . . ., u j M j )P (x j 1 , . . ., x j M j |u j 1 , . . ., u j M j ).Finally, we also consider conditional boxes P x j <l ,u j <l which are random variables on this space, given by P x j <l ,u j <l (y l |v l ) = P (y l |v l , x j l−1 , u j l−1 , . . ., x j 1 , u j 1 ).
(iv) Markov Condition: As in previous works on randomness amplification [5,7,8,10,11,13], we make the assumption that the ε-SV source and the boxes can be correlated with each other only through the no-signaling adversary Eve.In other words, we assume that the SV source, Eve's random variables (Z, W) and the box P constitute a Markov chain, so that given ) is independent of the bits produced by the source.
The device is formally a channel applied to SV source.Its analogue in the Santha-Vazirani theorem is just a deterministic hash function.In our situation, the channel is not deterministic.However the users of the device do not know this a priori.In the protocol we propose, they are able to verify that the device has randomness by looking at the statistics of inputs and outputs.Classically this cannot be achieved, since there is no way of certifying that a channel is not deterministic without extra assumptions.

C. The Bell inequality
The inequality we consider involves four spatially separated parties with measurement settings u = {u 1 , u 2 , u 3 , u 4 } and respective outcomes x = {x 1 , x 2 , x 3 , x 4 }.Each party chooses one of two measurement settings with two outcomes each so that u i ∈ {0, 1} and x i ∈ {0, 1} for i ∈ {1, .., 4}.The measurement settings for which non-trivial constraints are imposed by the inequality can be divided into two sets U 0 = {{0001}, {0010}, {0100}, {1000}} and U 1 = {{0111}, {1011}, {1101}, {1110}}. ( The inequality is then [9] x,u where the indicator function I L = 1 if L is true and 0 otherwise.The local hidden variable bound is 2 and there exist no-signaling distributions that reach the algebraic limit of 0. For any nosignaling box represented by a vector of probabilities {P (x|u)}, the Bell inequality may be written as where B is an indicator vector for the Bell inequality with Consider the quantum state where correspond to u i = 0 and measurements in the Z basis correspond to u i = 1 for each of the four parties i ∈ {1, . . ., 4}.These measurements on |Ψ lead to the algebraic violation of the inequality, i.e., the sum of the probabilities appearing in the inequality is zero.
The reason for the choice of this Bell inequality is twofold.Firstly, as we have seen, there exist quantum correlations achieving the maximal no-signaling violation of the inequality, which implies that free randomness amplification starting from any initial of the SV source may be possible.Secondly, we will show (in Lemma 10) that for any measurement setting u out of the 2 4 possible settings in the inequality, the probability of any of the 2 4 output bit strings x is bounded away from one (for any no-signaling box) by a linear function of the uniform value of the Bell expression.

D. Randomness Criterion
To quantify the quality of the output we will use of the distance to uniform of a random variable S ∈ Σ, conditioned on Eve's input and output: Although this function is convenient to work, it is not universally composable.A better definition of randomness relative to an eavesdropper is the following (see e.g.[26]): with |Σ| the size of Σ.
However the following relation between them holds: Lemma 6.For a random variable S ∈ Σ, Proof.

S V Forward signaling
No-signaling Ext(S1,S2) FIG. 1: Illustration of the protocol for randomness amplification from two no-signaling devices with N 1 = 1 block of n runs from the first device and N 2 block of n runs from the second.

III. PROOF OF CORRECTNESS OF THE PROTOCOL
Let us consider the protocol in Fig. 2 performed on k = 2 devices with the following choice of number of blocks: for a parameter t > 0. Then our main result is the following Proposition 7. Let (N 1 , N 2 ) be given by Eq. ( 16).Then conditioned on not aborting, the protocol (when applied to the four-partite Bell inequality given in Section II C) outputs Ω(n) bits S n such that Moreover if the boxes are realized by performing measurements which are O(δ(1 − µ)(1 − 2ε) 4 )-close to either one of the measurements of Eqs.(10,11) on states which are O(δ(1 − µ)(1 − 2ε) 4 )-close to the state of Eq. ( 9), then the protocol accepts with high probability.

Protocol I
1.The ε-SV source is used to choose the measurement settings u 1 ≤M1 , u 2 ≤M2 for the 2 devices.The devices produce output bits x 1 ≤M1 , x 2 ≤M2 .2. The measurements in device j (for j = 1, 2) are partitioned into N j blocks of boxes each containing n boxes (so that M j = nN j ).
3. The parties choose at random one block of boxes of size n from each device, using bits from the ε-SV source.
4. The parties perform an estimation of the violation of the Bell inequality in the chosen block from each of the two devices by computing the empirical average 2 (1 − µ) (with fixed constants δ, µ > 0). 5. Conditioned on not aborting in the previous step, the parties apply the extractor from part (i) of Lemma 5 to the sequence of outputs from the chosen block in each device.
FIG. 2: Protocol for device-independent randomness amplification from a two devices

Protocol II
1.The ε-SV source is used to choose the measurement settings u 1 ≤M1 , . . ., u k ≤M k for the k devices.The devices produce output bits x 1 ≤M1 , . . ., x k ≤M k .2. The measurements in device j are partitioned into N j blocks of boxes each containing n boxes (so that M j = nN j ).
3. The parties choose at random one block of boxes of size n from each device for 1 ≤ j ≤ k, using bits from the ε-SV source.
4. The parties perform an estimation of the violation of the Bell inequality in the chosen block from each device 1 ≤ j ≤ k by computing the empirical average 2 (1 − µ) (with fixed constants δ, µ > 0). 5. Conditioned on not aborting in the previous step, the parties apply a deterministic extractor from part (ii) of Lemma 5 to the sequence of outputs from the chosen block in each device.
FIG. 3: Protocol for device-independent randomness amplification from k devices for k > 2.
Proof.Recall that the superscripts denote the device and the subscripts denote the block of boxes, while capital letters indicate the set of boxes within the block so that U j ≤N j and X j ≤N j denote the inputs and outputs of the block of boxes numbered 1 to N j in the j-th device.
Under the Markov assumption that the SV source and the box are uncorrelated (see Section II B), Lemma 15 gives that the block of boxes from the first device is approximately uncorrelated with the block of boxes chosen from the second device with the SV source.The test for the Bell value from Section IV A is performed on the two blocks of boxes.When both blocks of boxes pass the test, Lemmas 14 and 10 give that each such block is a source with linear min-entropy in the size of the block n.Applying the independent-source extractors from Section II A, one obtains a string of nearly uniform random bits.
In more detail, assume the test performed on the chosen block of boxes from each device (j = 1, 2) accepts.Then From Lemma 12, we know that for each device, with probability at least 1 − , the fraction of the n boxes with uniform Bell value B U ≥ δ |U| is less than (1 − µ).Then, Lemma 10 gives that for any no-signaling box with uniform Bell value less than δ |U| , for any measurement setting u ∈ U and any output x, Pr(x|u, Z, W) ≤ 1+2δ

3
. Therefore, from Lemma 14, we have that for each device for any chosen set of inputs (u 1 , . . ., u n ), the output distribution of the chosen n boxes gives with probability at least 1 − 4 √ δ a min-entropy source with entropy: where δ = 2 exp −n . By Lemma 15 it follows that the the distributions on outputs of the chosen boxes in the two devices is close to independent.We can then apply the results on independent randomness extractors for two independent (n, H min (S)) sources in [2] where it was shown that a random hash function can be used to obtain Θ(n) bits S' n with error 2 −Ω(n) .This only shows that To obtain Eq. ( 17) we throw away a (1 − ν) fraction of the bits in S' (for ν sufficiently small), obtaining S, and apply Lemma 6.
It remains to estimate the error in the de Finetti bound in Lemma 15.Let us apply Lemma 15 with the number of blocks as specified in Eq. ( 16).Then we obtain that with T given by Eq. (58).Thus with probability larger than 1 − 1/t over A 2 we have T ≤ 1 t .By Markov inequality we have that for such good choice of with Choosing η = 1/ √ t we find that with probability larger than 1 − 1/t − 1/ √ t ≥ 1 − 2/ √ t, the boxes from the first device, the chosen block of boxes from the second device, and the chosen inputs will be such that T ≤ 1/ √ t.The robustness of the protocol follows from Lemma 17.

IV. TOOLS FOR RANDOMNESS AMPLIFICATION
In this section we give the tools we will employ proving the correctness of Protocol 2.

A. Verification of the Bell value
The lemma below states that, with high probability, the arithmetic average of mean values for conditional boxes is close to the observed value.The variables W i will be later interpreted as settings and outputs, i.e.W i = (u i , x i ), and the expectation value of variable B i as the Bell value with inputs taken from a SV-source.It has a similar flavour to previous results [19][20][21] obtained in the context of the related problem of randomness expansion.Lemma 8. Consider arbitrary random variables W i for i = 1, . . ., n, and binary random variables B i that are functions of W i , i.e.B i = f i (W i ) for some functions f i .Let us denote B i = E(B i |W i−1 , . . ., W 1 ) for i = 2, . . ., n and B 1 = E(B 1 ) (i.e.B i are conditional means).Define for k = 2, . . ., n, the empirical average and the arithmetic average of conditional means Then we have Before we prove the lemma, we need to state Azuma inequality.Let X 0 , . . ., X k and Y 0 , . . ., Y k be two sequences of random variables.Then X 0 , . . ., X k is said to be a martingale with respect to W 0 , . . ., W k if for all 0 Lemma 9. (Azuma-Hoeffding) Suppose X 0 , . . ., X k is a martingale with respect to W 0 , . . ., W k , and that Then for all positive reals t, Now we can prove the lemma 8.
Proof.Define X 0 = 0, X l = l(Z l − Z l ) and W 0 = 0. Let us show that {X i } k i=0 and {W i } k i=0 satisfy the assumptions of Lemma 9. First,

Bounding output probabilities by linear programming
Let us show that for the specific Bell inequality we consider, when the value of the Bell expression is small there is weak randomness.Consider a four-partite no-signaling box P (x|u) that obtains a value δ for the Bell expression in Eq. ( 6).The following lemma shows that for any measurement setting u, the probability of any outcome x is bounded from above by a function of δ.
Lemma 10.Consider a four-partite no-signaling box P (x|u) satisfying for some δ ≥ 0, with B the indicator vector for the Bell expression in Eq. ( 6) and |U| = 16 the number of settings in the Bell expression.For any measurement setting u * and any output x * , we have Proof.Consider any measurement setting u * and any corresponding output x * for this setting.
Then Pr (x * |u * , Z, W) can be computed by the following linear program Here, the indicator vector M u * ,x * is a 2 4 ×2 4 element vector with entries M u * ,x * (x, u) = I u=u * I x=x * .The constraint on the box {P (x|u)} written as a vector with 2 4 × 2 4 entries is given by the matrix A and the vector c.These encode the no-signaling constraints between the four parties, the normalization and the positivity constraints on the probabilities P (x|u).In addition, A and c also encode the condition that B.{P (x|u)} ≤ δ with δ the bound on the Bell value for the box.Analogous programs can be formulated for each of the 2 4 measurement settings appearing in the Bell inequality in Eq. ( 6) and each of the 2 4 corresponding outputs.
The solution to the primal linear program in Eq. ( 39) can be bounded by any feasible solution to the dual program which is written as For each {u * , x * }, we find a feasible λ u * ,x * satisfying the constraints to the dual program above that gives c T λ u * ,x * ≤ 1+2δ

3
. We therefore obtain by the duality theorem of linear programming that which is the required bound.

Obtaining randomness of individual conditional boxes
Let us start with a simple lemma relating the SV Bell value with the true Bell value.
Proof.It follows directly from the definition of a SV source, since Consider the test for the Bell value in Protocol 2, namely for each device j, one rejects unless for constants µ and δ.We now show that when the test accepts for device j, with high probability at most a (1 − µ) fraction of the n boxes have the uniform Bell value B U ≥ δ |U| .
Lemma 12. Assume that the test given by Eq. (44) for the box P (x 1 , . . ., x n |u 1 , . . ., u n ) accepts (for fixed constants µ and δ).Consider the set I := i : Proof.If the test accepts, we have that Then from Eq.( 26) in Section IV A we have Consider the set Then from Eq. ( 42), with probability at least , we have |I| n δ ≤ (1 − µ)δ and the fraction of boxes with average value B U ≥ δ |U| is less than (1 − µ).

C. From randomness of conditional boxes to min-entropy source
In this section we show that if a device is such that a linear amount of conditional boxes have randomness (in the weak sense that probability of their outputs is bounded away from one), then the distribution on outputs constitutes a min-entropy source.The considerations in this section will be applicable to any of the devices j ∈ [k] and any chosen block.Therefore we will skip the indices for simplicity.We will show, that if, with large probability over sequences x 1 , u 1 , . . ., x n , u n , a linear fraction of those boxes have, for any setting, probability of any output bounded away from 1, then the total probability distribution is close in variational norm to a min-entropy source (See [19][20][21] for a similar result in the context of randomness expansion).
We first prove, that if this happens for all sequences (i.e. with probability 1), then total box is a min-entropy source itself and subsequently consider the case when the probability is close to 1. Lemma 13.Fix any measure P on space of sequences (x 1 , u 1 , . . ., x n , u n ).Suppose that for a sequence (x 1 , u 1 , . . ., x n , u n ), there exists K ⊆ [n] of size larger than µn, such that for all l ∈ K the conditional boxes P x <l ,u <l (x l |u l ) satisfy for some 0 ≤ γ < 1.Then for any sequence (u 1 , . . ., u n ), P (x 1 , . . ., x n |u 1 , . . ., u n ) satisfies Proof.The proof proceeds by successive application of the Bayes rule and the time-ordered nonsignaling structure.

D. Imposing independence between devices: de Finetti bounds
Consider two devices, the first consisting of n boxes and the second consisting of N 2 blocks of n boxes each.In this section, we show that for suitable choice of N 2 , the boxes from the first device are close to being uncorrelated with the boxes in a block chosen from the second device using an ε-SV source.The lemma follows a similar formulation to the Lemma 6 in [8], which considered the case n = 1 for an arbitrary number of devices, and is based on the information-theoretic approach of [30,31] for proving de Finetti theorems for quantum states and non-signaling distributions.
We denote the box by P , where the superscript denotes the device and the subscript denotes the block of boxes.Capital letters denote the inputs and outputs for a set of n boxes so that X 1 = (x 1 1 , . . ., x 1 n ) and X 2 ≤N 2 = (x 2 1,1 , . . ., x 2 n,1 , . . ., x 2 1,N 2 , . . ., x 2 n,N 2 ) with the second subscript denoting the block.
) be a (2; n, N 2 n) time-ordered non-signaling distribution, with output and input alphabets Σ and Λ, respectively (i.e.P : Σ ×(N 2 +1)n × Λ ×(N 2 +1)n → R + ).The distribution P represents two devices with the first containing n boxes and the second N 2 blocks of n boxes each.Let A 2 ∈ [N 2 ] and (U 1 , U 2 ≤N 2 ) be chosen from an ε-SV source ν(A 2 , U 1 , U 2 ≤N 2 ).Then for every positive real t, we have with where q is the conditional box given the inputs U 2 <A 2 and outputs X 2 <A 2 of all prior boxes to the ones in block and ν A 2 is the probability ν conditioned on the value of A 2 .
Proof.Using the upper bound on mutual information I(A : B) ≤ min(log |A|, log |B|) and the chain rule I(A : BC) = I(A : B) + I(A : C|B), we have that for every We now use Pinsker's inequality relating the mutual information and trace distance as and the convexity of x → x 2 to obtain By Markov inequality, with Finally, to obtain Eq.(57) we replace t by t 2 N log (1+2ε) 2 in the equation above.
The lemma above can be generalized to the scenario where one considers k > 2 devices, which will be used in Protocol II in Fig. 3. Consider k devices with the j-th device consisting of N j blocks of n boxes each, for j ∈ [k].It shows that for suitable choices of N j , k blocks of boxes, chosen one from each device using an ε-SV source are close to being uncorrelated with each other.
In the following, we consider the time-ordered non-signaling box Here, the superscripts denote the device and the subscripts denote the block of boxes, while capital letters indicate the set of boxes within the block.Therefore, U j ≤N j and X j ≤N j denote the inputs and outputs of the block of boxes numbered 1 to N j in the j-th device with X j ≤N j being short for the outputs x j 1,1 , . . ., x j n,1 , x j n+1,2 , . . ., x j 2n,2 , . . ., x j (N j −1)n+1,N j , . . ., x j N j n,N j where the second subscript labels the block.The proof of the following lemma follows an analogous route to Lemma 15 above and a similar lemma from our previous paper [8].
n, . . ., N k n) time-ordered nonsignaling distribution, with output and input alphabets Σ and Λ, respectively (i.e.P : Σ The distribution P represents k devices with the j-th device containing nN j boxes.Assume (A 1 , . . ., A k ) ∈ [N 1 ] × . . .× [N k ] are chosen using an ε-SV source ν that is uncorrelated with the box P .Then for any probability distribution ν(U 1 ≤N 1 , . . ., U k ≤N k ) and for every set of positive reals {t 2 , . . ., t k }, we have with where q is the conditional box given the inputs U j <A j and outputs X j <A j of all prior boxes for all 1 and ν A 1 ,...,A k is the probability ν conditioned on measuring (A 1 , . . ., A k ).Moreover, there exist suitable choice of block sizes N j and reals {t i } such that for some suitably large parameter t .
The proof is completely analogous to the proof of Lemma 6 of [8] and is given in the appendix for the convenience of the reader.

E. Robustness of the protocol
In the remainder of this section, we would like to estimate the amount of noise that the protocol can tolerate.Suppose we are given a box P (x 1 , . . ., x n |u 1 , . . ., u n ) such that for every input and output and ∀i ∈ [n], B.{P x <i ,u <i (x i |u i )} ≤ δ.This will be the case, for example, if all the the entangled states and measurements used to produce a box are only O( δ)-close to the ones that would lead to a box violating maximally the Bell inequality (i.e.n copies of the entangled state given by Eq. ( 9), each measured in the bases given by Eqs. ( 10) and ( 11)).
Lemma 17.Consider the verification procedure applied to a box P (x 1 , . . ., x n |u 1 , . . ., u n ) such that for every input and output and ∀i ∈ [n], B.{P x <i ,u <i (x i |u i )} ≤ δ.Then the test accepts with probability Proof.We have that for all 1 ≤ i ≤ n, Denoting as before, B SV i From Eq. ( 70) and the relation between the uniform and non-uniform Bell values in Eq. (42), we have

VI. APPENDIX: PROOF OF LEMMA 16
Proof.Again, the proof of the above lemma follows the same steps as that of Lemma 15 with the inputs U j <A j and outputs X j <A j now understood to have occurred earlier in time than X j A j for each 1 ≤ j ≤ k.
We first analyze the case when the k blocks (A 1 , . . ., A k ) are chosen from the uniform distribution over [N 1 ] × . . .× [N k ] and then consider the scenario where they are chosen from an ε-SV source.With (A 1 , . . ., A k ) chosen uniformly, we first show that each of the j blocks is approximately in a product state with the previous j − 1 blocks of boxes.The product form of all blocks will then follow by application of the triangle inequality.
As in Lemma 15, applying the upper bound on mutual information I(A : B) ≤ min(log |A|, log |B|) and the chain rule I(A : BC) = I(A : B) + I(A : C|B), we have When A k is chosen from the uniform distribution U [N k ], we get Using Pinsker's inequality from Eq.(62) relating the mutual information and trace distance and the convexity of x 2 we obtain The above argument can also be applied to the box q(X 1 ≤N 1 , . . ., Using Eqs. ( 77) and (78), the triangle inequality and the monotonicity of the 1-norm under discarding subsystems, we obtain Following the above reasoning for the (k − 2) th box up to the second we find Let us now show how to extend the argument to the case when (A 1 , . . ., A k ) are chosen from a ε-SV source.Let From the chain-rule argument presented before we have Then by Pinsker's inequality and the convexity of x 2 , By Markov inequality, A 1 ,...,A i−1 ,A i+1 ,...,A k )∼ν A i E U 1Then from the definition of a ε-SV source,E (A 1 ,...A k )∼ν E U 1 ∼P R i = E A i ∼ν E (A 1 ,...,A i−1 ,A i+1 ,...,A k )∼ν A i E U 1 E A i ∼U [N i ] E (A 1 ,...,A i−1 ,A i+1 ,...,A k )∼ν A i E U 1