Quantum-secure covert communication on bosonic channels

Computational encryption, information-theoretic secrecy and quantum cryptography offer progressively stronger security against unauthorized decoding of messages contained in communication transmissions. However, these approaches do not ensure stealth—that the mere presence of message-bearing transmissions be undetectable. We characterize the ultimate limit of how much data can be reliably and covertly communicated over the lossy thermal-noise bosonic channel (which models various practical communication channels). We show that whenever there is some channel noise that cannot in principle be controlled by an otherwise arbitrarily powerful adversary—for example, thermal noise from blackbody radiation—the number of reliably transmissible covert bits is at most proportional to the square root of the number of orthogonal modes (the time-bandwidth product) available in the transmission interval. We demonstrate this in a proof-of-principle experiment. Our result paves the way to realizing communications that are kept covert from an all-powerful quantum adversary.

Encryption prevents unauthorized access to transmitted information-a security need critical to modern-day electronic communication.Conventional computationally-secure encryption 1,2 , information-theoretic secrecy 3,4 , and quantum cryptography 5 offer progressively higher levels of security.Quantum key distribution (QKD) allows two distant parties to generate shared secret keys over a lossy-noisy channel that are secure from the most powerful adversary allowed by physics.This shared secret, when subsequently used to encrypt data using the one-time-pad cipher 6 , yields the most powerful form of encryption.However, encryption does not mitigate the threat to the users' privacy from the discovery of the very existence of the message itself (e.g., seeking of "meta-data" as detailed in the recent Snowden disclosures 7 ), nor does it provide the means to communicate when the adversary forbids it.Thus, low probability of detection (LPD) or covert communication systems are desirable that not only protect the message content, but also prevent the detection of the transmission attempt.Here we delineate, and experimentally demonstrate, the ultimate limit of covert communication that is secure against the most powerful adversary physically permissible-the same benchmark of security to which quantum cryptography adheres for encrypted communication.
Covert communication is an ancient discipline 8 revived by the communication revolution of the last century.Modern developments include spreadspectrum radio-frequency (RF) communication 9 , where the signal power is suppressed below the noise floor by bandwidth expansion; and steganography 10 , where messages are hidden in fixed-size, finite-alphabet covertext objects such as digital images.We recently characterized the informationtheoretic limit of classical covert communication on an additive white Gaussian noise (AWGN) channel, the standard model for RF channels 11,12 .We showed that the sender Alice can reliably transmit O( √ n) bits to the intended receiver Bob in n AWGN channel uses with arbitrarily low probability of detection by the adversary Willie.Thus, a non-trivial burst of covert bits can be transmitted when n is large.Our work was generalized to other channel settings [13][14][15][16] .Similar square-root laws were also found in steganography 17 , where it was shown that Alice can modify O( √ n) symbols in a covertext of size n, embedding O( √ n log n) hidden bits 10,18-22 .
Optical signaling 23,24 is particularly attractive for covert communication due to its narrow diffractionlimited beam spread in free space 25,26 and the ease of detecting fiber taps using time-domain reflectometry 27 .Our information-theoretic analysis of covert communication on the AWGN channel also applies to a lossy optical channel with additive Gaussian noise when Alice uses a laser-light transmitter and both Bob and Willie use coherent-detection receivers.However, modern high-sensitivity optical communication components are primarily limited by noise of quantum-mechanical origin.Thus, recent studies on the performance of physical optical communication have focused on this quantum-limited regime [28][29][30] .Here we establish the quantum limits of covert communication.We demonstrate that covert communication is impossible over a pure-loss channel.However, when the channel has any excess noise (e.g., the unavoidable thermal noise from the blackbody radiation at the operating temperature), Alice can reliably transmit O( √ n) covert bits to Bob using n optical modes, even if Willie intercepts all the photons not reaching Bob and employs arbitrary quantum memory and measurements.This is achievable using standard laserlight modulation and homodyne detection (thus the Alice-Bob channel is still an AWGN channel).Thus, noise enables stealth.Indeed, if Willie's detector contributes excess noise (e.g., dark counts in photoncounting detectors), Alice can covertly communicate to Bob, even when the channel itself is pure-loss.We also show that the square-root limit cannot be outperformed.We corroborate our theoretical results with a proof-of-concept experiment, where the excess noise in Willie's detection is emulated by dark counts of his single photon detector.This is the first known implementation of a truly quantuminformation-theoretically secure covert communication system that allows communication when all transmissions are prohibited.

INFORMATION-THEORETICALLY COVERT COMMUNICATION
Quantum and classical information-theoretic analyses of covert communication consider the reliability and detectability of a transmission.We introduce these concepts next.
Reliability-We consider a scenario where Alice attempts to transmit M bits to Bob using n optical modes while Willie attempts to detect her transmission attempt.Each of the 2 M possible M -bit messages maps to an n-mode codeword, and their collection forms a codebook.Since we consider single-spatial-mode fiber and free-space optical channels, each of the n modes in the codeword corresponds to a signaling interval carrying one modulation symbol.Desirable codebooks ensure that the codewords, when corrupted by the channel, are distinguishable from one another.This provides reliability: a guarantee that the probability of Bob's error in decoding Alice's message P (b) e < δ with arbitrarily small δ > 0 for n large enough.In practice, error-correction codes (ECCs) are used to enable reliability.
Detectability-Willie's detector reduces to a binary hypothesis test of Alice's transmission state given his observations of the channel.Denote by P FA the probability that Willie raises a false alarm when Alice does not transmit, and by P MD the probability that Willie misses the detection of Alice's transmission.Under the assumption of equal prior probabilities on Alice's transmission state (unequal prior probabilities do not affect the asymptotics 11 ), Willie's detection error probability, P (w) e = (P FA + P MD )/2.Alice desires a reliable signaling scheme that is covert, i.e., ensures P (w) e ≥ 1/2 − ǫ for an arbitrarily small ǫ > 0 regardless of Willie's quantum measurement choice (since P (w) e = 1/2 for a random guess).By decreasing her transmission power, Alice can decrease the effectiveness of Willie's hypothesis test at the expense of the reliability of Bob's decoding.Information-theoretically secure covert communication is both reliable and covert.To achieve it, prior to transmission, Alice and Bob share a secret, the cost of which we assume to be substantially less than that of being detected by Willie.Secretsharing is consistent with other information-hiding systems [10][11][12][18][19][20][21][22] ; however, as evidenced by the recent results for a restricted class of channels 14,15 , we believe that certain scenarios (e.g., Willie's channel from Alice being worse than Bob's) will allow secretless optical covert communication.
Here we outline the theoretical development of quantum-information-theoretically secure covert optical communication.Formal theorem statements are deferred to the Methods, with detailed proofs in the Supplementary Information.
Channel model-Consider a single-mode quasimonochromatic lossy optical channel E nT η b of transmissivity η b ∈ (0, 1] and thermal noise mean photon number per mode nT ≥ 0, as depicted in Figure 1.Willie collects the entire η w = 1 − η b fraction of Alice's photons that do not reach Bob but otherwise remains passive, not injecting any light into the channel.Later we argue that being active does not help Willie to detect Alice's transmissions.For a pure loss channel (n T = 0), the environment input is in the vacuum state ρE 0 = |0 0| E , corresponding to the minimum noise bits be transmitted in n optical modes.However, PPM performs well in the low photon number regime 32 and the symmetry of its symbols enables the use of many efficient ECCs.
To communicate covertly, Alice and Bob use a frames on average, effectively using n = O(1/ √ n) photons per mode.By keeping secret which frames they use, Alice and Bob force Willie to examine all of them, increasing the likelihood of dark counts.An ECC that is known by Willie ensures reliability.However, the transmitted pulse positions are scrambled within the corresponding PPM frames via an operation resembling one-time pad encryption 6 , preventing Willie's exploitation of the ECC's structure for detection (rather than protecting the message content).Theorem 4 demonstrates that, using this scheme, Alice reliably transmits O n Q log Q covert bits at the cost of pre-sharing O n Q log n secret bits.

EXPERIMENTAL RESULTS
Objective and design-To demonstrate the squareroot law of covert optical communication we realized a proof-of-concept test-bed implementation.Alice and Bob engage in an n-mode communication session consisting of n/Q Q-ary PPM frames, Q = 32.As described in the Methods, Alice transmits ζn/Q PPM symbols on average, using a first order Reed-Solomon (RS) code for error correction.RS codes perform well on channels dominated by erasures, which occur in low received-power scenarios, e.g., covert and deep space communication 33 .We defer the specifics of the generation of the transmitted signal to the Methods.We varied n from 3.2 × 10 6 to 3.2 × 10 7 in several communication  Implementation-The experiment was conducted using a mixture of fiber-based and free-space optical elements implementing channels from Alice to both Bob and Willie (see Figure 2 for a schematic).Due to the low intensity of Alice's pulses, direct detection using single photon detectors (SPDs), rather than PNR receivers, was sufficient.Several configurations were considered for implementing the background noise at the receivers.We provided noise only during the gating period of the detectors since continuous  I, where Cs is the per-symbol Shannon capacity 34 .Given the low observed symbol error rate for ζ = 0.25 Q/n, we note that a square root scaling is achievable even using a relatively short RS code; Figure 4 demonstrates that this is achieved covertly.
wave light irradiating Geiger-mode avalanche photodiodes (APDs) suppresses detection efficiency 35 .Instead of providing extraneous optical pulses during the gating window of the APD, we emulated optical noise at the detectors by increasing the detector gate voltage, thus increasing the detector's dark click probability.While the APD dark counts are Poisson-distributed with mean rate nN photons per mode, when nN ≪ 1, the dark click probability 1 − e −nN is close to nN 1+nN , the probability that an incoherent thermal background with mean photon number per mode nN produces a click.In Table I we report the experimentally-observed and targeted values of dark click probabilities p are the quantum efficiencies of Bob's and Willie's detectors, which we do not explicitly calculate.However, quantum efficiency is strongly correlated with the detector's dark click probability 36 .
The amount of transmitted information, with other parameters fixed, is proportional to n(b) det /n

Our choice of n(b)
det ≫ n(w) det allowed the experiment to gather a statistically meaningful data sample in a reasonable duration.In an operational freespace laser communication system, a directional transmitter will likely yield just such an asymmetry in coupling between Bob and Willie; however, we note that the only fundamental requirement for implementing information-theoretically secure covert communication is p Results-Alice and Bob use a (31, 15) RS code. Figure 3 reports the number of bits received by Bob with the corresponding symbol error rate in our experiments, and his maximum throughput from Alice (calculated for each regime using the experimentallyobserved values from Table I).The details of our analysis are in the Methods.Our relatively short RS code achieves between 45% and 60% of the maximum throughput in the "careful Alice" regime and between 55% and 75% of the maximum in other regimes at reasonable error rates, showing that even a basic code demonstrates our theoretical scaling.
Willie's detection problem can be reduced to a test between two simple hypotheses where the loglikelihood ratio test minimizes P (w) e 37 .Figure 4 reports Willie's probability of error estimated from the experiments and the Monte-Carlo study, as well as its analytical Gaussian approximation, with the implementation details deferred to the Methods.Monte-Carlo simulations show that the Gaussian approximation is accurate.More importantly, Figure 4 highlights Alice's safety when she obeys the square root law and her peril when she does not.When remains constant as n increases.However, for asymptotically larger ζ, P (w) e drops at a rate that depends on Alice's carelessness.The drop at ζ = 0.008 vividly demonstrates our converse.

DISCUSSION
We determined that covert communication is achievable provided that the adversary's measurement is subject to non-adversarial excess noise.Excess noise is crucial, as pure loss alone does not allow covert communication, starkly contrasting the QKD scenario.However, the existence of excess noise in practical systems (e.g., blackbody radiation and dark counts) allows covert communication, as demonstrated for the first time in our proof-ofconcept optical covert communication system.Even though our results are for an optical channel, they are relevant to RF communication due to the recent advances in quantum-noise-limited microwavefrequency amplifiers and detectors 38 .Finally, our work provides a significant impetus towards the development of covert optical networks, eventually scaling privacy to large interconnected systems.

Covert Optical Communication Theorems
Here we state our theorems, with proofs deferred to the Supplementary Information.Each theorem can be classified as either an "achievability" or a "converse".Achievability theorems (2, 3, and 4) establish the lower limit on the amount of information that can be covertly transmitted from Alice to Bob, while the converse theorems (1 and 5) demonstrate the upper limit.In essence, the achievability results are obtained by 1. fixing Alice's and Bob's communication system, revealing its construction in entirety (except the shared secret) to Willie; 2. showing that, even with such information, any detector Willie can choose within some natural constraints is ineffective at discriminating Alice's transmission state; and 3. demonstrating that the transmission can be reliably decoded by Bob using the shared secret.
On the other hand, converses are established by 1. fixing Willie's detection scheme (and revealing it to Alice and Bob); and 2. demonstrating that no amount of resources allows Alice to both remain undetected by Willie and exceed the upper limit on the amount of information that is reliably transmitted to Bob.
We start by claiming the inability to instantiate covert communication in the absence of excess noise.
Theorem 1 (Insufficiency of pure-loss for covert communication) Suppose Willie has a pure-loss channel from Alice and is limited only by the laws of physics in his receiver measurement choice.Then Alice cannot communicate to Bob reliably and covertly even if Alice and Bob have access to a preshared secret of unbounded size, an unattenuated observation of the transmission, and a quantumoptimal receiver.
Next we claim the achievability of the square root law when Willie's channel is subject to excess noise.We first consider a lossy optical channel with additive thermal noise, and claim achievability even when Willie has arbitrary resources such as any quantum-limited measurement on the isometric extension of the Alice-to-Bob quantum channel (i.e., access to all signaling photons not captured by Bob).
Theorem 2 (Square root law for the thermal noise channel) Suppose Willie has access to an arbitrarily complex receiver measurement as permitted by the laws of quantum physics and can capture all the photons transmitted by Alice that do not reach Bob.Let Willie's channel from Alice be subject to noise from a thermal environment that injects nT > 0 photons per optical mode on average, and let Alice and Bob share a secret of sufficient length before communicating.Then Alice can lower-bound Willie's detection error probability P (w) e ≥ 1 2 − ǫ for any ǫ > 0 while reliably transmitting O( √ n) bits to Bob in n optical modes even if Bob only has access to a (sub-optimal) coherent detection receiver, such as an optical homodyne detector.
In the remaining theorems Willie's detector is a noisy photon number resolving (PNR) receiver.An ideal PNR receiver is an asymptotically optimal detector for Willie in the pure-loss regime (as discussed in the remark following the proof of Theorem 1 in the Supplementary Information).However, any practical implementation of a PNR receiver has a non-zero dark current.Theorems 3 and 4 show that noise from the resulting dark counts enables covert communication even over a pure-loss channel.We model the dark counts per mode in Willie's PNR detector as a Poisson process with average number of dark counts per mode λ w .
Theorem 3 (Dark counts yield square root law) Suppose that Willie has a pure-loss channel from Alice, captures all photons transmitted by Alice that do not reach Bob, but is limited to a receiver with a non-zero dark current.Let Alice and Bob share a secret of sufficient length before communicating.Then Alice can lower-bound Willie's detection error probability P (w) e ≥ 1 2 − ǫ for any ǫ > 0 while reliably transmitting O( √ n) bits to Bob in n optical modes.
The proof of Theorem 3 demonstrates that O( √ n) covert bits can be reliably transmitted using be onoff keying (OOK) coherent state modulation where Alice transmits the on symbol |α with probability q = O(1/ √ n) and the off symbol |0 with probability 1−q.However, the skewed on-off duty cycle of OOK modulation makes construction of efficient error correction codes (ECCs) challenging.We thus consider pulse position modulation (PPM) which constrains the OOK signaling scheme, enabling the use of many efficient ECCs by sacrificing a constant fraction of throughput.Each PPM symbol uses a PPM frame to transmit a sequence of Q coherent state pulses, |0 . . .|α . . .|0 , encoding message i ∈ {1, 2, . . ., Q} by transmitting |α in the i th mode of the PPM frame.Next we claim that the square root scaling is achievable under this structural constraint.
Theorem 4 (Dark counts yield square root law under structured modulation) Suppose that Willie has a pure-loss channel from Alice, can capture all photons transmitted by Alice that do not reach Bob, but is limited to a PNR receiver with a non-zero dark current.Let Alice and Bob share a secret of sufficient length before communicating.Then Alice can lower-bound Willie's detection error probability P (w) e ≥ 1 2 −ǫ for any ǫ > 0 while reliably transmitting O( n Q log Q) bits to Bob using n optical modes and a Q-ary PPM constellation.
Finally, we claim the unsurmountability of the square root law.We assume non-zero thermal noise (n T > 0) in the channel and non-zero dark count rate (λ w > 0) in Willie's detector.Setting λ w = 0 yields the converse for Theorem 2, and setting nT = 0 yields the converse for Theorems 3 and 4 Setting λ w = 0 and nT = 0 yields the conditions for Theorem 1.To state the theorem, we use the following asymptotic notation 39 : we say f (n) = ω(g(n)) when g(n) is a lower bound that is not asymptotically tight.
Theorem 5 (Converse of the square root law) Suppose Alice only uses n-mode codewords with total photon number variance σ 2 x = O(n).Then, if she attempts to transmit ω( √ n) bits in n modes, as n → ∞, she is either detected by Willie with arbitrarily low detection error probability, or Bob cannot decode with arbitrarily low decoding error probability.
The restriction on the photon number variance of Alice's input states is not onerous since it subsumes all well-known quantum states of a bosonic mode.However, proving this theorem for input states with unbounded photon number variance per mode remains an open problem.
Next we provide details of the experimental methodology.

Alice's encoder
Prior to communication, Alice and Bob secretly select a random subset S of PPM frames to use for transmission: each of the n/Q available PPM frames is selected independently with probability ζ.Alice and Bob then secretly generate a vector k containing |S| numbers selected independently uniformly at random from {0, 1, . . ., Q − 1}, where |S| denotes the cardinality of S. Alice encodes a message into a codeword of size |S| using a Reed-Solomon (RS) code.She adds k modulo Q to this message and transmits it on the PPM frames in S. We note that this is almost identical to the construction of the coding scheme in the proof of Theorem 4 (see the Supplementary Information), with the exception of the use of an RS code for error correction.

Generation of transmitted symbols
Alice generates the length-n binary sequence describing the transmitted signal, with a "1" at a given location indicating a pulse in that mode, and a "0" indicating the absence of a pulse.First, Alice encodes random data, organized into Q-ary symbols, with an RS code and modulo-Q addition of k to produce a coded sequence of Q-ary symbols.The value of the i th symbol in this sequence indicates which mode in the i th PPM symbol in the set S contains a pulse, whereas all modes of the PPM frames not in S remain empty.Mapping occupied modes to "1" and unoccupied modes to "0" results in the desired length-n binary sequence.
To accurately estimate Willie's detection error probability in the face of optical power fluctuations, the length-n binary sequence from above is alternated with a length-n sequence of all "0"s, to produce the final length-2n sequence that is passed to the experimental setup.Willie gets a "clean" look at the channel when Alice is silent using these interleaved "0"s, thus allowing the estimation of both the false alarm and the missed detection probabilities under the same conditions.Bob simply discards the interleaved "0"s.

Bob's decoder
Bob examines only the PPM frames in S. If two or more pulses are detected in a PPM frame, one of them is selected uniformly at random.If no pulses are detected, it is labeled as an erasure.After subtracting k modulo Q from this vector of PPM symbols (subtraction is not performed on erasures), the resultant vector is passed to the RS decoder.
For each experiment we record the total number of bits in the successfully-decoded codewords; the undecoded codewords are discarded.For each pair of parameters (ζ, n) we report the mean of the total number of decoded bits over 100 experiments.The reported symbol error rate is the total number of lost data symbols during all the experiments at the specified communication regime divided by the total number of data symbols transmitted during these experiments.The calculation of the theoretical channel capacity is presented in the Supplementary Information.

Estimation of P (w)
e -The test statistic for the loglikelihood ratio test is defined as: where f 0 (x w ) and f 1 (x w ) are the likelihood functions of the click record x w corresponding to Alice being quiet and transmitting, y is the number of clicks Willie observes in the i th PPM frame, and p (w) r = 1 − e −ηw n is the probability of Willie observing a click stemming from Alice's transmission.Equation (1) is derived in the Supplementary Information.Willie calculates L using equation (1) and compares it to a threshold S, accusing Alice if L ≥ S. Willie chooses the value of S that minimizes Willie's detection error probability P (w) e .For each pair of parameters (n, ζ) as well as Alice's transmission state, we perform m experiments, obtaining a sample vector y w from each experiment and calculating the log-likelihood ratio L using (1).We denote by m ] the vectors of experimentally observed log-likelihood ratios when Alice does not transmit and transmits, respectively.
To estimate Willie's probability of error P (w) e , we construct empirical distribution functions ∈ A denotes the indicator function.The estimated probability of error is then Monte-Carlo simulation and Gaussian approximation-We perform a Monte-Carlo study using 10 5 simulations per (n, ζ) pair.We generate, encode, and detect the messages as in the physical experiment, and use equation (2) to estimate Willie's probability of error, but simulate the optical channel induced by our choice of a laser-light transmitter and an SPD using its measured characteristics reported in Table I.Similarly, we use the values in Table I for our analytical Gaussian approximation of P (w) e described in the Supplementary Information.
Confidence intervals-We compute the confidence intervals for the estimate in equation (2) using Dvoretzky-Keifer-Wolfowitz inequality 40,41 , which relates the distribution function F X (x) of random variable X to the empirical distribution function Fm (x) = 1 m m i=1 1 Xi≤x (x) associated with a sequence {X i } m i=1 of m i.i.d.draws of the random vari-able X as follows: where ξ > 0. For x 0 , the (1 − α) confidence interval for the empirical estimate of F (x 0 ) is given by [max{ Fm (x 0 ) − ξ, 0}, min{ Fm (x 0 ) + ξ, 1}] where . Thus, ±ξ is used for reporting the confidence intervals in Figure 4. *  ⊗n .Before proceeding with the proof, we prove the following lemma: Lemma 2 Given the input of n-mode vacuum state |0 E n on the "environment" port and an n-mode entangled state |ψ on the "Alice" port of a beamsplitter with transmissivity η b = 1 − η w , the diagonal elements of the output state ρ W n on the "Willie" port can be expressed in the n-fold Fock state basis as follows: Proof.A beamsplitter can be described as a unitary transformation U BS from the two input modes (Alice's and the environment's ports) to the two output modes (Bob's and Willie's ports).Given a Fock state input |t A on Alice's port and vacuum input |0 E on the environment's port, the output at Bob's and Willie's ports is described as follows 44 (Section IV.D): Thus, which implies . Now, the partial trace of the output state ρ BW = |φ W n B n over Bob's system reveals Willie's output state: where equation ( S2) is due to the orthogonality of the Fock states.Thus, where ) and substituting the right-hand side (RHS) of (S4) into equation (S3) yields where equation ( S5) is due to η w ∈ [0, 1).
Proof.(Theorem 1) Alice sends one of 2 M (equally likely) M -bit messages by choosing an element from an arbitrary codebook { ρA n x , x = 1, . . ., 2 M }, where a state ρA n is a general n-mode pure state, where |k ≡ |k 1 ⊗ |k 2 ⊗ • • • ⊗ |k n is a tensor product of n Fock states.We limit our analysis to pure input states since, by convexity, using mixed states as inputs can only degrade the performance (since that is equivalent to transmitting a randomly chosen pure state from an ensemble and discarding the knowledge of that choice).
Let Willie use an ideal SPD on all n modes, given by positive operator-valued measure (POVM) . When W u is transmitted, Willie's hypothesis test reduces to discriminating between the states where ρW n u is the output state of a pure-loss channel with transmissivity η w corresponding to an input state ρA n u .Thus, Willie's average error probability is: since messages are sent equiprobably.Note that the error is entirely due to missed codeword detections, as Willie's receiver never raises a false alarm.By Lemma 2, is Bob's average probability of error when Alice only sends messages W u and W g(u) equiprobably.We thus reduce the analytically intractable problem of discriminating between many states in equation (S12) to a quantum binary hypothesis test.
The lower bound on the probability of error in discriminating two received codewords is obtained by lowerbounding the probability of error in discriminating two codewords before they are sent (this is equivalent to Bob having an unattenuated unity-transmissivity channel from Alice).Recalling that ρA n u = |ψ u A n A n ψ g(u) are pure states, the lower bound on the probability of error in discriminating between ψ A n u and ψ A n g(u) is 42 (Chapter IV.2 (c), Equation (2.34)): where is the fidelity between the pure states |ψ and |φ .Lower-bounding A n lower-bounds the RHS of equation (S17).For pure states |ψ and |φ , , where ρ − σ 1 is the trace distance 45 (Equation (9.134)).Thus, where the inequality is due to the triangle inequality for trace distance.Substituting (S18) into (S17) yields: A n 2

. (S19)
Since = |a 0 (u)| 2 and, by the construction of A, 1 − |a 0 (u)| 2 ≤ 4ǫ ηw and 1 − |a 0 (g(u))| 2 ≤ 4ǫ ηw , we have: Recalling the definition of P e (u) in equation (S16), we substitute (S20) into (S15) to obtain: Now, re-stating the condition for covert communication (S11) yields: with equality (S22) due to 1 − |a 0 (u)| 2 > 4ǫ ηw for all codewords in A by the construction of A. Solving inequality in (S22) for |A| 2 M yields the lower bound on the fraction of the codewords in A, |A| Combining equations (S21) and (S23) results in a positive lower bound on Bob's probability of decoding error P e ≥ 1 4 − ǫ ηw for ǫ ∈ 0, ηw 16 and any n, and demonstrates that reliable covert communication over a pure-loss channel is impossible.
Remark -The minimum probability of discrimination error between the states given by equations (S6) and (S7) satisfies 46 (Section III): , the error probability for the SPD is at most twice that of an optimal discriminator.Thus, the SPD is an asymptotically optimal detector when the channel from Alice is pure-loss.Since the photon number resolving (PNR) receiver, given by the POVM elements {|0 0| , |1 1| , |2 2| , . ..} ⊗n , could be used to mimic the SPD with the detection event threshold set at one photon, the PNR receiver is also asymptotically optimal in this scenario.
Theorem 3 (Square root law for the thermal noise channel) Suppose Willie has access to an arbitrarily complex receiver measurement as permitted by the laws of quantum physics and can capture all the photons transmitted by Alice that do not reach Bob.Let Willie's channel from Alice be subject to noise from a thermal environment that injects nT > 0 photons per optical mode on average, and let Alice and Bob share a secret of sufficient length before communicating.Then Alice can lower-bound Willie's detection error probability P (1 + n0 ) 1+n ln nn where (S25) is due to the geometric series . .α n is an n-mode tensor-product coherent state.The codebook is used only once to send a single message and is kept secret from Willie, though he knows how it is constructed.
Analysis (Willie): Since Willie does not have access to Alice's codebook, Willie has to discriminate between the following n-copy quantum states: Willie's average probability of error in discriminating between ρ⊗n 0 and ρ⊗n 1 is 45 (Section 9.1.4): where the minimum in this case is attained by a PNR detection.The trace distance ρ0 − ρ1 1 between states ρ1 and ρ1 is upper-bounded the quantum relative entropy (QRE) using quantum Pinsker's Inequality 45 (Theorem 11.9.2) as follows: ρ0 − ρ1 1 ≤ 2D(ρ 0 ρ1 ), Thus, QRE is additive for tensor product states: By Lemma 5, The first two terms of the Taylor series expansion of the RHS of (S28) with respect to n at n = 0 are zero and the fourth term is negative.Thus, using Taylor's Theorem with the remainder, we can upper-bound equation (S28) by the third term as follows: Combining equations (S26), (S27), and (S29) yields: ensures that Willie's error probability is lower-bounded by P (w) e ≥ 1 2 − ǫ over n optical modes.Analysis (Bob): Suppose Bob uses a coherent detection receiver.A homodyne receiver, which is more efficient than a heterodyne receiver in the low photon number regime 28 , induces an AWGN channel with noise power e by 11 (Equation ( 9)): where B is the number of transmitted bits.Substitution of n from (S31) into (S32) shows that O( √ n) bits can be covertly transmitted from Alice to Bob with P (b) e < δ for arbitrary δ > 0 given large enough n.
Before proving Theorems 7 and 8, we state a lemma that is used in their proofs.
Lemma 6 (Classical relative entropy bound on P e of binary hypothesis test) Denote by P 0 and P 1 the respective probability distributions of observations when H 0 and H 1 is true.Assuming equal prior probabilities for each hypothesis, the probability of discrimination error is P e ≤ 1 2 − 1 8 D(P 0 P 1 ), where D(P 0 P 1 ) = − x p 0 (x) ln p1(x) p0(x) is the classical relative entropy between P 0 and P 1 and p 0 (x) and p 1 (x) are the respective probability mass functions of P 0 and P 1 .
Theorem 7 (Dark counts yield square root law) Suppose that Willie has a pure-loss channel from Alice, captures all photons transmitted by Alice that do not reach Bob, but is limited to a receiver with a non-zero dark current.Let Alice and Bob share a secret of sufficient length before communicating.Then Alice can lower-bound Willie's detection error probability P Proof.Construction: Let Alice use a coherent state on-off keying (OOK) modulation {π i , |ψ i ψ i |}, i = 1, 2, where π 1 = 1 − q, π 2 = q, |ψ 1 = |0 , |ψ 2 = |α .Alice and Bob generate a random codebook with each codeword symbol chosen i.i.d.from the above binary OOK constellation.Analysis (Willie): Willie records vector y w = [y 1 , . . ., y n ], where y i is the number of photons observed in the i th mode.Denote by P 0 the distribution of y w when Alice does not transmit and by P 1 the distribution when she transmits.When Alice does not transmit, Willie's receiver observes a Poisson dark count process with rate λ w photons per mode.Thus, {y i } is independent and identically distributed (i.i.d.) sequence of Poisson random variables with rate λ w , and P 0 = P n w where P w = Poisson(λ w ).When Alice transmits, although Willie captures all of her transmitted energy that does not reach Bob, he does not have access to Alice's and Bob's codebook.Since the dark counts are independent of the transmitted pulses, each observation is a mixture of two independent Poisson random variables.Thus, each y i ∼ P s is i.i.d., with P s = (1 − q)Poisson(λ w ) + qPoisson(λ w + η w |α| 2 ) and P 1 = P n s .By Lemma 6, P ≥ 1 2 − 1 8 D(P 0 P 1 ).Since the classical relative entropy is additive for product distributions, D(P 0 P 1 ) = nD(P w P s ).Now, where the inequality is due to the Taylor's Theorem with the remainder applied to the Taylor series expansion of equation (S33) with respect to q at q = 0. Thus, Therefore, to ensure that P e ≤ e B−nE0 , where B is the number of transmitted bits, and E 0 is: The Taylor series expansion of E 0 with respect to q at q = 0 yields E 0 = qC + O(q 2 ), where i,j it the number of photons observed in the j th mode of the i th PPM frame.Denote by P 0 the distribution of y w when Alice does not transmit and by P 1 the distribution when she transmits.When Alice does not transmit, Willie's receiver observes a Poisson dark count process with rate λ w photons per mode, implying that y w is a vector of nQ i.i.d.Poisson(λ w ) random variables.Therefore, {y (w) i } is i.i.d. with y (w) i ∼ P w and P 0 = P n w , where P w is the distribution of Q i.i.d.Poisson(λ w ) random variables with p.m.f.: When Alice transmits, by construction, each PPM frame is randomly selected for transmission with probability ζ.In each selected PPM frame, a pulse is transmitted using one of Q modes chosen equiprobably.Therefore, in this case {y (w) i } is also i.i.d. with y (w) i ∼ P s and P 1 = P n s , where the p.m.f. of P s is: The symmetry of this channel yields the Shannon capacity 34 C s = I(X; Y ), where P(X = x) = 1 Q for x = 1, . . ., Q and I(X; Y ) is the mutual information between X and Y .We use the experimentally-observed values from Table I to compute C s for each regime, and plot Csζn Q since ζn Q is the expected number of PPM frames selected for transmission.

Derivation of the log-likelihood ratio test statistic
The log-likelihood ratio test statistic is given by L = f1(xw) f0(xw) , where f 0 (x w ) and f 1 (x w ) are the likelihood functions of the click record x w corresponding to Alice being quiet and transmitting.Click record x w contains Willie's observations of each PPM frame on his channel from Alice x w = [x i,j ∈ {0, 1}, where "0" and "1" indicate the absence and the presence of a click, respectively.When Alice does not transmit, Willie only observes dark clicks.Thus, each x w is a vector of i.i.d.Bernoulli p (w) D random variables.The likelihood function of x w under H 0 is then: Now consider the scenario when Alice transmits.The secret shared between Alice and Bob identifies the random subset S of the PPM frames used for transmission, and a random vector k which is modulo-added to the codeword.Modulo addition of k effectively selects a random pulse location within each PPM frame.Note that, while both the construction in the proof of Theorem 4 and Alice's encoder described in the Methods generate S first and then k, the order of these operations can be reversed: we can first fix a random location of a pulse in each of n/Q PPM frames, and then select a random subset of these frames.Consider Willie's observation of the i th PPM frame, and suppose that the l th mode is used if the frame is selected for transmission.Denote the probability of Willie's detector observing Alice's pulse by p (w) r = 1 − e −n (w) det .By construction, frames are selected for transmission independent of each other with probability ζ.Willie's detector registers a click on this mode when one of the following disjoint events occurs: • PPM frame is selected and pulse is detected by Willie (probability ζp (w) r ); 25 Q/n 9.15 × 10 −5 0.036 2.99 × 10 −6 1.52 ζ = 0.03 4 Q/n 9.11 × 10 −5 0.032 2.55 × 10 −6 1.14 ζ = 0.003 9.29 × 10 −5 0.032 2.65 × 10 −6 1.07 ζ = 0.008 9.27 × 10 −5 0.028 2.68 × 10 −6 1.05 Target: 9 × 10 −5 0.03 3 × 10 −6 1.4 regimes: "careful Alice" (ζ = 0.25 Q/n), "careless Alice" (ζ = 0.03 4 Q/n), and "dangerously careless Alice" (ζ = 0.003 and ζ = 0.008).For each (n, ζ) pair we conducted 100 experiments and 10 5 Monte-Carlo simulations, measuring Bob's total number of bits received and Willie's detection error probability.

FIG. 3 .
FIG.3.Bits decoded by Bob.Each data point is an average from 100 experiments, with negligibly small 95% confidence intervals.The symbol error rates are: 1.1 × 10 −4 for ζ = 0.25 Q/n, 8.3 × 10 −3 for ζ = 0.03/ 4 Q/n, 4.5×10 −3 for ζ = 0.003, and 1.8×10 −2 for ζ = 0.008.We also report the maximum throughput Csζn Q computed in the Methods using the experimentally-observed values from TableI, where Cs is the per-symbol Shannon capacity34 .Given the low observed symbol error rate for ζ = 0.25 Q/n, we note that a square root scaling is achievable even using a relatively short RS code; Figure4demonstrates that this is achieved covertly.
s and Willie's detectors, as well as the mean number of photons detected by Bob n(b)det = η b η det = (1 − η b )η (w)QE n, where n = 5 is the mean photon number of Alice's pulses, η b = 0.97 is the fraction of light sent to Bob, and η FIG. 4. Willie's error probability.Estimates from 100 experiments have solid fill; estimates from 10 5 Monte-Carlo simulations have clear fill; and Gaussian approximations are lines.The 95% confidence intervals (computed in the Methods) for the experimental estimates are ±0.136; for the Monte-Carlo simulations they are ±0.014.Alice transmits ζn/Q PPM symbols on average and Willie's error probability remains constant when Alice obeys the square root law and uses ζ = O( Q/n); it drops as n increases if Alice breaks the square root law by using an asymptotically larger ζ.

4η b.
Since Alice uses Gaussian modulation with symbol power n defined in equation (S31), we can upper-bound P (b) for any ǫ > 0 while reliably transmitting O( √ n) bits to Bob in n optical modes.
ǫ, Alice sets q = 4ǫ n e (ηw|α| 2 ) 2 /λw − 1 .(S35) Analysis (Bob): Suppose Bob uses a practical single photon detector (SPD) receiver with probability of a dark click per mode p (b) D .This induces a binary asymmetric channel between Alice and Bob, where the click probabilities, conditional on the input, are P(click | input |0 ) = p (b) D and P(click | input |α ) = 1 − e −η b |α| 2 (1 − p (b) D ), with the corresponding no-click probabilities P(no-click | input |0 ) = 1 − p (b) D and P(no-click | input |α ) = e −η b |α| 2 (1 − p (b) D ).At each mode, a click corresponds to "1" and no-click to "0".Let Bob use a maximum likelihood decoder on this sequence.Then the standard upper bound on Bob's average decoding error probability is 48 P (b) relative entropy is additive for product distributions, D(P 0 P 1 ) = n Q D(P w P s ).Now, denoting by x = [x 1 , • • • , x Q ] where x j ∈ N 0 , we have:D(P w P s ) = − (ηw|α| 2 ) 2 /λw − 1 2Qwhere the inequality is due to the Taylor's Theorem with the remainder applied to the Taylor series expansion of equation (S38) with respect to ζ at ζ = 0.By Lemma 6, ζ = 4ǫQ n(e (ηw |α| 2 ) 2 /λw −1) ensures that Willie's error probability is lower-bounded by P (w) e ≥ 1 2 − ǫ.Analysis (Bob): As in the proof of Theorem 7, Bob uses a practical SPD receiver with probability of a dark click p (b) D .Bob examines only the PPM frames in S. If two or more clicks are detected in a PPM frame, a PPM symbol is assigned by selecting one of the clicks uniformly at random.If no clicks are detected, the PPM frame is labeled as an erasure.After subtracting k modulo Q from this vector of PPM symbols (subtraction is not performed on erasures), the resultant vector is passed to the decoder.A random coding argument43 (Theorem 5.6.2) yields reliable transmission ofO n Q log Q covert bits.Theorem 9 (Converse of the square root law) Suppose Alice only uses n-mode codewords with total photon number variance σ 2x = O(n).Then, if she attempts to transmit ω( √ n) bits in n modes, as n → ∞, she is either detected by Willie with arbitrarily low detection error probability, or Bob cannot decode with arbitrarily low decoding error probability.Proof.As in the proof of Theorem 1, Alice sends one of 2 M (equally likely) M -bit messages by choosing an element from an arbitrary codebook { ρA nx , x = 1, . . ., 2 M }, where a state ρAn x = |ψ x A n A n ψ x | encodes an M -bit message W x .|ψ x A n = k∈N n 0 a k (x)|k is a general n-mode pure state, where |k ≡ |k 1 ⊗II.DETAILS OF THE EXPERIMENTAL METHODSHere we provide the mathematical details of the experimental methods.Calculation of Bob's maximum throughputThe Q-ary PPM signaling combined with Bob's device for assigning symbols to received PPM frames induces a discrete memoryless channel described by a conditional distribution P(Y |X), where X ∈ {1, . . ., Q} is Alice's input symbol and Y ∈ {1, . . ., Q, E} is Bob's output symbol with E indicating an erasure.Since Bob observes Alice's pulse with probability 1 − e −n (b) det , P(Y |X) is characterized as follows: P(Y = x|X = x) = 1 − e −n (b) = E|X = x) = e −n (b) det 1 − p (b) D Q P(Y = y, y / ∈ {x, E}|X = x) = 1 − P(Y = x|x) − P(Y = E|x) Q − 1 , . . ., x (w) i,Q ] contains the observation of the i th PPM frame with x (w) Channel model.The input-output relationship is captured by a beamsplitter of transmissivity η b , with the transmitter Alice at one of the input ports and the intended receiver Bob at one of the output ports, and η b being the fraction of Alice's signaling photons that reach Bob.The other input and output ports of the beamsplitter correspond to the environment and the adversary Willie.Willie collects the entire ηw = 1 − η b fraction of Alice's photons that do not reach Bob.This models single-spatial-mode free-space and single-mode fiber optical channels.Alice and Bob share a secret before the transmission.thechannelmustinject to preserve the Heisenberg inequality of quantum mechanics.Pure loss insufficient for covert communication-Regardless of Alice's strategy, reliable and covert communication over a pure-loss channel to Bob is impossible.Theorem 1 in the Methods demonstrates that Willie can effectively use an ideal single photon detector (SPD) on each mode to discriminate between an n-mode vacuum state and any non-vacuum state in Alice's codebook.Willie avoids false alarms since no photons impinge on his SPD when Alice is silent.However, a single click -detection of one or more photons- In the proof of this theorem we denote a tensor product of n Fock (or photon number) states by |u ≡ |u 1 ⊗ |u 2 ⊗ • • • ⊗ |u n , where vector u ∈ N n 0 and N 0 is the set of non-negative integers.Specifically, |0 ≡ |0 1 Alfred J. Menezes, Scott A. Vanstone, and Paul C. Van Oorschot, Handbook of Applied Cryptography, 1st ed.(CRC Press, Inc., Boca Raton, FL, USA, 1996) 2 J. Talbot and D.J.A. Welsh, Complexity and Cryptography: An Introduction (Cambridge University 20 Andrew D. Ker, "The square root law requires a linear key," in Proceedings of the 11th ACM workshop on Multimedia and Security, MM&Sec '09 (Princeton, Theorem 8 (Dark counts yield square root law under structured modulation) Suppose that Willie has a pure-loss channel from Alice, can capture all photons transmitted by Alice that do not reach Bob, but is limited to a PNR receiver with a non-zero dark current.Let Alice and Bob share a secret of sufficient length before communicating.Then Alice can lower-bound Willie's detection error probability P Alice encodes a message into a codeword of size |S| using an ECC that may be known to Willie.She adds k modulo Q to this message and transmits it on the PPM frames in S.Analysis (Willie): Willie detects each PPM frame received from Alice, recording the photon counts in y w = [y e < δ for arbitrary δ > 0 given large enough n.Proof.Construction: Prior to communication, Alice and Bob secretly choose a random subset S of PPM frames to use for transmission by selecting each of n/Q available PPM frames independently with probability ζ.Alice and Bob then secretly generate a vector k containing |S| numbers selected independently uniformly at random from {0, 1, . . ., Q − 1}, where |S| denotes the cardinality of S.