Optimized communication strategies with binary coherent states over phase noise channels

The achievable rate of information transfer in optical communications is determined by the physical properties of the communication channel, such as the intrinsic channel noise. Bosonic phase-noise channels, a class of non-Gaussian channels, have emerged as a relevant noise model in quantum information and optical communication. However, while the fundamental limits for communication over Gaussian channels have been extensively studied, the properties of communication over Bosonic phase-noise channels are not well understood. Here we propose and demonstrate experimentally the concept of optimized communication strategies for communication over phase-noise channels to enhance information transfer beyond what is possible with conventional methods of modulation and detection. Two key ingredients are generalized constellations of coherent states that interpolate between standard on-off keying and binary phase shift keying formats, and non-Gaussian measurements based on photon number resolving detection of the coherently displaced signal. For a given power constraint and channel noise strength, these novel strategies rely on joint optimization of the input alphabet and the measurement to provide enhanced communication capability over a non-Gaussian channel characterized in terms of the error rate as well as mutual information.


INTRODUCTION
The amount of information that can be transmitted through a physical channel depends on the fundamental properties of the channel 1,2 and the physical states used as information carriers. Recent work has shown that coherent states of light, routinely produced by lasers, can achieve the ultimate limits of information transfer, classical capacity, in communication channels with loss 3 , and phase-insensitive Gaussian noise 4,5 . These results provide strong support for using coherent states as the centerpiece for current and future developments of optical communication networks [6][7][8] . Moreover, beyond the realm of classical communications, coherent states have shown to be of great practical use for quantum communications 9,10 , including quantum key distribution [11][12][13][14][15][16][17][18][19][20] , quantum digital signatures 21 , and quantum fingerprinting 22,23 . However, despite the theoretical breakthroughs in identifying the capacities for phase-insensitive Gaussian channels, finding the ultimate information rates for other channels, such as noisy channels with a specific non-Gaussian noise that may be encountered in different situations, is still an open problem. Moreover, even in channels for which capacity is known, reaching this ultimate rate for reliable communications requires finding the optimal encoding schemes and optimal measurements over the physical information carriers 24,25 . Furthermore, finding optimal encodings and measurements to maximize information transfer in a specific channel with fundamental noise, in addition to technical noise in real devices, would represent a large advance in our understanding of the limits in realistic optical communications.
Quantum mechanics in principle allows for constructing measurements for coherent states surpassing the clas-sical limits of sensitivity and information transfer 2,26 . Discrimination strategies for coherent states based on optimized measurements with photon counting have been proposed 18,[27][28][29][30][31][32][33][34] and demonstrated 1,[35][36][37][38][39][40][41]43,44 to surpass the conventional limits of detection, the quantum noise limit (QNL), and approach the ultimate quantum limit, the Helstrom bound 26 . These nonconventional measurements can enhance information transfer in optical communications 43, 45 and surpass the classical limits of information transfer using joint measurements over sequences of coherent states 24 . Furthermore, photon counting measurements can be optimized to provide inherent robustness against noise and imperfections of realistic systems in communications 1,2 . While these optimized measurements can enhance sensitivities and information transfer with coherent states, the fundamental noise intrinsic in the channel can severely degrade the information encoded in these states. This in turn compromises the potential benefits of these optimized measurements for optical communications 47,48 .
In this work, we investigate a new approach for optimizing communications in a channel with specific intrinsic noise in addition to unavoidable technical noise, with the goal of maximizing sensitivities and information transfer based on non-Gaussian measurements and coherent states. The central concept of this approach consists of finding optimized communication strategies where measurements and coherent state encodings are jointly optimized to become more robust to the specific noise in the channel, and ultimately maximize sensitivities and information transfer over the noisy channel. As a proof-of-concept demonstration, we investigate optimized communication strategies for communications over a noisy channel with phase diffusion [49][50][51] , based on optimized single-shot photon-counting measurements and binary coherent state encodings. Phase diffusion is the most detrimental noise for states of light carrying information in the phase, since it destroys the coherence of the quantum states [52][53][54] . We show that an optimized strategy that simultaneously optimizes the non-Gaussian measurement and the binary state alphabet allows for surpassing the limits in performance of an ideal conventional measurement in terms of probability of error and information transfer per channel use over the non-Gaussian channel.

RESULTS
Optimized strategy for a phase diffusion channel Phase diffusion noise has been extensively investigated in quantum metrology, measurements and communications for phase estimation 49 , interferometry 50 , state discrimination, and information transfer in communication [55][56][57] . This noise is most damaging when information is contained in the coherent properties of the states used as information carriers. In particular, Gaussian phase diffusion makes the task of extracting information more difficult [47][48][49][50][51][52][53][54]58 , degrading measurement sensitivities and lowering the achievable information transfer in coherent communications. As a first step for constructing an optimized communication strategy with binary encoding over a channel with phase diffusion, we consider the optimization of the input alphabet to provide robustness to phase diffusion and to other sources of noise and imperfections. This optimization consists of finding the optimal energy distribution in the alphabet to minimize the detrimental effects of phase diffusion, while allowing for measurements to provide high sensitivity. Figure 1 shows the effect of phase diffusion on three different binary alphabets with coherent states with the same average energy n =n: (a) binary phase-shift keying (BPSK) {| − α , | + α }, with α real and positive; (b) on-off keyed (OOK) alphabet {|0 , | √ 2α }; and (c) a general binary coherent state alphabet {|α 1 , |α 2 }. We observe that phase diffusion affects equally the states {| − α , | + α } in the BPSK alphabet, and dramatically reduces their distinguishability, which causes discrimination errors to become very high. On the other hand, when considering the OOK alphabet {|0 , | √ 2α }, phase diffusion impacts only the state | √ 2α , and leaves the vacuum state |0 unaffected. In this case, their distinguishability weakly depends on the phase noise, highlighting the robustness of this alphabet to phase diffusion noise. Therefore, while BPSK has a smaller overlap and better distinguishability than OOK encoding in the absence of phase noise, OOK states have an overlap independent of the level of phase diffusion. The optimized alphabet {|α 1 , |α 2 } in Fig. 1(c) represents a smooth transition and a tradeoff between BPSK with a high degree of distinguishability for low levels of noise, and OOK which is immune to phase diffusion. Figure 1(c) shows an example of an optimized alphabet {|α 1 , |α 2 } which is optimized under the average energy constraintn = 1 2 (|α 1 | 2 + |α 2 | 2 ) for a given level of the phase noise. The result of this optimization is an alphabet that combines the robustness of OOK with the distinguishability of BPSK.
Optimized non-Gaussian measurements based on photon number resolution (PNR) 2 provide robustness against technical noise and imperfections for the discrimination of BPSK states surpassing the QNL. Optimized communication strategies in a non-Gaussian channel with phase diffusion can combine these measurements with an optimized input alphabet in order to minimize the probability of error in the channel. This strategy then optimizes simultaneously the measurement and the alphabet, resulting in a high degree of robustness to phase diffusion while maintaining the benefits of non-Gaussian measurements for surpassing the limits of conventional measurements. Figure 2(a) shows the concept of an optimized communication strategy for a binary channel with phase diffusion. The sender (Alice) prepares an input state from a coherent state alphabet {|α 1 , |α 2 }, and sends it to the receiver (Bob) though a non-Gaussian noisy channel. Phase diffusion causes the input states {|α k } (k = 1, 2) to become phase diffused mixed states 54 : where the strength of the phase diffusion noise is quan- . For a given level of phase diffusion σ, the strategy simultaneously optimizes the transmitter's alphabet {|α1 , |α2 } and the receiver's discrimination measurement to enhance sensitivities and information transfer through the noisy phase-diffusion channel. (b) Optimized strategy for state discrimination to minimize the probability of error (PE) with PNR(1) for an input alphabet with average powern = 0.5. Probability of error for the optimized strategy (solid blue); for a strategy without input alphabet optimization using BPSK (dashed red); a conventional measurement (CM) with its own optimized alphabet (solid grey); and for the Helstrom measurement with an optimal input alphabet (solid black). (c) Optimized alphabet for the optimized communication strategy (solid blue) and displacement (dashed green) strategy. Note that the optimized alphabet interpolates from BPSK to OOK as the level of phase diffusion σ increases. Parameters for the plots: ideal detection efficiency, no dark counts, and an interference visibility of ξ = 0.998. tified by the width σ of the Gaussian phase distribution.
At the channel output, the receiver implements an optimized single-shot measurement based on photon counting to discriminate these states with high sensitivity 2 . In this strategy, the input stateρ k is displaced in phase space toD(β)ρ k (σ)D † (β), where the displacement op-erationD(β) = e βâ † −β * â withâ (â † ) as the lowering (raising) operator, is implemented by interference of the input state with a displacement field β in a high transmittance beam splitter 59 . Subsequently, the photons in the displaced state are detected by a photon-numberresolving (PNR) detector with photon number resolution PNR(m). Here, m represents the maximum number of photons that a detector can resolve before becoming a threshold detector 2 . This measurement strategy uses a maximum a posteriori (MAP) decision rule to infer the input state based on the photon detection outcome k given the mean photon numbern, displacement field |β|, and photon number resolution PNR(m), for a level of phase noise σ.
The MAP strategy assumes that the correct state is the one with the highest conditional posterior probability P (ρ 1,2 (σ)|β, k, m) obtained through Bayes' rule: Here, P (k|m) is the total probability of detecting k photons given a PNR(m) strategy, and P (k|ρ 1,2 (σ), β, m) is the conditional probability of detecting k photons given |β| and m. We consider equiprobable input states, so that the prior probabilities become P (ρ 1,2 (σ)) = 0.5. The probability of error in the discrimination of the input states for a strategy with PNR(m) is: Here, P (k|ρ i (σ), β, σ, m) is the conditional probability of detecting k photons for the input state given the displacement |β|, noise level σ, and PNR(m). The error probability P E in Eq. (3) depends on the input alphabet, the intrinsic properties of the channel and the measurement performed by the receiver. This provides a way to find optimized strategies that simultaneously optimize the alphabet and the measurement to minimize the detrimental effects of the channel noise. The optimized strategies use an optimal displacementD(β) and an optimal input alphabet {|α 1 , |α 2 } for a given input powern, photon number resolution m, and channel noise level σ to minimize the probability of error P E . Figure 2(b) shows the performance of an optimized communication strategy for a channel with phase diffusion optimized for state discrimination for a strategy with PNR(1) forn = 0.5, with ideal detection efficiency η = 1.0, an interference visibility ξ = 0.998 which quantifies the technical noise and imperfections in the receiver 1,2 , and zero dark count rate ν = 0. To evaluate the performance of this strategy, we compare it with an ideal conventional measurement (CM) consisting of either homodyne or direct detection to minimize the discrimination error, with its own optimized alphabet (solid grey line). We note that the optimized alphabet for the CM results in either BPSK and OOK for this binary coherent state channel, and that it changes abruptly from BPSK to OOK when the conventional measurement switches from homodyne to direct detection.
As shown in Fig. 2(b), while a PNR(1) strategy with BPSK (dashed red) can only outperform the ideal CM for small phase noise σ 54 , optimizing the input alphabet to interpolate between BPSK and OOK, shown in CM for all levels of noise. Moreover, for high levels of noise, the optimized communication strategy approaches the Helstrom measurement with its own optimized alphabet, showing that this optimized communication strategy is asymptotically the optimal quantum measurement.
Optimized communication strategies can also be used to increase information transfer over a noisy channel. These strategies simultaneously optimize the measurement and the input alphabet to maximize mutual information, instead of minimizing probability of error, for a channel with intrinsic noise and technical noise from the devices. Optimized strategies for information transfer for a phase diffusion channel with binary state encoding are in general different from strategies designed for minimum error, as discussed in Section IIIC.

Experimental demonstration
The optimized communication strategies described above can be implemented with current technologies. We demonstrate these strategies in a proof-of-principle experiment for enhancing sensitivities and information transfer for the phase diffusion channel with a binary coherent-state encoding with a PNR non-Gaussian measurement, which provides robustness to technical noise and system imperfections 2 . The experimental realization uses an interferometric setup to implement the optimized strategies. Coherent-state pulses at 633 nm are displaced by interference on a highly transmissive beam splitter, and we use an avalanche photodiode (APD) as a photon number resolving detector. See Ref. 2 for a detailed description. To investigate the optimized communication strategies, a controlled level of the phase-diffusion noise is applied to the input state (see Supplementary Section 1). Our experiment achieves an overall detection efficiency η = 0.72, an interference visibility ξ = 0.998, and a dark count rate ν = 3.6x10 −3 . Technical noise in the experiment such as reduced visibility and dark counts affects the performance of the optimized strategy (see Supplementary Section 2). However, the levels of noise in our experiment only have a small effect on the strategy's performance.
We systematically investigate the optimized communication strategies for a channel with phase diffusion by first studying the performance of optimized PNR measurements with a BPSK alphabet 2 for this channel. Next we investigate the optimized communication strategies with an optimized measurement-alphabet method for enhancing measurement sensitivity. Finally, we investigate optimized communication strategies for maximizing the mutual information for a phase diffusion channel.
Discrimination with a BPSK alphabet under phase diffusion Figure 3 shows the experimental error probabilities for the discrimination of states from a BPSK alphabet with an optimized PNR measurement 2 with photon number resolution PNR(m) of m = 1, 2, 3, for three mean photon numbers: (a)n = 0.5, (b)n = 1, and (c)n = 2. We observe in all cases that while PNR(1) (red dots) outperforms an adjusted homodyne measurement up to a certain level of noise, as discussed in Ref. 54 , increasing photon number resolution to PNR(2) (green dots) and PNR(3) (blue dots) extends the level of noise σ where this optimized measurement 2 outperforms a homodyne measurement. The increase in robustness with PNR against phase diffusion becomes larger as the mean photon number increases. Fig. 3(b) and (c) show that PNR(3) extends the level of noise σ for which this measurement surpasses the homodyne limit by about 1.5 times for n = 1 and about 4 times forn = 2 compared to an on/off PNR(1) strategy.
Discrimination with an optimized alphabet under phase diffusion Phase diffusion severely affects measurements for state discrimination in a BPSK alphabet. To reduce the ef- fects of phase diffusion in the channel, a communication strategy can implement an encoding alphabet which is optimized for a particular level of phase noise. In conventional coherent communication with Gaussian mea-surements, constellation optimization has been used to mitigate some effects of phase noise [55][56][57] . However, in a more general optimized communication strategy using a non-Gaussian measurement, this alphabet can be opti-mized simultaneously with the displaced photon counting measurement to reduce errors and enhance information transfer. Figure 4 shows the performance of the optimized strategy for the discrimination of states from an optimized alphabet with an optimized PNR measurement 2 with PNR(1) and PNR(3), for mean photon numbers (a) n = 0.5, (b)n = 1.0, and (c)n = 2.0. Experimental data is shown with red (green) dots for PNR(1) (PNR(3)), and expected performance is shown in dotted lines. Error bars represent one standard deviation over 5 experimental runs of over 10 5 independent experiments. While a strategy with PNR(3) and a BPSK alphabet (solid green) can only outperform a CM for a limited range of noise levels σ, optimized strategies with optimal alphabets and measurements allow for outperforming the CM over larger ranges of noise σ. Moreover, optimized strategies with PNR(1) surpass the CM for all levels of noise forn = 0.5 andn = 1.0. For highern, increasing number resolution m is expected to enable discrimination below the CM at any noise level, as can be inferred from the trend in Fig. 4(c). Figures 4(d), (e), and (f) show the optimal alphabet forn = 0.5, 1.0, and 2.0, respectively. Discrete jumps in the optimized alphabets for different PNR strategies are the results of optimization of Eq. (3), which requires a global optimization over multiple minima 2 of P E . This optimization searches for the values of |α 1 | and |β| resulting in the global minimum of P E for a given noise level σ for a PNR(m) strategy. There are levels of noise at which a small increase in σ causes the former global minimum of P E as a function of |α 1 | and |β| to become a local minimum, and a former local minimum to become the new global minimum (see Supplementary Section 3). These abrupt changes in the global minimum result in the sudden jumps of the optimal alphabet shown in Fig. 4(e) at σ ≈ 0.36 and σ ≈ 0.38, and in Fig. 4(f) at σ ≈ 0.20 and σ ≈ 0.42. We note that the optimized alphabets correspond to interpolations between BPSK and OOK alphabets for alln, and result in large improvements over BPSK. This shows that strategies with optimized alphabets are essential or surpassing the sensitivity limits of conventional measurements in the channels with phase noise.

Mutual information under phase diffusion
Optimized communication strategies can also be designed to maximize information transfer over a non-Gaussian noisy channel, for which optimal encoding and decoding are unknown. An optimized communication strategy which minimizes probability of error will provide some advantage for increasing mutual information. However, in a noisy channel, the measurement and the alphabet can be optimized in order to maximize mutual information I(X : Y ) and will yield a different strategy than for minimum error. Mutual information quantifies the total amount of information between transmitter and receiver, and depends on the encoding alphabet and decoding measurement. For a displaced photon-counting measurement, I(X : Y ) can be expressed according to a "soft" decision rule where the number of photons detected is used to infer the input symbol rather than the binary output from a binary decision rule 60 . The mutual information for a channel with phase diffusion with a binary coherent state encoding can be expressed as: where P (k|{ρ i (σ)}, β, m) is the conditional probability of detecting k photons. In an optimized communication strategy over a noisy channel the input alphabet and measurement with PNR(m) are simultaneously optimized to maximize mutual information I(n, {ρ i (σ)}, β, m) under the average energy constraint for a noise level σ. Figure 5 shows the experimental results for the mutual information with optimized strategies for mean photon numbers (a)n = 1.0 and (b)n = 2.0, and photon number resolutions PNR(m) m = 1, 3, 5 in red, green, and blue dots, respectively. The theoretical predictions are shown with dashed colored lines. The mutual information for a conventional measurement (dashed grey), and for BPSK are shown adjusted for our total detection efficiency η = 0.72. Optimized communication strate-gies surpass the limit in mutual information for a CM at high levels of phase diffusion noise (σ ≥ 0.7), and for low noise (σ ≤ 0.1). Moreover, optimized strategies with higher PNR detection resolution m provide higher mutual information for all levels of noise. Note that optimized communication strategies with optimized alphabets drastically outperform BPSK for all PNR (m) in terms of mutual information. Fig. 5(c,d) show the optimized alphabets for (c)n = 1.0, and (d)n = 2.0, respectively. We observe that the optimal alphabet interpolates from BPSK to OOK similar to error probability. However this interpolation is continuous, because the mutual information is a convex function of σ for all PNR(m). In the intermediate level of noise (σ ≈ 0.5), there is a gap between the optimized strategies and the CM. This gap decreases as the photon number resolu-tion PNR(m) of the optimized strategies increases. This suggests that optimized communication strategies with high-enough photon number resolution m should provide levels of mutual information at least as high as those that can be achieved with ideal conventional measurements for all levels of phase diffusion noise. Figure 5(e) shows the maximum percent difference R(m) between an optimized strategy with PNR(m) and a CM forn from 0 to 2.0 for different PNR(m) from m=1 to m=20. This corresponds to the percent difference at the level of noise for which a PNR(m) strategy has the worst performance relative to a conventional measurement. R(m) is defined as: where I P N R(m) (σ) is the mutual information for an optimized communication strategy with PNR(m), and I CM (σ) is the mutual information for the conventional measurement. We observe that as the number resolution increases, the percent difference asymptotically approaches zero for all mean photon numbers. The blue regions to the right of the white line correspond to R(m) < 1%, i.e. when a PNR(m) strategy is within 1% of the conventional measurement. Figure 5(f) shows R(m) on a log-log scale forn = 0.5, 1.0, 1.5, and 2.0 in red, green, blue, and black lines, respectively. The straight lines indicate power-law scaling in the convergence of the form a(m) b , with b ≈ 1.1 for all lines. This convergence suggests that for all mean photon numbers, optimized communication strategies with large enough photon resolution m will at worst provide the same mutual information as the ideal CM, which serves as a lower bound for the performance of optimized communication strategies. At the same time, these optimized strategies with moderate photon number resolution provide large advantages for increasing mutual information compared to CM at low noise and high noise levels.

DISCUSSION
We proposed and demonstrated optimized communication strategies to maximize information transfer and measurement sensitivity over a non-Gaussian noisy channel. These optimized strategies are based on simultaneous optimization of the states used as information carriers with an optimized non-Gaussian photon counting measurement that surpasses the QNL for state discrimination. Simultaneous optimization of alphabet and measurement provides robustness to intrinsic channel noise, and allows for overcoming the sensitivity limits of conventional measurements and achieving higher information transfer in communications over noisy channels.
We demonstrated in a proof of principle experiment the concept of optimized strategies for communication over a channel with phase diffusion for binary coherent state alphabets and single-shot optimized measurements with photon number resolution.
These optimized communication strategies provide unexpected benefits to minimize the probability of decoding error and maximize the achievable mutual information in this noisy channel. Moreover, we observed that optimized communication strategies not only provide robustness to intrinsic channel noise, but also to technical noise and imperfections in the receiver.
We expect that optimized communication strategies can provide advantages for different problems in coherent communications extending to communication with multiple states and complex measurements. Moreover, optimized communication strategies can be applied to other channels utilizing practical optimized measurements and encodings to maximize information transfer in realistic noisy communication channels for which capacity limits are unknown, but that are encountered in optical communication networks.

ACKNOWLEDGEMENTS
This work was supported by the National Science Foundation (NSF) (PHY-1653670, PHY-1521016), and the project "Quantum Optical Communication Systems" carried out within the TEAM program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.

AUTHOR CONTRIBUTIONS
F.E.B. and K.B. conceived the idea and supervised the work. L.K. and M.T.D. conducted the theoretical study. M.T.D. designed the experimental implementation and performed the measurements. All authors contributed to the analysis of the theoretical and experimental results and contributed to writing the manuscript.

COMPETING INTERESTS
The authors declare that there are no competing interests.

DATA AVAILABILITY
The data that support the findings of this study are available from the authors upon request.

I. PHASE DIFFUSION PREPARATION AND CALIBRATION
Gaussian phase noise with controllable amplitude and bandwidth is prepared in the input state using an arbitrary function generator and a phase modulator, which modulates the phase of the input state. We use the interference of the input field with the local oscillator (LO) field with a given relative phase to estimate the strength of Gaussian phase noise by observing the photon number distributions with an avalanche photodiode. Figure S1(a) and (b) show examples of the photon number distributions for input states | − α , |iα , | − iα , and |α with n = 2.0, displaced byD(α) without (a) and with (b) phase diffusion with σ = 0.215. Phase diffusion modifies the photon number distribution for different input states, which can be used to estimate the level of induced noise σ. For example, while the input stateρ 1 (0) = | − α −α| is ideally displaced to vacuum byD(α) when σ = 0, phase diffusion modifies the photon number distribution to show support over higher numbers of detected photons, see Fig.  S1(b). The calibration of the phase noise of the input state consists of (1) applying a piecewise constant Gaussian waveform to the phase modulator and estimating the distribution of induced phases, and (2) using a Gaussian fit to estimate the standard deviation σ of the phase distribution, which quantifies the level of phase noise. Figure S1(c-f) shows an example of the calibration of phase diffusion with σ = 0.215 using states | ± iα with mean photon numbern = 2.0. The interference of these states with the LO with phase 0, allows for calibrating the phase noise at relative phases of φ = π/2 and 3π/2, which are the points that provide the highest sensitivity. The piecewise constant phase noise applied to the input state allows for defining time bins over which the relative phase of the input state and the LO is constant. Each time bin has a length of T ≈ 43 ms and contains 500 shots of the experiment all with the same relative phase. For each time bin we measure photon number detections and the relative phase for each time bin is extracted through the mean of the measured photon number distribution during that time bin: n ± = 2η n (1 ± ξsin(φ)) (S1) Figure S1(c) shows the reconstructed relative phase as a function of time with piecewise constant Gaussian phase noise for relative phase φ = π/2. A zoom in time in Fig. S1(d) shows time bins with constant phases over T ≈ 43 ms (500 pulses). Note that the vertical axis has been shifted with respect to φ = π/2 so that it shows deviations from π/2. These phases are expected to be Gaussian distributed, which can be used to estimate and calibrate the level of induced Gaussian phase noise. Figure S1(e) shows the histogram of extracted phase, combined for relative phases φ = π/2 and 3π/2, extracted from the photon number distributions with n = 2.0, for a waveform with amplitude of 1 V from the function generator. The fit to a Gaussian distribution results in a standard deviation of σ = 0.215, which quantifies the level of phase noise for this voltage. We repeat this procedure for different voltage levels of the function generator to calibrate the induced phase noise level as a function of applied voltage. Fig. S1(f) shows the level of phase noise σ as a function of applied voltage from the function generator, showing a linear relationship. A fit to a straight line allows for determining the relation between applied voltage and induced phase, which can be used for precise preparation of phase noise of the input state in the experiment. Note that for levels ν = 3.6x10 −3 and ξ = 0.998 in our experiment there is only a slight degradation in the expected performance of the strategies. Here CM is an ideal conventional measurement.

II. EFFECTS OF DARK COUNTS AND REDUCED VISIBILITY
The performance of a communication strategy depends on the noise and imperfections in any implementation. In our experiment, the main sources of noise and imperfections are detector dark counts and system imperfections resulting in mode mismatch between input stateρ k and displacement field |β , which can be accounted for by a reduced visibility 1 . Fig. S2 shows the expected performance of the optimized communication strategies in the phase diffusion channel for an average photon numbern = 1.0 for different levels of detector dark counts (ν) in Fig. S2(a) and (b) and reduced visibility (ξ) in Fig. S2(c) and (d), for probability of error and mutual information, respectively. An average energy ofn = 1.0 corresponds to an example of the input power in our experimental implementation. We observe that higher dark counts and lower visibility degrade the performance of the optimized strategy, increasing the probability of error and reducing mutual information. However, we observe that for the levels of noise and imperfections in our experiment with a dark count rate ν = 3.6x10 −3 and a visibility ξ = 0.998, these imperfections only have a small effect on the strategy's performance. We also note that the detector's after pulsing in our experiment (≈ 1%) has a very small effect on the optimized strategy, as has been observed for optimized PNR measurements for state discrimination in a pure-loss channel 2 .

III. OPTIMIZED ALPHABET
The optimal alphabets {|α k } (k = 1, 2) for the optimized strategies are obtained by minimizing the probability of error P E (n, {ρ i (σ)}, β, m) in Eq. (3) in the main manuscript. P E depends on the input powers |α k | 2 , the level of noise σ, the optimized displacement field |β|, and the photon number resolution m of the detection strategy PNR(m). P E as a function of the input signal power |α 1 | 2 (|α 2 | 2 =n − |α 1 | 2 ) and |β| 2 for a strategy with number resolution m, PNR(m), is a function with multiple minima, showing a total of m minima. Figure S3 shows an example of the logarithm of the probability of error log 10 (P E ) as a function of |α 1 | 2 and |β| 2 for average photon numbern = 2 for PNR(3), for noise levels (a) σ = 0.15, (b) σ = 0.25, (c) σ = 0.4, and (d) σ = 0.45. Note that for each value of σ there are three minima: two local minima (red circles), and one global minimum (white stars). The probability of error in Fig. S3 accounts for the dark counts ν = 3.6x10 −3 , visibility ξ = 0.998, and detection efficiency η = 0.72 in our experiment. This situation corresponds to the case shown in Fig. 4(c) and (f) in the main manuscript forn = 2, which shows two discrete jumps of the optimized alphabet |α 1 | 2 at σ ≈ 0.2 and at σ ≈ 0.42. We observe in Fig. S3 that by increasing the noise σ from 0.15 to 0.25, there is a change in which minima is the global minimum. This sudden change causes the optimized alphabet {|α k } to show a discrete jump between these two values, as can be seen in Fig. 4(f) in the main manuscript around σ ≈ 0.2. In the same way, by increasing σ from 0.4 to 0.5, there is a change in global minima, causing a discrete jump of {|α k } around σ ≈ 0.42. Our numerical studies show that for n = 1, we expect two discrete jumps in the optimized alphabet at σ ≈ 0.36 and σ ≈ 0.38 (see Fig. 4(e) in the main manuscript). However, forn = 0.5 there is not any change of global minimum of P E for PNR(3), so there are not expected discrete jumps in the optimized alphabet {|α k }, as can be seen in Fig. 4(d) in the main manuscript.
The optimization of the probability of error is a highly nonlinear function, since it is based on the maximum a-posteriori probability criterion 2 . This causes discrete jumps in the optimized alphabet for PNR(m), m > 1. On the other hand, the mutual information I(n, {ρ i (σ)}, β, m) in Eq. (4) in the main manuscript is a smooth function of |α k | 2 and |β| 2 with a single maximum. As a result, the optimized alphabets {|α k } for maximizing mutual information in phase-noise channels do not show any discrete jumps.