Genetic-optimised aperiodic code for distributed optical fibre sensors

Distributed optical fibre sensors deliver a map of a physical quantity along an optical fibre, providing a unique solution for health monitoring of targeted structures. Considerable developments over recent years have pushed conventional distributed sensors towards their ultimate performance, while any significant improvement demands a substantial hardware overhead. Here, a technique is proposed, encoding the interrogating light signal by a single-sequence aperiodic code and spatially resolving the fibre information through a fast post-processing. The code sequence is once forever computed by a specifically developed genetic algorithm, enabling a performance enhancement using an unmodified conventional configuration for the sensor. The proposed approach is experimentally demonstrated in Brillouin and Raman based sensors, both outperforming the state-of-the-art. This methodological breakthrough can be readily implemented in existing instruments by only modifying the software, offering a simple and cost-effective upgrade towards higher performance for distributed fibre sensing.

where n is an integer variable linearly related to the time t through the sampling rate , as = / , and is the total number of sampled points representing the data length of ( ), ( ) and ( ).
Supplementary Fig. 1. Schematic illustration for the working principle of the single-pulse system.
For a linear time-invariant (LTI) system, the single-pulse response ( ) can further be expressed by the linear convolution 1 between a single-pulse signal ( ) (see green dots in Supplementary Fig. 1) and the fibre impulse response ℎ( ), so that Supplementary Eq. (1) can be rewritten as: ( ) = ( )⨂ℎ( ) + ( ) (2) where sign ⨂ denotes linear convolution. Defining a pulse duration and a fibre length ℎ , the number of Note that this process requires to pad ( ) and ℎ( ) with − and − ℎ zeros, in order to increase their length to before performing the DFT.
Since ( ) represents a single realization of the additive white Gaussian noise (AWGN), which is a widesense stationary real-valued random process 1 , it can be characterized by its variance 2 expressed as: where (0) represents the autocorrelation coefficient of ( ) at zero delay.

Proposed pulse-coded system
Similar to the aforementioned single-pulse interrogation, for a given -point coded optical pulse sequence ( ) (see Fig. 1 in Results) launched into the sensing fibre, the measured response ( ) can be respectively described in discrete-time and discrete-frequency domains as: (1), (2) and (4), so that it shares the same variance 2 as that of ( ). It must be mentioned that this is true only when the thermal noise is dominant, so that the variance of the detection noise is not affected by the power of signal reaching the photo-detector, i.e. not affected by single pulse or code sequence used for the interrogation.
This condition can be realised by carefully designing the energy enhancement factor defined in Results, as will be elaborated in Supplementary notes 3 and 4. Knowing that ( ) can further be expressed by the linear convolution between ( ) (See Fig.1 in Results) and the single pulse ( ) , supplementary Eq. (5) can be rewritten as: where ( ) = ( ) ( ) , which can be obtained by retrieving the energy of each optical pulse in ( ) measured at the fibre input. This way the following decoding process can be performed: resulting in a decoded single-pulse response ( ) that consists of the targeted single-pulse response ( ) and a noise term that is affected by ( ). The measure of whether the decoder ( ) attenuates or magnifies the original input noise is given by the variance of the noise after decoding, which can be characterized based on Supplementary Eq. (4): is defined here as the noise scaling factor that should be minimized to achieve a high SNR. Knowing that ( ) is -point upsampling of -point signal ( ) as shown in Fig. 1 and expressed here: | ( )| is therefore equivalent to replicas of | ( )|, as shown in Supplementary Fig. 2 and proved here: It is clear that the mean value of |1/ ( )| 2 is identical to that of |1/ ( )| 2 regardless of the upsampling points .
Through this relation, the noise scaling factor can be rewritten as: Given the energy enhancement factor = ∑ ( ) =1 and the code length , in | ( )| 2 there is a fixed main lobe centred at = 0 with a peak value of | (0)| 2 ≡ 2 , covering the frequency interval [− /(2 ), /(2 )], as shown in Supplementary Fig. 3(a). This property results in a fixed dip at the centre of |1/ ( )| 2 , with an approximate power (integrated value over the mentioned frequency interval) of 3 /(2 2 ), as shown by the shaded area in Supplementary Fig. 3 Since equals to the mean value of |1/ ( )| 2 (blue curve in Supplementary Fig. 3(b)), it can be decomposed as: where the first right-hand term represents the mean value of the shaded area in Supplementary Fig. 3(b), which remains fixed for given and (i.e. independent of the detailed code distribution); the second right-hand term represents the mean value outside the shaded area, which depends on the detailed code distribution and is more critical for the determination of the minimum value of . Applying Cauchy inequality to the second righthand term of Supplementary Eq. (13): and making use of the equality given by the Parseval's theorem 1 : the following relation can be retrieved: which further results in: This approximation requires > 20, which is a reasonable condition to make the implementation of coding techniques meaningful for distributed sensors (sizeable SNR improvement). Note that this minimum value of can be reached only if the off-peak spectral region of | ( )| 2 (see supplementary Fig. 3) is perfectly flat as required by the Cauchy inequality in Supplementary Eq. (14), which is theoretically impossible to obtain 2 .
However, the flatter the spectrum, the closer between and the right-hand term of Supplementary Eq. (18), for any given and . Taking this into account and based on the expression of the coding gain (i.e. = √1/ , as defined in Results), the following relation can be obtained: in which the right-hand term represents the theoretical maximum of the coding gain .

Supplementary note 2: Distributed genetic algorithm (DGA)
In order to evaluate the efficiency and effectiveness of the proposed DGA method, a simple brute-force method has been implemented for comparison, targeting a unipolar binary code sequence with total bit number = 120 and energy enhancement factor = 40. More than 100 million sequences are randomly generated in a 10-hour brute-force searching, and the corresponding probability distribution of coding gain = √1/ is shown in Supplementary Fig. 4, where the red vertical line represents the standard reference coding gain = √ /2 ≈ 4.47. Results show that most values of are less than 3 and clustered around 1.9, while the largest reaches only 3.84 that is still much smaller than , indicating that the brute-force method is impractical to search for an acceptable sequence that can approach the coding gain provided by conventional codes (e.g. Simplex and Golay).
Supplementary Fig. 4. a probability distribution of resulted from 10-hours brute-force searching and b probability of distributed from 3.6 to 5 (zoom-in of Supplementary Fig. 4a).
Compared with the brute-force method, DGA has been demonstrated as an effective way to search the optimal/sub-optimal sequence. Supplementary Fig. 5 shows the distribution searched by DGA at the initial generation and 1 st , 3 rd , 16 th , 74 th , and 144 th iterations, in which each figure contains an inset showing a zoom-in ranging from 2.972 to . It can be found that the distribution in the initial generation is similar to the one shown in Supplementary Fig. 4 due to the same random initialization process. This distribution changes in the evolutionary process, and eventually converges at an optimal = 4.35 that is 97% of (0.12 dB less than ). Whilst obtaining much better performing , the computational time of DGA searching is ~135 minutes, being order-of-magnitude more time-efficient than the brute-force method.

Supplementary note 3: Noise analysis for unipolar coded BOTDA
In this note the impact of detection noise on general unipolar coded-BOTDA systems is analytically modelled, aiming to provide a quantitative guideline for the optimisation of the energy enhancement factor (defined in Results) under given experimental conditions. It turns out that some Brillouin-gain-dependent noises that are negligible in long-range single-pulse BOTDA, may however be significantly enhanced (even becoming dominant) in unipolar coded-BOTDA due to the large cumulated Brillouin gain. This noise enhancement may substantially compromise the SNR improvement brought by the theoretical coding gain , depending on the value of . In other words, must be properly designed to secure the system operating at a desired .
The analysis is here performed based on the setup shown in Fig. 5, in which a polarization scrambler is employed to mitigate the impact of polarization pulling effects 3,4 and a pre-amplifier (EDFA2) is used to enhance the SNR 5 . The noise model is established by assuming that the BFS profile is uniform, corresponding to the worstcase scenario, i.e., the negative impact of the Brillouin-gain dependent noises is the most detrimental. Considering both the optical noise and detector noise, for the  th single acquisition (i.e. non-averaged), the photocurrent of the detected optical signal (raw coded BOTDA trace) at the Brillouin resonance can be expressed as 3 : where stands for the sampling time, ≈ 1 A/W is the responsivity of the photodiode, is the EDFA gain, is the power of DC-probe entering the EDFA; , −ASE and − are photo-detection current noise, signal-ASE beating noise and signal-SpBS beating noise, respectively; ( , ) represents the net Brillouin gain at the Brillouin resonance provided by all pulses in the coding sequence, expressed as: where ( ) is the single-pulse Brillouin gain at the Brillouin resonance, and ( ) denote the code bit number and the amplitude of the ℎ coded pulse, respectively, ( , , ) is the relative polarization angle between the ℎ coded pulse and the probe wave, which is randomly distributed over the interval  Supplementary Eq. (22)), attributed to thermal noise and shot noise.

II) Photo-detection noise (second term in
However, due to the low optical probe power reaching the receiver, the shot noise contribution can usually be ignored. Thus, the noise STD is only related to the thermal noise and can be expressed as: where ℎ 2 is the thermal noise variance, which can be readily characterized by measuring the photodetector output without input light 5 . Supplementary Eq. (22)). The probe signal reaching the receiver beats with the ASE noise introduced by the pre-amplifier, resulting in a noise standard deviation calculated as:

III) Signal-ASE beating noise (third term in
where is the noise figure of the EDFA, is the electron charge and is the noise equivalent bandwidth that can be optimized to be equal to the signal bandwidth through digital filtering 6 .

IV) Signal-SpBS beating noise (fourth term in Supplementary Eq. (22)
). This noise comes from the beating between the SpBS originated from the coded pulse sequence and the probe signal reaching the photodetector. To calculate the noise standard deviation, the electric fields of the probe signal ( ⃗ ) and pump-induced SpBS ( ⃗ ) after the EDFA in front of the detector are firstly described as:

⃗ ( ) = [̂cos +̂sin ]√ ( )exp[ ( + ∆ ) + ] (27)
where ̂ and ̂ stand for the local polarization direction of the probe and its respective orthogonal direction; ∆ is the relative optical frequency offset between the probe and SpBS; the angle is the relative local polarization rotation of the SpBS with respect to the local probe, which varies in the range of [0, 2 ]; and Φ is the random pump-probe phase difference. Thus the detected optical power is: ( ) ∝ ( ⃗ ( ) + ⃗ ( )) ( ⃗ * ( ) + ⃗ * ( )) = 2 ( ) + 2 ( ) + 2 ( ) ( )cos( ) cos[∆ + Φ ] (28) The first right-hand term represents the probe signal reaching the photodetector. The second term is the SpBS component amplified by EDFA, corresponding to a small deterministic signal. The third term however represents the signal-SpBS beating noise, whose normalized standard deviation can be calculated as: where represents the optical power of the SpBS light. , gradually increase and become non-negligible. Based on Supplementary Fig. 7(a), the ratio between the noise variances in coded ( 2 ) and single-pulse ( 2 ) BOTDA schemes as a function of , is calculated and shown in Supplementary Fig. 7(b). It can be found that 40 is the optimum (maximum acceptable) value of , which secures a maximum 10% difference between the two noise variances (i.e. noise remains almost unchanged between single-pulse and coded BOTDA schemes).
Supplementary Fig. 8. The standard deviations of theoretically predicted noise and measured noise as a function of distance for a single-pulse scheme and b GO-coded scheme.
In such an ideal case ( ≈ 40), the theoretically calculated standard deviations of different noises and the total noise as a function of the fibre position are all shown in Supplementary Fig. 8(b), in which the theoretical prediction (red curve) of the total noise standard deviations matches well with the experimental result (blue curve), verifying the proposed noise model. Results also demonstrate that all Brillouin-gain dependent noises become negligible close to the fibre far-end (i.e. total noise of coded-BOTDA is approximately equal to that of singlepulse BOTDA shown in Fig. 8(a)), thanks to the proper design of .
To demonstrate that this limitation of (imposed by the additional Brillouin-gain dependent noises) is not exclusive in the here proposed GO-code, a comparative experiment has been performed with a 255-bit Simplex coded BOTDA. Experimental parameters are the same as those in Results, where is also set to 40.
Supplementary Fig. 9(a) shows the decoded Brillouin trace at the resonance peak frequency (averaged 1024 times), compared to single-pulse Brillouin traces averaged by 1024 and 65536 times (representing 9 dB SNR difference between them), exhibiting similar behaviours as the those in Fig. 3(b) obtained by the proposed GO-code.
Supplementary Fig. 9(b) illustrates the SNR of each trace versus distance, showing similar level of deterioration as that of GO-code (see Fig. 3(c)). This demonstrates that the negative impact of Brillouin-gain-dependent noises on the SNR enhancement is inherent to coding techniques, regardless of the coding scheme being used.
Supplementary Fig. 9. The experimental result of 255-bit Simplex coded BOTDA. a The decoded and measured trace along the sensing fibre. b SNR profile of each trace.
It is worth to mention that the Brillouin-gain-dependent noises are more detrimental in Cyclic coding. This is because the coded sequences spread all along the fibre, so that the noise induced by the strong signal originating from the fibre near-end will impact on the SNR of the weak signal originating from the fibre far-end. This is experimentally demonstrated with ≈ 40, as shown in Supplementary Fig. 10(a). By comparing with the noise level in a single-pulse scheme ( Supplementary Fig. 8(a)), the noise floor of the Cyclic coding response is globally higher. This means that the theoretical coding gain at the fibre far-end cannot be realised using Cyclic codes (i.e. there is nearly no actual coding gain), as shown in Supplementary Fig. 10(b). Supplementary Fig. 10. a The standard deviations of theoretically predicted noise and measured noise as a function of distance for Cyclic coding. SNR profile of each trace for Cyclic coding and single pulse with: b optimised pulse peak power, c only 10 mW pulse peak power It is also worth to mention that the theoretical coding gain of Cyclic code realised in some literature (e.g. [58] cited in the main text) results from a non-optimised pulse peak power in both single-pulse and Cyclic coding schemes (10 times below the MI threshold). In this case, the signal-dependent noise is much lower, so that the actual coding gain is close the theoretical value; however, this is of limited relevance since the absolute SNR with coding (red curve in Supplementary Fig. 10(c)) is even lower than that of an optimised single-pulse scheme as represented by the blue curve on the previous graph in Supplementary Fig. 10(b).
Note also that, with optimised pulse power, both aperiodic and Cyclic codes cannot bring decisive SNR improvement in short distance sensing as well. Supplementary Fig. 11(a) and (b) shows the noise profile of aperiodic code and Cyclic code for a 25 km sensing range, respectively, while Supplementary Fig. 11(c) shows the SNR obtained over the entire sensing fibre for both codes as well as for the single pulse case. It can be clearly found that both types of codes cannot provide decisive SNR improvement along the 25 km range, due to the additional signal-dependent noise induced by the code sequences themselves (noise that dominates the measurement). However, it is very relevant to notice that the total noise in the case of Cyclic code remains constant all over the measured trace (as shown in Supplementary Fig. 11(b)), while the total noise reduces with distance for the aperiodic code, as shown in Supplementary Fig. 11(a). This behaviour is essentially due to the pulse distribution over the sensing fibre, explaining the higher SNR improvement provided by aperiodic codes at longer sensing ranges compared to Cyclic coding as demonstrated in Supplementary Figs. 9 and 10.
Supplementary Fig. 11. Noise profiles for a aperiodic code and b cyclic code. c SNR profile over distance for singlepulse, aperiodic code and cyclic code cases.
All results indicate that there is no benefit brought by Cyclic coding when comparing with a fully optimised single-pulse scheme for any sensing range. Note however that, for a non-optimised pulse power, cyclic coding can be effective as demonstrated by previous publications, though this non-optimised condition has not yet proved to bring any global advantage in real conditions.

Supplementary note 4. Noise analysis of coded ROTDR system
In coded ROTDR systems, the use of coding increases the spontaneous Raman backscattering power, which may lead to a non-negligible shot-noise level in photo-detector, compromising the expected coding gain. This note describes the limitations imposed by these two phenomena and presents a model that can quantitatively define the optimal energy enhancement factor .
Considering the limitations imposed by amplified spontaneous Raman scattering (ASpRS), it must be noted that each code pulse simultaneously generates forward and backward Raman scattering components. In the case of a single pulse, the forward Stokes and anti-Stokes components interact with the pump pulse over an effective optical fibre length determined by the walk-off distance and optical losses. In the case of coding, return-to-zero format is required to avoid interactions between code pulses and Raman components generated by adjacent pulses in the sequence, which may occur due to group velocity differences among spectral components. Under this condition, the threshold power for forward ASpRS turns out to be equivalent to the single pulse case, i.e. being around 1 W for metre-scale spatial resolutions.
Having fixed the peak pump power (e.g. to 1 W as in this case), the optimal energy enhancement factor is primarily defined by the photo-detection noise. Indeed, a good balance between thermal noise and shot noise must be achieved to obtained the real benefits from coding, i.e. to enhance the SNR by the expected coding gain. While the thermal noise of APD is constant at a given temperature, shot noise increases with the input optical power.
The variance of the shot noise 2 in an APD detecting the coded anti-Stokes SpRS signal, and ignoring dark current, is given by 7 : where is the elementary charge, is the mean APD gain factor, is the APD excess noise factor, is the APD responsivity, P single ( ) is the single-pulse power response of the anti-Stokes SpRS, and is the electrical bandwidth. Supplementary Eq. (30) is experimentally validated for the case of 2 m SR, as shown in Supplementary Fig. 12(a), where the variance of the equivalent APD noise in the single-pulse case ( = 1) is characterised as 1.6 * 10 −3 2 , corresponding to the contribution of thermal noise. Note that this behaviour is enhanced in the case of APD when compared to PIN photodetectors, due to the impact of the avalanche amplification process and induced excess noise (i.e. because M 2 ≫ 1).
Supplementary Fig. 12. a Measured and calculated noise variance as a function of the energy enhancement factor. b Measured and simulated noise of coded ROTDR trace as a function of distance.
Since the total anti-Stokes SpRS power reaching the detector reduces as a function of distance along the sensing fibre, the additional noise introduced over the ROTDR traces decreases with distance. To fully benefit from coding, the total noise at the fibre far-end should remain almost unchanged between single-pulse and coded ROTDR schemes (i.e., noise near the fibre-end in both schemes should be dominated by thermal noise), requiring to optimise the energy enhancement factor . Based on Supplementary Eq. (30) and considering the fibre attenuation, = 44 is calculated for the case of 2 m SR and the 39 km long sensing fibre used in Results, as shown in Supplementary Fig. 12(b). It can be determined that the noise near the far fibre-end (between 35 and 39 km) is highly dominated by the thermal noise, since the noise variance over this section is visually identical to that outside the fibre.

Supplementary note 5. Experimental results with 1 m spatial resolution
Similar to the demonstration of GO-coded BOTDA with 2 m spatial resolution (SR), as shown by Fig. 3 in Results, experiments are here carried out at 1 m SR. This enables the use of a longer code sequence, thus expecting a larger SNR improvement. Due to the broadened Brillouin spectral response when using 1 m SR, a wider frequency scan (300 scans) with a frequency step of 1 MHz is performed. Temporal Brillouin traces are averaged 1024 times, leading to a total measurement time of ~5.7 min. The optimised single-pulse Brillouin gain peak is 0.5%, determining an optimal energy enhancement factor = 200. The decaying amplitude imposed by the EDFA gain saturation leads to an optimal 723-bit GO-code sequence that offers a theoretically evaluated coding gain of 9.3 dB, being only 0.7 dB lower than the standard reference coding gain (10 dB, √(200/2) = 10 times in linear scale). Similar to the demonstration performed at 2 m SR, relevant results are shown in Supplementary Fig.   13(a)-(c), respectively, where the reference curves (black) are obtained using the single-pulse scheme with 74182 averages, representing a reference SNR improvement equal to the expected coding gain (10 √74182/1024 = 9.3 ). As anticipated, the additional Brillouin-induced noises still compromise the performance of GO-code measurements at the fibre near-end, while showing a negligible effect at the fibre far-end.