Abstract
We combine the nonlinear Fourier transform (NFT) signal processing with machine learning methods for solving the direct spectral problem associated with the nonlinear Schrödinger equation. The latter is one of the core nonlinear science models emerging in a range of applications. Our focus is on the unexplored problem of computing the continuous nonlinear Fourier spectrum associated with decaying profiles, using a speciallystructured deep neural network which we coined NFTNet. The Bayesian optimisation is utilised to find the optimal neural network architecture. The benefits of using the NFTNet as compared to the conventional numerical NFT methods becomes evident when we deal with noisecorrupted signals, where the neural networksbased processing results in effective noise suppression. This advantage becomes more pronounced when the noise level is sufficiently high, and we train the neural network on the noisecorrupted field profiles. The maximum restoration quality corresponds to the case where the signaltonoise ratio of the training data coincides with that of the validation signals. Finally, we also demonstrate that the NFT bcoefficient important for optical communication applications can be recovered with high accuracy and denoised by the neural network with the same architecture.
Similar content being viewed by others
Introduction
Quite often, the evolution of nonlinear systems is well approximated by the nonlinear partial differential equations (PDE). Evidently, there is no universal theory for the solution of nonlinear PDEs, but there exists a distinguished class of nonlinear equations that can be solved with a mathematical rigour: the socalled integrable systems. The history of integrable PDEs started in the 1960s when Gardner et al.^{1} discovered a method for finding the infinite families of exact solutions for the Kortewegde Vries equation. Their method termed the inverse scattering transform, can be deemed as the generalisation of the conventional Fourier transform (FT) to the nonlinear systems. Thus, the name nonlinear Fourier transform (NFT) for it is often used nowadays, especially in the signal processing literature^{2,3}. Shortly after the integration of the Kortewegde Vries equation, Zakharov and Shabat developed the inverse scattering machinery (i.e. the NFT method) for yet another celebrated PDE: the nonlinear Schrödinger equation (NLSE)^{4}, which will be the focus of our current study.
In a nutshell, for an integrable PDE there exists the canonical transform of dependent variables, converting the original nonlinear system into the socalled actionangle variables; the evolution of the latter is governed by a set of uncoupled trivial (linear) differential equations. Mathematically, this can be treated as the effective linearisation of a nonlinear integrable PDE^{5,6}. For our work, it is important that we know the explicit form of the NFT operations attributed to the NLSE.
The NLSE, being a generic model describing the interplay between the dispersive and nonlinear effects, is applicable to the description of a vast number of physical phenomena, ranging from the dynamics of magnetoordered systems^{7} to hydrodynamics^{8}. It also serves, under certain assumptions, as a principal master model governing the evolution of a singlepolarisation slowvarying light envelope propagating along the singlemode fibre^{9,10}. In the dimensionless form we write down the NLSE as:
In the fibreoptic context, q(z, t) is the electromagnetic field evolving down the fibre, z is the distance along with the fibre, while t is the retarded time variable. Eq. (1) is explicitly written as the focusing NLSE, corresponding to the anomalous dispersion of the standard optical fibre. We note that our further results are general and can be used for various physical applications, where NLSE (1) provides a good approximation. Nonetheless, without loss of generality, we will refer in the paper to the field q as to “a signal”.
Withing modern optical communications, the NFT is used not as a tool for the NLSE solution, but as a signal processing method^{2,3}. This concept originated from the work of Hasegawa and Nyu^{11}, who proposed to depart from considering the time domain solitonic shapes^{10}, but rather use the nonlinear spectrum (the socalled eigenvalues) for the data modulation and transmission. Over the last decade, the NFTbased optical transmission techniques have been resurrected and greatly extended^{3,12}. The most efficient NFTbased optical transmission method is the socalled nonlinear frequency division multiplexing (NFDM)^{2}, within which we directly modulate the parameters of the nonlinear modes that emerge from the nonlinear Fourier (NF) signal decomposition. When the optical field propagates down the fibre link, the evolution of the nonlinear modes inside the NF domain stays almost linear, in contrast to the truly nonlinear evolution of signal in the spacetime domain. Due to this property, we can theoretically get rid of the infamous nonlinear crosstalk degrading the transmission performance at high signal powers^{13}.
Generally, when considering the NF decomposition of an arbitrary rapidly decaying waveform, we can have two distinct coexisting parts of the NF spectrum: the continuous part, describing quasilinear dispersive waves, and the discrete part, corresponding to solitonic modes^{2,3,5,6}. The continuous part of NF spectrum is represented by the complexvalued function \(r(\xi ) \in {\mathbb {C}}\) of a real argument \(\xi \in {\mathbb {R}}\), where \(\xi\) is called the spectral parameter; \(r(\xi )\) is called the reflection coefficient, and \(\xi\) emerges as the nonlinear analogue of a conventional Fourier frequency. This NF spectrum part converges to the conventional FT of our signal in the lowpower limit^{14}, see also the explicit expressions in Methods. The discrete part consists of the complex eigenvalues \(\xi _n \in {\mathbb {C}}^{+}\), located in the upper complex halfplane, and the associated norming constants \(r_n\) (spectral amplitudes)^{15}. The graphical summary of the general NF spectrum structure is given in Fig. 1. However, we point out that it is exactly the utilisation of the continuous NF spectrum part^{16,17,18,19,20,21,22,23} that resulted in the breakthrough in the NFDM technology: this idea, mentioned already in early NFT transmissionrelated works^{2,14}, is in stark contrast with the progenitor solitonbased transmission methods^{10}. In our current study we specifically address the continuous NF spectrum: our goal is to compute the profile \(r(\xi )\) given the localised q(t) shape. Then, we mention that the continuous NF spectrum modulation using the special technique coined bmodulation^{24,25,26,27} has provided the highest NFDM data rates so far^{12,28}. Thus, in this paper we also address the recovery of the bcoefficient, \(b(\xi ) \in {\mathbb {C}}\), \(\xi \in {\mathbb {R}}\), given q(t). When the solitons are absent, as it is in the case considered, the full NF spectrum corresponding to a given finiteextent signal can be equivalently represented by either the reflection coefficient or by the bcoefficient, see more in Methods. Finally, we note that for the NFDM based on the discrete NF spectrum^{29,30,31}, the achieved data rates have been noticeably lower than those for the modulation of continuous NF spectrum, see the comparison in^{12}, Fig. 1], and we do not address the computation of solitonic parameters in our research.
The NFDM transmission method relies on the (approximate) integrability of our transmission channel, i.e. we inherently assume that Eq. (1) is a very accurate model describing the signal evolution down the fibre. However, aside from secondorder dispersion and Kerr nonlinearity present in (1), in realistic fibreoptic systems, there are numerous other effects affecting the signal propagation. Optical noise inevitably arising during the amplification process^{9} is one of the key challenges in optical communications. The noise results in random NF spectrum disturbances^{32,33}, imposing limits on the NFDM transmission quality. Thus, in our current work, we analyse the capability of a neural network (NN) to denoise the NF spectra. Similarly to Ref.^{34}, in optical transmission applications, the NFTNet that we consider in this work, is supposed to be integrated into the receiver architecture: it takes in the corrupted signal and yields the “purified” nonlinear spectrum containing the modulated data. Another widespread deviation from idealised model (1) is the nonzero nonuniform gainloss profile occurring in realistic systems for both lumped^{18,22} and distributed^{19} amplification schemes. We also mention the effects of polarisation mode dispersion^{35,36}, higherorder chromatic dispersion^{35}, and componentinduced impairments, to itemise just several important sources. All these effects bring about the deviations of the true optical channel from integrable NLSE (1) such that the NF spectrum of the signal at the end of our transmission system can be significantly distorted, which results in the appearance of errors in the transmitted data^{20,21,35}. Given that, the machine learning and artificial neural networks (NN) based signal processing methods have recently attracted much attention, as they can effectively render adaptive distortionsresilient signal processing tools, and, thus, using the NNs we can mitigate the impact of detrimental factors mentioned above^{37,38}.
The first direction in utilising the NNs for NFDM systems consists in applying the additional NNbased processing unit at the receiver to compensate the emerging line impairments and deviations from the ideal model^{39,40,41,42,43}. But, despite ensuing transmission quality improvement, this type of NN usage brings about the additional complexity of the receiver. In the other approach, the NFT operation at the receiver is entirely replaced by the NN element. It has been shown that this approach, indeed, results in a considerable improvement of the NFTbased transmission system functioning^{31,34,44}. But, despite the benefits rendered by such a NN utilisation, the NNs emulating the NFT operation have so far been mostly used in the NFDM systems operating with solitons only, and the NN structure used there was relatively simple. In the only work related to the continuous NF spectrum recovery^{45}, a standard “imageInputLayer” NN (developed originally for handwritten digits recognition) from MATLAB 2019a deep learning toolbox was adapted to process the signals of a special form. Such an approach, evidently, has limited applicability and flexibility and is not optimal neither in terms of the result’s quality nor in the complexity of signal processing. In our current work, we demonstrate how this direction can be significantly extended and optimised, presenting and analysing the NNbased NFT modelling for the continuous NF spectrum, and using the special optimisation tools for finding the best NN architecture. We believe that our current research can lay the basis for the development of highefficiency channelagnostic NFDM transmission systems. Moreover, in our study, we address the question of recovering not only the NF spectrum \(r(\xi )\), but also the bcoefficient, so it can be combined with the most efficient NFDM transmission method: the bmodulation.
Finally, we note that, recently, the interest in using the NFT as a signalprocessing tool has risen in fields that are not directly relevant to optical transmission. In particular, the NFT was applied in the socalled integrable turbulence to monitor the appearance of coherent structures, such as breathers, solitons, and rogue waves^{46,47}, to the optical microresonators regime analysis^{48}, to the optical frequency combs characterisation^{49}, and to the analysis of laser regimes and the emergence of dissipative coherent nonlinear structures^{50,51,52}. The analysis of NFT modes’ evolution for such systems often appears to be more informative and convenient than dealing with the conventional Fourier modes. The NFT is also an important tool for the design of fibre Bragg gratings^{53,54}. Thus, we believe that the technique presented in this work can have a much wider range of applications than simply being a processing tool in optical communications. To end up, solving nonlinear differential equations itself by using NNs is a fastgrowing area with a range of applications in science and engineering^{55,56,57}. We hope that our work will also advance knowledge in this emerging field.
Results
In this section, we describe the main results obtained in the process of finding a suitable NN architecture for computing the continuous NF spectrum of a given signal. First, we describe which type of signals we used in training and testing. Next, we discuss the Bayesian optimisation application for our finding the bestperforming NN architecture and the respective training procedure. Then, we analyse the output accuracy for the proposed NN architecture and compare it with that produced by a deterministic NFT numerical algorithm. In this paper, for the data generation and “conventional” computation we use the Fast NFT (FNFT) library^{58}. At the end of the Results section, we show that the proposed NN architecture can predict not only the scattering coefficient \(r(\xi )\), but also the NF coefficient \(b(\xi )\), Eq. (8).
Training data generation
In this work, without loss of generality, we analyse the NF decomposition of the signals having the form of wavelength division multiplexing (WDM) format with random modulation and returntozero carrier functions, considered in^{59,60}. In the time domain, one (normalised) WDM symbol to decompose is given as the sum of independent subcarriers:
where M is a number of WDM channels, \(\omega _k\) is a carrier frequency of the kth channel, \(C_k\) corresponds to the digital data in kth channel, and T defines the symbol interval; f(t) is the carrier support waveform of our returntozero pulses. Q in (2) is the normalisation factor that we use to set the required energy for each signal (the total signal energy is calculated according to Eq. (3)). Each \(C_k\) in (2) is a complex number drawn from the constellation with a particular cardinality, i.e. it is chosen with an equal probability from the finite set of allowed constellation points. For our NF decomposition analysis each time we use a single signal of the form given in Eq. (2). To train the NN, we precomputed 94035 such signals, with \(C_k\) for each carrier randomly drawn from quadrature phaseshift keying (QPSK) constellations, i.e. the constellations with 4 possible points; the number of optical channels (carriers) in (2) is 15. Then we sampled our signal at equidistant points in time, \(t_m\), over the segment of length T, \(q(t_m)=q_m\): the number of sample points in each signal representation was \(2^{10}= 1024\). The normalised symbol interval T was set to unity so that the time step size used was \(\Delta t = 2^{10}\) (for the explicit normalisations referring to singlemode fibre transmission see, e.g., Ref.^{3}). For generated discretised profile, the reflection coefficient \(r(\xi )\) was identified for 1024 sample points in \(\xi\) variable, calculated using the fast numerical NFT method^{58}. The parameter \(\xi\) for our computations ranged from \(\pi / (4 \Delta t) \approx 804\) to \(\pi / (4 \Delta t) \approx 804\): this region corresponds to the conventional Fourier spectrum computational bandwidth for the given sampling rate \(\Delta t\), up to the scaling factor 2 referring to the linear limit correspondence^{14}. Each signal in the dataset was eventually normalised so that its energy \(E_{{{\text {signal}}}} = 39.0\). Some of the signals in the initial dataset for this energy contained solitons, but such signals were singled out and removed from the training and validation datasets. The remaining 94,035 signals did not contain solitons, which means that the discrete nonlinear spectrum for each signal is absent, such that these are used in our analysis. We note that although there are no solitons in the signals, we are still operating in the regime where the signal nonlinearity is not negligible, see Methods. The more straightforward way of generating the datasets with desired properties would be to use the inverse NFT routines, but these are much more timeconsuming, such that we decided to employ the datageneration approach described below: it also allows us to explicitly control the accuracy of the generation process.
Together with the set of deterministic signals, we generated the signal sets with the addition of uncorrelated Gaussian noise, adding the random value to each sample point. In realistic applications, the source of this noise can be the instrumental imperfections of the transceiver or the effects relevant to inline amplifiecation^{9}. The signaltonoise ratio (SNR) is a traditionally used characteristic for quantifying the level of a noisy corruption:
where \(E_{{{\text {signal}}}}\) and \(E_{{{\text {noise}}}}\) are the signal and noise energies, respectively; \(q_m\) is the mth signal sample, with N being total number of sampling points, \(\Delta t\) is the time sample size. For further training, in addition to the set without noise, which had 84632 signals, we used 8 sets of 423160 signals (5 different noise realisations). Each set corresponds to one of the following SNR values: \(\{0, \, 5, \,10, \, 13, \, 17, \, 20, \, 25, \, 30\}\) dB. 9 sets of 9403 signals with the corresponding noise levels were left to validate the network performance. Validation data sets were not used in the training process. We note that the NFT in optical communications is tailored for use in longhaul systems, meaning the high levels of noise (low SNR) is most interesting from the application perspectives. However, we also include the results for high SNR levels to analyse the NN functioning peculiarities in detail.
Neural network design and Bayesian optimisation
As mentioned above, the general NF spectrum attributed to a given localised waveform consists of two parts: the discrete spectrum that we do not consider in our current study (our trial pulses do not contain any solitonic component, neither in pure form nor in the noisy case), and the continuous part which is our subject in hand here. The continuous part is retrieved through considering the special Jost solutions (7) to the ZakharovShabat problem (6), see Methods. The goal of our work is to demonstrate the fundamental possibility of replacing the direct calculation of NF spectrum through the numerical solution of the ZakharovShabat problem (6) with the computations employing speciallydesigned and trained NNs.
The latter task can be addressed using the encoderdecoder approach, where the encoder transforms the input signal into some intermediate vector representation and, later, the decoder converts this representation into the output signal. We notice that the input and output signals can belong to two different data domains. There are several advantages of this approach, e.g. it is quite flexible, so the encoder and decoder structures can differ to match exactly the “nature” of each signal’s domain. With this, we train such NNs in the endtoend style, so the weights of the encoder and decoder will be trained simultaneously and fit each other. A lot of highly efficient encoderdecoder architectures have been designed up to date, e.g. those can demonstrate an efficiency higher than that of a human brain for some specific tasks^{61}. For processing quite long sequences (typically more than 1000 data points), the convolutional NNs (CNN) are often more beneficial than the recurrent NNs (RNN). Also, the CNN allows us to parallelise the computations in an efficient way, which is important in our case. Thus, we argue that the encoderdecoder architectures based on CNNs are most suitable for our data and task, though other NN types may also deserve investigation in latter studies.
As a starting point, we took the WaveNet^{62}based network, which extends the concept of deep CNNs. Models of this type have several advantages, among which we underline the reduction of time required for training the network on long data sequences. However, a significant drawback of this architecture is the requirement to embed a large number of convolutional layers to increase the receptive field. In our work, to increase the effective size of that region, we used convolutions with dilation. This made it possible to exponentially increase the receptive field with the NN depth growth and, therefore, to capture a larger number of data points in the input signal.
The momentous issue in using NNs to perform any nonlinear transformation is the choice of the optimal network architecture. One of the optimisation methods is to enumerate the possible combinations of NN parameters. But even in the case of a relatively small number of layers, the number of hyperparameters can reach several thousand, which makes the optimisation process very timeconsuming, if realisable at all. Thus, the search for an optimisation algorithm for such computationally expensive problems can be extremely difficult. However, the Bayesian optimisation method^{63} is deemed to be one of the most efficient optimisation strategies, and so we employ it in this work to find the optimal hyperparameters distribution for the NFTNet.
The Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set^{63,64}. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, the Bayesian optimization aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. Thus, it tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (the hyperparameters expected to bring us close to the optimum). An important aspect to note is that the Bayesian optimisation often does not return one specific point in the parameter hyperspace for which the optimised function is minimal. The process converges into some subspace of parameters, where several points can locally minimize the function^{63}. A detailed description of hyperparameters tuning can be found in the article^{65} where Bayesian optimisation is used to adapt parameters for the synthesis of a digital predistortion filter for optical transmitters.
We manipulate the following hyperparameters for the convolutional part of the neural network: the number of convolutional layers, the number of filters, the kernel size, stride, dilation, and the activation function for each layer. We used the activation functions “ReLU”, “tanh” and “sigmoid” in the hyperparameters optimisation. After the convolutional part, there are 2 fully connected layers, the second of which has a fixed size (1024, which corresponds to the size of the output vector). The size and activation function of the first fully connected layer was also a hyperparameter for optimization. For the optimisation, we used a dataset without additional noise and employed only the real part of the continuous spectrum for the prediction. After that, the “optimal” architecture (but not weights) is fixed, and is no longer changed to predict the imaginary part of the continuous spectrum or for our operating with the datasets with additional noise. The loss function was optimised for each architecture. We used the mean squared error (MSE) as the loss function, aiming to minimise the MSE between the network output and the target output computed with the conventional NFT method^{58}. In training, we employed the Adam (Adaptive Moment Estimation) optimisation algorithm with the learning rate of 1e–4^{66}. The learning process of each point in the parameter hyperspace was stopped if the value of the loss function did not decrease over 5000 epochs. We chose this large epoch stoppingcriterium number to neutralise the factor of randomness in the learning process, which appears due to the random choice of the initial weights. Additionally, we checked the value of the loss function on the validation set to prevent the overfitting, but for the amount of training data used, the overfitting was not observed. Figure 2 presents the dependence of the MSE value and dependence of its minimum on the Bayesian iteration number. For architectures with more than 20 million training parameters, we set the value of the loss function to 1.0: this explains the upper cutoff limit in the figure. It is apparent from Fig. 2 that the optimisation has identified a subspace where many architectures have approximately the same value of the loss function at the level of \(10^{5}\). However, there was a point where the value was at the level of \(10^{7}\). Thus, we took this point (a set of hyperparameters) as the optimal one. After finding the optimal architecture, each NN’s weights were trained for different SNR but keeping the same optimal architecture parameters. On average, with the amount of data used, our learning process took 50,000 epochs to reach the minimum for each noise level.
The original signal and NF data for the continuous spectrum are complexvalued functions. Therefore, two networks with the same architecture are to be used for the whole transformation; each identical part is responsible for the computation of either the real or imaginary parts of the resulting arrays, which contain the values of continuous NF spectrum defined in Eq. (10). Figure 3 depicts the schematic for the entire optimised NFTNet architecture. The convolutional part consists of three layers with 10, 15 and 10 filters. Kernel sizes of the first and third convolutional layers are 10, and for the layer between them, it is 18. As noted above, we took the dilation value for each layer as one of the sought hyperparameters. For NFTNet, the optimisation gave that the first two layers have dilation 2, stride 1 and “tanh” activation function, and for the third layer, the dilation is 1 with stride 3 and “ReLu” activation. After the CNN part we put the flattening layer, not shown in the figure (but affecting the processing complexity), and two fullyconnected layers with 4096 and 1024 neurons. The exemplary picture of how the designed NN works on one signal is given in Fig. 4c. In this figure, we show the results of the NNbased NF spectrum computation for the noiseless case. Already from this figure, we can notice that the result produced by our NN and that obtained from conventional NFT routine^{58} are very similar.
Studying the NFTNet performance for computing NF spectra of noisy signals
In this section we analyse the NFTNet performance and the denoising property of the NN. We compare the deviations in the obtained nonlinear spectrum calculated with the NFTNet and calculated with the conventional NFT applied to the same signal without noise. To quantify the performance rendered by the NFTNet application with the performance of conventional algorithms applied to noisy signals, we use the following metric:
where S is the total number of signals in the validation set, \(\langle \cdot \rangle _{\xi }\) denotes the mean over the spectral interval, \(\{r_{{{\text {predicted}}}}(\xi )\}_{i}\) and \(\{r_{{{\text {actual}}}}(\xi )\}_{i}\) correspond to the value of reflection coefficient \(r(\xi )\) computed for the signal number i at point \(\xi\) (we compare the quantities for the validation data set). The label “predicted” refers to the result produced by the NFTNet on the noisy signal, and “actual” marks the \(r(\xi )\) value obtained using the conventional NFT algorithm^{58} for the noiseless signal. The relative error \(\eta (\xi )\) is determined at the point \(\xi\), so we use \(\langle \eta (\xi ) \rangle _{\xi }\) to estimate the overall mean of the error for one signal, and use Eq. (4) to evaluate the error for the entire validation dataset. We stress that the metric was chosen in such a way as to take into account even the regions where the value of the spectrum is much less than one.
The results of our comparison for \(r(\xi )\) computation using different SNR levels for NFTNet are presented in Table 1, and are arranged as follows. The first column of the table identifies the SNR value in dB for the validation signals, i.e. the level of noise for the signals which we analyse. The first row of the table displays the SNR values of noisy signals from the training set, i.e. it shows the noise level of the signals on which the NFTNet was trained. We notice that the case \({{\text {SNR}}}=30\) dB corresponds to almost negligible noise, while for \({{\text {SNR}}}=0\) dB our noise energy is equal to that of our signal, which signifies a very intensive noisy corruption. Thus, each column in the table corresponds to the results produced by the NN trained on the signals with the chosen level of a noisy corruption. The number in each cell shows the averaged metric value, Eq. (4), where for the computation of \(\{r_{{{\text {predicted}}}}(\xi )\}_{i}\) we used the NFTNet trained on the signals with SNR values shown in the first row and applied to the validation signals having the SNR values given in the first column. The “Conv. NFT” column shows the error value for the numerical result of the fast NFT method on the signals with added noise, where the respective SNR is presented, again, in the first column. The value of metric (4) corresponding to the conventional NFT method applied to noiseless signals is, obviously, zero: the results provided by the conventional NFT without noise are taken as the true ones. When the NFTNet produces a less accurate result compared to the conventional NFT applied to the noisy signal, the cell is marked with bold; when the NFTNet outperforms the conventional NFT method, i.e. it successfully purifies our signal from noise, the respective cell is not highlighted (white). Whence, the white area size in each table demonstrates how well the NFTNet retrieves the nonlinear spectrum for noisecorrupted signals.
Table 1 shows the error values for the restoration of \(r(\xi )\) coefficient (10) of a noiseless and noisyperturbed signals (2), by the NFTNet architecture given in Fig. 3. The first row in the table corresponds to the noiseless case. It is always marked with bold, which means that the NN cannot provide any better results than the benchmark ones rendered by the conventional fast NFT method used to generate the training data.
However, the values of the error for noisecorrupted signals reveal interesting tendencies. It follows from the table that for the low training noise level (up to 10 dB, columns three through nine), the NFTNet error is typically lowest for the noiseless validation dataset (second row). Thus, the addition of low noise in the training dataset only degrades the NFTNet restoration capability, even though this decrease is not significant. This NFTNet feature can be deemed as the NN’s being “confused” by the weak noise in its training in the nonlinear transformation identification. For the most interesting case of high noise, the network works best for samples where the SNR value is the same for the validation and training sets. In such cases, the relative error is about 8–12%, while the error for conventional NFT is at the level of 100–200%. Another fact is that with decreasing noise (rows from bottom to top) in the validation set, the error value remains at approximately the same level after the cell corresponding to the same training and validation noise values. These results confirm that the presented NN architecture is capable of performing the desired nonlinear transformation, the NFT, and, in addition, it can also work as an effective denoising element when the noise level becomes nonnegligible.
The examples of original and noisecorrupted signals and the corresponding nonlinear continuous spectra are given in Fig. 4, where we used the NFTNet for the computations. Figures 4b and d show that when the additional noise distorts the signal, the conventional numerical algorithms naturally produce the noisedistorted nonlinear spectra. Fig. 4e and f show the relative error value \(\eta (\xi )\) (4) for the continuous spectrum prediction with NFTNet for the signal without noise (left) and the signal with noise (right), and the reflection coefficient computed for the original signal by the conventional NFT (marked as “Conv. NFT” in the panes’ legends). In Fig. 4c, e, the NFTNet is trained on the dataset without adding noise, and in Fig. 4d, f, the NFTNet is trained on the dataset with additional noise for SNR = 5 dB. Figure 4c and d show that in the presence of noise, the fast NFT results begin to deviate noticeably from the original (noiseless) values, while the NFTNet tends to denoise the resulting nonlinear spectrum.
NFTNet performance for the restoration of NF coefficient \(b(\xi )\) attributed to noisy signals
In addition to the coefficient r, from the optical communications perspective it is instructive and important to check how the proposed architecture would work to predict the NF coefficient b, Eq. (8). We note that the optical transmission method coined bmodulation^{24,26,49}, where we operate with the modulation of the bcoefficient, has proven to be the most efficacious technique among different NFDM methods proposed^{12,28}. Moreover, for the practical case when our signal has a finite extent, the continuous part of the NF spectrum can be completely described by the bcoefficient only, because the second NF coefficient \(a(\xi )\) can be calculated from \(b(\xi )\) profile, see Eq. (11) in Methods. Our goal here is to demonstrate that the same NFTNet structures can be used for the both \(r(\xi )\) and \(b(\xi )\) computation, when the NN is trained on the respective dataset. As the loss function, we now use the MSE build on the bcoefficient samples, and the MSE is also used as our quality metric in the respective tables:
The notations are the same as we used in (4): the labels “predicted” and “actual” correspond, respectively, to the result of the NFTNet applied to noisy signals and the result produced by the conventional NFT routine applied to noiseless signals.
We carried out the analysis of the NFTNet performance for the restoration of bcoefficient using the same approach as we did in the previous subsection for \(r(\xi )\). Our results for noise pulses with the different level of noise are summarised in Table 2. We checked that the NFTNet configurations when applied to the computation and denoising of \(b(\xi )\) revealed the same tendencies for the quality of restoration as we observed in the previous subsection devoted to the reflection coefficient \(r(\xi )\).
A similar situation as was observed for coefficient \(r(\xi )\), remains in this case. The error is minimal for a noiseless validation set. However, this trend now continues for high noise levels. A similar tendency is observed all over the results: the values above the diagonal vary slightly. The additional observations when dealing the bcoefficient are as follows. An interesting difference from the case relevant to \(r(\xi )\), is that the metric value (5) in the case of predicting \(b(\xi )\) is less, and the bold region in Table 1 is larger compared to what we see in Table 2 for the bcoefficient. From the results, it is clear that the prediction accuracy is higher for the bcoefficient. It means that our NN generally works more accurately for the restoration of coefficient \(b(\xi )\) than for \(r(\xi )\). This result can be expected, as the noiseperturbed \(r(\xi )\) contains the noisy contributions from both \(a(\xi )\) and \(b(\xi )\), while the bcoefficient involves only its noisy contribution, and thus gets corrupted less. So in the latter case, the NN has to clean off less noise.
Figure 5 summarizes the above and shows the calculation errors (4) and (5) for NFTNet architecture. The plot actually visualises the values and tendencies from Tables 1 and 2. For both \(r(\xi )\) and \(b(\xi )\) coefficients, the NN outperforms the fast NFT results when the NFTNet gets trained on the data with additional noise.
Discussion
Our goal in this work was to demonstrate that the NN can be successfully used for performing the NFT operation, in particular, for computing the profile of continuous NF spectrum. Note that our interest was not only the computation of the continuous NF spectrum, i.e. the nonlinear transformation, but the possibility to denoise signal using NNs. We started with the WafeNettype architecture^{62}, which is effectively a deep CNN, and applied Bayesian optimization^{67} to find the optimal set of hyperparameters. Initially, we set the task of optimizing the entire architecture, so the hyperparameters were not only the parameters of the layers but also their number.
Once again, we emphasize that Bayesian optimization does not always give the “best” set of parameters. It provides a subspace of hyperparameters in which neural networks with such parameters are best trained on the available dataset. Due to the fact that neural networks are universal approximators, any sufficiently complex architecture can be trained for a specific task. We can expect that the optimization process can converge endlessly towards increasing the complexity of the network. However, this is not suitable for our task, where we want to minimize the complexity of the network while improving the accuracy of the work. Therefore, we simultaneously limited the number of trainable parameters in the NN during optimization. In our case, during the optimization process, we found an architecture that gives us the best metric value (4) and we chose it as the desired architecture. Further, the optimization process could converge to another subspace of hyperparameters, but we stick to the point with the minimum value of the loss function.
We found that this NN, indeed, can perform the NFT operation and denoise the received NF spectrum: the denoising effect is pronounced at medium to high noise levels. To achieve this effect, several realisations of the noise are needed for the neural network to “understand” the influence of noise on the signal. As expected, denoising is typically best when the training and testing data noise levels coincide, though we observed some deviations from this rule for lower noise levels, where the quality of restoration of the NF spectrum also makes a noticeable contribution in the overall error value. When being trained on different noise levels, the NFTNet was still able to produce denoising, thus demonstrating the design’s flexibility. We have shown that conventional NFT calculation methods give “distorted” results when working added noise. In fact, the “distorted” results are actually correct, but from the nonlinear transformation point of view. But from the application’s perspective, we are almost always interested in the denoised signals to reduce the embedded data corruption level. At this place we notice that the exemplary signals that we used for the NFTNet training/testing, Eq. (2), are, evidently, different from those used in r of bmodulated NFDM systems. Moreover, the latter are subject to dispersive effects as the NN has to process them at the receiver side after their having passed some distance. To adapt the NFTNet for the different signals, two possible strategies can be used. The first one is straightforward, where we retrain the NN from scratch using a different dataset. The second strategy can make use of the pretrained NFTNet model and utilise domain randomisation and adaptation^{68,69}. We believe that after the retraining procedure, the NFTNet (or some of its modifications, if we find that the capacity of the proposed NN architecture is insufficient to account for some complicated realworld effects) should be capable to account for the spurious soliton emergence and involved noise properties taking place in the realistic optical transmission systems.
Finally, we note that the problem of recovering a few solitons from a given pulse utilizing NN has been studied in^{31,34,44,70}. However, the NN architectures used in those studies are much simpler as one has to identify and filter only a few solitonic parameters , while in our work we recovered 1024 complex numbers representing the continuous NF spectrum. A larger number of solitary modes was considered in^{71}, where, however, only the total number of solitons in the pulse was studied. Potentially, it is interesting to combine the NN developed in our work with the additional module that can deal with soliton parameters restoration: such a hybrid tool would be able to perform the complete NFT decomposition of an arbitrary decaying pulse.
To sum up, we investigated the modelling of the NFT operation associated with the focusing NLSE, using the NN with a special structure, which we coined the NFTNet. We considered here an almost unexplored case dealing with the computation of the continuous part of the NF spectrum. It was demonstrated that the WaveNettype NFTNet structure can satisfactorily perform the task of the NF spectrum computation, and the bestperforming architecture was obtained by Bayesian hyperparameters optimisation. Moreover, we showed that the same NFTNet structure can be used to efficaciously retrieve both the reflection coefficient \(r(\xi )\) and the scattering coefficient \(b(\xi )\). The most practically important feature of the developed NNbased method is its capability to perform signal denoising. We demonstrated that the NNbased processing can bring about essential improvements in the quality of NF spectrum restoration attributed to noiseperturbed timedomain profiles, compared to the conventional highaccuracy NFT processing method. The advantage in denoising becomes most pronounced at high noise levels, with the maximum restoration quality typically occurring when the SNR of the training data is the same as that of the validation dataset.
Methods
Forward NFT operation for focusing NLSE
The NF spectrum associated to a given pulse q(t) (we drop the dependence of our quantities on z for simplicity) having a finite \(L_1\) norm, is calculated using the solutions of the socalled ZakharovShabat spectral problem^{2,3,4,6}. The latter is represented by the set of coupled ordinary differential equations written for two auxiliary functions \(v_{1,2}\). Our signal to decompose, q(t), enters into this set as an effective potential. We write down the ZakharovShabat problem (the focusing NLSE case) as^{4}:
In Eq. (6), \(\xi\) is the (generally complexvalued) spectral parameter which plays the role of conventional Fourier frequency for integrable nonlinear PDEs. The overbar in Eq. (6) and below denotes the complex conjugates of corresponding quantities. To determine the NF spectrum associated with our profile q(t), we need to find the special solution \(\Phi (t,\xi )\) of Eq. (6), called Jost function, imposing the special asymptotic condition at the trailing end of the pulse:
The NF pulse decomposition consists in finding the continuous and discrete components of the NF spectrum associated with the localised signal q(t). The core part of NFT is the calculation of scattering coefficients, \(a(\xi ) \in {\mathbb {C}}\) and \(b(\xi ) \in {\mathbb {C}}\), defined through the Jost solution \(\Phi (t,\xi )\) as follows
where \(\xi \in {\mathbb {R}}\). The scattering coefficients for the focusing NLSE satisfy:
The continuous part of NF spectrum is generally defined by the ratio of quantities b and a from (7):
where \(r(\xi )\) is often refereed to as the reflection coefficient. \(r(\xi )\) plays the role of the ordinary Fourier spectrum for nonlinear integrable PDEs and converges to the FT of our signal in the lowpower (linear) limit; see more direct expressions below.
The discrete part of NF spectrum (the solitonic degrees of freedom) consists of the set of complexvalued pairs: \(\{ \xi _n, c_n\}\), where n numerates the soliton mode, and each \(\xi _n\) is the (nondegenerate) solution of the equation \(a(\xi )=0\), laying the the upper complex semiplane of \(\xi\). The second quantity, the socalled norming constants \(c_n\), are given (for a sufficiently localised signal^{72}) by: \(c_n = c(\xi _n) = b(\xi _n)/a'(\xi _n)\), with prime meaning the derivative with respect to \(\xi\). The value of \(\xi _n\) determines the amplitude and frequency of each solitonic component, while \(c_n\) defines the values of phase and the “centreofmass” position of a solitary mode. However, the discrete part of NF spectrum is not addressed in our study; see Refs.^{31,34,44} where the solitonic parameters are computed using the NNs.
More exact mathematical details regarding the NF spectrum definition and properties can be found in, e.g., monograph^{6}, see also Ref.^{72} for a brief mathematical review.
NF spectrum associated with finiteextent signals
In practical applications, we do not typically deal with the signals defined on the whole infinite taxis, but rather operate with the truncated waveforms, meaning that q(t) is nonzero only inside the finite interval of t. In this case, the NF spectrum of the signal is completely characterised by the coefficient \(b(\xi )\) from (8), which becomes bandlimited, appended with the finite discrete set of solitonic parameters \(\{\xi _n,c_n\}\)^{25,26}. When, in addition, the discrete NF spectrum is absent, as it is in the case considered, the whole NF spectrum can be defined using just \(b(\xi )\) profile^{24}, while the coefficient \(a(\xi )\) can be expressed through \(b(\xi )\) in the following way:
where the integral in the exponent is understood in the principal value sense. So, in practice, instead of \(r(\xi )\) (10), it is sufficient to compute the bcoefficient, and then find \(a(\xi )\) using Eq. (11). If needed, we then can use both computed quantities to find the reflection coefficient (10). In practice, the bcoefficient is preferable, since when calculating the \(r(\xi )\), in the case of a value of the \(a(\xi )\) close to zero, the numerical error of the calculation greatly increases. We note that within the bmodulation concept, which has turned out to be the most efficacious NFDM method developed so far, we utilise the \(b(\xi )\) functions as information carriers^{24,25,26,27}.
NF spectrum for the weaklynonlinear case and threshold for soliton nucleation
Let us assume that the amplitude of our signal is small, say \(q(t) \sim \varepsilon\), with \(\varepsilon \ll 1\). Then, we can derive the following expansions for the NF scattering coefficients^{14}:
up to \(\varepsilon ^2\) (the next expansion term \(\sim \varepsilon ^4\)), and
up to \(\varepsilon ^3\) (the next expansion term \(\sim \varepsilon ^5\)). With the accuracy up to \(\varepsilon ^4\), we have for the reflection coefficient:
So we see that the first linear term in \(r(\xi )\) expansion is simply the conjugated FT of our signal up to the frequency scaling factor. Then, \(r(\xi )\) from Eq. (14) differs from the expression for \(b(\xi )\), Eq. (13), only by the terms \(\sim \varepsilon ^3\) and higher, but the structure of both expressions is the same, and so the NFTNet with the same structure can successfully recover both \(r(\xi )\) and \(b(\xi )\) if we explicitly train it for the recognition of the corresponding quantity. We believe that this also holds for any level of nonlinearity, maybe aside from the case when we are close to the soliton creation threshold and \(r(\xi )\) displays sharp peaks^{14}, Fig. 2]. But, in such a special scenario, it looks more efficient to use the NN to recover \(a(\xi )\) and \(b(\xi )\) profiles, as these do not typically display any singular behaviour.
Turning to the question of soliton appearance from a localised profile, the rigorous criterion for our having no embedded solitons can be formulated for singlelobe profiles as^{73}:
and the deterministic profiles used in our work have a much higher normalised energy. For more involved multilobe profiles, the solitoncreation threshold is typically higher, but we still had some profiles that contained solitary components, so we had to eliminate them. When we add noise to our signal that initially contains no solitons, a random modulation typically diminishes the probability of solitons appearance^{74,75}. However, we checked out that all randomly perturbed signals used in our study did not contain a solitonic component as well.
To demonstrate the difference between the continuous NFT spectrum and the linear FT spectrum, we calculated (taking into account the necessary transformations and frequency scaling) both spectra for an example signal of the type used in our analysis. As the measure showing the distinction between the conventional Fourier and NF spectra, we use the norm of the difference: \(r(\xi )  r_{FT}(\xi )\), where \(r_{FT}(\xi )\) is given by the first (linear) term in the expansion of \(r(\xi )\), Eq. (14). Figure 6a shows an example of a nonlinear and conventional Fourier spectrum. The dependence of the difference on the spectral parameter \(\xi\) for a typical signal from our testing set is shown in Fig. 6b. The critical decrease of the difference at \(\xi\) region below \(100\) and above 100 occurs because the amplitude of the continuous spectrum at that region also tends to zero. The average maximal difference parameter value over the entire spectrum for all signals from the test dataset is \(\approx 9\). This fact allows us to argue that the nonlinear effects are essential for the selected testing signals, despite their containing no solitons. Thus, the accuracy of the NFTNet allows us to perceive the truly nonlinear effects.
Numerical NFT computation
In our work we used the conventional forward NFT numerical method to generate training and testing data set pairs: the signal and its respective NF spectrum. For the computation of continuous NF spectrum associated with a given profile q(t) (containing no solitons) having the form of Eq. (2), we used the exponential scheme ES4 from the FNFT package^{58} (nonfast realisation). It has the accuracy proportional to the fourth power of the time sample size, \(\sim (\Delta t)^4\). We note that there exists the fast realisation of the NFT processing with \(\sim (\Delta t)^4\) accuracy^{76}, which can potentially be used for efficient NFTNet training.
Complexity analysis
One of the important metrics in the development of signal processing tools is the complexity of the processing device, i.e. the number of elementary arithmetic operations that the processing unit employs to reach its goal. Quite often we need to analyse the interplay between the complexity and accuracy of the processing unit. Thus, here we perform the complexity analysis for the NFTNet.
In our case, we concentrate only on the number of multiplications, since in practical implementation the computational complexity of addition operations is negligible. The number of real multiplications needed for the forward propagation of the model, as introduced in^{77} for several types of NN layers, is also used to calculate the computational complexity of the NFTNet in this paper.
The overall complexity C of the NFTNet can be presented as the sum of two constituents: the complexity of denselyconnected block \(C_{{{\text {dense}}}}\) and the complexity of convolutional block \(C_{{{\text {conv}}}}\). For the calculation of \(C_{{{\text {dense}}}}\) the same formula as in^{77} can be used, where we have \(n_i\) inputs, \(n_1\) neurons in the hidden layers, and \(n_o\) outputs, and the complexity is defined as:
In the case of the convolution layer, we can change the equation given in^{77} to measure the generalised convolutional layer complexity by taking into account the number of filters f and kernel size k, as well as the effect of padding p, stride s, and dilation d. The complexity \(C_{{{\text {conv, layer}}}}\) for one layer when the input shape is [\(L_{in},Q_{in}\)], is specified as follows:
where \(Q_{in}\) denotes a number of channels, \(L_{in}\) is a length of signal samples sequence. Therefore, the total complexity of the NFTNet used in this paper in terms of real multiplications per output sequence (1024 complex valued points) is:
where the factor 2 in front appears due to the use of two identical NNs to predict the real and imaginary parts of the continuous NF spectrum. Turning to our optimised architecture, to process 1024 complex signal samples, the following number of multiplication operations for the optimised architecture is required:
For comparison, processing a signal consisting of 1024 points using FNFT methods from Ref.^{78} requires 3885572 FLOPs (note that this is not the number of multiplications, so the direct comparison with the number from Eq. (19) is somewhat difficult). Generally, for the computation of N points in the NF spectrum from N point in tdomain, the nonfast NFT methods^{72} typically require \(N^2\) FLOPs, while the fast methods need \(N \log ^2N\) FLOPs^{58,78}. From this perspective, the complexity of the current NFTNet corresponds to that of nonfast NFT methods. However, some techniques can be further used to reduce the NN’s complexity^{79}.
References
Gardner, C. S., Greene, J. M., Kruskal, M. D. & Miura, R. M. Method for solving the KortewegDevries equation. Phys. Rev. Lett. 19, 1095 (1967).
Yousefi, M. & Kschischang, F. Information transmission using the nonlinear Fourier transform, Part I: Mathematical tools. IEEE Trans. Inf. Theory 60, 4312–4328 (2014).
Turitsyn, S. et al. Nonlinear Fourier transform for optical data processing and transmission: Advances and perspectives. Optica 4, 307–322 (2017).
Zakharov, V. & Shabat, A. Exact theory of twodimensional selffocusing and onedimensional selfmodulation of waves in nonlinear media. Sov. Phys. JETP 34, 62 (1972).
Ablowitz, M. J., Kaup, D. J., Newell, A. C. & Segur, H. The inverse scattering transformFourier analysis for nonlinear problems. Stud. Appl. Math. 53, 249–315 (1974).
Novikov, S., Manakov, S., Pitaevskii, L. & Zakharov, V. E. Theory of Solitons: The Inverse Scattering Method (Springer Science & Business Media, 1984).
Kosevich, A. M., Ivanov, B. & Kovalev, A. Magnetic solitons. Phys. Rep. 194, 117–238 (1990).
Osborne, A. Nonlinear Ocean Waves and the Inverse Scattering Transform (Academic press, 2010).
Agrawal, G. P. FiberOptic Communication Systems Vol. 222 (John Wiley & Sons, 2012).
Mollenauer, L. F. & Gordon, J. P. Solitons in Optical Fibers: Fundamentals and Applications (Elsevier, 2006).
Hasegawa, A. & Nyu, T. Eigenvalue communication. J. Lightwave Technol. 11, 395–399 (1993).
Yangzhang, X. et al. Dualpolarization nonlinear frequencydivision multiplexed transmission with \(b\)modulation. J. Lightwave Technol. 37, 1570–1578 (2019).
Essiambre, R., Kramer, G., Winzer, P., Foschini, G. & Goebel, B. Capacity limits of optical fiber networks. J. Lightwave Technol. 28, 662–701 (2010).
Prilepsky, J. E., Derevyanko, S. A. & Turitsyn, S. K. Nonlinear spectral management: Linearization of the lossless fiber channel. Opt. Express 21, 24344–24367 (2013).
Aref, V. Control and detection of discrete spectral amplitudes in nonlinear fourier spectrum. arXiv preprint arXiv:1605.06328 (2016).
Prilepsky, J. E., Derevyanko, S. A., Blow, K. J., Gabitov, I. & Turitsyn, S. K. Nonlinear inverse synthesis and eigenvalue division multiplexing in optical fiber channels. Phys. Rev. Lett. 113, 013901 (2014).
Le, S., Prilepsky, J. E. & Turitsyn, S. K. Nonlinear inverse synthesis for high spectral efficiency transmission in optical fibers. Opt. Express 22, 26720–26741 (2014).
Le, S., Prilepsky, J. & Turitsyn, S. Nonlinear inverse synthesis technique for optical links with lumped amplification. Opt. Express 23, 8317–8328 (2015).
Le, S. T., Prilepsky, J. E., Rosa, P., AniaCastañón, J. D. & Turitsyn, S. K. Nonlinear inverse synthesis for optical links with distributed Raman amplification. J. Lightwave Technol. 34, 1778–1786 (2015).
Le, S. et al. Demonstration of nonlinear inverse synthesis transmission over transoceanic distances. J. Lightwave Technol. 34, 2459–2466 (2016).
Le, S., Aref, V. & Buelow, H. Nonlinear signal multiplexing for communication beyond the Kerr nonlinearity limit. Nat. Photon. 11, 570 (2017).
Kamalian, M., Prilepsky, J., Le, S. & Turitsyn, S. On the design of NFTbased communication systems with lumped amplification. J. Lightwave Technol. 35, 5464–5472 (2017).
Yousefi, M. & Yangzhang, X. Linear and nonlinear frequencydivision multiplexing. IEEE Trans. Inf. Theory 66, 478–495 (2019).
Wahls, S. Generation of timelimited signals in the nonlinear Fourier domain via bmodulation. In 2017 European Conference on Optical Communication (ECOC), 1–3 (IEEE, 2017).
Gui, T., Zhou, G., Lu, C., Lau, A. P. T. & Wahls, S. Nonlinear frequency division multiplexing with bmodulation: Shifting the energy barrier. Opt. Express 26, 27978–27990 (2018).
Shepelsky, D., Vasylchenkova, A., Prilepsky, J. E. & Karpenko, I. Nonlinear Fourier spectrum characterization of timelimited signals. IEEE Trans. Commun. 68, 3024–3032 (2020).
Chimmalgi, S. & Wahls, S. Bounds on the transmit power of bmodulated NFDM systems in anomalous dispersion fiber. Entropy 22, 639 (2020).
Yangzhang, X. et al. Experimental demonstration of dualpolarization NFDM transmission with \(b\)modulation. IEEE Photon. Technoln Lett. 31, 885–888 (2019).
Hari, S., Yousefi, M. I. & Kschischang, F. R. Multieigenvalue communication. J. Lightwave Technol. 34, 3110–3117 (2016).
Buelow, H., Aref, V. & Idler, W. Transmission of waveforms determined by 7 eigenvalues with pskmodulated spectral amplitudes. In ECOC 2016; 42nd European Conference on Optical Communication; Proceedings of, 1–3 (VDE, 2016).
Wu, Y. et al. Robust neural network receiver for multipleeigenvalue modulated nonlinear frequency division multiplexing system. Opt. Express 28, 18304–18316 (2020).
Derevyanko, S., Prilepsky, J. & Turitsyn, S. Capacity estimates for optical transmission based on the nonlinear Fourier transform. Nat. Commun. 7, 12710 (2016).
Pankratova, M., Vasylchenkova, A., Derevyanko, S. A., Chichkov, N. B. & Prilepsky, J. E. Signalnoise interaction in opticalfiber communication systems employing nonlinear frequencydivision multiplexing. Phys. Rev. Appl. 13, 054021 (2020).
Jones, R. T., Gaiarin, S., Yankov, M. P. & Zibar, D. Timedomain neural network receiver for nonlinear frequency division multiplexed systems. IEEE Photon. Technol. Lett. 30, 1079–1082 (2018).
Yangzhang, X., Lavery, D., Bayvel, P. & Yousefi, M. I. Impact of perturbations on nonlinear frequencydivision multiplexing. J. Lightwave Technol. 36, 485–494 (2018).
Tavakkolnia, I. & Safari, M. The impact of PMD on singlepolarization nonlinear frequency division multiplexing. J. Lightwave Technol. 37, 1264–1272 (2019).
Musumeci, F. et al. An overview on application of machine learning techniques in optical networks. IEEE Commun. Surv. Tutor. 21, 1383–1408 (2018).
Khan, F. N., Fan, Q., Lu, C. & Lau, A. P. T. An optical communications perspective on machine learning and its applications. J. Lightwave Technol. 37, 493–516 (2019).
Gaiarin, S., Da Ros, F., De Renzis, N., da Silva, E. P. & Zibar, D. Dualpolarization NFDM transmission using distributed Raman amplification and NFTdomain equalization. IEEE Photon. Technol. Lett. 30, 1983–1986 (2018).
Koch, J., Weixer, R. & Pachnicke, S. Equalization of soliton transmission based on nonlinear fourier transform using neural networks. In 45th European Conference on Optical Communication (ECOC), 1–3 (2019).
Kotlyar, O., Kopae, M. K., Prilepsky, J. E., Pankratova, M. & Turitsyn, S. K. Machine learning for performance improvement of periodic nftbased communication system. In 2019 European Conference on Optical Communications (2019).
Kotlyar, O. et al. Combining nonlinear fourier transform and neural networkbased processing in optical communications. Opt. Lett. 45, 3462–3465 (2020).
Kotlyar, O. et al. Convolutional long shortterm memory neural network equalizer for nonlinear Fourier transformbased optical transmission systems. Opt. Express 29, 11254–11267 (2021).
Yamamoto, S., Mishina, K. & Maruta, A. Demodulation of optical eigenvalue modulated signal using neural network. IEICE Commun. Express 8, 507–512 (2019).
Zhang, W. Q., Chan, T. H. & Afshar, S. Direct decoding of nonlinear OFDMGAM signals using convolutional neural network. Opt. Express 29, 11591–11604 (2021).
Randoux, S., Suret, P., Chabchoub, A., Kibler, B. & El, G. Nonlinear spectral analysis of peregrine solitons observed in optics and in hydrodynamic experiments. Phys. Rev. E 98, 022219 (2018).
SotoCrespo, J. M., Devine, N. & Akhmediev, N. Integrable turbulence and rogue waves: Breathers or solitons?. Phys. Rev. Lett. 116, 103901 (2016).
Turitsyn, S. K., Chekhovskoy, I. S. & Fedoruk, M. P. Nonlinear Fourier transform for characterization of the coherent structures in optical microresonators. Opt. Lett. 45, 3059–3062 (2020).
Wang, J., Sheng, A.G., Huang, X., Li, R.Y. & He, G.Q. Eigenvalue spectrum analysis for temporal signals of Kerr optical frequency combs based on nonlinear Fourier transform. Chin. Phys. B 29, 034207 (2020).
Ryczkowski, P. et al. Realtime fullfield characterization of transient dissipative soliton dynamics in a modelocked laser. Nat. Photon. 12, 221 (2018).
Sugavanam, S., Kopae, M. K., Peng, J., Prilepsky, J. E. & Turitsyn, S. K. Analysis of laser radiation using the nonlinear Fourier transform. Nat. Commun. 10, 5663 (2019).
Chekhovskoy, I., Shtyrina, O., Fedoruk, M., Medvedev, S. & Turitsyn, S. Nonlinear Fourier transform for analysis of coherent structures in dissipative systems. Phys. Rev. Lett. 122, 153901 (2019).
Skaar, J., Wang, L. & Erdogan, T. On the synthesis of fiber Bragg gratings by layer peeling. IEEE J. Quantum Electron. 37, 165–173 (2001).
Turitsyna, G. E., Webb, S., Mezentsev, V. & Turitsyn, S. K. Novel design of FBGbased composite double notch VSB filter for DWDM systems. J. Lightwave Technol. 24, 3547–3552 (2006).
Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Datadriven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).
Lusch, B., Kutz, J. N. & Brunton, S. L. Deep learning for universal linear embeddings of nonlinear dynamics. Nat. Commun. 9, 4950 (2018).
Li, Z. et al. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895 (2020).
Wahls, S., Chimmalgi, S. & Prins, P. FNFT: A software library for computing nonlinear Fourier transforms. J. Open Source Softw. 3, 597 (2018).
Sedov, E. V. et al. Soliton content in the standard optical OFDM signal. Opt. Lett. 43, 5985–5988 (2018).
Turitsyn, S., Sedov, E., Redyuk, A. & Fedoruk, M. Nonlinear spectrum of conventional OFDM and WDM returntozero signals in nonlinear channel. J. Lightwave Technol. 38, 352–358 (2019).
Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. Deepface: Closing the gap to humanlevel performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1701–1708 (2014).
Oord, A. V. D. et al. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
Pelikan, M., Goldberg, D. E., CantúPaz, E. et al. Boa: The bayesian optimization algorithm. In Proceedings of the genetic and evolutionary computation conference GECCO99, vol. 1, 525–532 (Citeseer, 1999).
Močkus, J. On bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference, 400–404 (Springer, 1975).
Sena, M. et al. Bayesian optimization for nonlinear system identification and predistortion in cognitive transmitters. J. Lightwave Technol. 39, 5008–5020 (2021).
Spall, J. C. Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Autom. Control 45, 1839–1853 (2000).
Freire, P. J. et al. Complexvalued neural network design for mitigation of signal distortions in optical links. J. Lightwave Technol. 39, 1696–1705 (2021).
Freire, P. J. et al. Transfer learning for neural networksbased equalizers in coherent optical systems. J. Lightwave Technol.https://doi.org/10.1109/JLT.2021.3108006 (2021).
Tobin, J. et al. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 23–30 (IEEE, 2017).
Mishina, K., Sato, S., Yoshida, Y., Hisano, D. & Maruta, A. Eigenvaluedomain neural network demodulator for eigenvaluemodulated signal. J. Lightwave Technol.https://doi.org/10.1109/JLT.2021.3074744 (2021).
Sedov, E. V., Chekhovskoy, I. S., Prilepsky, J. E. & Fedoruk, M. P. Application of neural networks to determine the discrete spectrum of the direct Zakharov–Shabat problem. Quantum Electron. 50, 1105 (2020).
Vasylchenkova, A., Prilepsky, J., Shepelsky, D. & Chattopadhyay, A. Direct nonlinear Fourier transform algorithms for the computation of Solitonic spectra in focusing nonlinear Schrödinger equation. Commun. Nonlinear Sci. Numer. Simul. 68, 347–371 (2019).
Klaus, M. & Shaw, J. On the eigenvalues of Zakharov–Shabat systems. SIAM J. Math. Anal. 34, 759–773 (2003).
Turitsyn, S. K. & Derevyanko, S. Solitonbased discriminator of noncoherent optical pulses. Phys. Rev. A 78, 063819 (2008).
Derevyanko, S. A. & Prilepsky, J. E. Soliton generation from randomly modulated returntozero pulses. Opt. Commun. 281, 5439–5443 (2008).
Medvedev, S., Vaseva, I., Chekhovskoy, I. & Fedoruk, M. Exponential fourth order schemes for direct Zakharov–Shabat problem. Opt. Express 28, 20–39 (2020).
Freire, P. J. et al. Performance versus complexity study of neural network equalizers in coherent optical systems. arXiv preprint arXiv:2103.082122 (2021).
Chimmalgi, S., Prins, P. J. & Wahls, S. Fast nonlinear Fourier transform algorithms using higher order exponential integrators. IEEE Access 7, 145161–145176 (2019).
Arguello, D. R. et al. Realization of neural networkbased optical channel equalizer in restricted hardware. arXiv preprint arXiv:2109.07204 (2021).
Acknowledgements
JEP and SKT acknowledge the support of Leverhulme Trust project RPG2018063. SKT is supported by the EPSRC programme Grant TRANSNET, EP/R035342/1. PJF acknowledges the support from the EU Horizon 2020 program under the Marie SklodowskaCurie Grant Agreement 813144 (REALNET). EVS acknowledges the support from the Russian Science Foundation under Grant 177230006, ISC research was supported by the grant of the President of the Russian Federation (MK677.2020.9). VAK and JEP acknowledge the Erasmus+ mobility scheme between National Technical University “Kharkiv Polytechnic Institute” and Aston University.
Author information
Authors and Affiliations
Contributions
J.E.P. and S.K.T. conceived the study. V.V.S., V.A.K., E.V.S., and J.E.P. proposed the neural network model type. E.V.S. and I.S.C. collected the data. P.J.F. and E.V.S. performed the architecture optimisation. E.V.S. performed the numerical simulations and designed the figures and tables. J.E.P. and E.V.S. wrote the manuscript, with the assistance of S.K.T. and M.K.K. All authors reviewed the manuscript. The work of P.J.F. and E.V.S. was supervised by J.E.P. and S.K.T. The work of V.V.S. was supervised by V.A.K.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sedov, E.V., Freire, P.J., Seredin, V.V. et al. Neural networks for computing and denoising the continuous nonlinear Fourier spectrum in focusing nonlinear Schrödinger equation. Sci Rep 11, 22857 (2021). https://doi.org/10.1038/s41598021022529
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021022529
This article is cited by

Serial and parallel convolutional neural network schemes for NFDM signals
Scientific Reports (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.