Deep learning enhanced Rydberg multifrequency microwave recognition

Recognition of multifrequency microwave (MW) electric fields is challenging because of the complex interference of multifrequency fields in practical applications. Rydberg atom-based measurements for multifrequency MW electric fields is promising in MW radar and MW communications. However, Rydberg atoms are sensitive not only to the MW signal but also to noise from atomic collisions and the environment, meaning that solution of the governing Lindblad master equation of light-atom interactions is complicated by the inclusion of noise and high-order terms. Here, we solve these problems by combining Rydberg atoms with deep learning model, demonstrating that this model uses the sensitivity of the Rydberg atoms while also reducing the impact of noise without solving the master equation. As a proof-of-principle demonstration, the deep learning enhanced Rydberg receiver allows direct decoding of the frequency-division multiplexed signal. This type of sensing technology is expected to benefit Rydberg-based MW fields sensing and communication.

T he strong interaction between Rydberg atoms and microwave (MW) fields that results from their high polarizability means that the Rydberg atom is a candidate medium for MW fields measurement, e.g., using electromagnetically induced absorption 1 , electromagnetically induced transparency (EIT) 2,3 and the Autler-Townes effect [3][4][5][6] . The amplitudes 7-10 , phases 10,11 and frequencies 9,10 of MW fields could then be measured with high sensitivity. Based on this measurement sensitivity for MW fields, the Rydberg atom has been used in communications 7,8,12,13 and radar 14 as an atom-based radio receiver. In the communications field, the Rydberg atom replaces the traditional antenna with superior performance aspects that include sub-wavelength size, high sensitivity, system international (SI) traceability to Planck's constant, high dynamic range, self-calibration and an operating range that spans from MHz to THz frequencies 7,9,10,15,16 . One application is analogue communications, e.g., real-time recording and reconstruction of audio signals 13 . Another application is digital communications, e.g., phase-shift keying and quadrature amplitude modulation 7,8,12 . The channel capacity of MW-based communications is limited by the standard quantum limited phase uncertainty 7 . Furthermore, a continuously tunable radio-frequency carrier has been realized based on Rydberg atoms 17 , thus paving the way for concurrent multichannel communications. Detection and decoding of multifrequency MW fields are highly important in communications for acceleration of information transmission and improved bandwidth efficiency. Additionally, MW fields recognition enables simultaneous detection of multiple targets with different velocities from the multifrequency spectrum induced by the Doppler effect. However, because of the sensitivity of Rydberg atoms, the noise is superimposed on the message, meaning that the message cannot be recovered efficiently. Additionally, it is difficult to generalize and scale the band-pass filters to enable demultiplexing of multifrequency signals with more carriers 16 .
To solve these problems, we use a deep learning model for its accurate signal prediction capability and its outstanding ability to recognize complex information from noisy data without use of complex circuits. The deep learning model updates the weights via backpropagation and then extracts features from massive data without human intervention or prior knowledge of physics and the experimental system. Because of these advantages, physicists have constructed complex neural networks to complete numerous tasks, including far-field subwavelength acoustic imaging 18 , value estimation of a stochastic magnetic field 19 , vortex light recognition 20,21 , demultiplexing of an orbital angular momentum beam 22,23 and automatic control of experiments [24][25][26][27][28][29] .
Here, we demonstrate a deep learning enhanced Rydberg receiver for frequency-division multiplexed digital communication. In our experiment, the Rydberg atoms act as a sensitive antenna and a mixer to receive multifrequency MW signals and extract information 9,11,12 . The modulated signal frequency is reduced from several gigahertz to several kilohertz via the interaction between the Rydberg atoms and the MWs, thus allowing the information to be extracted using simple apparatus. These interference signals are then fed into a well-trained deep learning model to retrieve the messages. The deep learning model extracts the multifrequency MW signal phases, even without knowing anything about the Lindblad master equation, which describes the interactions between atoms and light beams in an open system theoretically. The solution of the master equation is often complex because the higher-order terms and the noises from the environment and from among the atoms are taken into consideration. However, the deep learning model is robust to the noise because of its generalization ability, which takes advantage of the sensitivity of the Rydberg atoms while also reducing the impact of the noise that results from this sensitivity. Our deep learning model is scalable, allowing it to recognize the information carried by more than 20 MW bins. Additionally, when the training is complete, the deep learning model extracts the phases more rapidly than via direct solution of the master equation.

Results
Setup. We adapt a two-photon Rydberg-EIT scheme to excite atoms from a ground state to a Rydberg state. A probe field drives the atomic transition j5S 1=2 ; F ¼ 2i ! j5P 1=2 ; F 0 ¼ 3i and a coupling light couples the transition j5P 1=2 ; F 0 ¼ 3i ! j51D 3=2 i in rubidium 85, as shown in Fig. 1a. Multifrequency MW fields drive a radio-frequency (RF) transition between the two different Rydberg states j51D 3=2 i and j50F 5=2 i. The energy difference between these states is 17.62 GHz. The multifrequency MW fields consist of multiple MW bins (more than three bins) with frequency differences of several kilohertz from the resonance frequency. The amplitudes, frequencies, and phases of the multiple MW bins can be adjusted individually (further details are provided in the "Methods" section). The detunings of the probe, coupling and MW fields are Δ p , Δ c and Δ s , respectively. The Rabi frequencies of the probe, coupling and MW fields are Ω p , Ω c and Ω s , respectively. The experimental setup is depicted in Fig. 1b. We use MW fields to drive the Rydberg states constantly, producing modulated EIT spectra, i.e., the probe transmission spectra, as shown in the inset of Fig. 1b. The phases of the MW fields correlate with the modulated EIT spectra and can be recovered from these spectra with the aid of deep learning. Specifically, the probe transmission spectra are fed into a well-trained deep learning model that consists of a one-dimensional convolution layer (1D CNN), a bi-directional long-short-term memory layer (Bi-LSTM) and a dense layer to extract the phases of the MW fields. Figure 1c-e shows these components of the neural network (further details are presented in the "Methods" section). Finally, the bin phases are recovered and the data are read out.
Frequency-division multiplexed signal encoding and receiving.
In the experiments, we use a four-bin frequency-division multiplexing (FDM) MW signal for demonstration, where one of the four MW bins is used as the reference bin. The relative phase differences between the reference bin and the other bins are modulated by the message signal. Specifically, for the four-bin MW signal, where ω 0 is the resonant frequency, ω 1,2,3 are the relative frequencies, the carrier frequencies are 2π(ω 0 + ω 1 ) = 17.62 GHz − 3 kHz, 2π(ω 0 + ω 2 ) = 17.62 GHz − 1 kHz, 2π(ω 0 + ω 3 ) = 17.62 GHz + 1 kHz and 2π(ω 0 + ω 4 ) = 17.62 GHz + 3 kHz, the frequency difference between two frequency-adjacent bins is Δf = 2 kHz and the message signal is φ 1,2,3 = 0 or π, standing for 3 bits (0 or 1), and the reference phase is φ 4 = 0 (which remains unchanged). The phase list φ 1 ; φ 2 ; φ 3 ; φ 4 À Á is a bit string for time t 0 . By varying the phase of φ 1,2,3 with time, we then obtain the FDM signal for binary phase-shift keying (2PSK). Additionally, the amplitudes of the four bins are 0.1A 4 = A 1,2,3 to solve the problem that results from the nonlinearity of the atom, where the probe transmission spectra of two different bit strings, e.g. (0, 0, π, 0) and (0, π, 0, 0), are the same (further details are presented in the "Methods" section). By increasing the frequency difference Δf, we can obtain higher information transmission rates. For four bins with Δf = 2 kHz, the information transmission rate is n b × Δf = (4 − 1) × 2 × 10 3 bps = 6 kbps, where n b is the number of bits. In the experiments, disturbances originate from the environment and atomic collisions. Because of the sensitivity of Rydberg atoms to MW fields, the resulting noise submerges our signal. To use the sensitivity of the Rydberg atoms and simultaneously minimize the effects of noise, the deep learning model is used to extract the relative phases φ 1 ; φ 2 ; φ 3 À Á .
Deep learning. To improve the robustness and speed of our receiver, we use a deep learning model to decode the probe transmission signal. The complete encoding and decoding process is illustrated in Fig. 2a. The Rydberg antenna receives the FDM-2PSK signal and down-converts this signal into the probe transmission spectrum. The information is then retrieved from the spectrum using the deep learning model. The precondition is that the different bit strings correspond to distinct probe spectra; this is resolved by setting 0.1A 4 = A 1,2,3 , as discussed earlier.
Then, we combine the 1D CNN layer, the Bi-LSTM layer and the dense layer to form the deep learning model (see the "Methods" section for further details) 30,31 . One of the reasons for using the 1D CNN layer and the Bi-LSTM layer is that the data sequences are long, which means that prediction of the phases φ ¼ φ 1 ; φ 2 ; φ 3 ; 0 À Á from the spectrum is a regression task and requires a long-term memory for our model. Another reason is to combine the convolution layer's speed with the sequential sensitivity of the Bi-LSTM layer 32 . The input sequence is first processed by the 1D CNN to extract the features, meaning that a long sequence is converted into a shorter sequence with higherorder features. This process is visualized to show how the deep learning model treats the transmission spectrum; more details are presented in the Supplementary Materials. The shorter sequence is then fed into the Bi-LSTM layer and resized by the dense layer to match the label size (see the "Methods" section for further details). Specifically, the probe spectrum T ¼ T 0 ; T τ ; T 2τ ; Á Á Á ; T iÁτ Á Á Á ; T NÁτ È É and the corresponding phases are collected to form the data set, where T i⋅τ is the ith data point of a probe spectrum and the fourth bit φ 4 = 0 is the reference bit. Both the spectra and the phases are 1D vectors with dimensions of N + 1 and 4, respectively. These independent, identically distributed data {{T}, {φ}} are fed into our model as a data set. By shuffling this data set and splitting it into three sets, i.e., a training set, a validation set and a test set, we train our model on the training set (feeding both the waveforms and labels {{T}, {φ}}), validate, and test our model on the validation and test sets, respectively (by feeding waveforms without labels and comparing the predictions with ground truth labels). The validation set is used to determine whether there is either overfitting or underfitting during training. Finally, the performance (i.e., accuracy) of the model is estimated by predicting the test set.
The performance of our deep learning model is affected by the training epochs and the training and validation set sizes. The training curves on different training sets and validation sets are shown in Fig. 2b, c. Initially, our model performs well on the training set only, implying overfitting. The curves then converge (dashed line) and our model performs well on both the training set and the validation set. The sudden jump in the loss curve in Fig. 2c is caused by the change in the learning rate (further details are presented in the "Methods" section). Use of more training and validation data causes the curves to converge more quickly. The deep learning model performs well after these few-sample training. In Fig. 2d, we show a confusion matrix for prediction of a uniformly distributed test set, which demonstrates accuracy of 99.38%.
The "noise" shown in Fig. 2a refers to two kinds of noises. One comes from atoms and the external environment (systematic noise). The other comes from the noise added on purpose (additional noise). The systematic noise cannot be adjusted quantitatively and is discussed with its noise spectrum in the Supplementary Materials. Because the noise on the data set is independent and is distributed identically (i.i.d.), i.e., the entire data set is shuffled before being split into the training and test sets, the systematic noise pattern is almost the same in both the training set and the test set. The deep learning model has already

Input data Output Kernal
Convolution (e) Fig. 1 Illustration of the setup. a Overview of experimental energy diagram. Probe and coupling laser beams excite the atoms at ground state j5S 1=2 i to the Rydberg state j51D 3=2 i. Multifrequency microwave (MW) electric fields couple the Rydberg states j51D 3=2 i and j50F 5=2 i. b Schematic of Rydberg atombased antenna and mixer interacting with multifrequency signals. A 795 nm laser beam is split into two beams, which then propagate in parallel through a heated Rb cell (length: 10 cm, temperature: 44.6 ∘ C, atomic density: 9.0 × 10 10 cm −3 ) 46 . One is the probe beam, which counterpropagates with the coupling laser beam exciting atoms to Rydberg states to reduce Doppler broadening. The other is the reference beam, which does not counterpropagate with the coupling laser beam. The beams are detected using a differencing photodetector (DD) to obtain the probe transmission spectrum (inset). Multifrequency MW fields transmitted by a horn are applied to the atoms, with a radiated direction that is perpendicular to the laser beam propagation direction. The multifrequency MW fields are modulated using a phase signal such that the phase differences between the reference bin and the other bins carry the messages. The probe transmission spectrum is fed into a well-trained neural network to retrieve the variations of the phases with time. c-e Schematics of the neural network. The network consists of c a one-dimensional convolution layer, d a bi-directional long-short-term memory layer and e a dense layer; for further details about these layers, see the "Methods" section. NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-29686-7 ARTICLE NATURE COMMUNICATIONS | (2022) 13:1997 | https://doi.org/10.1038/s41467-022-29686-7 | www.nature.com/naturecommunications learned the systematic noise pattern during the training process, which is one of the major advantages of use of deep learning against systematic noise. However, there is a case where the noise is not i.i.d. (i.e., the case where a specific noise occurs during testing only). This problem can be solved by online learning and addition of prior knowledge as new features into the data, e.g., data for the temperature, the weather, and other factors 33 . Here for simplicity, we talk about the i.i.d. case only and add the white noise with a mean μ and a standard deviation σ. We ignore the 1/f noise in this case because it decays rapidly in the low frequency range and the signal with which it would interfere is located within the 2-200 kHz range. The additional noise is added both on the training set and the test set of the deep learning model in Fig. 2e, which demonstrates the performance of the deep learning model when used on a data set with biased or unbalanced noise. The results below the red line show the performance of the model after training on a weaker-noise training set when predicting based on a stronger-noise test set, i.e., generalization for a stronger noise case. These results indicate that the deep learning model has the generalization ability required to adapt to stronger noise. In the area above the red line, there is more noise in the training set than in the test set. Theoretically, a small amount of additional noise in the training set will increase the robustness of the deep learning model. However, when the noise increases, it affects the accuracy, which decays rapidly. Next, the well-trained model is used to reconstruct the QR code. In Fig. 3a-c, the results and the corresponding confusion matrices with their epochs are shown. First, the information is encoded into a QR code. After the code is transmitted, received and decoded using the Rydberg atoms and the deep learning model, the information is then reconstructed successfully using the 35-epoch training model in Fig. 3c, but is not reconstructed in parts (a) and (b). The accuracy is defined by the number of correctly predicted bit strings divided by the total number of bit strings (147 bit strings). After 35 epochs, the accuracy reaches 99.32% and the message is reconstructed successfully from the QR code received.
Comparison between deep learning method and the master equation. In our case, the master equation that we employed is the commonly used one without considering the noise spectrum. The accuracies of the deep learning model and the master equation fitting on noisy data are different. Figure 4 shows the accuracies obtained by the two methods. The deep learning model is trained on a training set without additional noise, and tested on a test set with additional white noise whose standard deviation is σ (the transmission spectra with noise are given in Supplementary Materials). Here for simplicity, the data set is composed of the transmission of four MW bins only (one of them is reference bin) and the frequency difference between the adjoin bins is Δf = 2kHz. On the other hand, the result of the master equation is given based on the same test set as that of the deep learning model. The deep learning method outperforms the fitting of the master equation on the noisy data set.
Apart from the robutness to the noise, when the transmission rate is increased by increasing the number of MW bins or the frequency difference Δf, the deep learning model performs well, while it is difficult to retrieve the messages with high accuracy using the master equation. Specifically, to increase the bandwidth efficiency and the transmission rate, the number of MW bins used to carry the messages must be increased, but the information is still recognizable because of the scalability of the deep learning model. For 20 MW bins, the number of bits is (20 − 1) with one reference bit, giving a 20 À 1 ð Þ 2 kbps ¼ 38 kbps transmission rate. The number of combinations of these bits is 2 19 , which increases exponentially as the number of MW bins increases. Here, for demonstration purposes, only the first 3 bits of the total of 19 bits carry the messages and the other bits, including the reference, are set to be 0. To show how well our model performs, we train, validate and test the model on this new data set without varying the other parameters, with the exception of the training epochs of our model. The loss curves for training and validation are shown in Fig. 5a. A confusion matrix for epoch 78 is shown in Fig. 5b. The model performs well on this new test set, which was sampled uniformly from eight categories with an accuracy of 100%. Another method that can be used to increase the information transmission rate involves increasing the frequency difference. In our case, the frequency difference is increased from Δf = 2 kHz to Δf = 200 kHz. The transmission rate increases correspondingly, from (4 − 1) × 2 kbps = 6 kbps to (4 − 1) × 200 kbps = 0.6 Mbps. To detect the high-speed signal, the DD bandwidth is increased, which inevitably leads to increased noise. After the model is trained on this new data set, the training and validation loss curves are as shown in Fig. 5c. A confusion matrix for epoch 83 is shown in Fig. 5d. Increasing the number of training epochs allows the model to perform well on this new data set, with an accuracy of 98.83% on a uniformly sampled test set.
To compare the performances of the deep learning model and the master equation, we fitted the probe spectra for 20 bins with a frequency difference Δf = 2 kHz and four bins with a frequency difference Δf = 200 kHz by solving the master equation without considering the higher-order terms and the effects of noise. In each case, 160 probe spectra were fitted that were sampled uniformly from every category. The prediction results are shown in Fig. 5(e) and (f). The prediction accuracy of the master equation is lower than that of the deep learning model. In our case, the impact of increasing the number of bins is greater than increasing the DD bandwidth for high-speed signals on the fitting accuracy. The prediction accuracy for a 20-bin carrier with frequency difference Δf = 2 kHz is 20.63%, which is like to the accuracy of guessing, i.e., 1/8. This implies that there is a disadvantage that comes from the fitting method itself, i.e., it can easily become trapped by local minima. Some type of prior knowledge is required to overcome this disadvantage, e.g., provision of the initial values of the phases before fitting. In contrast, the deep learning model is data driven and does not require any prior knowledge. The local minima problem of deep learning can be overcome using some well-known techniques, including learning rate scheduling and design of a more effective optimizer 32 . Additionally, the accuracy difference for the 200-   The master equation is solved by "FindFit" function in Mathematica 11.1 with both "Accuracy-Goal" and "PrecisionGoal" default, while the deep learning code is written in Python 3.7.6. These codes are run on the same computer with NVIDIA GTX 1650 and Intel®Core TM i7-9750H. Another method to decode the signal is available that uses an in-phase and quadrature (I-Q) demodulator or a lock-in amplifier 7,12 . However, the carrier frequency must be given when decoding the signal in this case. Additionally, for multiple MW bins, numerous bandpass filters are required. The deep learning method is thus much more convenient.

Discussion
We report a work on Rydberg receiver enhanced via deep learning to detect multifrequency MW fields. The results show that the deep learning enhanced Rydberg mixer receives and decodes multifrequency MW fields efficiently; these fields are often difficult to decode using theoretical methods. Using the deep learning model, the Rydberg receiver is robust to noise induced by the environment and atomic collisions and is immune to the distortion that results from the limited bandwidths of the Rydberg atoms (from dipole-dipole interactions and the EIT pumping rate, as studied in ref. 7 ) for high-speed signals (Δf = 200 kHz). In addition to increasing the transmission speed of the signals, further increments in the information transmission rate are achieved by using more bins, which is feasible because of the scalability of our model. Besides the transmission rate, this deep learning enhanced Rydberg system promises for use in studies of the channel capacity limitations. Because spectra that are difficult for humans to recognize as a result of noise and distortion are distinguishable when using the deep learning model, Rydberg systems enhanced by deep learning could take steps toward the realization of the capacity limit proposed in the literature ref. 34 . To obtain high performance (i.e. high signal-tonoise ratio, information transmission rate, channel capacity and accuracy), the training epochs and training set must be extended and enlarged.
In summary, we have demonstrated the advantages of receiving and decoding multifrequency signals using a deep learning enhanced Rydberg receiver. In a multifrequency signal receiver, rather than using multiple band-pass filters, lock-in amplifier 7,12 and other complex circuits, signals can be decoded using the extremely sensitive Rydberg atoms and the deep learning model at high speed and with high accuracy without solving the Lindblad master equation. One of the advantages of use of the Rydberg atom is that the accuracy of the Rydberg atom approaches the photon shot noise limit 35 . In principle, the accuracy of the Rydberg atom is higher than that of the classical antenna. According to recent work based on the atomic superheterodyne method, ultrahigh sensitivity can be obtained 10 . However, in this proof-of-principle demonstration, there is considerable room for the optimization required to reach that limit (e.g., stabilization of the laser, narrowing the laser linewidth, and temperature stabilization). The sensitivity of the Rydberg atoms is a double-edged sword because it also involves noise. The deep learning model restricts this side effect while taking full advantage of the Rydberg atoms' sensitivity to the signal. Using the automatic feature extraction processes of the neural networks, the spectra are classified in a supervised manner. If the features (e.g. mean value, variance, frequency spectrum) are extracted manually, the spectra are then clustered by unsupervised learning methods such as t-distributed stochastic neighbour embedding (t-SNE) or the density-based spatial clustering of applications with noise (DBSCAN) method 31 , without training on the training set. Our work will be useful in fields including high-precision signal measurement and atomic sensors. Additionally, this decoding ability can be generalized further to decode other signals that are encoded by different encoding protocols, e.g., frequency division multiplexing amplitude shift keying (FDM-ASK), frequency division multiplexing quadrature amplitude modulation (FDM-QAM), and IEEE 802.11ac WLAN standard signals for a 5 GHz carrier. The frequency of carrier to be decoded covers from several hertz to terahertz since for Rydberg atoms to receive MW with different wavelengths, the only part of the system that needs to be tuned is the frequency of the laser, while in classical receivers, the wavelength of the received MW is limited by the size of the antenna [36][37][38][39] . In addition to communications, our receiver can be used to detect multiple targets from multifrequency signals caused by the Doppler effect.

Methods
Generation and calibration of MW fields. The MW fields used in our experiments were synthesized by the signal generator (1465F-V from Ceyear) and a frequency horn. Each bin in the multifrequency MW field is tunable in terms of frequency, amplitude and phase. The RF source operates in the range from DC to 40 GHz. The frequency horn is located close to the Rb cell. We used an antenna and a spectrum analyser (4024F from Ceyear) to receive the MW fields and then calibrated the amplitudes of the MW fields at the centre of the Rb cell. The probe transmission spectrum in the time domain when Δ p = 0, Δ c = 0 and Δ s = 0 reflects the interference among the multifrequency MW bins, which results from the beat frequencies of the bins that occur through the interaction between the atoms and light. The Rydberg atoms receive the MW bins by acting as an antenna and a mixer 9,11,12 . After reception by the atoms, the frequency spectrum of the probe transmission shows that we can obtain the frequency differential signal from the probe transmission spectrum. This represents an application of our atoms to reduce the modulated signal frequency (from terahertz to kilohertz magnitude), which allows the signal to be received and decoded using simple apparatus. In our experiment, more than 20 frequency bins can be added to the atoms, for which the dynamic range is greater than 30 dBm. The amplitudes, phases and frequencies of these bins can be tuned individually. When the bandwidth is increased to detect an increasing frequency difference Δf signal, more noise is involved, but this noise is suppressed by the deep learning model. In other words, the signal can be recognized using the deep learning model when the information transmission rate is increased by raising the frequency difference Δf. These bins are used to send FDM-PSK signals in the "FDM signal encoding and receiving " section of the main text.
Master equation. The Lindblad master equation is given as follows: where ρ is the density matrix of the atomic ensemble and H = ∑ k H[ρ (k) ] is the atom-light interaction Hamiltonian when summed over all the single-atom Hamiltonians using the rotating wave approximation. This Hamiltonian has the following matrix form: where for the MW signal The Rabi frequency can be derived as follows: where the second term (which resonates with the energy levels of the Rydberg atoms) induces the normal EIT spectrum and the first term modulates that spectrum. In the interaction between the atoms and the MW fields, the atoms act as a mixer such that the output signal frequency (ω 1 , ω 2 , ω 3 ) is less than the input signal frequency (ω 0 + ω 1 , ω 0 + ω 2 , ω 0 + ω 3 ). The modulation signal's nonlinearity is reduced by setting the reference and increasing its amplitude as shown in Eq. (3), which is a precondition for recognition of these phases via deep learning. ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi where the condition for the approximations on the second line and the third line is The Lindblad superoperator L = ∑ k L[ρ (k) ] is composed of single-atom superoperators, where L[ρ (k) ] represents the Lindbladian and has the following form: r j i s h j are collapse operators that stand for the decays from state e j i to state g , from state r j i to state e j i and from state s j i to state r j i with rates Γ e , Γ r and Γ s , respectively. Because we are only concerned with the steady state here, i.e. t → ∞, the Lindblad master equation can be solved using dρ/dt = 0. The complex susceptibility of the EIT medium has the form χ(v) = (|μ ge | 2 /ϵ 0 ℏ)ρ eg , where ρ eg is the element of density matrix solved using the master equation. The spectrum of the EIT medium can be obtained from the susceptibility using T $ e ÀIm½χ : Deep learning layers. Our deep learning model consists of a 1D CNN layer, a Bi-LSTM layer and a dense layer. The mathematical sketches for these layers are given as follows.
The 1D CNN layer is illustrated in Fig. 1c. The input signal convolutes the kernel in the following form: where f represents the input data, g is the convolution kernel, m is the input data index and n is the kernel index. The 1D CNN extracts the higher-order features from the input data to reduce the lengths of the sequences fed into the Bi-LSTM layer. Before flowing into the Bi-LSTM layer, the data pass through the batch normalization layer, the ReLU activation layer and the max-pooling layer, in that sequence. For a mini-batch B ¼ x 1ÁÁÁm È É , the output from the batch normalization layer is y i = BN γ,β (x i ) and the learning parameters are γ and β 40 . The update rules for the batch normalization layer are: where Eqs. (5) and (6) evaluate the mean and the variance of the mini-batch, respectively; the data are normalized using the mean and the variance in Eq. (7) and the results are then scaled and shifted in Eq. (8). The training is accelerated using the batch normalization layer and the overfitting is also weakened by this layer. The output then passes through the ReLU activation layer. The activation function of this layer is f ReLU ðxÞ ¼ maxðx; 0Þ. The vanishing gradient problem is diminished by this activation function. Next, the inputs are downsampled in a max-pooling layer 30 .
The LSTM layer and an LSTM cell are shown schematically in Figs. 1d and 6a, respectively. The equations for the LSTM are shown as Eqs. (9)-(14) 32,41 . At a time t, the input x t and two internal states C t−1 and h t−1 are fed into the LSTM cell. The first thing to be decided by the LSTM cell is whether or not to forget in Eq. (9), which outputs a number between 0 and 1 that represents retaining or forgetting. Next, an input gate (Eq. (10)) decides which values are to be updated from a vector of new candidate values created using Eq. (11). The new value is then added to the cell state and the old value is forgotten in Eq. (12). Finally, the cell decides what to output using Eqs. (13) and (14).
where σ(x) = 1/(1 + e −x ) is the sigmoid function. The sigmoid and tanh functions are applied in an element-wise manner. The LSTM is followed by a time-reversed LSTM to constitute a Bi-LSTM layer that improves the memory for long sequences. The dense layer and a neuron are drawn in Figs. 1e and 6b, respectively, and the corresponding equations are where w is the vector of weights, b is the bias, x represents the input data, g(a) = 1/ (1 + e −a ) is the sigmoid activation function used to limit the output values to between 0 and 1, and y is the output. The dense layer resizes the shape of the data obtained from the Bi-LSTM to match the size of the label. The training consists of both forward and backward propagation. A batch of probe spectra propagates through the 1D CNN layer, the Bi-LSTM layer, and dense layer during the forward training process. The differentiable loss function is then calculated. In our case, the differentiable loss function is the mean squared error (MSE) between the predictions and the ground truth, which is used widely in the regression task 32 . The equation for the MSE is where m is the number of data points in one spectrum, n is the mini-batch size, φ i is the ground truth and f(T i ) is the model prediction. In backpropagation, the trainable weights of each layer are updated based on the learning rate and the derivative of the MSE loss function with respect to the weights to minimize the loss L MSE , such that where η is the learning rate and W is the trainable weight for each layer. The weights of each layer are then updated according to the RMSprop optimizer 42 .
The network is implemented using the Keras 2.3.1 framework on Python 3.6.11 (ref. 30 ). All weights are initialized with the Keras default. The hyper-parameters of the deep learning model (including the convolution kernel length, the number of hidden variables and the learning rate) are tuned using Optuna 43 .
Deep learning pipeline. To obtain better fitting results, the data are scaled based on their maximum and minimum values, i.e., T 0 ¼ ðT i À minðTÞÞ=ðmaxðTÞ À minðTÞÞ. The labels are encoded in dense vectors with four elements rather than in one-shot encoding vectors to save space 32 . Each of these elements is either 0 or 1, representing the relative phase 0 or π of each bin, respectively.
A one-dimensional convolution layer (1D CNN), a bidirectional long-shortterm memory layer (Bi-LSTM) and a dense layer are used in our deep learning model. The deep learning model structure is shown in Fig. 7. The data size for the input layer is given in the form (batch size, length of probe spectrum, number of features). The batch size is 64 in our case. Because the duration of the spectrum ranges from t = 0 to t = 0.999 ms with a time difference of τ = 1 μs, the spectrum length is 1000. For a 1D input, the number of features is 1. Therefore, the data size for the input layer is (64, 1000, 1).
During training of this model, fourfold cross-validation is used to save the amount of training data.The data set is split as shown in Fig. 8. First, the data set is split into two parts. The first is the test set (red), which remains untouched during training. The second (purple) is used to train the model. In the cross-validation process, the rest data set (purple) is copied four times and is divided equally into four parts each. One of these parts is the validation data set (green) and the others are used as training sets (blue). Four models are trained on the different training sets and validation sets. Then the best model is chosen according to the validation set and is tested on the test set. After splitting, the training set, the validation set, and the test set all remain unchanged. In every epoch, each model iterates the training set only once. There is no new set being taken; instead, the same training set is iterated once each epoch. The computational graph is cleared before each training sequence to prevent leakage of the validation data. Gaussian noise (where the mean is 0 and the standard deviation is 0.5) is added to the training data to increase the robustness of the proposed model. In addition, the learning rate is adjusted during training to jump out of the local minimum, which results in the jump in Fig. 2c in the main text. The initial learning rate is 0.001. If the loss (mean-square error) of the validation set does not decrease over 10 epochs, the learning rate is multiplied by 0.1. The RMSprop optimizer is used to update the weight of each layer during training 42 .
The bidirectional LSTM layer can be replaced with the well-known selfattention layer to improve the memory of our proposed model further 44 . However, this would require more training time and increased GPU memory. The current model has been able to meet our requirements to date.

Data availability
The data are available in Github 45 (https://github.com/ZongkaiLiu/Deep-learningenhanced-Rydberg-multifrequency-microwave-recognition). The deep learning results are presented in the Jupyter notebook. And the master equation results are presented in the Mathematica notebooks.

Validation set
Training set Training and Validation set Test set Fig. 8 Data partition during training, validating, and testing. First, the data are split into two sets. The first is the test set. The remaining data set is copied four times and is then split into four sets, each with different parts that act as training sets and validation sets for the training of four deep learning models.