Wireless signal modulation identification method based on RF I/Q data distribution

Electromagnetic spectrum detection is the basis of the next generation wireless communication technology. Wireless signal identification is an important part of electromagnetic spectrum detection and management activities. This paper proposes to extract the distribution features of different modulated signals from the signal I/Q data. A two-dimensional gradient matrix is used to describe the characteristics of the signal classification. The minimum gradient cumulative distance (GCD) estimate between the sample and the model is used as the decision criterion for the signal classification. According to the result of the confusion matrix, the weight of the model is adjusted. Experiments show that the recognition rate of the modulated signal mentioned in this paper can reach 82.75%. The I/Q data sample was extracted under actual engineering conditions involving random noise, and the recognition rate dropped to approximately 79%. Based on the initial model gradient matrix, a reasonable algorithm is set to adjust the weight of the model, which can effectively improve the recognition rate of the modulated signal.

Electromagnetic spectrum resources are non-renewable resources. The use of spectrum resources by various countries is an important national development strategy. The electromagnetic spectrum detection system mainly realizes the perception and visualization of the spatial spectrum distribution. This is the basis for the safe and rational use of spectrum resources from the physical layer. In the spectrum detection system, most researchers pay attention to the spatial spectrum distribution prediction algorithm, spectrum occupancy calculation, and spectrum interference detection algorithm 1 . The study of interference maps helps manage spectrum resource utilization, facilitates the rapid location of interference sources, and develops solutions 2,3 . However, these spectrum detection methods only estimate the energy distribution of the spatial spectrum and do not estimate other signal characteristics. As the number of wireless devices increases, analyzing electromagnetic spectrum detection from the perspective of source feature extraction can more effectively improve the electromagnetic spectrum resource management and control capabilities. For example, the location, tracking and monitoring of unregistered signals. Describe the spectral resources based on the signal source parameters that can greatly reduce the capacity of the spectrum database 4 . Therefore, the research of signal detection methods directly affects the accuracy of spectrum detection results. Researchers propose an algorithm for signal identification problems specific to MIMO systems 5 . Among them, the researchers conducted in-depth research on space-time block code identification and MIMO modulation identification. The MIMO modulation identification methods are mainly divided into: the maximum likelihood function method of the received signal, and the specific modulation features of the received signal. The data processed by the system is obtained after the Fast Fourier transform (FFT) block of the I/Q data at the Radio Frequency (RF) end.
Blind detection is a necessary condition for the spectrum detection system. Matching detection of a specific modulation method increases the complexity of the detection system. With the digitization of wireless signals, we can achieve signal source identification through data analysis methods. RF I/Q data contains all the information of the wireless signal source. We extract the phase information at different receiving antennas from the I/Q data to achieve signal source direction finding 6 . The software defined radio (SDR) module collects I/Q data from the source. FFT is performed on the I/Q data to obtain the energy distribution of the signal at each frequency. Real-time phase information can also be extracted from the I/Q data. By extracting real-time phase information from the I/Q data, we implemented low signal to noise ratio (SNR) signal detection 7 . The data after FFT ignores many features. Therefore 8 , establishes a convolutional neural network (CNN) based on three features of I/Q, Amplitude/Phase and FFT to realize the recognition of signal modulation. But the authors use the I/Q data before the RF end after the modulation module as shown in Fig. 1. Different modulation signals can be easily distinguished according to the I/Q distribution. However, it is not suitable for practical engineering applications.
The main contribution of this paper is to study the application of I/Q distribution characteristics in modulation signal recognition. The received I/Q data contain the noise of the environment, and its I/Q distribution will change accordingly. The I/Q distribution of signals with different modulation modes has different distribution features. Compared with Amplitude/Phase and the output FFT, I/Q data can better reflect the features of the signal. We extract the I/Q data of the wireless signal source directly from the source module at the receiving end. The I/Q data is the signal after the frequency spectrum is moved to the baseband. It contains the original information of environmental noise. Neural networks can be applied to classification, but learning time is too long, it is difficult to explain the reasoning process and reasoning basis. If the model adds a new category, it will need to be re-learned and will affect the recognition accuracy of the old category. The amount of I/Q data sample is large, and the time it takes to learn with a convolutional neural network is greatly increased. We propose to implement wireless signal modulation recognition based on I/Q component probability distribution.

I/Q component distribution of different modulation signal source
The SDR collects wireless signals in the form of I/Q data. Due to a large amount of I/Q data, the frequency distribution of the signal can be easily analyzed by frequency domain analysis after FFT. The time-frequency domain spectrum distribution characteristics can classify the signal modulation. However, the energy detection method usually has a high probability of missed detection ( P m ) at low SNR. In fact, the I/Q data obtained by the RF receiver will be mixed with noise, and its I/Q distribution is distributed in different forms of "ring". We propose to describe the signal features by extracting the I/Q distribution of different modulated signals. The I/Q data contains all the features of the baseband signal. I/Q data appears in complex form. It can be expressed as In this paper, signal samples with different energy intensities are collected the I/Q component distribution of different SNR signals under the condition of no environmental interference is shown in Fig. 2. I represents the abscissa, Q is the ordinate, and the histogram represents the real part and the distribution component. When the signal strength is strong, most of the I/Q components are distributed near the four vertices of the interval, and there are fewer I/Q components near the origin, as shown in Fig. 2a. With the signal strength decreases, the I/Q signal gradually converges to the origin, the number of I/Q components decreases, and the real and imaginary parts are approximately normally distributed, as shown in Fig. 2b. However, the number of I/Q components of the noise sample is close to zero, as shown in Fig. 2c.   = Q) . In the case of low SNR, 10,000 weak signal samples and 10,000 noise samples are collected, and the E W = max(C w i,j ) distribution of the samples is shown in Fig. 3. As the SNR decreases, the I/Q component approaches the origin, and the weight of the largest I/Q component can reach C w i,j → 100, 000 . When the boundary between noise and signal is set to E W = 80, 000 , the detection probability of weak signals can reach more than 95%. As the SNR increases, the I/Q component is concentrated toward the four coordinate boundary points (−1,

Modulation identification method base on I/Q distribution
With the same SNR, the I/Q component distribution of the wireless signal has a similar pattern. We propose a model that combines the gradients of the I/Q distribution to describe the different modulation signals. As shown in Fig. 4a, it is the distribution of I/Q samples of a complete FSK signal. The I/Q sample distribution of the modulated signal after propagation in free space will show the characteristic of "ring". According to the gradient distance between the sample and the model, the model with the smallest distance is the target category. In order to reduce the amount of data, we take the data of I, Q ∈ [0, 1] . The data volume of the model is one quarter of one I/Q data sample, as shown in Fig. 4b. The real part I of each signal S(n) is used as the X-axis coordinate, and the imaginary part Q is used as the Y-axis. The number of multiple S(n) is the value of the Z-axis coordinate, and finally reflects the I/Q sample distribution of the modulated signal. The modulation signal identification method proposed in this paper is mainly related to the SNR of the signal. The lower the SNR of the signal, the more concentrated the distribution of the I/Q component is at the zero point. The I/Q samples used in this article are all the SNR of the modulated signal is 40 dB.
If the real part and imaginary part of I/Q are accurate to 0.001(I, Q ∈ [0.000, 0.999] ), the matrix of 1000 × 1000 represents the I/Q component distribution. The two-dimensional I/Q distribution matrix can be modified to  We propose to describe different modulation signals by extracting a two-dimensional gradient matrix from the two-dimensional I/Q distribution matrix. The two-dimensional gradient matrix is obtained by calculating the gradient of adjacent C w i,j in the two-dimensional I/Q distribution matrix. Two-dimensional gradient matrix as the basic model for adjusting signal recognition. It can be expressed as where the first equation is the column gradient, and the second equation is the row gradient. The two-dimensional gradient matrix model of this paper is calculated by column gradient. The two-dimensional gradient matrix formula of the column gradient is shown in (5).
For the same modulated signal, the two-dimensional I/Q distribution matrix between samples will have a small change, and the corresponding two-dimensional gradient matrix will also have a difference. In the case of the same SNR, we propose GCD to identify the modulated signal. It can be expressed as In practical applications, random noise has an influence on the recognition rate of the modulated signal. We propose model weight parameters (W) to adjust the gradient distance of each category and improve the overall recognition rate. (6) can be modified as The value of W depends on the modulation signal identification error detection result. The similarity of the models of the individual categories is higher, and the GCD is lower than the actual value due to random noise, resulting in a error detection. The initialization model is a single sample gradient matrix, and the self-feedback model weight parameter adjustment can effectively improve the overall recognition rate.

Results and analysis
The signal acquisition module is HackRF. Its ADC/DAC sampling rate can reach 8 bits, that is, the range of I/Q component distribution is 1-256. A single transmitting antenna transmits signals, and a single receiving antenna collects I/Q data at a distance of 20 cm. The transmit power of the signal source is directly measured as − 70.07 dBm with the feeder. The SNR of samples is about 40 dB. Based on the open source energy detection project ( soapy_power ), we modify the script to calculate the weight of the I/Q sample data to obtain a two-dimensional I/Q distribution matrix. The size of a single initial two-dimensional gradient matrix model is only 8.0 MB 9 . It can be deployed directly to the spectrum detection system. Confusion matrix for the wireless signal modulation identification as shown in Fig. 5. There are a total of eight types of modulated signals in this paper 10 , which are collected and saved as I/Q data formats by the SDR module. There are 100 test samples for each type of signal (800 samples in total). The recognition rate of the modulated signal can reach 76.13%.
In current newer research, the modulated signal I/Q data sample after modulation module and before RF module is used 8 . The sample in this article is the I/Q data sample from the RF module. The signal transmission  www.nature.com/scientificreports/ and reception module introduces random noise. Even so, the recognition rate of the GCD method can also approach the recognition rate of the 8 method. I/Q distributions are all distributed in a "ring" shape. OFDM, BPSK and 16QAM have similar I/Q distributions. Since most samples of OFDM and BPSK were misidentified as 16QAM, the recognition rate of OFDM and BPSK is low. According to the confusion matrix, it can be found that the probability of being misjudged as 16QAM and 64QAM is high, and the overall recognition rate can be improved by increasing their GCD coefficients. In order to improve the recognition rate, the model weight is adjusted. Suppose the initial model weight vector is [1, 1, 1, 1, 1, 1, 1, 1], and the model weight step value is 0.01. As shown in Fig. 5, the samples of OFDM and BPSK are mostly misclassified as 16QAM and 64QAM. Therefore, it is preferred to increase the model weights of 16QAM and 64QAM step by step to increase the relative threshold of GCD and the recognition rate of OFDM and BPSK. In the adjustment process, the model weight of a single category model is increased to ensure that the overall recognition improves until the overall recognition rate converges. The model weight vector after recognition rate convergence is D ′ GCD = [1, 1, 1.05, 1, 1, 1, 1.05, 1.05] . The recognition rate has increased to 82.75%. Confusion matrix for the wireless signal modulation identification after adjusting parameters as shown in Fig. 6. After adjusting the model parameters, the recognition rate of 16QAM is reduced, and the recognition rates of OFDM and BPSK are improved.
We mainly adjust the weight manually, so that the weight of the modulation signal with higher similarity is reduced, and the weight of the adjustment signal with less obvious characteristics is increased. The I/Q data sample used in this paper is measured data and environmental noise has been introduced. In different SNR environments, the I/Q distribution of the modulation signal does not change much, but the weights need to be adjusted slightly. Finally, we conducted engineering tests. Each sample is randomly selected to calculate the recognition rate. After a large sample size experiment, the recognition rate of actual engineering tests converges to 79%. As shown in Fig. 7. In the future, we will study the dynamic model weight adjustment method based on error detection results to improve the recognition rate. In the confusion matrix, each column represents the predicted category, and each row represents the true belonging category of the data. www.nature.com/scientificreports/

Conclusion
We propose a modulation signal identification based on the signal I/Q distribution and the two-dimensional gradient matrix by estimating the gradient cumulative distance. The I/Q distribution matrix is extracted with a single I/Q data sample and the gradient matrix is calculated. The gradient distance of the sample describes the degree of similarity of the model. Then, the recognition rate of the modulated signal is increased by the model weight. Under the influence of random noise, the measured results show that the proposed method can guarantee a higher recognition rate.