Super resolution DOA estimation based on deep neural network

Recently, deep neural network (DNN) studies on direction-of-arrival (DOA) estimations have attracted more and more attention. This new method gives an alternative way to deal with DOA problem and has successfully shown its potential application. However, these works are often restricted to previously known signal number, same signal-to-noise ratio (SNR) or large intersignal angular distance, which will hinder their generalization in real application. In this paper, we present a novel DNN framework that realizes higher resolution and better generalization to random signal number and SNR. Simulation results outperform that of previous works and reach the state of the art.

which severely hinders its generalization and is also the general drawback in many previous related works. The more practical situation is that the signal number is unknown, so the DOA estimation naturally falls into a direction classification problem. In this case, the observation space is separated into N subspaces and the activation values of N neurons in output layer represent the probability of a signal locating in each subspace. This situation has been studied in Liu et al. 's work 28 , where they propose a two-stage DNN structure that is characterized by a multitask autoencoder and a group of parallel multilayer classifiers. The autoencoder denoises the DNN input and decomposes its components into P spatial subregions, whereas P parallel disconnected MLP concentrate on the direction classifications in each subregion. The autoencoder and parallel classifiers are trained independently and give the DOA estimation collaboratively with error of 0.5 • . However, when signals are located at the edge of subregions, the corresponding DOA estimation aggravate in precision or even disappear in the estimated spectrum. Moreover, only the two-signal scenario is considered in the training process, which results in unsatisfactory performances in the scenarios of different signal number. Apart from the drawbacks mentioned above, most of the previous DNN studies locate sources on very coarse grids with spacing 5 •23,24 or even 10 •25 . Such coarse segmentations do not meet the precision requirements in most general DOA estimation applications. Besides, in most traditional machine learning and deep learning approaches, source signals are restricted to large angular distances and same signal-to-noise ratios (SNR) in the training and validation stages, which hinders the generalization performance in real application of DOA estimation. At last, the DNN structures in all previous DOA studies are not actually deep, they are more like MLP neural networks. For the general reason that results often get better as DNN becomes deeper, we believe that a much deeper structure may boost the performance of DNN based DOA estimation.
In this paper, we take advantage of deep learning techniques to boost the resolution and generalization of DNN based DOA estimation. It is widely believed that as DNN gets deeper, its representational capacity becomes stronger and a more precise mapping function can be trained. Meanwhile, the residual technique makes it possible to train a very deep neural network in a controllable manner 29 . By considering DOA estimation as a direction classification problem, we propose a simple DNN architecture that turns out to be much better than ever before. The outline of this paper is as follows: Firstly, the data model, DNN model and training details are given in "Methods" section. Then, we show the detailed simulation results under different situations and compare with related studies in "Results" section. At last, we make the error analysis and conclude with some final remarks in "Discussion" section.

Methods
Data model. Let us consider a linear array composed of L omnidirectional antenna elements. Assume that there are P narrow band uncorrelated signals impinging on the array from directions {θ 1 , θ 2 , ..., θ P } . Under complex representation, the received signal by the l-th array can be expressed as where s p (t) is the complex envelop of the p-th signal reaching antenna array at time t; a l (θ p ) = e −i(l−1)2π d sin(θ p )/ is the steering vector of the p-th signal for the l-th array, with d denoting the spacing between the elements of the array and denoting the wave length; n l (t) is the background noise of the l-th array, which takes a zero mean complex Gaussian distribution. To avoid spatial aliasing, spacing d is usually set as half of wavelength d = /2 . Equation (1) can be written in a matrix form where X(t), S(t), N(t) are given by and is the array manifold matrix.
In order to perform DOA estimation, spatial covariance matrix R has to be determined, which is computed by the expectation value on snapshots of different time instants.
where noises and signals are assumed to be mutually independent. Without loss of information, the upper right part of the L × L Hermitian matrix R can be organized into a vector of (1) x l (t) =    DNN model. The architecture of DNN used in the DOA estimation is shown in Fig. 1, which is featured by three residual blocks of the same depth. In each residual block, the linear transformation layer is used to encode data into desired dimension, and the following multilayer residual network (ResNet) is used to achieve good performance. The output layer consists of 180 neurons with sigmoid activation value representing the existence probability of signal at each angle from −90 • to 90 • . The reason for choosing larger and larger ResNet blocks is that we naively expect DNN model probe signals more clearly as dimension increases. Actually we have trained a very deep ResNet block with fixed dimension 180, but its simulation result is not so good. For our proposed DNN model here, one can also use more neurons in the output layer to improve the resolution without changing the previous body structure.
In the three-stage pipeline processing, input data Z is encoded into vector of more and more large dimensions. To prohibit overfitting risk and improve the generalization of our DNN model, we use dropout technique between or after residual blocks. In each residual block, linear transformation layer serves only as dimension matching. The following ResNet consist of N residual layers, and each one is composed of two linear sub-layer with hidden ReLU nonlinearity 30 in the middle where W 1 , W 2 ∈ R d x ×d x are trainable matrices. We use the residual connections proposed by He et al. 29 to ease the training of our DNN. So the output Y of each residual layer is computed by the following equation: We then apply layer normalization 31 after the residual connection to stabilize the activations of neurons.
In the output layer, we use the mean squared error . We find this smooth label technique also helpful to easy the training. Then, we adopt the stochastic gradient decent (SGD) algorithm to optimize the loss function based on the proposed DNN framework.
Training details. As neural networks get very deep, the phenomena of vanishing/exploding gradients can easily take place in the training processes. To prevent this problem, a suitable weight initialization is strictly required and also beneficial to convergence rate and final performance. In our DNN model, bias is initialized to zero and weight matrix is initialized to normal distribution W l ∼ N(0, 1/d 2 l ) , where d l is the input dimension of l-th hidden layer. Lastly, normalization factor is initialized to one in the layer normalization operations.
Learning rate plays a key role in weight updates and convergence performance. If it is too small, the training process converges very slowly and often gets trapped in local minimums. Conversely, if learning rate is too large, the model easily diverges and results in vanishing/exploding gradients phenomena. Here we take a dynamic learning rate strategy formulized by where s is the latest training step, s w = 200k and s d = 400k are the warmup steps and decay-starting step. So the learning rate firstly grow linearly to 0.1, and then keeps unchanged for 200k steps, finally decay quadratically in the rest of training. This dynamic learning strategy turns out to be very efficient in our DNN training processes.
To prohibit overfitting risk, we only take use of dropout technique without L1 or L2 regularization. Dropout is applied to the outputs of second and third residual block with probability 0.5, and also applied to all residual values Res(X) with probability 0.1 in the three residual blocks. No other place is dropout involved. Because the dimension of final output layer equals that of the first residual block, we suspect dropout 0.5 following the first residual block may hinder the resolution performance. Experimental results indeed show performance degradation (not shown here) when this type of dropout is added.
In the training processes, data model based on 8-element uniform linear array spaced half a wavelength is used. Signal impinging angles in the training and validation sets are random in ( −90 • , 90 • ) and ( −75 • , 75 • ), respectively. Besides, the spatial covariance vectors Z in all data sets are obtained from 500 snapshots. In all the training processes for DNN of different depth, we take use of millions of samples, with batch size set as 800 and iterations as 2000k. By introducing powerful TensorFlow, the most time-consuming training process takes about 20 hours on a single GPU Nvidia Quadro P6000. Training progress of the deepest neural network is shown in Fig. 2. One can find model converge steadily and no overfitting take place.

Results
In this section, the performances of DOA estimation based on our proposed DNN framework are investigated. Because the number of signals impinging on the array and their SNR can be varied, we carried out simulations with different settings. All simulation results are obtained from the best run after model converges. To be strict, any estimated angle deviating from its real value by 1 • will be viewed as a false prediction. The corresponding estimation results of precision (P), recall (R) and their harmonic average (F-score) are shown in Tabel. 1. As one can see, the simulation performances get better as DNN becomes deeper. Lastly we state that the number of hidden layers is 6N + 3 because of three ResNet blocks and linear transformation layers, and the shallowest one is deeper than that of almost all previous ANN studies.

Random number of signals with equal SNR (case B).
Naturally, the number of signals cannot be previously known in most real application. So we also carried out simulations on random number of signals. The training set is composed of 4000k samples, with signal numbers random in [0,7], integer impinging angles and equal SNR of 10 dB. The validation set consists of 4 subsets with different signal numbers 1,3,5,7, respectively. Each subset is composed of 100k samples with floating impinging angles (deviating from integer value by 0.35 • at most) and equal SNR of 15 dB. The detailed simulation results are displayed in Table. 2. Different from the shallow DNN, a much deeper block depth is required to obtain a satisfactory result, taking the value of 5, 10 and 15 in the simulations. Apart from the improvement as the growth of ResNet depth, one can also find the performances decrease as signals become more. Comparing simulation results of five-signal subset with that of case A, unsurprisingly one can find performance get better when signal is restricted to a fixed number. Moreover, we show the simulation results under different SNR as in Fig. 3a, where SNR influences are clearly displayed with respect to signal number. Naturally, the best results belong to the 10 dB validation set that is equaled to the training set. Performance declination of 7 dB set is larger than that of 15 dB, which is attributed to relatively increased influence of noises. Finally, we give an illustration of DOA estimation for this case. Six signals of 10 dB are assumed to impinge onto antenna array simultaneously, with initial angles [−60 • , −59 • , −27 • , −25 • , 0 • , 10 • ] . Keeping the angular distances unchanged and evolving the first signal from −60 • to −10 • with a step of 1 • , we show the direction estimation results in Fig. 3b. Only one false-dismissal and one false-alarm occur for the third and forth signals, resulting in a very high F-score 0.9967 for this example.

Fixed number of signals with random SNR (case C). Previous works on ANN-based DOA estimation
are mainly focused on the condition of equal SNR, so we test the performance of our DNN-based model on the condition of random SNR. The training and validation sets respectively consist of 4000k and 100k samples, both characterized by 4 signals of integer impinging angles and random SNR between 5 and 20 dB. In the simulations, we find the expected signals of small SNR often missed, which causes large recall decrease and is faithfully displayed in Table. 3. Because of the sharp drop compared with case B and limited improvements as the growth of DNN depth, we do not go further on the case of 'random number and random SNR' . Nevertheless, this result is still acceptable.

Direction estimation resolution. The idea of direction classification is very useful when signal number
is unknown, but it has a major defect: the higher resolution required, the larger class number and bigger neural network are needed. To overcome this problem, we take the method of amplitude interpolation to estimate signal directions of non-integer impinging angles, which is expressed as where θ I is the estimated direction of the I-th peak of neural network output u(θ) , and u I±n is the neuron activation amplitude of the n-th neighbour. For example, the distribution of estimation error for case B is shown in Fig. 4a, where the proportion of different error ε is counted from 10k samples of two signals with SNR=10 dB. As one can see, almost all errors are smaller than 0.25 • . Statistically, the average and standard derivation of absolute estimation error is µ = 0.1 • and σ = 0.06 • , respectively. Through this interpolation method, performance of DOA estimation is largely improved. Alternatively, there is another perspective to investigate estimation error. When impinging angles deviate from integer values by δ = 0.5 • , the direction classification will be ambiguous. So we expect performance declination of direction classification as δ increases gradually from 0 • to 0.5 • . As is shown in Fig. 4b, F-score of direction classification falls from 1.0 at zero deviation down to 0.5 at middle point. Estimation error starts to pull down F-score greatly when it fills the gap between δ and middle point, formulized as δ + µ + σ ≈ 0.5 • . So we can find the sharp collapse near δ = 0.35 • , which is in good agreement with the straight estimation error distribution.
Comparison with previous studies. We compare our results with two latest DNN studies on DOA estimations carefully. In Liu et al. 's study 28 , the training process is restricted to two-signal scenario and the validation process is carried in the situation of big intersignal distance ( 9.4 • ). When their result is generalized to three-signal scenario, the estimation errors will grow dramatically even though a larger intersignal distance ( 14 • ) is applied. What is worse, because of their subregion technique, DOA estimation performance becomes terrible when signals are located at the edge of subregions. In Huang et al. 's study 26 , the most crippling weakness is that the signal number must be known in advance, which severely hinders its generalization in real applications. Although our DNN structure is fairly simple, it outperforms that of previous works; see Table. 4. Firstly, our DNN model is suitable for random number of signals, or rather, no priori knowledge of signal number is needed. Secondly, compared with Liu et al. 's work, our model is not troubled by the subregion edge aggrava- (14) θ I = I + (u I+1 − u I−1 + 2u I+2 − 2u I−2 + 3u I+3 − 3u I−3 )/0.8 Table 3. Simulation results of case C with ResNet block depth of 5, 10, 15.

Discussion
In the previous simulations, the integer-float angle inconsistent between the training and validation sets will hinder the performance of DOA estimations. We find that if the validation set changes to the one of integer angle, the F-score will obtain improvements of at most 2.0 absolute percentages. Apart from that, estimation errors come from three aspects: neighboring signals, large angles and SNR differences. Firstly, we find that one input signal may split into two output neighbors or two neighboring input signals may fuse into one output. For example, input signal of angle 50 • sometimes generates prediction of two signals of angle 50 • and 51 • , which causes the precision decrease. Conversely, input signals of angle 34 • and 36 • sometimes fuse into one output of angle 35 • , which causes both precision and recall decrease. As signals become more and more, the probability of neighboring occurrences gets larger. So we can see the performance decrease as the increasement of signal number in the above simulations. Furthermore, in the process of amplitude interpolation estimations, amplitude peaks of neighbouring signals will interact with each other greatly, making our interpolation method inaccurate (maybe there exist other proper amplitude interpolation methods, but the final estimation error will not be greatly reduced).
Secondly, ANNs obtain the abilities to establish the mapping between input features and output predictions after training. In our proposed DNN, the mapping function between input steering vector and output angles can be symbolically written as To generate the right predictions θ O = θ I , the mapping function f is expected as arcsine, whose gradient f ′ (sin θ I ) = 1/ cos θ I approaches infinite near large angle ±90 • . As for neural networks, it is easy to fit a smooth function but hard to fit a very sharp one. So the mapping function f will deviate from arcsine at large angles and result in prediction errors. That is why we test our DNN model in ( −75 • , 75 • ) but not the whole interval ( −90 • , 90 • ).
Lastly, situation of different SNR has larger variable space compared with that of equal SNR, making it difficult to fit a precise mapping function. Furthermore, signals of higher SNR tend to cover up the ones of lower SNR. The extraction of weak signals is a general challenging problem in DOA estimations.
In conclusion, an efficient DNN-based model for superhigh resolution DOA estimation is proposed in this paper. The key advantage of ANN-based models over conventional subspace based methods is the ability to provide accurate DOA estimations almost instantaneously as they avoid complex matrix calculations. Compared with other ANN-based models, our DNN model has much deeper depth and keeps higher resolution in more general conditions, such as random signal number, random SNR and small angular distances. However, this model does not work very well in the condition of large signal number, large SNR difference or neighboring signals. This inspires us that ResNet is not adequate, and some other techniques are still required. For example, stacked autoencoders 32,33 or restricted Boltzmann machines 34,35 may be useful in signal denoising. Specially, convolutional neural network (CNN) 36 may be more efficient to estimate signal directions from spatial covariance matrix R, whose geometry features may be lost in present DNN input Z. A more general DNN-based algorithm of superhigh resolution in the condition of random signal number and random SNR is left for a future work. (15) θ O = f (sin θ I ) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.