A panoramic continuous compressive beamformer with cuboid microphone arrays

Compressive beamforming is a powerful approach for the direction-of-arrival (DOA) estimation and strength quantification of acoustic sources. The conventional grid-based discrete compressive beamformer suffers from the basis mismatch conundrum. Its result degrades under the situation that sources fall off the grid. The existing continuous compressive beamformer with linear or planar microphone arrays can circumvent the conundrum, but work well only for sources in a local region. Here we develop a panoramic continuous compressive beamformer with cuboid microphone arrays based on an atomic norm minimization (ANM) and a matrix pencil and paring method. To solve the positive semidefinite programming equivalent to the ANM efficiently, we formulate a solving algorithm based on the alternating direction method of multipliers. We also present an iterative reweighted ANM to enhance sparsity and resolution. The beamformer is capable of estimating the DOAs and quantifying the strengths of acoustic sources panoramically and accurately, whether a standard uniform or a sparse cuboid microphone array is utilized.

being the row vector composed by the source strength under each snapshot, C being the set of complex numbers, L being the total number of snapshots and ||·|| 2 being the  2 norm, are 100 dB, 98 dB, 96 dB, 94 dB, 92 dB and 90 dB (referring to 2 × 10 −5 Pa). The frequency of emitted signal is 4000 Hz. Figure 1 presents the utilized microphone arrays and the reconstructed source distributions. A standard uniform cuboid array with 343 microphones (Fig. 1a) is utilized to obtain Fig. 1c,e,g, while a sparse cuboid array with 170 microphones (Fig. 1b) is utilized to obtain Fig. 1d,f,h. Figure 1a,b also presents the panoramic scene of sources. Figure 1c,d corresponds to the conventional compressive beamformer. The grid is [0°: 6°: 180°] × [0°: 6°: 360°]. Obviously, only the third and fourth sources that lie on the grid are accurately identified. For the other sources that lie off the grid, leakage occurs, which leads to inaccurate DOA estimations and peaks seriously deviating from the true source strengths. In contrast, as shown in Fig. 1e-h, the developed beamformer estimates the DOA and quantifies the strength of each source accurately. To sum up, the developed panoramic continuous compressive beamformer can conquer the basis mismatch conundrum, allowing accurate DOA estimation and strength quantification, whether a standard uniform or a sparse cuboid microphone array is utilized. Besides, Fig. 2 presents the result reconstructed by the two-dimensional continuous compressive beamformer with a rectangular microphone array. Only sources whose elevation angels lie in [0°, 90°] are accurately identified. This demonstrates the necessity of developing the panoramic continuous compressive beamformer to realize the panoramic DOA estimation and strength quantification of acoustic sources. To obtain Figs 1 and 2, multiple snapshots are adopted. An example is also exhibited in Supplementary Note 1 to again demonstrate the advantages of the multiple-snapshot data model over the single-snapshot one, which has been demonstrated and explained in one-and two-dimensional fields 5,10,12,16,20 . Efficiency advantage of our ADMM based algorithm. To obtain Fig. 1e,f, we solve the positive semidefinite programming equivalent to the ANM by the off-the-peg SDPT3 solver in CVX toolbox 25 , which uses the interior point method (IPM) 26 . To obtain Fig. 1g,h, we solve the positive semidefinite programming by our ADMM based algorithm. The accurate DOA estimation and strength quantification shown in Fig. 1e-h demonstrates that both the solver and our algorithm are effective. Employ − P P P / F F , θ φ φ − θˆI [ , ] [ , ] /2 F and − s s s / rms r ms rms 2 2 to measure the microphone signal reconstruction error, the DOA estimation error and the strength quantification error. Thereinto, ||·||| F is the Frobenius norm, ∈ × P C ABC L and ∈ × P C ABC L are the matrices of the reconstructed and the true microphone signals respectively, A, B and C are the row, column and layer • Represents the microphone and ○ represents the source. Source distributions reconstructed by (c,d) the conventional and (e-h) the developed panoramic continuous compressive beamformer. The positive semidefinite programming equivalent to the ANM is solved by (e,f) the SDPT3 solver in CVX toolbox and (g,h) our ADMM based algorithm. In (c-h), the reconstructed (*) and the true (○) outputs are scaled to dB via referring to their respective maximum, and at the same time, referring to 2 × 10 −5 Pa, the reconstructed maximum is labeled on the top, so do in the subsequent maps. ∈ s R rms I and s rms ∈ R I are the vectors of the quantified and the true root mean square strengths respectively. Table 1 lists the values of these errors corresponding to Fig. 1e-h and the consuming time of the IPM based SDPT3 solver and our ADMM based algorithm. Apparently, in terms of microphone signal reconstruction and strength quantification, the panoramic continuous compressive beamformer with the ADMM based algorithm has slightly lower accuracy than the one with the IPM based SDPT3 solver, whereas in term of DOA estimation, they have almost the same accuracy. The consuming time of our ADMM based algorithm is only about 1/7 of the one of the IPM based SDPT3 solver at most. Besides, when the dimensionality of the problem increases, for example, A = B = C = 8, the IPM based SDPT3 solver fails to obtain effective solutions, whereas our ADMM based algorithm still works well.
To test the convergence of our ADMM based algorithm, we plot the curves of the error − P P P / F F vs. the number of iterations in Fig. 3. The parameter setup of sources keeps the same as in Fig. 1. Obviously, whether the standard uniform or the sparse cuboid microphone array is utilized, a plateau is reached after only a few dozens of iterations, which means our algorithm can converge fast. The errors after convergence are 3.38% and 6.38% respectively for the standard uniform cuboid microphone array and the sparse one, which are larger than the errors of the IPM based SDPT3 solver (2.45% and 3.32%). Changing the parameters, such as number and DOAs of sources, frequency, SNR and so on, to conduct simulations, similar phenomena can be obtained. See Supplementary Note 2. These phenomena demonstrate the inherent characteristic of ADMM that it converges fast to a moderate accuracy, but slowly even impossibly to an extremely accurate solution 11,22 . Fortunately, the moderate accuracy is typically sufficient in practical applications.
Sparsity and resolution enhancement via IRANM in the case of small source separation. Define the minimum separation among sources as   . A smaller minimum separation among sources that can be accurately identified means a higher resolution. Change the DOAs of the second, fourth and sixth sources assumed in Fig. 1 as (45°, 100°), (100°, 180°) and (140°, 275°) in turn. Namely, let the first and second sources, the third and fourth sources, and the fifth and sixth sources close to each other. Δ min decreases from 0.13 to 0.03. Figure 4 presents the reconstructed source distributions. The standard uniform cuboid array with 343 microphones (Fig. 1a) is utilized to obtain Fig. 4a,c, while the sparse cuboid array with 170 microphones (Fig. 1b) is utilized to obtain Fig. 4b,d. Figure 4a,b corresponds to the ANM based panoramic continuous compressive beamformer. It fails to identify all the sources accurately. The estimated number of sources in Fig. 4a is also more than the true one. In contrast, as shown in Fig. 4c,d, the IRANM based panoramic continuous compressive beamformer estimates the DOA and quantifies the strength of each source accurately. The phenomenon demonstrates that IRANM can enhance sparsity and resolution, whether a standard uniform or a sparse cuboid microphone array is utilized. Figure 5 plots the curves of the above errors vs. Δ min of the ANM and the IRANM based panoramic continuous compressive beamformer. For each Δ min , the errors are measured over 20 Monte Carlo runs. Two sources that are separated by Δ min are generated in each run. The standard uniform cuboid array with 343 microphones is utilized to obtain Fig. 5a-c, while a sparse cuboid array with only 170 randomly retained microphones is utilized to obtain Fig. 5d-f. In term of reconstruction of P, Fig. 5a,d shows that the error of ANM is distinctly higher under Δ < . ABC 0 7/ min 3 compared to Δ ≥ . ABC 0 7/ min 3 , whereas the error of IRANM varies only slightly across all Δ min and is distinctly lower than the one of ANM. In term of DOA estimation, Fig. 5b,e shows that on one hand, the IRANM based panoramic continuous compressive beamformer has very low errors across all Δ min ; on the other hand, compared to the former, the ANM based panoramic continuous compressive beamformer has distinctly higher errors under Δ < . ABC 0 5/ min 3 and almost the same errors under Δ ≥ . ABC 0 5/ min 3 . In term of strength quantification, Fig. 5c,f shows that the error of the ANM based panoramic continuous compressive beamformer is always higher than the one of the IRANM based panoramic continuous compressive beamformer, and that is particularly apparent under small Δ min . These phenomena demonstrate that the IRANM based panoramic continuous compressive beamformer has the stronger denoising capability and the enhanced resolution compared to the ANM based one, allowing to reconstruct the microphone signal induced by sources as well as estimate the DOAs and quantify the strengths of small-separation sources more accurately, whether a standard uniform or a sparse cuboid microphone array is utilized. www.nature.com/scientificreports www.nature.com/scientificreports/ Sparsity and resolution enhancement via IRANM in the case of underestimated noise level. In above results, the noise level is estimated accurately. Namely, the estimated noise level ε is equal to the true one (the Frobenius norm of the true noise ∈ × N C ABC L ). In practical applications, the noise is usually not explicitly known, and therefore, it is hard to let ε be equal to the true noise level. Figure 6a-d respectively presents the distributions reconstructed by the ANM based panoramic continuous compressive beamformer for the sources assumed in Fig. 1 Fig. 6a, the estimated number of sources is more than the true one, and the DOA estimation as well as the strength quantification is wrong. In Fig. 6b, even though the estimated sources in the display dynamic range have the same number as the true ones, the DOA estimation as well as the strength quantification is still wrong. In Fig. 6c, the DOAs are accurately estimated. The source strengths are quantified as 98.98 dB, 96.71 dB, 94.22 dB, 91.71 dB, 89.11 dB and 86.17 dB in turn, which are 1.02 dB, 1.29 dB, 1.78 dB, 2.29 dB, 2.89 dB and 3.83 dB lower than the true values. Distinctly, the weaker the source is, the more the quantified strength is lower than the true one. In Fig. 6d, the weakest source in (155°, 290°) has been lost. These phenomena demonstrate that for the ANM based panoramic continuous compressive beamformer, an underestimated noise level will lead to a less sparse solution and thus a wrong identification and a reduced resolution, whereas an overestimated noise level will cause a too sparse solution by eliminating the weak sources. This is mainly because by underestimating noise level, partial noise is identified as sources, whereas by overestimating noise level, the weak sources are removed as noise. Figure 6e,f presents the source distributions reconstructed by the IRANM based panoramic continuous compressive beamformer when ε is . N 0 25 F and . N 0 5 F . Apparently, all the DOAs are estimated and all the strengths are quantified accurately, which means IRANM enhances the sparsity and resolution. The standard uniform cuboid array with 343 microphones is utilized to obtain Fig. 6. Figure 7 presents the results when the sparse cuboid array with 170 microphones is utilized. It has the same rule as Fig. 6.

Discussion
In summary, we develop a panoramic continuous compressive beamformer with cuboid microphone arrays. It first denoises the measured signal and thus obtains the signal from sources by minimizing a sparse metric of source distribution in the continuous domain, for example, the atomic norm of the microphone signal induced by sources, then retrieves the DOA information via using the MaPP method to process the result of ANM, and finally quantifies the source strength according to the DOA information and the microphone signal from sources. www.nature.com/scientificreports www.nature.com/scientificreports/ The beamformer can conquer the basis mismatch conundrum of the conventional grid-based discrete compressive beamformer, allowing panoramic and accurate DOA estimation and strength quantification of acoustic sources, whether a standard uniform or a sparse cuboid microphone array is utilized.
To solve the ANM, we formulate a positive semidefinite programming. It is a convex optimization problem and can be solved by the off-the-peg IPM based SDPT3 solver in CVX toolbox. The solver trends to be slow and even ineffective for large-dimensionality problems. To overcome the limitation, we present a reasonably fast algorithm based on ADMM. By establishing another sparse metric, we also present the IRANM that can be substituted for ANM. Because the metric promotes the sparsity to a larger degree than the atomic norm, IRANM can enhance sparsity and resolution compared with ANM. For the sources with a small separation, the ANM based beamformer fails to estimate the DOAs and quantify the strengths accurately, whereas the IRANM based beamformer succeeds. The beamformer takes the estimated noise level as an input. For the ANM based beamformer, overestimating the noise level may make the solution too sparse, for instance by eliminating sources of smaller strength. On the other hand, by underestimating the noise level, the solution may be less sparse than the actual solution, which will leads to inaccurate identifications and reduced resolution. In practical applications, when the noise is not explicitly known, in order to capture all the sources, we advise using a properly underestimated noise level and enhancing sparsity and resolution by IRANM.
The IRANM is solved only by the slow SDPT3 solver in this paper. We have not yet successfully developed a fast algorithm for it, even based on ADMM. The reason why IRANM cannot be solved accurately and robustly by ADMM can be explained as follows. IRANM is an iterative strategy. The weighting matrix utilized in current iteration is calculated based on the results in the previous iteration. In each iteration, if the semidefinite positive programming is solved via ADMM, the inherent characteristic that ADMM cannot fast converge to an accurate solution 11,22 will lead to a result with moderate accuracy. The error will affect the weighting matrix and thus burden the result with a larger error in next iteration. Consequently, as the iteration increases, the actually solved result deviates from the theoretical one more and more. In the future work, it is of significance to handle the problem. Besides, no experimental data are provided to support the claims in this paper. The accuracy of the www.nature.com/scientificreports www.nature.com/scientificreports/ method becomes lower as the SNR decreases. It is also of significance to conduct the experimental investigation and explore a new method that can still enjoy high accuracy even under very low, for example, negative, SNRs. Finally, it is also of interest to develop the panoramic continuous compressive beamformer with other microphone arrays, for example, spherical ones, to realize the panoramic and accurate DOA estimation and source strength quantification.

Problem formulation.
1 , s i l , expresses the strength of the ith source under the lth snapshot, which is the signal induced by the source at the (0, 0, 0)th microphone.
After forming the matrix = P p p [ , , , ] , where (·) T and ⊗ denote the transpose and the Kronecker product operator respectively, we obtain Denoting by ∈ × N C ABC L the additive noise, the measurement matrix ∈ × ⭑ P C ABC L is described by The primary focus of this paper is to retrieve the DOA and strength information of sources with P ⭑ and the estimated noise level as inputs and without discretizing the target region into a grid. Equation (4) Under the continuous setting, and the elements in ψ all can be regarded as the continuous functions of θ and φ. In accordance to ref. 27 , ψ t t t d( , , ) 1 2 3 is the atom of the signal model in Eq. (5). The infinite atomic set is expressed as The atomic  0 norm and the atomic norm of P are respectively defined as where inf denotes the infimum. P A is a convex relaxation of P A,0 11,18,20 . www.nature.com/scientificreports www.nature.com/scientificreports/ Exploiting the source sparsity to solve Eq. (4) can denoise P ⭑ and thus obtain P. P A,0 is the direct metric of source sparsity 11,18,20 . However, it is non-deterministic polynomial-time hard to solve Eq. (4) with the minimization of P A,0 as a constraint. Replacing P A,0 with P A , we write the reconstruction problem of P as ε = − ≤ ∈ × ⭑ P P P P arg min subject to , where ε is the estimated noise level. Theoretically, let ε = N F . Equation (9) is a convex optimization problem 26 . For sparse cuboid microphone arrays, − ⭑ P P in Eq. (9) becomes − Ω Ω ⭑ P P . In this case, the signals induced by sources at full microphones, P, are reconstructed from the measurements of partial microphones, Ω ⭑ P .
Positive semidefinite programming to solve ANM. For a given column vector , 2 , , 1}) being the halfspace 28 we define a three-fold Toeplitz operator T bb (·) to map u into a Hermitian A × A block Toeplitz matrix: and T 0 is Hermitian, i.e., in Eq. (11) is a C × C Toeplitz matrix: and T 0,0 is Hermitian, i.e., . We propose the following proposition. Proposition: Denote (tr( ( )) tr( )) subject to ( ) 0 (13) bb bb bb T where tr(·) represents the trace and ≥0 means the matrix is positive semidefinite. If T u ( ) bb admits a Vandermonde decomposition 13,14 , i.e., www.nature.com/scientificreports www.nature.com/scientificreports/ The Vandermonde decomposition of T u ( ) bb is the precondition to make Eqs (16) and (9)   , the updates in + q ( 1)th iteration are as follows: , both, the identity matrices. It can be derived that the variable updates in Eq. (19) have the following closed forms: The derivation can be found in Supplementary Note 4. www.nature.com/scientificreports www.nature.com/scientificreports/ and ⁎ T bb (·) be the adjoint of T bb (·). For a given matrix ∈ α α α ∈ ( , , ) H 1 2 3 ]∈C N u , where Θ α is an elementary Toeplitz matrix with ones on the αth diagonal and zeros elsewhere. Then, = .
We rewritten the update of Z in Eq. (20) as A reasonable termination criterion of ADMM is that the primal and dual residuals or infeasibilities must be small [22][23][24] . For the semidefinite programming, the primal and dual infeasibilities are recommended for use 23,24 . According to refs 23,24 , the primal and dual infeasibilities after q iterations for the current problem can be respectively defined as The algorithm terminates when − − max {pri inf , dual inf } q q is less than a pre-set tolerance level or the pre-set maximum number of iterations is reached. Another reasonable termination criterion is that the relative change of the result at two consecutive iterations is small enough, which means the result has converged.

DOA estimation via MaPP.
and the first BC rows to obtain ∈ − ×Ŷ . Compute the generalized eigenvalues of the matrix pencil Y Y ( , )

IRANM. Establish the metric
bb bb where |·| denotes the determinant of a matrix and κ > 0 is a regularization parameter. κ P M ( ) has the following properties under a certain condition 11,14,18 : (1)  M ( ) is equivalent to minimizing P A,0 as κ approaches 0. According to the second property, minimizing κ P M ( ) is equivalent to minimizing P A as κ approaches +∞. This means κ P M ( ) serves as a bridge between P A,0 and P A , and can enhance sparsity compared with P A . According to the third property, the sufficient condition for κ→ T u ( ) bb 0 to admit a Vandermonde decomposition can be guaranteed.
Replacing P A with κ P M ( ), we reformulate the reconstruction problem of P as To minimize such a concave + convex function, an effective algorithm is the majorization-minimization 29,30 . The algorithm solves the optimal variable and the optimal value of the objective function iteratively. Two steps are involved in each iteration. In the first majorization step, a surrogate function is constructed to locally approximate the objective function. The surrogate function should be no smaller than the objective function, and the equality holds at the current optimal variable. Then in the minimization step, the surrogate function is minimized to obtain a new optimal variable. L et û k b e the optimal variable, κ k b e the regularization parameter, be the weighting matrix, determined by the kth iteration. The surrogate function in + k ( 1)th iteration can be constructed as (ln ( ) t r( ( ) ) tr( )) 1 2 (tr( ( )) tr( )) , is the tangent plane of κ + T u I ln ( ) bb k 2 at =û u k , and c k is a constant independent of variables. Ignoring c k , we write the minimization problem in + k ( 1)th iteration as   Apparently, P is reconstructed via minimizing its weighted atomic norm iteratively, and the weighting coefficient is updated in each iteration. Hence, the method can be named as IRANM. For sparse cuboid microphone arrays, x y z 0 035 m. The number of snapshots is taken as 10. The independent and identically distributed complex Gaussian noise is utilized. The signal-to-noise ratio P N (20lg( / )) F F is taken as 20 dB. In the ADMM based algorithm, τ is determined according to ref. 10 , ρ is set to 1, and the iteration is terminated if the relative changes of u and P at two consecutive iterations, i.e., www.nature.com/scientificreports www.nature.com/scientificreports/ and discussed in Supplementary Note 6. In the MaPP method, the threshold is set as the maximal eigenvalue divided by 100, which means sources in 20 dB dynamic range are considered. When implementing IRANM, we terminate the iteration if the relative changes of the solution P at two consecutive iterations, i.e., − − −ˆ‖ ‖ ‖ ‖ P P P / k k k 1 F 1 F , is less than 10 −3 or the maximum number of iterations, set to 20, is reached.

Data Availability
Datasets generated and analyzed in the current study are available from the corresponding author on reasonable request.