Dictionary learning-based reverberation removal enables depth-resolved photoacoustic microscopy of cortical microvasculature in the mouse brain

Photoacoustic microscopy (PAM) capitalizes on the optical absorption of blood hemoglobin to enable label-free high-contrast imaging of the cerebral microvasculature in vivo. Although time-resolved ultrasonic detection equips PAM with depth-sectioning capability, most of the data at depths are often obscured by acoustic reverberant artifacts from superficial cortical layers and thus unusable. In this paper, we present a first-of-a-kind dictionary learning algorithm to remove the reverberant signal while preserving underlying microvascular anatomy. This algorithm was validated in vitro, using dyed beads embedded in an optically transparent polydimethylsiloxane phantom. Subsequently, we demonstrated in the live mouse brain that the algorithm can suppress reverberant artifacts by 21.0 ± 5.4 dB, enabling depth-resolved PAM up to 500 µm from the brain surface.


Results
Sparse reverberation model. The photoacoustic response of an absorber is the result of transient laser pulse-induced heating followed by thermoelastic expansion 2,12 . The response to a nanosecond pulse excitation, which is typically used in PAM, has been modeled as a spatial impulse by Diebold et al. 2 . The detected photoacoustic signal (PA o ) from the original laser pulse is the convolution of the transducer receive mode impulse response H with this impulse.  where ⨷ denotes the convolution operation, H the transducer receive mode impulse response, z the depth of the source, and v s the velocity of sound in the medium. Similarly, signals from a photoacoustic point source at a depth z contaminated with reverberant echoes may be modelled as follows. Acoustic reverberation is characterized by periodically spaced echoes generated by repeated reflections between near-parallel interfaces in the medium. Consider a point photoacoustic source at depth z in tissue between the two interfaces ( Fig. 1). Let the normalized echogenicity at the first tissue interface be α 1 and acoustic reflectance at the interface 2 at a depth d from the first interface be α 2 . Hence, the fractions of energy transmitted through the interface 1 and 2 are 1−α 1 and 1−α 2 , respectively. The received photoacoustic signal is a combination of the signal generated at the original point source and that from successively higher-order reverberant echoes. The signal at each order of reverberation is a superposition of reverberation from each of the parallel interfaces surrounding the point where the signal is produced. The photoacoustic signal contaminated with reverberant signal (R) can be expressed through the following series: From interface 1, And from interface 2, Combining Eqns (2) and (3), we have ∑ α αα It may be observed that since α α ≤ , 1 1 2 , higher-order terms are smaller in magnitude. Hence, in practical cases, the reverberant signal is either beyond the dynamic range of the PAM system or buried under the noise floor of the system after multiple reflections. To reflect this, the series may be truncated to a finite number of terms ( N 2 ). Using Eqns (1) Eqn (4) can be modified as Furthermore, since the arrival times of photoacoustic pulses are discrete, the reverberant echo is inherently sparse. Thus, R can be written as interfaces causing the reverberation and P represents a matrix, where each column is the transducer receive mode impulse response shifted axially to the spatial location of the transient pressure Note that it is not feasible to measure S experimentally, given the heterogeneity of tissue and structures within the body. It follows from Eqn. (6) that if the signal at a given depth in the tissue contains only the true signal but no reverberation, then | | = S 1 0 . In this case, the signal at this point is only represented by a single copy of the transducer impulse response shifted to that point. PAM data are acquired as a two dimensional ensemble of A-lines by raster scanning the region of interest with coaxially and confocally aligned dual foci 16 (Fig. 2a). An A-line is a combination of signals and reverberant echoes from multiple sources over the imaging depth. This can be represented as a sparse linear combination of basis vectors (Supplementary section S1). The number of possible basis vectors exceeds the dimensionality of the A-line, which makes this basis over-complete and consequently a vector of weights that combines these basis vectors is sparse. DL algorithms can adaptively learn over-complete bases and sparse weighting vectors simultaneously from the data. Once the basis is computed, it follows from Eqn. (6) that if we reduce the contribution of the reverberant responses in S to just the signal, i.e. with = S 1 0 . Then the reverberation is suppressed. The complete reverberation suppression algorithm (Fig. 2b) is illustrated in pseudo-code as follows. Dictionary Learning. In the previous section, we have described a representation (Eqn. (6)) to characterize PAM data at a given depth in tissue. Volumes of PAM data are acquired as a two-dimensional ensemble of A-lines. Given the PAM data from a B-scan (B), it may be observed from our formulation Eqns (2-7) that B can be completely characterized if the matrices, P and S are known at all depths, which effectively computes the over-complete basis and sparse weighting described in Supplementary section S1. In the presence of system noise, an approximation to the basis and sparse weighting vector can be adaptively learned from the raw data using dictionary learning, which is formulated as the following optimization problem: Here, K is the number of dictionary atoms (basis vectors) that comprise each A-line. D and G represent the dictionary and sparse code matrices, analogous to the basis vectors and sparse weighting. ε represents the error of the coded signal with respect to the raw signal due to system noise. Using this approach, each A-line in B is decomposed or coded into a linear combination of basis vectors represented by the columns of D (Fig. 3) and each basis vector is known as an atom of the dictionary. As illustrated by the constraints in Eqn. (8), each A-line of B is a combination of at most K dictionary atoms. It is important to note that the atoms in D are linear combinations of the atoms of the overcomplete basis (Supplementary section S1). Although multiple dictionary learning algorithms [17][18][19] exist, the dictionary is learned using the K-SVD algorithm 10,11 in this paper due to its rapid convergence and simple implementation. The optimization process in the K-SVD algorithm is briefly explained below, and additional details can be found in the literature 10,11 . The algorithm is illustrated in Fig. 2c and in pseudo code as follows.
The algorithm proceeds using an alternating minimization procedure, where an initial dictionary D consisting of R atoms is assumed, and the data is sparsely coded using this dictionary. Each atom in the dictionary is subsequently updated to reduce the reconstruction error with respect to the input signals coded by that atom using a rank one approximation. This process is repeated iteratively until the error with respect to the input signal is lower than ε or until a specified number of iterations has elapsed. The parameters that have a bearing on the algorithms convergence are the number of atoms of the dictionary, the sparsity K and the error ε. It is important to note that this is an unsupervised learning procedure, in that no information is provided a-priori to the algorithm about which signals actually contain reverberation.
Consequently, the dictionary atoms are linear combinations of the true signal and the reverberation, and these signals are further processed to remove the reverberant echoes while retaining the true signals. The dictionary atoms are normalized to have unit energy owing to the rank one approximation in the algorithm. Since the A-lines are sparse combinations of the dictionary basis vectors, it is sufficient to suppress the reverberation in each basis vector to obtain a reverberation-suppressed B-scan (Fig. 2b). An example of a learned dictionary and sparse code from a given B-scan is illustrated in Fig. 3. It may be observed that the learned dictionary (D) accurately captures the reverberant artifacts (white arrows) and signals of smaller amplitude (red arrows) and the matrix (G) is sparse.
Algorithm performance. A common initialization for the K-SVD algorithm, using randomly selected samples of the data itself as the initial dictionary, rather than an established basis such as the Fourier, cosine or wavelet bases, was used 10 . This initialization method has its origins in the K-SVD being a generalization of the K-means algorithm, which obtains feature/dictionary vectors using the expectation minimization algorithm-a specific instance of the more general alternating minimization method. The K-SVD algorithm has a natural noise reducing property ( Figure S1), since it seeks to find features common to the coded signals during the update step.
Naturally uncorrelated white Gaussian noise is reduced first during the sparse coding process and then during the rank one update step described in preceding section. Applying the K-SVD algorithm with a well-tuned  dictionary size and sparsity can further reduce the noise floor lower while retaining useful signal ( Figure S1a). The performance of the algorithm was characterized by varying the different parameters such as dictionary size, sparsity K, and number of iterations for which the algorithm was executed. The algorithm performance in coding the signal was assessed by measuring the intensity of the error between the signals reconstructed from the learned dictionary against the raw data, normalized to the maximum signal amplitude. The error was averaged over 300 B-scans, each of which contains 3600 A-lines.

Reverberation suppression.
From the formulations discussed in the preceding sections (Eqns (2-7)), it is known that the reverberation and the true signal can both be expressed as sparse combinations of the shifted transducer impulse response. Additionally, it is known that the amplitudes of reverberant echoes decrease with each successive reflection (Eqns (2-4)). The number of the reverberant echoes (N) (Eqns (5-7)) is dependent on the acoustic impedance mismatch and varies spatially. However, we note from our model (Eqns. (6, 8, Supplementary section S1)) that choosing a subset of signals from each dictionary atom can suppress the reverberation. To extract this subset, we assume that the strongest signal in each basis vector in the dictionary is the 'true' signal originating from the transient thermoelastic expansion at the PAM focus.
However, isolating only the strongest signal may cause loss of signals from smaller vessels or regions of the tissue containing fewer optical absorbers. Hence, we cross-correlate each atom (Fig. 4f) with the strongest signal present in it (Fig. 4g) to find all similar signals (Fig. 4h) and obtain a weighting function to suppress reverberation. The envelope of the cross correlation is thresholded (Fig. 4h,i) and used to suppress the reverberation. As shown in Fig. 4k, the correlation weighting suppresses the reverberation in the resultant dictionary. It can be observed from Fig. 4c and j that the correlation window retains more of the signals possessing smaller amplitude. This is confirmed in the contrasting figures (Fig. 4d,k) where the signals of smaller amplitude are retained (white arrows). The reverberation suppression procedure is illustrated in Fig. 2d,e, and in pseudo-code as follows.  This was confirmed by overlaying the signals constructed from the learned dictionary, maximum window suppressed signal, and correlation window suppressed signal on a binary mask constructed from the raw B-scan. The mask was constructed by choosing regions of the B-scan image which are above −40 dB relative to the maximum. The ground truth B-scan (Fig. 5a) overlaid on the same masked area is shown for reference. No significant difference can be observed between Fig. 5a and b, illustrating that the dictionary is correctly capturing pertinent signals. Additionally, it can be observed from Fig. 5c and its corresponding zoomed inset that although the reverberant artifacts (white arrows) are suppressed, there is also loss of signals of small amplitude (red arrows). However, when the correlation weighted window is used (Fig. 5d) the signals of small amplitude are retained, while the reverberation is suppressed.

In vitro validation.
For the in vitro experiments, data were acquired using 6-μm-diameter dyed polystyrene beads (Polysciences Inc.) embedded in transparent PDMS (Sylgard 184, Dow-Corning), as shown in Fig. 6a. Since PDMS is optically transparent, this allows us to have a known distribution of optical absorbers without potential signal from the background. The data were acquired using the setup shown in Fig. 2a. A representative B-scan of the results is shown in Fig. 6b. It may be observed in Fig. 6c that the signal and reverberation are coded accurately using the learned dictionary which is confirmed in the corresponding zoomed insets. The result of different windowing methods were compared in Fig. 6d,e. It may be observed that the reverberation signal is suppressed in both figures and their corresponding zoomed insets Fig. 6h,i (white arrows).
Additionally, it may be observed from Fig. 6h that some of the signals of lower amplitude (yellow arrows) are also lost compared to the coded signal when the maximum signal window is used. It may be observed in Fig. 6i that a larger number of signals of lower intensity (yellow arrows) are retained using the correlation windowing compared to the maximum signal window used in Fig. 6h. The difference in the retained signal is quantified using the mean square error (MSE). The MSE was calculated with respect to the raw signal (Fig. 6b) after coding with the learned dictionary (Fig. 6c), suppression with the maximum signal window (Fig. 6d), and the correlation weighted window (Fig. 6e). It may be observed, as expected, that there is a residual error after coding with the learned dictionary due to the noise suppression property of the algorithm (See Methods, Tuning of algorithm parameters). In addition, it may be observed that the error with respect to the raw data is smaller when the correlation window is used to suppress reverberation, rather than the maximum signal window, as more of the signals of smaller amplitude are retained. Furthermore, it may be observed from (Fig. 6h,i) that some of the noise retained after the coding is further suppressed. This is due to the consideration of only the signal of maximum amplitude and the threshold applied to the envelope of the correlation as discussed in the preceding section (See Fig. 5c,d).  In vivo validation. The algorithm was also validated in vivo in PAM images of the mouse brain (See Methods, Animal preparation), using the setup shown in Fig. 2 (See Methods, Photoacoustic Microscopy). The focal plane was gradually translated axially from 50 to 450 µm beneath the cortical surface with a 100-µm step size. A composite B-scan was constructed from the data ±50 μm about the focal plane of each of the five datasets to evaluate the suppression of reverberation. The maximum amplitude projection (MAP) image in the XZ plane was constructed from the raw composite B-scans and the composite B-scan obtained after processing with our algorithm. It may be observed from Fig. 7a and its corresponding zoomed inset Fig. 7d that the reverberant signal from superficial cortical layers completely obscures the vasculature at deeper layers. The performance of the algorithm was also compared with and without the use of dictionary learning. The XZ projection was also constructed after applying the windowing procedure directly to the B-scan with identical parameters (see Fig. 7b). It may be observed from the figure that the vessels do not appear as distinct as with the use of DL. Additionally, it may be observed that the contrast in the raw XZ-projection image (Fig. 7a) is poor due to the relatively high noise floor. The noise reducing capability of the K-SVD reduces this noise and consequently increases contrast as is seen in Fig. 7c and its corresponding zoomed inset (Fig. 7f). The zoomed insets Fig. 7(d-f) show the enhanced suppression of reverberation and clear delineation of microvessels penetrating the mouse cortex after the the use of DL. This was confirmed from the B-scans ( Figure S4), where DL is observed to reduce noise, enhance the reverberation suppression, improve contrast and consequently improve the delineation of penetrating cortical microvessels. We quantify this improvement in performance using the mean intensity of the error with respect to the original signal with, and without dictionary learning ( Figure S2) in a region of interest deep in tissue ( Figure S2 (yellow box)) over 300 B-scans.
Additionally, the suppression of reverberant artifacts was also verified in the XY-projection image from data ±50 μm about the focal plane at each depth. Reverberation from superficial vessels manifests as 'ghosting' creating false copies on images deeper in the mouse cortex as shown in Fig. 8(a-e), where the arrows indicate the position of spurious ghosting artifacts. After reverberation removal, the obscured microvasculature at deep layers becomes visible. However, the algorithm is still limited by physical processes such as optical scattering and absorption losses. The larger vessels and dense superficial vasculature absorb most of the energy from the optical excitation, which leads to shadowing under the major vessels at increased depths as is seen in Fig. 8h. Furthermore, optical scattering also reduces the energy of the excitation delivered to deeper tissue, which also reduces the available signal at increased depths as seen in Fig. 8i,j.

Discussion
In this paper, an algorithm based on DL to remove reverberation in PAM was presented. Dictionary learning is a powerful algorithmic tool that has been used in many applications, including denoising 11,20 , blind source separation 21 , super-resolution 22 , deblurring 23 , and compressed sensing. Although K-SVD offers an elegant and useful solution, it is still difficult to tune properly on larger datasets due to the relatively long run time (20 seconds for each B-scan with an A-line interval of 0.83 µm and a sparsity of 5). Fortunately, online and minibatch methods 17 of DL have been developed, which can reduce the run time by learning dictionaries while the data is acquired. The use of dictionary learning is invaluable as it allows for an abstraction of pertinent signal features which are common among A-lines. Using the windowing methods on the B-scans directly leads to sub-par results, since we also have to contend with noise. In fact, this case is subsumed in our model, since it leads to a trivial dictionary where every A-line is assumed as an atom and the sparse matrix is an identity matrix. However, a key advantage of the method is lost when processing the images this way, in that non-essential signals such as system noise are also introduced into the process, so while assigning the dictionary may be simple, the dictionary itself may be excessively noisy and contain information that is not relevant.
Additionally, if we have further information a-priori about regions of the vasculature that need to be included, or excluded, the dictionary could be learned using a supervised approach 18,24 . Implementation of these methods would allow coding and reverberation removal as the PAM data is being acquired rather than retrospectively on saved data as presented herein. Furthermore, extending the present study to multi-parametric PAM 16,25 may enable depth-resolved quantification of blood perfusion, oxygenation and flow across different cortical layers. Combining these 3D-resolved functional parameters may ultimately enable label-free high-resolution imaging of cerebral oxygen metabolism in important disease models such as ischemic stroke, epilepsy and traumatic brain injury.

Methods
Tuning of Algorithm parameters. The performance of the algorithm was characterized with respect to variation in its parameters. Figure S2 shows the algorithm performance with varying sparsity and dictionary size. The sparsity represents the number of atoms (basis vectors) of the final learned dictionary that comprise each A-line of a B-scan. The lower the sparsity, the better the dictionary is able to reject noise and unwanted information. However, this can also increase the error of the coded signal with respect to the input signal, and increasing the number of dictionary atoms or algorithm iterations can compensate for error from low sparsity ( Figure S2b). The computational complexity of the algorithm as a whole scales nonlinearly with both the number of atoms in the dictionary and the number of A-lines to be coded, but linearly with the number of iterations 26 which is reflected in Figure S2c.
At first glance, it may appear that a larger number of iterations would reduce the error with respect to the raw data further and are hence more desirable. However, subsequent careful investigation ( Figure S3) reveals that the reduction in error is primarily due to the dictionary adapting better to the noise in the signal, which indicates wasted computational effort due to increased computational time coding undesirable signal, and mitigates the noise suppressing advantage of the algorithm. Hence, we choose a sparsity of 5 with a dictionary size of 700 with a single iteration. The algorithm parameters were tuned and the same parameters were used for all subsequent experiments on in vivo and in vitro data. The noise suppression of K-SVD was characterized by assessing the noise floor from a 100 μm wide region of interest deep in the tissue (approx. 1 mm), where there is unlikely to be any useful signal from dictionary coded and raw PAM data. The noise floor was found to be 25.42 ± 0.05 dB below the maximum signal intensity (averaged over 300 B-scans) for the raw B-scans and 42.12 ± 0.15 dB below the maximum signal intensity.
Photoacoustic microscopy. In our PAM system (Fig. 2a), a nanosecond-pulsed 559 nm laser (BX40-2-G, Edgewave) operating at the repetition rate of ~10 kHz is applied. Attenuated by a neutral density filter (NDC-50C-2M, Thorlabs), the beam is then filtered by an iris (SM1D12D, Thorlabs) and focused by a condenser lens (LA1608, Thorlabs). The focused beam is further filtered by a 50-μm-diameter pinhole (P50C, Thorlabs) prior to entering the microscope objective (M-10X, Newport), which is used to couple the beam into the single-mode optical fiber. To monitor the laser fluctuation, a beam sampler (BSF10-A, Thorlabs) is placed on the beam path to reflect a small portion (i.e., 5%) of the laser energy to a high-speed photodiode (FDS100, Thorlabs).
The other end of the single-mode fiber is mounted onto a 3-axis translational stage, which allows the adjustment of vertical position and 2D raster scan. Exiting from the single-mode fiber, the laser beam is collimated by an achromatic doublet (AC127-025-A, Thorlabs) and reshaped by an iris (SM05D5, Thorlabs), which is further focused using the identical achromatic doublet. A correction lens (LA1207-A, Thorlabs) is used to compensate for the optical aberration at the interface between the ambient air and water. The laser excitation light goes through the central hole of our customized ring-shaped ultrasonic transducer (center frequency: 35 MHz; 6-dB bandwidth: 70%), allowing the confocal alignment of the optical and acoustic foci to ensure the maximum sensitivity.
Animal preparation. Male CD-1 mice (9-13 weeks old, Charles River Laboratories) were used for the studies. Under anesthesia, the hair in the mouse head was removed and a surgical incision was made in the scalp to expose the skull. The skull over the region of interest was removed using the established surgical procedure 27 . The mouse was anesthetized with 1.5% vaporized isoflurane, and its body temperature was kept at 37 °C using a heating pad (Omega, SRFG-303/10) and a temperature controller (Cole-Parmer, EW-89802-52). All experimental procedures were carried out in conformity with the laboratory animal protocol approved by the Animal Care and Use Committee at the University of Virginia.