Limited-View and Sparse Photoacoustic Tomography for Neuroimaging with Deep Learning

Photoacoustic tomography (PAT) is a non-ionizing imaging modality capable of acquiring high contrast and resolution images of optical absorption at depths greater than traditional optical imaging techniques. Practical considerations with instrumentation and geometry limit the number of available acoustic sensors and their “view” of the imaging target, which result in image reconstruction artifacts degrading image quality. Iterative reconstruction methods can be used to reduce artifacts but are computationally expensive. In this work, we propose a novel deep learning approach termed pixel-wise deep learning (Pixel-DL) that first employs pixel-wise interpolation governed by the physics of photoacoustic wave propagation and then uses a convolution neural network to reconstruct an image. Simulated photoacoustic data from synthetic, mouse-brain, lung, and fundus vasculature phantoms were used for training and testing. Results demonstrated that Pixel-DL achieved comparable or better performance to iterative methods and consistently outperformed other CNN-based approaches for correcting artifacts. Pixel-DL is a computationally efficient approach that enables for real-time PAT rendering and improved image reconstruction quality for limited-view and sparse PAT.


Introduction
Neuroimaging in small animals have played an essential role in preclinical research to provide physiological, pathological, and functional insights that are key for understanding and treating neurological diseases.Over the past few decades, there has been significant advances in magnetic resonance imaging (MRI) and optical imaging techniques for structural and functional neuroimaging.For example, MRI can acquire high resolution images of brain structures over large volumes, 3D connectivity and diffusivity information using diffusion tensor imaging, and brain activity using functional MRI [1]- [3].However, MRI has poor temporal resolution and cannot be used to study fast hemodynamic mechanisms and responses.Optical imaging techniques can exploit the diverse biological molecules (e.g.hemoglobin, melanin, and lipids)each possessing different optical propertiespresent in biological tissues to provide contrast for structural and functional imaging [4]- [6].However, strong optical scattering limits the imaging depth of optical techniques to approximately 1-2 mm into the brain.S. Guan is with the Bioengineering Department, George Mason University., Fairfax, VA O USA. (e-mail: sguan2@gmu.edu) and The MITRE Corporation., McLean, VA, 22102 (e-mail: sguan@mitre.org).The author's affiliation with The MITRE Corporation is provided for identification purposes only and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.Approved for Public Release; Distribution Unlimited.Case Number 18-4405.©2019 The MITRE Corporation.All Rights Reserved.A. Khan, and P. Chitnis, S. Sikdar are with the Bioengineering Department, George Mason University, Fairfax, VA 22031 USA.(e-mail: akhan50@gmu.edu,pchitnis@gmu.edu,and ssikdar@gmu.edu).
Photoacoustic tomography (PAT) is an emerging non-invasive hybrid technique that has recently seen substantial growth in numerous preclinical biomedical applications and as a powerful clinical diagnostic tool [7]- [10].In particular, there is a strong interest in PAT for preclinical structural and functional neuroimaging [11]- [15].Given its unique use of light and sound, PAT combines the high contrast and molecular specificity of optical imaging with the high spatial resolution and centimeter-penetration depth of ultrasound imaging [16]- [18].PAT has been demonstrated capable of Kilohertz volumetric imaging rates, far exceeding the performance of other modalities, which enables new insights into previously obscure biological phenomena [19].There are diverse contrast agents available such as chemical dyes, fluorescent proteins, and nanoparticles that can be used to further enhance the imaging capabilities of PAT [20], [21].
PAT involves irradiating the biological tissue with a short-pulsed laser.Optical absorbers within the tissue are excited by the laser and undergo thermoelastic expansion which results in the generation of acoustic waves [22].A sensor array surrounding the tissue is then used to detect the acoustic waves, and an image is formed from the measured sensor data.PAT image reconstruction is a well-studied inverse problem that can be solved using analytical solutions, numerical methods (e.g.time reversal), and model-based iterative methods [23]- [27].In general, a high-quality image can be reconstructed if the sensor array has a sufficiently large number of sensor elements and completely encloses the tissue.However, building an imaging system with these specifications is often prohibitively expensive, and in many in vivo applications such as neuroimaging, the sensor array typically can only partially enclose the tissue [28], [29].These practical limitations result in sparse spatial sampling and limited-view of the photoacoustic waves emanating from the medium.Reconstructing from sub-optimally acquired data causes streaking artifacts in the reconstructed PAT image that inhibits interpretation and quantification [30].
To address these issues, iterative methods are commonly employed to remove artifacts and improve image quality.These methods use an explicit model of photoacoustic wave propagation and seek to minimize a penalty function that incorporates prior information [31]- [33].However, they are computationally expensive due to the need for repeated evaluations of the forward and adjoint operators, and resulting image quality is dependent on the constraints imposed.[34], [35].
Given the wide success of deep learning in computer vision, there is a strong interest in applying similar methods for tomographic image reconstruction problems [36]- [38].Deep learning has the potential to be an effective and computationally efficient alternative to state-ofthe-art iterative methods.Having such a method would enable improved image quality, real-time PAT image rendering, and more accurate image interpretation and quantification.
Among the many deep learning approaches for image reconstruction, post-processing reconstruction (Post-DL) is the most widely used and has been demonstrated for improving image reconstruction quality in CT [39], [40], MRI [41], and PAT [42]- [45].It was shown capable of achieving comparable performance to iterative methods for sparse PAT image reconstruction [46], [47].In Post-DL, an initial inversion is used to reconstruct an image with artifacts from the sensor data.A convolutional neural network (CNN) is then applied as a postprocessing step to remove artifacts and improve image quality.The main drawback of Post-DL is that the initial inversion does not properly address the issues of limited-view and sparse sampling, which results in an initial image with artifacts.Image features (e.g.small vessels) that are missing or obscured by artifacts are unlikely to be recovered by the CNN.
Previous works attempted to improve upon Post-DL by removing the need for an initial inversion step [47], [48].One approach termed direct reconstruction (Direct-DL) used a CNN to reconstruct an image directly from the sensor data [48].The main challenge in using Direct-DL was the need to carefully select parameters (e.g.stride and kernel size) for each convolutional layer in order to transform the sensor data into the desired image dimensions.Changing either the dimensions of the input or output would require a new set of convolution parameters.Direct-DL was shown capable of reconstructing an image but underperformed compared to Post-DL.
In this work, we propose a novel approach termed pixel-wise deep learning (Pixel-DL) for limited-view and sparse PAT image reconstruction.Instead of using an initial inversion, pixel-wise interpolation is applied to the sensor data to window relevant information on a per pixel-basis based on the physics of photoacoustic wave propagation.This strategy enables the CNN to utilize more information from the sensor data to reconstruct a higher quality image.The pixel-interpolated sensor data also has similar dimensions to the desired output image.We compare Pixel-DL to other PAT image reconstruction methods including: time-reversal, iterative reconstruction, Post-DL, and a modified implementation of Direct-DL.

Results
CNN-based deep learning approaches were implemented using the Fully Dense UNet (FD-UNet) CNN architecture.The CNNs were trained using images derived from a synthetic vasculature phantom.The PAT reconstructions methods were evaluated using maximum intensity projection images from a mouse brain vasculature phantom.Reconstructed images were compared to the ground truth image using the peak-signal-to-noise ratio (PSNR) and structural similarity index (SSIM) as metrics for image quality.PSNR provides a global measurement of image quality, while SSIM provides a local measurement that takes into account for similarities in contrast, luminance, and structure.
The time reversal reconstructed images had severe artifacts blurring the image and the lowest average PSNR and SSIM for all sparsity levels (Fig. 1 and Table 1).Images reconstructed with iterative or a CNN-based method had significantly fewer artifacts and a higher average PSNR and SSIM.Vessels obscured by artifacts in the time reversal reconstructed images were more visible in the other reconstructed images.As expected, increasing the number of sensors resulted in fewer artifacts and improved image quality for all PAT image reconstruction methods.Pixel-DL consistently had a higher average PSNR and SSIM than Post-DL for all sparsity levels and similar scores to iterative reconstruction.In case of sparse sampling (16 and 32 sensors elements), Post-DL reconstruction typically introduced additional vessels that were not originally in the ground truth image (Fig. 1a-b).This was likely due to the CNN misinterpreting strong artifacts in the input image as real vessels.Pixel-DL reconstructed images exhibited a similar behavior but typically had fewer additional vessels.In Pixel-DL, the CNN had access to more information from the sensor data to accurately reconstruct the vessels.Compared to the deep learning-based reconstructed images, those reconstructed using iterative reconstruction had an overly smoothed appearance, a pattern commonly observed when using the total variation constraint.
The average reconstruction time reported for each method are for reconstructing a single image from the PAT sensor data.Time reversal is a robust and computationally inexpensive reconstruction method (~2.57seconds per image).Iterative reconstruction removed most artifacts and improved image quality but had a much longer average reconstruction time (~491.21seconds per image).Pixel-DL reconstructed images with similar quality to iterative reconstruction and was significantly faster by over a factor of 1000 (~7.9 milliseconds per image).Average reconstruction time for Post-DL is dependent on the initial inversion used since the computational cost of a forward pass through a CNN is essentially negligible.Since time reversal was used as the initial inversion, Post-DL had a longer average reconstruct time than Pixel-DL (~2.58 seconds per image).Pixel-DL consistently outperformed time reversal in reconstructing images of the synthetic vasculature and mouse brain vasculature (Fig. 2).Interestingly, mDirect-DL only outperformed time reversal in reconstructing the synthetic vasculature images, which were used to train the CNN.While the mDirect-DL reconstructed images of mouse brain vasculature somewhat resembled the ground truth images, they had a PSNR and SSIM of 13.59 and 0.21, respectively, which was lower than those in images reconstructed using achieved using the conventional time reversal technique.This indicated that the mDirect-DL CNN partially learned the mapping of the PAT-sensor data onto the image space but overfitted to the training data.
During training, the CNNs for Pixel-DL and mDirect-DL converged to a minimum mean squared error, but the Pixel-DL converged to a lower error (Supplementary Fig. 2).

Discussion
In this work, we propose a novel deep learning approach termed Pixel-DL for limitedview and sparse PAT image reconstruction.We performed in silico experiments using training and testing data derived from a synthetic vasculature phantom and mouse brain vasculature phantom, respectively, to compare multiple PAT image reconstruction methods.Results showed that Pixel-DL outperformed Post-DL and mDirect-DL for all experiments.Pixel-DL had similar performance to iterative reconstruction but required several orders of magnitude less time to reconstruct a single image.
The CNN architecture and hyperparameters were used for all deep learning approaches implemented were essentially the same.Thus, discrepancies in performance between the approaches were primarily due to their respective inputs into the CNN (Fig. 1).In Post-DL, the input was an image initially reconstructed from the sensor data using time reversal.The input and output to the CNN are both conveniently images of the same dimensions.This removed the need for the CNN to learn the physics required to map the sensor data into the image space.However, the initial inversion did not properly address the issues of limited-view and sparse sampling which resulted in an initial image with artifacts.Moreover, the CNN no longer had access to the raw sensor data and was only able to use information contained in the image to remove artifacts.There was likely useful information in the sensor data for more accurately reconstructing the PAT image, which was ignored in this approach.
In Pixel-DL, the initial inversion is replaced with pixel-wise interpolation, which similarly provides a mapping from the sensor data to image space.Relevant sensor data is windowed on a per pixel basis using a linear model of acoustic wave propagation.This enables the CNN to have a richer information source to reconstruct higher quality images.Furthermore, there is no initial inversion introducing artifacts; thus, the CNN does not have an additional task of learning to remove those artifacts.mDirect-DL similarly did not require an initial inversion and instead used the full sensor data as an input to the CNN to reconstruct an image.The advantage of mDirect-DL is that the CNN had the potential to fully utilize the information available in the sensor data to reconstruct a high-quality image.However, the CNN also had a more difficult task because it needed to additionally learn a mapping from the sensor data into the image space.Results showed that the CNN had difficulty in learning a generalizable mapping and overfitted to the training data (Fig. 2).The FD-UNet was likely not an optimal architecture for this task since it was designed assuming the input was an image.A different neural network architecture for a multidimensional time-series input would be better suited.
In our implementation of Pixel-DL, a linear model assuming a constant speed of sound and circular wave propagation was used for pixel-wise interpolation of raw sensor data in a manner that was governed by the physics of photoacoustic wave propagation.While this model was sufficient for the case of an acoustically homogenous medium, it would need to be modified if the medium was heterogeneous or if acoustically reflective surfaces were present.Naively reconstructing with these assumptions would result in additional artifacts that the CNN would need to learn to correct.
The proposed Pixel-DL approach can be used as a computationally efficient method for improving PAT image quality under limited-view and sparse sampling conditions.It can be readily applied to a wide variety of PAT imaging applications and configurations.Pixel-DL enables for the development of more efficient data acquisition approaches.For example, PAT imaging systems can be built with fewer sensors without significantly sacrificing image quality, which would allow for the technology to be more affordable.Pixel-DL achieved similar performance but was faster than iterative reconstruction by over a factor of a 1000.It would allow for real-time PAT image rendering which would provide valuable feedback during image acquisition.
While we have demonstrated the feasibility of Pixel-DL specifically for PAT, this approach can also be readily applied to ultrasound imaging.Image reconstruction for PAT and ultrasound imaging both largely rely on time-of-flight calculations to determine where the signal originated.Therefore, a similar linear model of acoustic wave propagation can be used to readily apply Pixel-DL for ultrasound image reconstruction problems.Pixel-DL can also be adapted to other imaging modalities if a model mapping raw sensor data to the image space is available.

Photoacoustic Signal Generation
The photoacoustic signal is generated by irradiating the tissue with a nanosecond laser pulse ().Light absorbing molecules in the tissue undergo thermoelastic expansion and generate photoacoustic pressure waves [22].Assuming negligible thermal diffusion and volume expansion during illumination, the initial photoacoustic pressure  can be defined as where () is the spatial absorption function and Γ() is the Grüneisen coefficient describing the conversion efficiency from heat to pressure [50].The photoacoustic pressure wave (, ) at position  and time  can be modeled as an initial value problem for the wave equation, in which  is the speed of sound [51].
Sensors located along a measurement surface   measure a time-dependent signal (Supplementary Fig. 1).The linear operator ℳ acts on (, ) restricted to the boundary of the computational domain Ω over a finite time  and provides a linear mapping from the initial pressure  to the measured time-dependent signal .

Photoacoustic Image Reconstruction
Time reversal is a robust reconstruction method that works well for homogenous and heterogeneous mediums and also for any arbitrary detection geometry [26], [27].A PAT image is formed by running a numerical model of the forward problem backwards in time.This involves transmitting the measured sensor data in a time-reversed order into the medium, and performs sufficiently well as long as the acoustic properties of the medium are known a priori.
For iterative reconstruction, the PAT image, initial photoacoustic pressure,  is recovered from the measured signal  by solving the following optimization problem using an isotropic total variation (TV) constraint where the parameter  > 0 is a regularization parameter [31], [35], [52].The TV constraint is a widely employed regularization functional for reducing noise and preserving edges.Iterative reconstruction with a TV constraint works well in the case of simple numerical or experimental phantoms but often leads to sub-optimal reconstructions for images with more complex structures [42].

Deep Learning
In this work, three different CNN-based deep learning approaches were used for limitedview and sparse PAT image reconstruction (Fig. 4).These approaches all began with applying an initial processing step to the PAT sensor data and then recovering the final PAT image using a CNN.The primary difference among these approaches was the processing step used to initially transform the PAT sensor data.In Post-DL, the sensor data was initially reconstructed into an image containing artifacts using time reversal, and the CNN was applied as a post-processing step for artifact removal and image enhancement.In Pixel-DL, pixel-wise interpolation was applied to window relevant information in the sensor data and to map that information into the image space.In mDirect-DL, a combination of linear interpolation and down sampling was applied so that the interpolated sensor data had the same dimensions as the final PAT image.

CNN Architecture: Fully Dense UNet
After the sensor data was transformed, the final PAT image was recovered using the Fully Dense UNet (FD-UNet) CNN architecture.The FD-UNet builds upon the UNet, a widely used CNN for biomedical imaging tasks.by incorporating dense connectivity into the contracting and expanding paths of the network [53].This connectivity pattern enhances information flow between convolutional layers to mitigate learning redundant features and reduce overfitting [54].The FD-UNet was demonstrated to be superior to the UNet for artifact removal and image enhancement in 2D sparse PAT [45].

Pixel-Wise Interpolation
Pixel-wise interpolation uses a linear model of acoustic wave propagation that assumes the acoustic waves are propagating spherically and traveling at a constant speed of sound.Based on these assumptions, the time-of-flight needed for a signal originating at some position and traveling to a sensor can be calculated.The defined image reconstruction grid spans the region of   interest in the imaging system (Fig. 5).Pixel-wise interpolation is used to map the measured photoacoustic signals from a sensor to each pixel location in the image reconstruction grid.This process is repeated for each sensor in the imaging system and results in a 3D data array with dimensions corresponding to the 2D image space and sensor index.

Deep Learning Implementation
The CNNs were implemented in Python 3.6 with TensorFlow v1.7, an open source library for deep learning [55].Training and evaluation of the network is performed on a GTX 1080Ti NVIDIA GPU.The CNNs were trained for 40 epochs using a mean squared error loss function, learning rate of 1e-4, and a batch size of three images.

Photoacoustic Data for Training and Testing
Training data were procedurally generated using data augmentation, where new images were created based on a 340x340 pixel-size image of a synthetic vasculature phantom generated in MATLAB.First, scaling and rotation was applied to the initial phantom image with a randomly chosen scaling factor (0.5 to 2) and rotation angle (0-359 degrees).Then a 128x128 pixels sub-image was randomly chosen from the transformed image and translated by a random vertical and horizontal shift (0-10 pixels) via zero-padding.For each experiment, 500 training images were generated.Testing data were generated from a 3D micro-CT mouse brain vasculature volume [56] with a size of 260x336x438 pixels.The Frangi vesselness filter was applied to suppress background noise and enhance vessel-like features [57].A new image was created from the filtered volume by generating a maximum-intensity projection of a randomly chosen 128x128x128 pixel sub-volume.For each experiment, 250 testing images were generated in this manner.
A MATLAB toolbox, k-WAVE [58], was used to simulate photoacoustic data acquisition using an array of acoustic sensors.Each training and testing image were treated as photoacoustic source distribution on a computation grid of 128x128 pixels.The medium was assumed to be non-absorbing and homogenous with a speed of sound of 1500 m/s and density of 1000 Kg/m 3 .The sensor array had 16, 32, or 64 sensor elements equally spaced on a semi-circle with a diameter of 120 pixels.The k-Wave toolbox was also used for reconstructing an image from the raw photoacoustic data using the artifact-prone TR approach.
Images reconstructed using the four schemes, TR, post-DL, mDirect-DL, and pixel-DL, were compared against the ground truth, i.e., mouse-brain images used as testing data using the peak-signal-to-noise ratio (PSNR) and structural similarity index (SSIM) as metrics for image quality.PSNR provides a global measurement of image quality, while SSIM provides a local measurement that takes into account for similarities in contrast, luminance, and structure [49].

Fig. 1 .
Fig. 1. | Limited-view and sparse PAT image reconstruction of mouse brain vasculature.PAT sensor data acquired with a semi-circle limited-view sensor array at varying sparsity levels.a, Ground truth image used to simulate PAT sensor data.b, PAT reconstructions with 16 sensors.Vessels are difficult to identify in time reversal reconstruction as a result of artifacts.c, PAT reconstructions with 32 sensors.Vessels can be clearly seen in CNN-based and iterative reconstructions.d, PAT reconstructions with 64 sensors.Larger vessels are identifiable in all reconstructed images.

Fig. 2 .
Fig. 2. | Limited-view and sparse Pixel-DL and mDirect-DL PAT image reconstructions.PAT sensor data acquired with 32 sensors and a half-circle view.a, CNNs were trained and tested on images of the synthetic vasculature phantom.Both CNN-based approaches successfully reconstructed the example synthetic vasculature phantom image b, CNNs were trained on images of the synthetic vasculature phantom but tested on mouse brain vasculature images.mDirect-DL failed to reconstruct the example mouse brain vasculature image and performed worse than time reversal.

Fig. 3 |
Fig. 3 | Summary of CNN-based deep learning approaches for PAT image reconstruction.The primary task is to reconstruct an essentially artifact-free PAT image from the acquired PAT sensor data.a, PAT sensor data acquired using a sensor array with 32 sensors and semi-circle limited-view.b, Initial image reconstruction with sparse and limited-view artifacts using time reversal for Post-DL.c, 3D data array acquired after applying pixel-wise interpolation for Pixel-DL.d, Sensor data interpolated to have matching dimensions as the final PAT image for mDirect-DL.e, Desired artifact-free PAT image reconstruction from the CNN-based deep learning approaches.

Fig. 4 |
Fig.4| FD-UNet CNN Architecture.The FD-UNet CNN with hyperparameters of initial growth rate,  1 = 16 and initial feature-maps learned,  1 = 128 is used for PAT image reconstruction.Essentially the same CNN architecture was used for each deep learning approach except for minor modifications.a, Inputs into the CNN for each deep learning approach.The Post-DL CNN implementation used residual learning which included a skip connection between the input and final addition operation.The initial Pixel-DL input contains "N" featuremaps corresponding to the number of sensors in the imaging system.b, The FD-UNet is comprised of a contracting and expanding path with concatenation connections.c, The output of the CNN is the desired PAT image.In Post-DL, residual learning is used to acquire the final PAT image.

Fig. 5 |
Fig. 5 | Pixel-Wise Interpolation Process.a, Schematic of the PAT system for imaging the vasculature phantom.The red semi-circle represents the sensor array, and the gray grid represents the defined reconstruction grid.The first sensor (S1) is circled and is used as an example for applying pixel-wise interpolation to a single sensor.b, The acquired PAT sensor data.c, Pixel-interpolated data after applying pixel-wise interpolation to each sensor based on the defined reconstruction grid.d, Sensor data for S1.Color represents the distance from the sensor.e, Calculated distance for each pixel from the location of S1. f, Values are mapped from the S1 sensor data to the reconstruction grid based on the distance of each pixel from the location of S1.

Table 1 :
Average PSNR and SSIM for Micro-CT Mouse Brain Vasculature Testing Dataset (N = 250 testing images)

Number of Sensors Time Reversal Post-DL Pixel-DL Iterative Reconstruction Pixel-DL 16
For each row, PSNR is shown as normal text on top while SSIM is shown as italicized text on the bottom.