Fluorescence lifetime imaging with a megapixel SPAD camera and neural network lifetime estimation

Fluorescence lifetime imaging microscopy (FLIM) is a key technology that provides direct insight into cell metabolism, cell dynamics and protein activity. However, determining the lifetimes of different fluorescent proteins requires the detection of a relatively large number of photons, hence slowing down total acquisition times. Moreover, there are many cases, for example in studies of cell collectives, where wide-field imaging is desired. We report scan-less wide-field FLIM based on a 0.5 MP resolution, time-gated Single Photon Avalanche Diode (SPAD) camera, with acquisition rates up to 1 Hz. Fluorescence lifetime estimation is performed via a pre-trained artificial neural network with 1000-fold improvement in processing times compared to standard least squares fitting techniques. We utilised our system to image HT1080—human fibrosarcoma cell line as well as Convallaria. The results show promise for real-time FLIM and a viable route towards multi-megapixel fluorescence lifetime images, with a proof-of-principle mosaic image shown with 3.6 MP.


Scientific Reports
| (2020) 10:20986 | https://doi.org/10.1038/s41598-020-77737-0 www.nature.com/scientificreports/ square array detector 21 . Therefore, large detectors, such as the SPAD array used in this work, can be important for imaging dim samples which are often encountered in biologically relevant experiments. Wide-field FLIM is typically realised using TCSPC in a system with microchannel plate-based gated optical intensifiers combined with a sensor capable of resolving signal position, such as a Charge-Coupled Device (CCD) camera 22,23 . An emerging alternative to aforementioned intensifier based systems are Single Photon Avalanche Diode (SPAD) arrays manufactured with complementary-metal-oxide semiconductor (CMOS) technology 24 , which can operate with TCSPC [25][26][27][28][29][30][31] or time-gated 32-37 acquisition mode. The main advantages of SPAD arrays over conventional CCD/CMOS cameras are the picosecond temporal resolution, and single-photon sensitivity 38 , which make them ideal for a broad range of applications in the area of ultrafast time-resolved imaging 39 . Until recently, SPAD arrays had a relatively limited number of 'active areas' (or 'pixels') due to the physical constraints imposed by the need to fit complex timing electronics for each individual pixel on the same chip. Nonetheless, recent technological advances have led to SPAD sensors with formats comparable to intensified CCDs. Prior to the development of the SPAD array used in this work 34 , a 512 × 512 SPAD array (SwissSPAD2 33 ) was the largest SPAD array available (recent developments in SPAD detectors are reviewed in 40 ). We note that generally speaking, gated SPAD cameras offer higher pixel fill-factor and therefore better photon detection probability compared to TCSPC SPAD cameras that require more complex on-pixel electronics. On the other hand, gated cameras require digital scanning of the gate to achieve temporal resolution, therefore lengthening the total acquisition times. There is therefore always a trade-off between the requirement to scan the gate (for a gated camera) and low pixel fill-factor (for a TCSPC camera). When building a megapixel SPAD camera, one must also consider the advantage of higher quality readout obtained by simplifying the electronics.
In this work, we demonstrate 0.5 megapixel (500 × 1024 pixels) wide-field FLIM microscopy with a SPAD array, while protein fluorescent lifetimes are extracted directly from the data via a bespoke artificial neural network (ANN). We illustrate the applicability of the 0.5 MP SPAD array for FLIM imaging of Convallaria samples at 1 Hz acquisition rate, and also in experiments with samples relevant in cancer research, such as the human fibrosarcoma (HT1080) cell line. Finally, we show the potential to extend this approach to ultra-wide fields-ofview by acquiring 8 tiles with the SPAD array, thus providing a 3.6 MP lifetime image of Convallaria.

megapixel wide-field SPAD array
The SPAD array is described in detail in Ref. 34 and in the "Methods". The camera operates in gated mode, i.e. the sensor is sensitive to incoming photons for a fixed duration gate of ∼ 3.8 ns that has, to good approximation, a super-Gaussian profile (see "Methods"). Figure 1a schematically explains how the 3.8 ns gate is scanned in steps that can be as small as 36 ps. At each step, a binary image (frame) of the spatially resolved photon counts is recorded. Stacking together all of these frames therefore provides a temporally-resolved spatial image of the The ANN architecture (see "Methods") consists of one input layer (IL), one output layer (OL), and a series of hidden layers (HLi, with i = 1, 2, 3 ). Each of these layers consists of a fully-connected dense layer (dark blue) followed by with rectified linear unit (ReLU) activation function (light blue). The input layer is fed with the fluorescence decay signal recorded by a single pixel of the SPAD array. www.nature.com/scientificreports/ sample where the data along the temporal axis is the convolution of the lifetime response with the camera temporal gate. The samples are imaged on to the camera with 0.47 µm/pixel spatial sampling for our all data, with the exception of Fig. 2g-i that has 33 µm/pixel spatial sampling with (see "Methods").

Lifetime retrieval
Irrespective of the imaging modality used for FLIM measurements, extracting lifetime information is not a trivial matter 42 . The measured signal, f(t), is the convolution of the impulse response function (IRF) and the fluorescence decay of the fluorophore, g(t), can be expressed as: where t i is the time of the i-th sampling of the signal. A plethora of algorithms have been developed in order to tackle the problem of retrieving fluorescence decay (reviewed in 42 ), with perhaps the most common being least-squares (LSQ) deconvolution (sometimes referred to as 'reconvolution'). In this approach a model of the fluorescence decay is convolved with the IRF and compared to the measured data using LSQ minimisation. The 'best fit' yields a set of parameters, including the lifetime (described in detail in "Methods"). However, LSQ minimisation-based lifetime estimation is typically very demanding computationally, even with the reduction of computational times provided by Graphical Processing Units (GPUs) 43 . Recently, a rapid non-fitting method called center-of-mass method (CMM) has been improved by accounting the background contribution 44 . However, CMM relies on the explicit assumption that the IRF is much shorter than the lifetime measured 45 , which does not apply for example when using gated cameras with gate times that are similar to the expected lifetimes, as in this work. Alternatively, fast visualisation methods such as phasor analysis have been proposed 46 , and have been successfully used for time-gated SPAD array FLIM data analysis 47 . In addition to the above-mentioned numerical approaches, advances in machine learning (ML) methods 48 has enabled researchers to utilise deep learning (DL) frameworks to extract the exponential decay time and component fraction information from FLIM data rapidly and without fitting 49,50 . Here we employ an ANN to retrieve the lifetime at each pixel of the SPAD array. The ANN layout is shown in Fig. 1b: the input layer (IL) is a one-dimensional array corresponding to the time-resolved www.nature.com/scientificreports/ photon-count signal for a given pixel. This is connected to the output layer (OL), which provides a single value for the fluorescence lifetime decay constant τ , through 3 fully-connected dense layers of decreasing size. The ANN is trained on computer generated data that is created by taking a range of (mono) exponential lifetime decays (see Fig. 1a) that are chosen in the range τ = 0.5 − 5 ns and convolved with a super-Gaussian of width that varies in the range 3.6-6 ns. The ANN was then tested on both simulated data not used in the training and also on actual experimental data.
In the latter case, the retrieval was compared to the results from the LSQ deconvolution. We found that in order to retrieve precise values of τ from test data provided by the camera it was necessary to also include noise in the training data. The best results were obtained assuming two sources of noise: a Poisson-distributed component that is proportional to the actual signal, as expected for a photon detection process and also used in Ref. 50 and a Gaussian component, see "Methods". As shown in what follows, the ANN is applied to each individual pixel and can provide very similar results to a standard LSQ deconvolution albeit with a retrieval time of ∼ 8 s for a megapixel image, corresponding to 1000× gain in speed, if compared to our LSQ approach (the mean absolute difference between the two methods for the results shown in Fig. 2 was 0.14 ± 0.12 ns) and is thus a key component in rendering megapixel FLIM a real-time technique.

Results-0.5 megapixel wide-field FLIM
To illustrate the applicability of our SPAD array for FLIM data acquisition, we imaged a Convallaria sample with 'high photon counts' (HPC) at a 10 s acquisition rate (Fig. 2a-c) and with 'low photon counts' (LPC) at a 1 second acquisition rate (Fig. 2d-f). The HPC data-set was obtained by using a 504 ps gate shift and exposure of ≈ 330 ms. In order to achieve 1 Hz acquisition, we reduced the exposure to ≈ 33 ms per frame. Figure 2 shows that lifetime data can be retrieved for both the HPC and LPC data. However, LPC data analysis is more challenging due to the lower signal-to-noise ratio (SNR). In the examples shown here, the total photon count in the LPC data falls below 2700 photons per pixel, whereas the HPC data exceeds 8500 (the intensity values in Fig. 2 are clipped to make dimmer structures more visible). Nonetheless, both the LSQ and ANN methods recovered similar mean lifetime values for both HPC and LPC data. The mean lifetime and standard deviation values for the lifetime images in Fig. 2 are shown in Table 1.
One of the main benefits of the ANN is that it has the potential for being significantly faster than LSQ. Using a pre-trained model (training time ≈ 38 min on a training set of ≈ 2 million simulated decay curves), the ANN retrieval requires 2.7-3.6 s to process the full image. This is 388-1288 times faster than the LSQ method, which took 23.3-58 min in our tests (detailed times for each data-set are described in Fig. 2). We emphasise that we fit each pixel independently, and do not rely on 'global fitting' schemes, where data is averaged spatially and/or temporally 52 .
While Convallaria is a popular sample for testing FLIM systems [53][54][55][56] , the strong signal it yields is not necessarily representative, for example, of the signal level from transfected mamallian cells. To show a more practical example, we provide FLIM data of cancer research relevant samples: fixed HT1080 (fibrosarcoma) cells, transfected with pcDNA3-Clover 57 and expressing a protein with a single fluorescence lifetime (Fig. 2g,h). The HT1080 cell data was acquired using a 108 ps gate shift, with a total ≈ 400 s acquisition time.
Similarly to the Convallaria results, the ANN and LSQ results match well quantitatively (absolute difference between LSQ and ANN: 0.10 ± 0.05 ns). Our retrieved lifetime for HT1080 cells transfected with Clover is in good agreement with a previously reported value of 2.6 ns 58 . The large size of the sensor allows imaging multiple cells, at high detail, across a large field of view simultaneously. We note that the acquisition time could be decreased by increasing the gate shift and acquiring fewer time bins at the cost of reduced sampling of the fluorescence decay, and potentially less accurate lifetime recovery.

Results: 3.6 megapixel wide-field FLIM
Finally, we present a 3.6 MP image of our Convallaria sample to showcase that very large field-of-views can be achieved almost trivially using the 0.5 megapixel SPAD array (Fig. 3). The field of view in Fig. 3 is 618 × 650 µ m with the same spatial sampling of 0.33 µm/pixel as in previous figures. We acquired the data using a mosaic acquisition by moving the sample with ≈ 10% overlap between mosaic tiles. Crucially, our ANN method required only 36 s to recover lifetime information from this data-set. This retrieval time could be shortened by processing each pixel (or batches of pixels) in parallel.  Fig. 2. The LSQ and ANN lifetime retrieval methods provide similar, compatible results. We note the consistency in the lifetimes for HPC and LPC data, which shows that we can retrieve reliable lifetimes even at relatively low photon counts.

Conclusions
We have demonstrated the application of the largest-to-date time-gated SPAD array for fluorescence lifetime imaging of biologically relevant samples. By exploiting the ability to vary the gate-shift size between different exposures of the camera, we showed that wide-field FLIM at 0.5 megapixel resolution is possible at 1 Hz acquisition speed. By performing a spatial mosaic acquisition, 3.6 MP fluorescent lifetime images are readily available. These could be scaled to even larger fields of view and shorter acquisition times by using bespoke and rapid translation stages. In this work we retrieve mono-exponential lifetimes as this has less stringent SNR requirements compared to multi-exponential fitting 42,59 . However, future work could of course extend this multi-exponential decays.
While fast analysis methods such as the phasor approach 46 , or methods utilising advances in machine learning 49,50 , including our own ANN, are important parts of a high-speed FLIM systems, the biggest impact on imaging low-signal biologically relevant structures at sub-micron resolution at large FOV is delivered by the continuous improvement of SPAD array technology with e.g. increases also in fill factor and quantum efficiency 60 .

Methods
Time-gated imaging. The imaging is based on a 1 MP SPAD image sensor 34 . In its current version, the megapixel array accommodates multiple functionalities, which leads to only half of the sensor being read out in the gated mode used here 34 . The camera has 7% fill factor and 10% photon detection probability at 510 nm. To acquire the fluorescence lifetime of a sample, the camera is operated in the time-gated mode. Laser pulses are www.nature.com/scientificreports/ repeatedly exciting the sample; the photons emitted due to fluorescence will be detected by the camera in multiple frames with shifting gate window, resulting in an image sequence per acquisition. For each frame with a fixed gate position, 255 binary frames are summed to create an 8-bit image (9-bit images are obtained by summing two 8-bit registers). To obtain the full characteristic of a single-exponential fluorescence decay, the starting gate position is tuned to be prior to the laser excitation. The gate with average width of 3.8 ns over all pixels is then shifted finely between each frame by a fixed time. Illustration of gate-shifting is shown in Fig. 1a. We note that we pre-process the data by performing background subtraction and pile-up correction, where the latter is accounted for, following equation 1 in Ref. 47 and adopting the same nomenclature: where I corr is the pile-up corrected counts, I max is the maximum possible photon count (depending on the bit depth e.g. 255 for 8-bit data), and I rec is the actual recorded value at a particular pixel. The background is removed by taking the average of first few frames, before the decay signal is observed, and subtracting it from all the frames. We threshold out any pixels that have fewer than N tot photon counts in total (i.e. integrated over all time gates) in order to eliminate pixels that have no significant signal. Photon counts drop towards the left hand side of the sensors due to a decay in the strength of the electronic drivers that distribute the signal controlling the time gates. We account for this non-uniform response of the SPAD array 34 with a 'sliding thresholding' of the data, going from N tot = 1300 (right hand edge of the image) to N tot = 50 (left hand edge).
We note that despite the 7% fill factor, the small pixel pitch of 9.4 µ m (corresponding to a pixel active circular area of 6.18µm 2 ) of the sensor provides sufficient sampling for microscopy at common magnification range.
Data analysis-LSQ deconvolution. As briefly explained in the introduction in the main text, the measured data is a convolution of the impulse response function (IRF) and the underlying fluorescence decay (see Eq. 1). Technically, the IRF of a FLIM system depends on the excitation source and the detector (in our case, the gate length of each SPAD in the array). However, since the nominal pulse width of our laser pulse (<47 ps) is significantly smaller than the gate length of our SPADs (average 3.8 ± 0.2 ns), the contribution to the IRF from our laser source is negligible.
We modelled the IRF as a generalised Gaussian function (or super-Gaussian) at each pixel b as where t is time, t ′ is the position of the gate, N the Gaussian order, and w N is given by the full width at half maximum (FWHM) of the gate through the following relation: We measured that the average 10% to 90% intensity rise time for the gate (evaluated over all the pixels) is (0.55 ± 0.08) ns 34 . The order N of the super-Gaussian is chosen such that it matches the actual measured profiles 34  where τ is the decay constant (i.e. lifetime), t 0 is a temporal offset, b is a constant that accounts for a signal offset induced by a non-zero background and A 0 is an amplitude parameter corresponding to the number of photon counts. Following Eqs. (3) and (5), we model the temporal response of our detection system through the function where ⊛ stands for mathematical convolution. We then apply a LSQ optimisation to the measured data that provides an estimate of [ w, t 0 , τ , b, A 0 , t ′ ]. Computations were implemented using MATLAB R2017b using lsqcurvefit function.
Data analysis-artificial neural network. We use a custom ANN consisting of an input layer (IL), an output layer (OL), and three hidden layers (HLi, with i = 1, 2, 3 ) connecting IL with OL, as depicted in Fig. 1b. Each of these layers is formed by a fully-connected dense layer followed by with rectified linear unit (ReLU) activation function. The IL (with n 0 = 200 nodes) is fed with a fluorescence decay signal (normalised to the range [0, 1]), i.e. with a 1D vector with as many elements as the number of gate shifts (200 in our work). Then, the output of the IL is fed in cascade through the ANN, while the number of nodes of each subsequent HLi is decreased to: n 1 = 100 , n 2 = 50 , and n 3 = 25 . Finally, the OL provides an estimation of the lifetime τ of the fluorescence decay. Computations were carried out using Tensorflow 1.14. Standard machine learning techniques require training with data-sets of sufficient size and quality. To this aim, we train our ANN only with synthetic data, i.e. with fluorescence decay curves-lifetime pairs that are generated numerically. Our training set consisted of 2 million datapoints. We trained using mini-batch gradient descent, with a mini-batch batch size of 128 using adaptive moment estimation (Adam) as our gradient descent (2) I corr = −I max ln 1 − I rec I max , w N = FWHM 2(0.5 ln 2) 1/2N .
Scientific Reports | (2020) 10:20986 | https://doi.org/10.1038/s41598-020-77737-0 www.nature.com/scientificreports/ algorithm. Our loss function was the mean squared error (MSE) between the ground truth lifetime values of our mini-batches, and the corresponding ANN predictions. In order to prevent overtraining the network, we validated the number of epochs to train for using a validation set of 1 million datapoints. We found that the models overtrained minimally, achieving a training set loss that was <10% smaller than our validation loss for all ANNs, and, upon prediction, we found that our test set loss was approximately equal to our validation loss.
Our test set comprised 1 million datapoints. For our 200 time-bin data, expressing our ground truths and predictions in terms of nanoseconds, we obtained mean squared errors of 0.0053 ns 2 on our test set. Our errors had ∼ 0 mean and 0.072 ns standard deviation, and approximately 0 expectation. In other words, our estimator was unbiased, and, averaged over all the test-set lifetimes, gave predictions within error bounds of ±72 ps. For our 30 time-bin data, we obtained a test set MSE of 0.0634 ns 2 . Our errors had approximately 0 mean, and 250 ps standard deviation.
Including variability on gate width, centre position, lifetime, and lifetime curve height, allows us to account for the fact that the various SPAD pixels have slightly different underlying electronic properties 34 . For the two cases (200 time-bin, 108 ps gate shift and 30 time-bin, 504 ps gate shift), we created two distinct neural networks, to match the input sizes of 200 and 30. For both of these neural networks, we created training, validation and test sets, all modelled according to Eqs. (1), (3), (4), and (5) . In order to facilitate good generalisation, our datasets all contained histograms modelled according to a wide range of parameters; this range was kept the same for both ANNs.
The parameter ranges were: decay lifetime τ : [0.5, 5] ns, decay start t 0 : [5,10] ns, decay amplitude, A: [2,32], gate width w n : [3.6, 6] ns. Our range of lifetimes cover the lifetime expression domain of our dye, acridine orange. Variations in decay starting point arise from the signal transmission time difference for different regions of the SPAD array. Decay intensity variations originate from various sources, such as the local concentration of fluoroscent dye caused by cellular structure. The gate width variations are inherent to the SPAD array (see 34 ).
Both LSQ and ANN deconvolution approaches retrieve fluorescence lifetimes one pixel at a time, and can therefore be used on data of any dimension. For the ANN approach we obtained a root mean squared error of 0.0725 on a test set of ≈ 1 million synthetic data. The ANN algorithm then takes (≈ 8.5 ± 0.5) s to predict the lifetimes of a 1024 × 1024 synthetic data set with 200 time bins on a Intel Core i7 10510U CPU.
Noise model. We experimentally confirmed that the noise model in our experiments was best described by a mixture of Poissonian (due to photon counting statistics) that scales as the square root of the signal and Gaussian (electronic) noise.
By analysing the first and last ten time bins (in which no fluorescence signal is present) of 3, full 0.5 MP images, containing 135 thousand non-background pixels, we found that the Gaussian noise had an approximately 0 mean and a standard deviation of around 2.3 counts. In order to add flexibility to our model, we added to each histogram Gaussian noise whose mean and standard deviation were randomly drawn from [−2, 2] and [0, 5], respectively.
Poissonian noise is constantly present in our training/testing/experimental data, by nature of the measurements. Further, we tested the robustness of our method on varying levels of background (Gaussian) noise for various lifetimes i.e. 1, 2.5 and 4 ns. For each noise distribution scenario, we kept the decay intensity parameter, A, the decay start t 0 and the gate width w n constant, and tested on 100 histograms. We found that even for Gaussian noise with a standard deviation of 10 counts/pixel, which is 4× larger than the experimentally measured standard deviation, and 2× larger than the highest training noise level, we still retrieve the correct mean lifetime, with a typical std. dev. of 130 ps and worst case std. dev. of 199 ps. This behaviour was found to be consistent over all the tested lifetimes. Figure 4 shows the lifetime retrieval performance of both the LSQ and the ANN methods. For this benchmark, we generated 9 sets of synthetic curves following the convolution model described above. Each set of curves had constant lifetime decay τ within the range [0.75-4.75] ns, and randomly varying noise from curve to curve within the set. Orange and blue dots represent the mean of the lifetime values retrieved with the LSQ and the ANN, respectively, while error bars correspond to their standard deviation. Both methods provide lifetime estimates in excellent agreement with the ground truth data.
Imaging set-up. We acquired the data on a custom-built epi-fluorescence microscope, with an Olympus 20 × 0.4 NA objective with an f = 250 mm tube lens (or Nikon 40 × 0.75 NA objective, and f = 100 mm tube lens for 1 Hz acquisition) air objective, and a FITC emission/excitation filter and dichroic mirror set. For the illumination source, we used HORIBA DeltaDiode laser diode model DD-470L, nominal peak wavelength 470 nm spectral Full Width at Half Maximum (FWHM) 10 nm, nominal pulse width of 47 ps, and a nominal pulse energy of 15 pJ. The repetition rate of the diode was set to 25 MH-7 nsz.

Mammalian cells, culturing conditions and transfections. HT1080 cells were maintained in Dul-
becco's modified eagle's medium (DMEM), supplemented with 10% foetal bovine serum (FBS), 2 mM L-glutamine and 1 × PenStrep. Cells were maintained in 10 or 15 cm TC-treated plastic dishes at 37 • C and 5 % CO 2 . HT1080 cells were transfected using Amaxa nucleofector Kit T, program L-005. Cells were transfected with 5 µ g DNA (pcDNA3-Clover) following manufacturers guidelines and replated on 6 cm TC-treated plastic dishes overnight at 37 • C, 5% CO 2 . 35 mm glass bottom MatTek dishes that were coated with laminin 10 µg/ml diluted in PBS and left overnight at 4 • C. Cells were then collected and replated onto the dishes and incubated for 24 hours at 37 • C, 5 % CO 2 . After, the dishes were washed twice with PBS before fixing with 4 % Paraformaldehyde Scientific Reports | (2020) 10:20986 | https://doi.org/10.1038/s41598-020-77737-0 www.nature.com/scientificreports/ for 10 min. These were then washed three times with PBS before slight drying. 10µ l Fluromount-G (without DAPI) was added on top of the cells and a 19 mm glass coverslip was added on top to seal the cells in the dish. Dishes were kept in the dark until imaging.

Data availability
Data relevant to the figures in the manuscript can be obtained from https ://doi.org/10.5525/gla.resea rchda ta.1012.