Statistical distortion of supervised learning predictions in optical microscopy induced by image compression

The growth of data throughput in optical microscopy has triggered the extensive use of supervised learning (SL) models on compressed datasets for automated analysis. Investigating the effects of image compression on SL predictions is therefore pivotal to assess their reliability, especially for clinical use. We quantify the statistical distortions induced by compression through the comparison of predictions on compressed data to the raw predictive uncertainty, numerically estimated from the raw noise statistics measured via sensor calibration. Predictions on cell segmentation parameters are altered by up to 15% and more than 10 standard deviations after 16-to-8 bits pixel depth reduction and 10:1 JPEG compression. JPEG formats with higher compression ratios show significantly larger distortions. Interestingly, a recent metrologically accurate algorithm, offering up to 10:1 compression ratio, provides a prediction spread equivalent to that stemming from raw noise. The method described here allows to set a lower bound to the predictive uncertainty of a SL task and can be generalized to determine the statistical distortions originated from a variety of processing pipelines in AI-assisted fields.


Optical calibration of a microscope camera
The pixel value recorded in a microscope image depends both on the signal (mean number of photons impinging on the pixel) and noise. In general, the information content deriving from the signal cannot be distinguished from the entropy due to noise. During camera calibration, we project a series of specific mean photons numbers on the bare sensor pixels, i.e. we inject a signal known in advance, so that the noise model of the camera may be accurately determined. Our procedure is adapted from that present in the EMVA1288 standard 1 .
The inside of polytetrafluoroethylene (PTFE) integrating sphere is illuminated by a white LED stable to better than 1/10 5 over a range of output powers from 0.1 mW to 200 mW ( Figure S1). The light intensity is measured through a 1cm x 1cm NIST-traceable photodiode connected to a calibrated 7digit voltmeter placed on the surface of the sphere. The sensor is placed on the axis of the main 5 cm sphere aperture, at a 1 m distance.
In order to calibrate the camera sensor, 1000 images are acquired for each of 200 different illuminations, that are spaced according to a square-law from complete darkness to sensor saturation.
For each pixel we can therefore map each input light level (mean photon number 〈 〉) to a histogram of the recorded digital pixel values , as shown in Figure S2a. From this plot, a mathematical model of the sensor response can be formulated, as described in the EMVA1288 1 . In particular, the relation between the standard deviation of the per-pixel noise and the pixel value can be determined ( Figure S2b).

Figure S1 Experimental setup for microscope cameras calibration
A PTFE integrating sphere is illuminated by a white LED. A photodiode is connected to a calibrated 7-digit voltmeter. The microscope camera is placed on the axis of the sphere at 1 m from the 5 cm aperture of the PTFE sphere. The figure has been generated via Microsoft PowerPoint 16.53.

Training of the Random Forest (RF) algorithm for segmentation tests on PC microscopy images
To implement the method described in the present work, different machine learning models have been trained on raw data for different cellular segmentation applications.
As described in the Methods section, in order to segment PC images, we have trained the FastRandomForest algorithm via the Weka segmentation ImageJ plug-in. In this case, classifier training has consisted in manually annotating single pixels of the raw image according to two classes: cell and background.
In the case of the microspheres' micrograph shown in Figure 2a, 592 pixels (0.05% of the total number of pixels) have been annotated for the cell class and 71864 (6%) for the background one. The classifier is initialized with 200 trees and is trained by using 2 random features per node selected between gaussian blur, hessian, sobel filter and difference of gaussians operators. Training provides an out-of-bag error of 0.05%.
To assess the performance of the trained model, we have calculated on a test set the receiver operating characteristic (ROC) and the precision -recall curves ( Figure S3  The confusion matrix at a threshold of 0.5 is provided by the Table S1 Predicted: Negative Predicted: Positive The confusion matrix at a threshold of 0.5 is given by and provides a JI of 99%, confirming the good quality of the trained classifier also in the case of the MPK cells' micrograph.

Training of the RF algorithm for segmentation tests with LS microscopy and OPT images
In the case of the voxel classification performed on the LS microscopy dataset (Figure 3a and   The confusion matrix associated to anatomy vs background classification at a threshold of 0.5, corresponding to a JI of 98%, is given in Table S3 Predicted: Negative Predicted: Positive  The confusion matrix for the plaques vs anatomy classification at a threshold of 0.5, which provides a JI value of 31%, is given by

Impact of the DP compression on resolution parameters in PC microscopy
Resolution parameters in a standard PC microscope, such as the Point Spread Function (PSF) and the Modulation Transfer Function (MTF), are measured to compare the predictive uncertainty due to raw data noise of the tested segmentation models with that originated from physical resolution uncertainties. Moreover, we estimate resolution parameters from the raw and the DP compressed datasets to show that their values are preserved upon DP compression.
We compare the value of the PSF of an optical microscope with 20x objective that is extracted from the raw and the DP images of 500 nm diameter microspheres. The FWHM of a single microsphere spatial profile, estimated over 10 different spheres, was measured as (882 ± 36) nm and (883 ± 36) nm from the raw and the DP images, respectively ( Figure S9a

Impact of the DP compression on resolution parameters in light-sheet microscopy
For the same reasons previously discussed, we measure the PSF of the LS microscope from a 3D stack of 796 images of 100 nm diameter microspheres. In particular, we estimate the PSF from 10 microspheres selected in the raw dataset, as well as in the DP compressed one. As we are interested in comparison of the resolution parameters estimates from the raw and the DP compressed images, we did not implement the pre-processing pipeline utilized in 4 and do not exclude the possibility of beads aggregates.
The lateral PSF, obtained from the average of the FWHM of the x and y spatial profiles ( Figure S10a and b), turned out to be (15.4 ± 5.1) µm and (15.3 ± 5.2) µm for the raw and the DP images, respectively. The axial PSF, i.e. the FWHM of the z profile (Erreur ! Source du renvoi introuvable.e), is (20.1 ± 6.5) µm and (20.2 ± 6.7) µm for the raw and the DP images, respectively.
We conclude that the values from the two datasets, as well as their statistical dispersion, are in very good agreement. Moreover, also in the 3D case, the prediction spread associated to the noise of the raw data, as shown in 3g is around 2-3 voxels, is definitely smaller than that provided by the Point Spread Function (PSF) of the microscope.
14 Figure S10 Measurement of the PSF in light-sheet microscopy obtained from raw data and after

DP compression
Lateral and axial PSF obtained from the spatial profile along x (a), y (b) and z direction (c) of a single 100 nm polystyrene microsphere imaged (insets) with a 1x objective. The figure has been generated via Python 3.7.3 and Microsoft PowerPoint 16.53.