Utilising low-cost, easy-to-use microscopy techniques for early peritonitis infection screening in peritoneal dialysis patients

Peritoneal dialysis (PD) patients are at high risk for peritonitis, an infection of the peritoneum that affects 13% of PD users annually. Relying on subjective peritonitis symptoms results in delayed treatment, leading to high hospitalisation costs, peritoneal scarring, and premature transition to haemodialysis. We have developed and tested a low-cost, easy-to-use technology that uses microscopy and image analysis to screen for peritonitis across the effluent drain tube. Compared to other technologies, our prototype is made from off-the-shelf, low-cost materials. It can be set up quickly and key stakeholders believe it can improve the overall PD experience. We demonstrate that our prototype classifies infection-indicating and healthy white blood cell levels in clinically collected patient effluent with 94% accuracy. Integration of our technology into PD setups as a screening tool for peritonitis would enable earlier physician notification, allowing for prompt diagnosis and treatment to prevent hospitalisations, reduce scarring, and increase PD longevity. Our findings demonstrate the versatility of microscopy and image analysis for infection screening and are a proof of principle for their future applications in health care.


Supplementary Note 2. OpticLine use without disposable across Fresenius PD tubing set.
While many of our experiments used the standard Baxter PD tubing set, the Fresenius PD tubing set can also be used. In fact, the Fresenius tubing set has a transparent bubble in its drainage line. We directly imaged across this bubble using the OpticLine clamp without the disposable and learnt that the focal plane of our ball lens lies inside the thick bubble material, preventing us from acquiring quantifiable images of flowing cells. We replaced the ball lens with a 140x zoom compound lens. To do this, we swapped the standard Raspberry Pi V2 camera and printed circuit board with the Raspberry Pi HQ printed circuit board, which could then be attached to any off-the-shelf microscope lens via a C-mount. We then acquired a 140x zoom microscope lens and attached it to OpticLine directly. This new lens with a longer back focal length of 100 mm allowed us to image across the thick Fresenius tubing bubble material and acquire quantifiable images. We determined the number of cells in the viewing window per concentration and tested for statistically significant differences across concentrations ( Supplementary   Fig. 3). Similar to our regression analysis performed with the OpticLine spherical lens and disposable, we can eventually translate these counts to concentrations if we choose to take more images across several patient samples and timepoints to train a new prediction model with this new set-up.

Supplementary Note 3. Hough transform cell counting method.
One of the techniques we tested before developing the present image batch analysis algorithm was a common functionality termed the Hough transform (OpenCV 2021), which generally searches for circles within an image ( Supplementary Fig. 5). This open source function from many third parties follows all of the contours of the image once a mask is produced. It then draws circles of variable sizes, as defined in the input parameters, along every contour over the entire image. If these circles overlap over a region in significant amounts, the function deems these regions as circle centres and outputs their centroids. Because cells are moving at high speeds across these frames, and because we are limited by the shutter speed of OpticLine's camera, many of our images contain cells that are not round but rather elliptical or even linear. We learnt that the Hough over-counted some regions of the images because the function has a high sensitivity. During validation, we realised that these regions consisted primarily of noise from turbulence associated with higher cell counts. This produced a larger dynamic range that helped the device discern variable concentrations with statistical differences. We were satisfied that the device was able to discern different concentrations of cells with the Hough but were dissatisfied that it wasn't only counting cells, but rather turbidity of the fluid associated with higher cell counts. Therefore, we pursued the present image batch analysis algorithm to have a higher specificity for cells. We also performed the classifications of WBC concentration predictions using the Hough algorithm count outputs ( Supplementary Fig. 5). Interestingly, even though the Hough algorithm had slightly better training performance (R 2 was 0.94 vs. 0.93 for the current image batch analysis algorithm), the predictions using the Hough-trained linear regression model and count outputs resulted in a greater false positive rate (0.08), false negative rate (0.21), and lower accuracy (0.87), which further supports the development of our in-house and novel image batch analysis algorithm.

Supplementary Note 4. Cell detection performance in context of peritonitis screening.
When quantifying the performance of our image batch analysis algorithm for cell detection by comparing to manual cell counts, precision and accuracy tended to increase with increasing concentration (up to 73% and 31%, respectively), while lower concentrations had greater sensitivity (up to 45%). Having greater sensitivity at lower concentrations is helpful for our infection screening purposes so that we do not undercount, and thus underpredict WBC concentration, falsely classifying potentially infection-indicating cases as healthy. On the other hand, having greater precision at higher concentrations means that we will be less likely to overpredict WBC concentration at already high concentrations. We recognize that these performance results are relatively low for cell detection. Our focus for our screening tool is to discern healthy effluent from those at risk for infection, so we prioritize how well we can predict WBC concentration for binary classification rather than specific cell detection. See Figure 4 for classification accuracy results.
Supplementary Note 5. Nested cross-validation approach to training/validation of computational pipeline.
We performed the image batch analysis algorithm with 96 combinations of parameter values to output cell counts for images spanning six concentrations (healthy baseline to 120 WBCs/mm 3 ) in effluent from three patients. We focused on these lower concentrations to train our prediction model on critical ranges for infection classification, including multiple patient effluents to increase dynamic range when training and to make our model more robust to different patients. We randomly selected two different batches of five images out of twenty batches per concentration for each patient effluent sample for a total of 180 images.
Due to the linear nature of our cell-counting algorithm (Fig. 2b), we assumed a linear relationship between image cell counts and concentration. To select the parameter set that had the best concentration predictions, we performed k-fold (k=6) cross validation, holding out one set of five-image batches across the concentrations ( Supplementary Fig. 8a). Within each cross-validation iteration, we trained 96 ordinary least squares linear regression models for each set of algorithm count outputs from different parameter combinations, after eliminating outliers by only training on the interquartile range (IQR) of the algorithm counts at each concentration in each cross-validation fold. In contrast to the present method, we regressed counts onto concentration (counts = a*concentration + b) so that our independent variable would be our "ground-truth" variable, and only corrected for negative concentration predictions on averages to minimise bias instead of variance. We calculated the R 2 on the IQR of the held-out test set of algorithm counts. We selected the fold-averaged linear regression model and parameter set that yielded the highest fold-averaged test R 2 , which was 0.76 (Supplementary Table 3).
To evaluate the performance of this parameter-set selection method, we performed nested k-fold cross validation with an outer k=6 layer and an inner k=5 layer, so that one set of five-image batches were held out in each cross-validation layer ( Supplementary Fig. 8b). Performing the same parameter-set selection method described previously in the inner cross-validation layer, we used the cross-validated linear regression model for the best parameter set to predict on the outer-layer IQR of the test set. The best parameter set resulting from the inner parameter-set selection method yielded the same parameter set in each outer cross-validation iteration, which was the same parameter set resulting from the single-layer cross validation (parameter-set selection). The average R 2 for the outer layer test sets was 0.77-similar to the final R 2 of the parameter-set selection method (Supplementary Table 3)-suggesting that our parameter-set selection method was robust.

Supplementary Note 5 (Continued):
We also had two independent raters each manually label the 180 total images for cells to evaluate how well our cell-counting algorithm performs compared to the ground truth of manual counts. We evaluated one rater's counts, the other rater's counts, and the average of their counts in place of the 96 parameter sets for both the single-layer parameter-set selection and nested cross-validation methods. For the single layer, the best manual count was one of the rater's, which had a fold-averaged test R 2 of 0.74, very similar to that obtained for our best parameter set (Supplementary Table 3). Although the best parameter set counts were generally high across concentrations (i.e., in the 100s for 0 WBCs/mm 3 ), they had the highest correlation with the best rater's counts as well as the raters' average counts, suggesting that our algorithm performs similarly to human raters. However, when performing the nested cross validation, there was less consistency for best-performing algorithm count set, and the average outer-layer test R 2 was 0.70, lower than that obtained from evaluating our algorithm counts (Supplementary Table 3). This suggests that while the parameter-set selection method and correlations demonstrated similarity between the manual and algorithm counts, we have less confidence in the single layer with the manual counts, and thus our algorithm achieves better performance than even human raters.
When using the single-layer fold-averaged regression model for the best-performing parameter set to predict on the effluent test samples, we found that the model under-predicted WBC concentrations more than the current model and was less able to classify the effluent samples into non-infection and infection-caution. We then tried performing the parameter-set selection and nested cross-validation pipelines on our original set of 100 images for six concentrations (healthy baseline to 300 WBCs/mm 3 ) of one patient's effluent. While this yielded a new best parameter set with more realistic counts (i.e. in the 10s-20s for 0 WBCs/mm 3 ) and higher parameter-set selection and nested cross-validation R 2 values (0.93 and 0.92, respectively), the concentration predictions were still under-predicted, resulting in a greater number of false negatives than our current model. When identifying what the linear regression model would be with our current algorithm counts from the parameter-set selection method described here, the false negative rate decreased, but the false positive rate increased. Although we aim to minimise the false negative rate, we decided to continue developing the current prediction model that had the best balance between sensitivity and specificity. We also believe that our current parameter-set selection method based on comparison to human raters' counts is more intuitive for the purposes of our publication.

Supplementary Note 5 (Continued):
While the approach of selecting the best parameter set by implementing the linear regression training within the selection process and using nested cross validation to evaluate the parameter-set selection method yielded promising results, we hypothesise that ultimately, there was so much variability among the algorithm counts compared to our training set size that it was difficult to obtain robust regression models when testing on counts from new images. Improving our counting algorithm against non-cell artifacts and increasing our training set size could yield better prediction performance for models resulting from the nested cross-validation approach.

Supplementary Note 6. Pilot market research and human factors study.
We conducted a pilot human factors study approved by the Institutional Review Board at Stanford to collect device feedback from our key stakeholders as soon as possible within our development timeline. Our team of physicians and engineers developed a 15-minute anonymous online questionnaire using Google Forms that was administered from December 2019 to March 2020 to PD past, present and prospective patients and caretakers, and also professionals working with PD, which includes nephrologists, nurses, and researchers. The questionnaire asks PD background questions, such as number of peritonitis episodes and how long it takes to set up PD, before leading into questions about three different design prototypes we were considering at the time, which included the present design. For participants recruited at Lucile Packard Children's Hospital, we additionally collected setup times of 3D-printed, non-functioning prototypes. There were six current and past PD patient and caretaker and 19 professional respondents, and the majority of the latter group believed OpticLine would reduce the time from infection to treatment (74%), patients' worries and fear (79%), and their own stress and workload (63%).

Supplementary Note 7. Additional screening metrics.
In clinical use, OpticLine would determine the screening result using thousands of images from multiple drains throughout the PD session. In our benchtop experiments, we created test infection samples and took the number of images that would be taken in a single drain (n=100). Thus, the present results demonstrate that we can achieve accurate screenings even from a single drain's worth of data, assuming that the test samples represent the average effluent fluid content from an entire PD session. Further clinical testing can indicate other summary statistics such as the mean or maximum WBC concentration prediction to provide additional information for screening. While the standard of care is just one WBC count at the end of a PD session, OpticLine tracks changes in WBC concentration throughout a PD session; novelly, our device may be able to notify patients of potential infection before the end of their PD session.

Supplementary Note 8. Spectrophotometry cell counting method.
Our original solution focused on developing a spectrophotometric approach to detecting the concentration of WBCs in peritoneal dialysis (PD) patients' effluent fluid as an early indication of peritonitis ( Supplementary Fig.   11). Our initial experimentation to measure the optical absorbance of WBCs using the NanoDrop 2000 determined that the peak absorbance was at 265 nm, so we built the OpticLine prototype to measure the optical absorbance of our fluid samples at this wavelength. We tested our spectrophotometer works-like prototype using a range of WBC concentrations diluted in Dulbecco's phosphate-buffered saline (DPBS), mock effluent fluid, and PD patient effluent fluid. OpticLine measurements of mock effluent with added WBCs resulted in a significant difference between infected (132 WBCs/mm 3 ) and non-infectious WBC levels (10 WBCs/ mm 3 ) (Supplementary Fig. 11).
However, when we tested PD effluent, we discovered that one of the four patients' effluent fluid used in the study confounded our results, preventing the OpticLine spectrophotometer prototype from being able to distinguish between infected and non-infected WBC concentrations. We concluded that spectrophotometry alone is not an optimal method for detecting peritonitis in PD patients due to the variability in fluid absorbance that may occur not only across patients, but also between PD sessions for a single patient. Although spectrophotometry used alone could lead to variable measurements, the technology could be used in tandem with microscopy (the focus of our current WBC detection method) to allow for more accurate measurements with a multi-cheque system.         *All other responses if percentage does not add up to 100% for each category were "prefer not to disclose." **First five geographical locations refer to those in the U.S. Percentage calculated from those who reported a state or not living in the U.S. (n=88); n=90 reported living in the U.S.