## Introduction

A variety of spectroscopic methods have been used in many applications that range from the agro-food industry1 to biological sciences2, medical applications3,4,5, and the pharmacological industry6 among others. In particular, the use of near-infrared (NIR) spectroscopy has been reported for the detection of microorganisms in different media. For example, Yao-Zen and co-workers7 studied the use of NIR hyperspectral imaging to detect, in chicken fillets, the presence of Enterobacteriaceae, a large family of gram-negative bacteria able to cause important diseases. Similarly, Siripatrawan et al.8 reported the detection of Escherichia coli in packaged spinach. For this purpose, the authors implemented an artificial neural network (ANN) to successfully predict food contamination. While there is vast evidence that hyperspectral imaging can be used to detect relatively large microorganisms such as bacteria or fungi9, it is not clear whether these optical techniques can be successfully applied for the detection of viral particles. First, bacteria and fungi are approximately two orders of magnitude larger than viruses and tend to form colonies. Secondly, the size of viral particles typically lies slightly below the lower limit of optical wavelengths (< 400 nm). Despite the apparent limitations that may exist for the detection of viruses using optical spectroscopic imaging techniques, there are few studies that suggest its viability10,11,12,13. However, it is not clear whether these studies detected the presence of the virus or its effects on the tissues of the hosts, as the damage and tissue alterations caused during an on-going infection may appear more prominent than the virus itself at the working wavelengths.

Here, we pursued to understand whether visible and NIR (VNIR) hyperspectral imaging can be exploited to determine the presence of viral particles in a fluid suspension as well as on a surface upon complete evaporation of its water content. We analysed the diffuse optical reflectance spectra obtained from preparations of lentiviral particles pseudotyped with the glycoprotein G of the vesicular stomatitis virus (VSV) using three different approaches: classification (positive/negative) by partial least square-discriminant analysis (PLS-DA) and a feed-forward neural network (FFNN), and quantification of viral load by analysis of averaged spectra, as summarized in Fig. 1. The viral model under examination has been used to investigate a number of human diseases14 including the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)15, as it provides a safer alternative while lowering biosafety requirements in the laboratory. The system engineered here was able to not only detect the lentiviral particles but also to predict the viral load of the sample. This approach can be used for rapid and mass screening of infectious diseases, as it allows for analysing large number of samples simultaneously in a matter of seconds.

## Results

VNIR hyperspectral images were taken immediately after placing the samples (liquid droplets) on a supporting plate and after complete dry-up (dry residue). The diffuse optical reflectance spectra, converted to pseudo-absorbance (PA), were analysed following a pixel-based approach and integrated afterwards to obtain a per-droplet classification. They corresponded to 164 preparations (with an individual volume of 5 $$\upmu$$L) in two fluids: 74 samples prepared in phosphate buffered solution (PBS) and 90 in artificial saliva (AS). The numbers of pixels from samples in PBS were: 74,626 from wet droplets and 80,212 from dry residues. The numbers of pixels from samples in AS were: 92,657 from wet droplets and 94,425 from dry residues. For each fluid, the classification methods used here were trained and evaluated using the same fixed sample sets.

### Pseudo-absorbance spectra

The first step in understanding whether or not the spectra in the VNIR range could discriminate samples with viral particles from their negative controls was to obtain the overall PA spectra of the pixels from 5-$$\upmu$$L droplets. This volume was chosen sufficiently small to resemble relatively large respiratory droplets, and big enough to prevent rapid evaporation. The average number of pixels per droplet was approximately 1800 pixels. Pixels corresponding to image irregularities (e.g. reflection bright spots) were manually segmented, and the overall number of pixels per droplet reduced to roughly 1200, what represents two thirds of the total imaged surface of each sample. When placed on the supporting plate, the diameter of the droplet was approximately 2 mm; thus, assuming a 2-mm-in-diameter disc model, each pixel would represent a volume of 4 nL approximately, the volume of an infectious aerosol as reported for SARS-CoV-216. However, the information embedded within the spectrum of a given pixel does not only arise from that particular volume but also from neighbouring sites, as the contribution of sub-surface scattering plays a predominant role in providing differential information. We appreciated relevant differences in the spectral signatures between samples with lentiviral particles and their negative controls, both in PBS and AS preparations, as shown in Fig. 2a,b. Although less prominent, there were also differences in the spectra of their corresponding dry residues (Fig. 2c,d). This suggests that the spectral signatures carry relevant information about the viral content of the samples. However, the overlapping spectral signatures of the media and the targeted viral particles possess a high variability that requires multivariate analysis. Thus, to reduce the complexity of the information contained within the PA spectra, a PLS-DA model was built as described in the Methodology. This allowed to enhance the spectral differences as illustrated in the examples shown in Fig. 2e,f.

### Classification using PLS-DA

PLS-DA models included between 16 and 20 latent variables and retained from 95.20% up to 96.27% of data variance. The output of the PLS-DA was used for pixel classification. The Wilcoxon rank-sum test performed on these outputs revealed significant statistical differences between samples containing lentiviral particles and their non-transfected equivalent preparations (p-value = 0), as shown in Fig. 3a. The interquartile range overlap was more pronounced in dry residues. Nevertheless, the classifier performed better than a random classifier in all cases at pixel level, as illustrated in the receiver operating characteristic (ROC) curves shown in Fig. 3b-e. Note that a ROC curve below the diagonal line, as in Fig. 3e, represents a classifier model with an inverted output. An example of the resulting pixel classification in each case is illustrated in Fig. 3f-i. Next, to determine whether or not a droplet was taken from a sample containing viral particles, a new threshold was determined as the percentage of positive pixels within a given droplet. The ROC curves in Fig. 3j-m show an important improvement in the performance of the classifier when operating at droplet level (i.e., considering all individual pixels that constitute a given droplet), as information was gathered based on multiple observations.

The area under the ROC (AUROC), a performance metric typically used to evaluate a classifier, showed excellent accuracy in PBS preparations, both in wet droplets and dry residue (AUROC = 0.90). The accuracy of the prediction dropped importantly for samples prepared in AS. Further information on the classification performance can be found in Table 1.

### FFNN classification

In order to extract the embedded information that enables detection of the viral presence, a set of 28 spectral shape descriptors was calculated in seven spectral bands of interest and fed into the FFNN, whose output took values between 0 and 1 in the same fashion as in the PLS-DA. Similarly, the Wilcoxon rank-sum test showed strong differences among the outputs of the FFNN classification obtained from preparations containing lentiviral particles and the negative controls used here, as illustrated in Fig. 4a. These differences were also observed in their dry residues (Fig. 4b). ROC curves were constructed to determine the classification thresholds, as shown in Fig. 4c-f. An example of the resulting classification of the pixels within the same droplet is shown in Fig. 4g-j.

As in the PLS-DA, a per-droplet classification was obtained by requiring a certain percentage of individual positive pixels to consider the sample as positive. Again, the droplet classification threshold was determined following the criteria previously described, as illustrated in Fig. 4k-n. As with the PLS-DA models, this integration of information from individual pixels to droplet level improved drastically the performance of the FFNN classifiers, even to 100% accuracy.

The goodness of the classification method was also assessed by computing the AUROC, which remained above 0.5 in all cases, as shown in Table 2. The sensitivity (SE) and specificity (SP) per-droplet showed similar values for preparations in PBS (SE = 100%, SP = 100%) and AS (SE = 93.3%, SP = 100%), and were substantially higher than those obtained from the PLS-DA. Regarding the analysis of the dry residues, the accuracy of the FFNN algorithm showed similar performance for PBS and AS, above a random classifier at pixel level, and substantially higher at droplet level (AUROC = 0.88 for PBS and AUROC = 0.67 for AS).

### Prediction of the viral load

We observed that some spectral feature descriptors could shed light about the viral load of the sample under study. In particular, the value of the area ratio (AR) descriptor (defined as the ratio between the area under the spectral curve of the sample and the area under the curve of the spectrum of the supporting plate, see supplementary material) showed statistically significant differences (Wilcoxon rank-sum test, p-value = 0) when evaluated at different concentrations, as illustrated in Fig. 5a. In all cases, significant differences (p-value = 0) were obtained between samples containing lentiviral particles and their negative controls prepared in equal concentrations. Furthermore, a very strong linear correlation was found between the value of the said descriptor and the viral load in fresh preparations in PBS (r2 = 0.99) and in AS (r2 = 0.96). A high correlation was also found in dry residues of AS preparations (r2 = 0.88), as illustrated in Fig. 5b-c,e. This linear relationship could not be established for dry residues from PBS preparations (r2 = 0.20), as shown in Fig. 5d. These results suggest that it is possible to quantify the viral load of fresh samples by computing key morphological descriptors in certain spectral fringes. Note here that the lowest concentration included in this analysis was 800 TU·µL−1. Assuming that approximately 0.1% of the virus particles that are present in the preparations are infectious17, this minimum viral load, expressed in -physical units- (copies·mL−1), would be equivalent to 8·108 copies·mL−1. Within the wide range of viral loads of SARS-CoV-2 patients, this value corresponds to the small percentage of the so-called ‘supercarrier’ individuals, potential ‘superspreaders’ of the disease18.

## Discussion

Hyperspectral imaging techniques have been extensively used to rapidly identify pathogens forming biofilms on food products19 or to prevent the spread of plant diseases20 to name a few. Here we have demonstrated it is also possible to detect the presence of a virus, in this case a lab-engineered lentiviral particle, in a fluid using hyperspectral image processing techniques in the VNIR range. Although there have been successful attempts to identify viral infections in mosquitoes11 or plant seeds10 using similar methods, the previous results are limited to the analysis of ongoing infections in tissues, where histological changes or the effects of the host immune response may represent the prevalent source of information to discriminate the presence of the virus. Nonetheless, we have been able to successfully detect viral particles alone in two different media, achieving a sensitivity and a specificity in liquid droplets comparable to those reported from molecular techniques (SE = 100%, SP = 100% in PBS, SE = 93.3%, SP = 100% in AS). At present, the gold standard for the detection of a virus is the polymerase chain reactions (PCR) test. However, with the advent of the global pandemic caused by the SARS-CoV-2 virus, a number of new molecular-based diagnostic technologies have emerged21,22, allowing for a substantial reduction of turnaround times. A significant advantage of VNIR hyperspectral imaging detection over traditional diagnostic techniques relies on the possibility of performing non-contact, reagent-free analysis of a large number of samples simultaneously. We prepared the system to image between 9 and 25 droplets over an area of 9 × 8 cm2, but this technology can be easily scaled to increase the field of view, thus allowing for the analysis of numerous samples concurrently. Note that using a standard personal computer, the generation of numerical models can take several hours. However, once these models are built, the average processing time per droplet is approximately one minute for PLS-DA, and can be substantially reduced when using the FFNN algorithms described here.

It might also have the potential to be adapted for use in personal devices (e.g. smartphones) with adequate optical adapters23. Furthermore, when compared to other optical spectroscopic techniques that rely on the use of a single beam of light, hyperspectral imaging provides information redundancy by accounting for a collection of pixels within a given sample. Note that even when the performance of the classifier at pixel level was slightly above a random classifier, there was a substantial improvement at droplet level. This suggests that if the spectral signature of a given virus contains distinctive information, this can be extracted and enhanced by analysing multiple samples.

The envelope of the lentiviral particles used in this study was VSV-G glycoprotein. This glycoprotein confers some optical properties that may be responsible for its spectral signature. Nevertheless, this study cannot discern whether the information is arising at molecular level or from their aggregations or structural properties, and the corresponding virological and biochemical studies remain open. It is also important to note that these viral particles have a diameter between 80 nm and 120 nm24,25, which is approximately three times smaller than the shortest wavelength within the visible range (i.e. blue light). Therefore, if the information is embedded in the scattered light, the PA spectra obtained from other viral species might also possess distinct features that would enable their optical identification. If that is confirmed, there are numerous applications for this technology. However, the challenge lies with finding spectral features that are substantially different from the background and the environment, that is, the supporting plate and the fluid media.

In this study, samples were classified using two different independent methods. The performance of both techniques were similar, a point that supports the hypothesis underpinning this work. In addition, the statistically significant differences found in the value of one of the spectral feature descriptors across different sample concentrations revealed that there was enough information for quantifying the viral load. It is therefore likely that a combination of different spectral descriptors, when used in an ANN, could also be exploited to further refine the quantification results reported here, as the combination of multiple sources of information can reveal these hidden details.

We also sought to determine whether this imaging method could be used for the screening of viral contamination on fomites. Although the performance of the classifier was in general lower in dry residues compared to liquid samples, the technique showed values of the AUROC between 0.67 and 0.88. In particular, in AS, a medium that mimics human fluids, the sensitivity thus obtained was 87.5% (SP = 60.0%). Note that this study presents a relatively unbalanced number of positive and negative samples leaving room for further improvements. A PLS-DA model built with unbalanced datasets substantially improves with balanced data26.The results reported here support the use of this technology, not only for the screening of fluid samples obtained from a large number of subjects but also for a potential pre-screening of inanimate objects which may be subjected to viral contamination.

## Materials and methods

### VSV-G pseudotyped lentiviral particles

For the production of the lentiviral particles, 293 T cells were transfected with a lentiviral plasmid coding for ZsGreen, a plasmid coding for viral envelope VSGV-g protein (under the control of the human cytomegalovirus promoter) and a plasmid encoding the Tat, Gag-Pol and Rev genes required to construct a 2nd-generation lentiviral system27,28. HEK cells were cultured in Gibco Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% foetal bovine serum (Biowest, Nuaillé, France) and a combination of penicillin (100 UI·mL−1) and streptomycin (100 μg·mL−1) (MilliporeSigma, Missouri, USA). After transfection, cells were incubated at 37ºC, 5% CO2 for 48 h. Next, the culture medium containing viral particles was collected and Lenti-X reagent (Takara Bio Inc, Shiga, Japan) was added to concentrate viral particles through precipitation. The mixture was then incubated at 4ºC followed by centrifugation at 1500 G for 45 min at 4ºC.The supernatant was then removed, and the precipitate stored at -80ºC for later use. As a negative control, un-transfected HEK 293 T cells were cultured and the medium was collected, precipitated with LentiX, and aliquoted as described above.

One hour prior to recording the reflectance spectra, a viral aliquot was thawed and resuspended in either phosphate buffered saline (MilliporeSigma, Missouri, USA) or 1700–0305 artificial saliva (Pickering Laboratories, California, USA). From an initial stock titer of 20·103 transducing units (TU)·$$\upmu$$L−1, serial dilutions were prepared in the corresponding media to obtain the following set of concentrations: [500, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000] TU·$$\upmu$$L−1.

### Negative controls

Droplets of the same fluid (PBS or AS) with the same concentrations of virus culture medium (DMEM) and Lenti-X reagent but without lentiviral particles were used as negative controls. In addition, droplets of pure fluids were included for comparison.

### Cell cultures

Human Embryonic Kidney (HEK) Lenti-XTM 293 T cell lines (Clontech) were maintained in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% (v/v) heat inactivated Fetal Bovine Serum (FBS), 2 mM L-glutamine, 10 mM Hepes, 1% (v/v) sodium pyruvate, 50 μM 2-mercaptoethanol, 100 U·mL-1 penicillin and 100 μg·mL-1 streptomycin at 37°C, 10% CO2.

### Hyperspectral imaging setup and calibration

Spectral information was obtained between 406.62 nm and 996.46 nm using an A-Series VNIR hyperspectral imaging system (Headwall, Massachusetts, USA), in 810 bands with 0.74 nm of nominal spectral resolution. These wavelengths range from the lower limit of the visible (VIS) to the near infra-red (NIR) band of the electromagnetic spectrum. The camera was mounted 30 cm over the sample plane on an A-LST0750-C (Zaber, Vancouver, Canada) motorised linear stage to allow scanning of the samples, and a 35-mm LM35HC lens (KOWA, Saitama, Japan) was used to provide optimal focusing. An area of 9 × 8 cm2 was scanned to ensure light intensity remained approximately homogenous over the sample. The image had a resolution of 1275 pixels by 1000 pixels; therefore, the area of a single pixel was approximately of 70 × 80 $$\upmu$$m2. Optical reflectance signals at each spectral wavelength were codified using 8 bits.

The samples were illuminated using two ASD Illuminator halogen light sources (Malvern Pabalytical, Worcestershire, UK) mounted symmetrically 35 cm above the sample plane with their emission axes forming a 60-degree angle with the sample plane. The colour temperature of the light source was 3100 K and the emission spectrum ranged between 350 nm and 2500 nm. The experiments were carried out under low ambient illumination to mimic a potential point-of-care testing environment. Illumination irradiance was measured in a square grid pattern of 16 points covering the area of the sample supporting plate using a radiometer (HD 2102.2 with a LP 471 Probe, Delta OHM srl, Padua, Italy). The contribution from ambient light was 1.03 ± 0.05 W∙m−2. When the experimental illumination sources were connected, the average irradiance on the supporting plate was 252 ± 28 W∙m−2.

The system was then linearly calibrated between a white and a dark reference. The white reference was obtained from a 3.62-inch Spectralon white reference (Labsphere, New Hampshire, USA). The dark reference was generated by blocking the lens using the lens cap provided by the manufacturer. Note the dark reference contains the background noise generated by the photodetector.

### Preparation of samples and imaging

The samples were imaged at room temperature under ambient conditions (relative humidity varying between 40% and 60%). Using a micropipette, 5-$$\upmu$$L droplets were deposited onto the surface of an approximately 22 mm × 22 mm polytetrafluoroethylene sheet (BSH, Seville, Spain) with a thickness of 1 mm (supporting plate). Each preparation contained between 9 and 25 droplets distributed in a square mosaic pattern to facilitate off-line digital segmentation. The samples were immediately positioned within the field of view of the imaging system over a 10-mm-thick wood sheet. The reflectance spectra were then recorded. Following complete evaporation of the aqueous content, a second reflectance spectra dataset was obtained from the dry residues.

### Computer equipment

Processing units were standard, high-end personal computers (128 Gb RAM, Intel® Core(TM) i9-10980XE CPU 3.00 GHz) running under Windows® 10 Pro, 64 bits.

### Sample sets

A total of 164 preparations (74 samples in PBS and 90 in AS) were analysed. The sample distribution among the different experimental groups is shown in Table 3. These same test groups were used to evaluate both the PLS-DA and FFNN classifiers. See supplementary material for a detailed description of positive and negative samples for each fluid and concentration.

### Image pre-processing

Original spectra of hyperspectral images were converted into pseudo-absorbance (PA) by computing the logarithm of the inverse of the reflectance (R) spectra (PA = log (1/R)). A unit offset was added to avoid zero-singularities in the logarithmic transform. A standard white-dark calibration was then applied. Using the hyperspectral imaging software Evince (Prediktera, Umeå, Sweden), a red–green–blue (RGB) subset of the hyperspectral cube was used to segment the content of each droplet. A conservative approach was adopted to discard peripheral pixels. Next, a principal component analysis (PCA) with two components was performed on the spectra of all pixels within a droplet to discard outlier pixels, typically flare-affected pixels. The spectra were then normalised using the standard normal variate (SNV) transform. A Savitzky-Golay filter was applied afterwards for smoothing, and a correction of the baseline was performed.

### Partial Least-Squares Discriminant Analysis (PLS-DA)

A PLS-DA model29 was constructed using the individual spectra of all pixels of training samples as input, and retaining a number of latent variables sufficient to capture over 95% of data variance using PLS Toolbox 8.6 (Eigenvector Research Inc, Washington, USA) running under Matlab R2020b (The Mathworks Inc., Massachusetts, USA). It generated an output variable with values between 0 and 1 for the classification of each pixel in the test set. Additionally, we re-analysed the data using the mean PA spectra for each droplet (see supplementary material for details).

### Feed-Forward Neural Network (FFNN)

We constructed an ANN following a pixel-based approach30. The input of the network consisted of the values of 28 spectral feature descriptors obtained in the seven spectral bands of interest (between 415 nm and 900 nm). These descriptors quantify representative morphological features (amplitude, length, curve ringing, curvature, compression, kurtosis, horizontality, area and comparisons with the background reference) of every individual spectrum (see the supplementary material for a full description). The output of the network, a value comprised between 0 and 1, was then used for pixel classification. The spectra were classified using an FFNN implemented in Matlab R2020b (The Mathworks Inc., Massachusetts, USA) with sigmoid hidden and softmax output neurons. The FFNN consisted of 20 hidden and one output layers. The number of neurons (196) in each hidden layer was set as the product of the number of spectral descriptors and bands of interest (28 × 7 = 196). Other configurations were discarded after empirical evaluation for best outcomes. To find the optimal weights, the supervised training was carried out using the scaled conjugate gradient backpropagation, limiting the number of iterations to 1000 to avoid overfitting. The performance of the network was tested by evaluating the mean square error. Training, validation and test sets were constructed using random splits of independent samples.

### Data analysis

The non-parametric two-tailed Wilcoxon rank-sum test was performed on the outputs of the PLS-DA and the FFNN models, and on the spectral feature descriptor used for quantification of the viral load. To determine classification thresholds, ROC curves were analysed. The cut-off value was chosen to maximise both, sensitivity and specificity. This value was determined as the point of the curve with minimum quadratic distance to the upper left corner of highest sensitivity and specificity31. To assess the goodness of the classifier, the AUROC was also computed, and the corresponding confusion matrix was calculated. From values of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) classifications, sensitivity (SE) and specificity (SP) were then obtained as follows: SE = TP/(TP + FN), SP = TN/(TN + FP). Statistical significance was considered at the 95% confidence level. P-values below 0.0001 were considered as zero.