Learning models for classifying Raman spectra of genomic DNA from tumor subtypes

An early and accurate detection of different subtypes of tumors is crucial for an effective guidance to personalized therapy and in predicting the ability of tumor to metastasize. Here we exploit the Surface Enhanced Raman Scattering (SERS) platform, based on disordered silver coated silicon nanowires (Ag/SiNWs), to efficiently discriminate genomic DNA of different subtypes of melanoma and colon tumors. The diagnostic information is obtained by performing label free Raman maps of the dried drops of DNA solutions onto the Ag/NWs mat and leveraging the classification ability of learning models to reveal the specific and distinct physico-chemical interaction of tumor DNA molecules with the Ag/NW, here supposed to be partly caused by a different DNA methylation degree.


Introduction
Tumor lineages are divided in multiple subtypes characterized by different cell proliferation, migration, and invasion capabilities which, on turn, determine cancer aggressiveness and metastatic potential [1,2].The advent of molecular genetic technology based on DNA sequencing methods [3,4] makes it possible to detect DNA mutations associated with cancer and related subtypes, but the expensive and complex enzyme based (CNN).We will address to the first two methods as the global ones, whereas the last three will be called local ones.We will show that all the strategies achieve a very high classification accuracy, close to 90%.Moreover the local methods allow, also, to identify the relevant spectral ranges that appear to be decisive for their correct classification.
In fact, the data distinction process exploits the information mainly inferred from two significant spectral regions: a low wavenumber (LW region) one, between 125 and 550 cm −1 , containing features related to the different physico-chemical interaction of the DNA molecules with the nanowires; a second one at higher wavenumbers (HW region), between 2300 and 3400 cm −1 , where vibrational peaks associated with the CH/CH 3 bonds of the methyl groups are located.Our data therefore prove that the remarkable capacity of the proposed bioanalytical platform to discriminate cancer subtypes is partly based on its ability to highlight the properties associated with the different degree of methylation of malignant tumor DNA, and the effects that these epigenetic variations have on its mechanical and conformational properties.Given the reversibility of the methylation process, the development of novel strategies to evicence such mechanism influencing gene expression without modification of the DNA sequence, can be extremely interesting, not only for diagnostic purposes, but also for therapy approaches.

Materials and Methods
In this section we describe the sample preparation procedures, the Raman measurements, and the statistical approaches developed to analyse the experimental data (see, also, [20,15]).
2.1.Experimental procedures 2.1.1.Ag/SiNW substrate fabrication Au catalyzed SiNWs have been grown on Si wafers by plasma enhanced chemical vapor deposition (PECVD) using SiH 4 and H 2 as precursors at a total pressure of 1 Torr and flow ratio SiH 4 /(H 2 +SiH 4 ), fixed to 1 ∶ 10.The substrate temperature during the growth was kept at 350 o C, and a 13.6 MHz radiofrequency was used to ignite the plasma with power fixed at 5 W. A metal coating with a nominal thickness of 90 nm was obtained by evaporating an Ag film onto the SiNWs array.

Cell culture and DNA extraction
As skin cancer model [21] we used the human melanoma cell lines SK-MEL-28 and A375 established from patient-derived cancer samples and routinely used in skin cancer research, which were compared to the human immortalized keratinocyte HaCaT as control health skin model.As colon cancer model, we used the cell lines CaCo-2 and HT29, which were compared to the same human immortalized keratinocyte HaCaT as control health skin model.All cell lines were cultured in complete Dulbecco's modified Eagle's medium (DMEM; Hyclone, South Logan, UT) with high glucose (4.5 g/L), supplemented with 10% fetal bovine serum (FBS, HyClone), 2 mMglutamine and 100 IU/mL penicillin/streptomycin (Invitrogen, Carlsbad, CA).Cells were kept at 37 o C in a humidified atmosphere with 5% CO 2 , until medium removal and harvesting by trypsin treatment.
ccdd˙etal-me˙col-sub.tex -20 febbraio 2023 The cells were passaged every 3-4 days at a sub-cultivation ratio of 1:5 and used within 5-20 passages.The cell pellet resulting from subsequent centrifugation for 5 min at 4000 rpm was then processed for genomic DNA extraction.The cells were lysed in hypotonic lysis buffer by repeated pipetting, incubated 15

Raman analysis
A DXR2xi Thermo Fisher Scientific Raman Imaging Microscope has been used to collect a Raman map of all the DNA drops deposited onto the nanostructured substrates and left to dry in air at Room Temperature.
The maps have been collected by using a 532 nm laser source, with a 1 mW excitation power and a 50x objective in a backscattering configuration.Each spectrum composing the map resulted from 4 accumulations lasting 5 ms each.The map step size has been fixed in 4 µm so as to obtain about 4000 independent spectra for each drop.For each sample, HaCaT, SK-MEL-28, A375, CaCo-2, and HT29, the entire measurement process has been repeated 5 to 10 times.

Data set and pre-processing
We performed a pairwise comparison of the experimental data by comparing each time two sets of Raman spectra coming from two different samples.More precisely, we considered six cases: 1) HaCaT vs. A375, 2) HaCaT vs. SK-MEL-28, 3) SK-MEL-28 vs. A375, 4) HaCaT vs. CaCo-2, 5) HaCaT vs. HT29, and 6) CaCo-2 vs. HT29.For each case the two involved samples are conventionally called first and second sample.
In each comparison the data set consisted of N = 4000 spectra, 2000 for each sample.The spectra have been randomly chosen in the central part of the droplets to exploit maximally the interaction between the DNA molecules and the nanostructured substrate.Since the spectra are collected at points of the droplet at distance larger than, or equal to, 4 µm we assume that the data are independent [20].
Large part of the collected spectra share similar features.The few which are substantially different from the others have been considered outliers and removed from the analysis.To do this, we built a decision surface by adding and subtracting three times the (point-wise) empirical standard deviations to the average spectra and discarding those spectra featuring at least one point outside the decision surface.Moreover, we smoothed the data by filtering the original raw spectra with the Savitzky-Golay algorithm [22] (see also [23]) over a window of 90 data points treated as convolution coefficients.

Statistical approaches ccdd˙etal-me˙col-sub.tex -20 febbraio 2023
We propose five different models to classify the spectra.Two are rather simple, essentially geometric, and based on the evaluation of global characteristics of the spectra: the average and the 2 distance from the average.The other three are more sophisticated, they take into consideration the local properties of the spectra, and are based on the PCA analysis, the average pooling of the spectra, and the propagation of the spectra through a 1-D Convolutional Neural Network (CNN).
The spectra are split into training and test sets, the first used to tune the parameters of the models here proposed, the second to validate them.We deal with a binary target variable W , whose outcomes 1 and 0 correspond to the DNA molecules of the first and the second sample, respectively.

Logistic regression on global average (LRA)
The first model we proposed is based on a very simple idea, i.e., the global mean of the spectral intensity is exploited as the unique predictor of a logistic regression model.Thus, the mean value provides a representation of the spectra in both the LW and HW regions.This method does not involve highly detailed data analysis and is meant to provide a basic tool to classify Raman Spectra.In particular, it is of interest to check whereas such a simple approach can be sufficient to identify properly tumoral DNA.The model is penalized by means of an L 2 regularization with setting 1 as shrinkage parameter.The model is validated by means of a ten-fold cross-validation, and its goodness is assessed by the average score of the ROC-AUC (Receiver Operating Characteristic -Area Under the Curve) curve over the ten cross-validation folds.

Evaluation of 2 distance (L2D)
The second method here proposed is based on the analysis of some geometric features of the whole considered portion of the spectra and has already been presented in [20] and it can be described as follows.The training set is used to compute average spectra of the two samples, represented by the column vectors of R p , h 1 for the first sample and h 2 for the second one.Then, the 2 distance between both these averages and each spectrum belonging to the test set and represented by the i-th row of the data matrix X is computed as follows, A classifier for each spectrum is then built thanks to the following outcome binary function where τ ∈ [0, 1] is an optimization parameter, again optimized as above with a ten-fold validation.The outcome is set as equal to 1 if the spectrum is identified as coming from the first sample.

Logistic regression on average pooling (LRP)
The third method here proposed is also based on a logistic regression model, where the input features are computed by applying the average pooling operator on the input spectra.Thus, the spectral domain of each spectrum is divided into non-overlapped and non-equispaced sub-domains and for each subset the mean value is computed.Such an approach aims to pre-process and represent the profile of each spectrum with a lower ccdd˙etal-me˙col-sub.tex -20 febbraio 2023 number of explanatory variables.Depending on the binary task to solve, a finer or coarser partitioning of the spectral domain can be chosen; here a four and three-feature representation has been set for the LW and HW regions, respectively.Again, the logistic regression model is penalized by the L 2 regularization with the shrinkage parameter equal to 1 and the model validation follows a ten-fold cross-validation procedure.
The permutation importance technique [24] is here used to investigate which input features mainly support the predictions of the logistic regression model.This technique is often employed to inspect if the random shuffling of one column feature can drastically decrease the accuracy of the model.Indeed, a random permutation in a column feature causes the break of the correlations between that specific explanatory variable and the target variables and produces a drop in the predictive performance of the model.To quantify the degradation of the accuracy due to the permutations on one column feature, we evaluate the difference between the ROC-AUC of the model and the average ROC-AUC after applying a finite number of permutations.Such a quantity is labeled as the importance score of one column feature.In this case, 30 distinct permutations on each column feature have been applied.

Logistic regression on PCA components (PCA)
We use the notation borrowed from [25, Chapter 1 and Paragraph 8.2.1] and already used in [20].Any single spectrum, in the considered interval, is represented as a column vector x ∈ R p .By collecting the N row vectors x † , where † denotes transpositions, we construct the N × p matrix X which represents the entire data set in the considered interval.The j-th column of X is the collection of the N observations of the j-th variable, namely, the intensity corresponding to the j-th value of the Raman shift.Thus, we compute the N ×p matrix Y by centering X with respect to the columns (i.e., the Raman shift).A principal components analysis is then obtained by the eigendecomposition of the empirical covariance matrix Y † Y [26].The principal components directions v 1 , . . ., v p ∈ R p are computed and we call i-th principal component loadings the p elements of the column vector v i .The projection principal component (PC) of the data Y and its variance of each PC is given by the corresponding eigenvalue.
The variance concentrates on the first m principal components, allowing us to neglect in the next step all the other p − m components.This is way we call "local" this method.
The selected first m principal components z 1 , . . ., z m are thus interpolated to build a logistic regression model to estimate the probability mass function of the binary target variable W by where β i ∈ R for i = 0, . . ., m.In this case, we consider an optimization parameter λ ∈ [0, 1] such that we associate the outcome for the binary variable W = 1 to each set of components z 1 , . . ., z m if Pr(W = 1|z 1 , . . ., z m ) ≥ λ and W = 0 otherwise.The tuning parameter λ is estimated by means of a ten-fold cross-validation procedure (see, [27,Ch. 7]).The original sample is randomly partitioned into ten equally sized subsamples.A subsample is kept as test set, while the other nine ones are used as training data.Then the accuracy of both the methods are evaluated, while the cross-validation process is repeated ten times, paying attention to use each round a different subsample as test group.The ten results are then averaged to compute a single estimation.The advantage of this validation strategy is that all observations are used at the same time for both training and testing and each observation is used for testing exactly once.

1-D Convolutional Neural Network (CNN)
1-D CNN represents a type of feed-forward neural network designed to solve a broad class of classification tasks when the input features are precisely 1-D grid-structured data [28,29,30,31].Such a class of models combines convolutional and max-pooling operators to encode the sequentiality of the patterns contained in the input data.As a result, the optimization of the weights defining the convolutional filters of the convolutional layers aims to give the most linearized latent representation of the input Time-Series.In our case, therefore, we regarded the spectra as some Time-Series whose "temporal evolution" takes place along the spectral domain.
Before being propagated thought the layers of the CNN model, the spectra are neither rescaled nor transformed furtherly.The model, therefore, is validated by means of ten-fold cross-validation.To assess the goodness of the model, we compute the ROC-AUC score on each fold; the average AUC score over all the cross-validation folds is to be intended as the goodness of the model.The Standard Error Mean is used to estimate the error on the average ROC-AUC score.
The design of our CNN is purely convolutional, i.e. it consists of a sequential combination of Convolutional Layers followed by Max-Pooling Layers.The non-linear activation function is embedded in the convolutional layer; in specific, we opted for a softplus function, i.e. φ(x) = log (1 + exp (x)).Dropout layers [32] with dropout rate of 0.25 are also employed to contrast overfitting.Each convolutional layer possesses 64 filters whose convolutional masks have an amplitude of 3 pixels; the pooling size of the pooling layers is equal to 2. The sequence of convolutional and pooling layers is then repeated three times; the resulting feature map is therefore flattened by means of a Flatten Layer.Finally, the flattened feature map is propagated through one Fully-Connected Layer with 16 output nodes and softplus activation function.This particular layer returns the latent representation of the spectra.The latent representation is therefore propagated through a Fully-Connected Layer with a sigmoid activation function and one output node, that is the output node of the CNN.During the training phase, the ADAM [33] algorithm is used to optimize the Binary-Cross Entropy loss function.The batch size and learning rate are set equal to 64 and 0.001, respectively.
As we know, feed-forward neural networks are usually regarded as black-box models whose activity cannot be expressed by means of closed forms.However, we can visualize the impact that a single input feature has on the final predictions by means of the Vanilla Gradient method [34].Such a method allows the construction of saliency maps based on the evaluation of the gradients ∂o ∂X i , with o the output value of the CNN and X i the i-th input feature.The evaluation of the score-derivative ∂o ∂X i is nothing but the change needed at the pixel X i to affect the class score the most [34].Note that, in our case, X i is exactly the intensity of the i-th Raman shift.The visualization and the interpretation of saliency maps, however, can often suffer from scale problems.Such an inconvenience is often adjusted by applying a monotone function to map the values of the score derivative into the desired interval.We, therefore, opted for the empirical cumulative functions of the score-derivatives themselves, i.e. we construct a saliency map that is specific to the test set.Hence, we feed the CNN model with the instances of the test set; next, we backpropagate the output scores via the Vanilla Gradient method and finally apply the desired empirical distribution function on the score derivatives.

Results and discussion
Drop casting has been used to deposit healthy and cancer DNA solutions on the Ag/SiNWs platform.Figure 1 (left panel) shows a representative dried DNA drop characterized by the typical coffee-ring pattern.The magnified SEM image of the area in the red square, reported in the right panel, shows the morphology of the disordered mat of Ag/SiNWs, which are 2-3 µm long and have diameters ranging from 80 to 150 nm.Furthermore, it is possible to observe a slight sticking of the wires due to the presence of adsorbed DNA.Raman maps were collected in the central part of the drops, to exploit maximally the interaction between the DNA molecules and the nanostructured substrate.Figure 2, left and right panels, report the average Raman spectra calculated over the entire maps for HaCaT with the two melanoma phenotypes, and HaCaT with the two colon cancer phenotypes, respectively.Some specific features can be recognized: i) the bands directly ascribed to the DNA molecules, i.e., those located between 600 and 1200 cm −1 due to aromatic in-plane bending vibrations of the bases and stretching vibrations of the phosphate moiety; ii) the peak at about 234 cm −1 , associated with the metal-nitrogen (Ag-N) stretching vibration mode of the generated surface bond between the deposited nucleotides and the Ag coverage of the NWs [35,36,37,38]; iii) the peak at about 514 cm −1 , originated from the SiNWs themselves, that produce a detectable Si signal even through the Ag coating; iv) the pronounced large band at about 2934 cm −1 , corresponding to the stretching vibrations of the CH 2 and CH 3 groups [4].
The aforementioned Raman features are thus related not only to intrinsic chemical characteristics of the DNA molecules, but also to their physical properties, which influence the molecule arrangement on, and interaction with, the NWs, carrying important diagnostic information.In fact, the band around 234 cm  takes into account the specific DNA molecule adsorption on the nanostructured Ag surface through the N atoms of the basis rings, so that unrepaired oxidative DNA damage, or a different stiffness, can influence the Raman signal at that band.The Si peak at 514 cm −1 coming from the substrate provides information on the surface distribution of the molecules: a different DNA conformation results in a diverse substrate coverage and causes a consequent variation of the peak intensity.Finally, the band at 2934 cm −1 , comprising contributions coming from C-H vibrations, is clearly conditioned by the degree of DNA methylation.On the basis of these considerations, we performed our statistical analysis by focussing our attention on two principal spectral ranges: the low wavenumbers (LW) region consisting of p = 221 spectral points with wavenumber ranging from 125.25 cm −1 to 549.27 cm −1 (orange selection in figure 2) and the high wavenumbers (HW) region consisting of p = 570 spectral points with wavenumbers from 2303.16 cm −1 to 3399.83 cm −1 (light blue selection in figure 2).We first discuss our results for the melanoma data, to concentrate subsequently on the colon tumor ones.Here, our aim is to show that the applied classification methods achieve a very good accuracy when applied to different experimetal settings and targets.

Analysis of melanoma data
In order to evaluate the ability of the methods proposed in Section 2.3 to classify the Raman spectra we first report in Table 1 the AUC values obtained with the different methods for the HW and LW regions of the spectra.The associated ROC graphs are reported in figure 3.
We first note that the low wavenumber part of the spectra allows to distinguish healthy (HaCaT) and tumor (SK-MEL-28 or A375) samples with all the proposed methods.The less performing is the one based on the computation of the 2 norm.On the other hand, this spectral region does not seem to ensure a good classification of tumor sybtypes, indeed, in the comparison between the SK-MEL-28 and the A375 data only the PCA method is able to perform with an AUC larger than 0.9.The other two local methods work  reasonably well with a 0.88 AUC, while the global methods performance is absolutely poor.
The tumor samples are, on the contrary, very well classified and distinguished with all the proposed methods applied to the high wavenumber part of the spectra.
The performance of the local methods is generally better than that of the global ones, but the reason why they are particularly useful is that they provide us with a detailed information about the physical and the chemical phenomena which allow the classification of the tumor subtypes.
Thus, we restrict our discussion to the SK-MEL-28 vs. A375 comparison and show how our statistical analysis provides information at the physical and chemical level.
In figure 4 we report the loadings computed for the first five PCA components, while values of standard deviation, proportion, and cumulative proportion of variance in both the HW and LW regions.Note that in both the regions the cumulative variance associated with these components exceeds the value 99, 9%, with most of the variance related to the first principal component PC1.As shown in figure 4, the contributions of the input variables to the first principal component -the black line -is almost constant in both the regions, with the exception for the peak at 234 cm −1 (LW region) and 2930 cm −1 (HW region).
A peculiar behavior in correspondence of both the peaks is shown also by the second component, enforcing the hypothesis that these two peaks are the most relevant to achieve binary classification within this model.
In figure 5 the average importance scores of the logistic regression on average pooling are shown.As one can see, the LW region is particularly sensitive at 480-548 cm the interaction between the samples and the substrate, as the broad peak around 234 cm −1 (see figure 2) do not support the high performance of the logistic regression.In the HW region, instead, one can see that the binary problem is solved by exploiting information laying in sub-band at 3200-3399 cm −1 .
In figure 6 the saliency maps of the 1D-CNN models are shown.The saliency map presents two broad peaks around 310 cm −1 and 490 cm −1 with saliency values of 0.6 and 0.82 respectively.Both these regions are not centered on some crucial wavenumbers such as 234 cm −1 or 512 cm −1 ; where physico-chemical spectral lines are usually located.When dealing with the HW region, one can see that the saliency map reveals a flat region with saliency 0.6 at the right of the peak of the CH vibrations, i.e., 3030-3300 cm −1 .
In figure 7 we report the loadings related to the PCA components and the salience map for the 1-D CNN (left and right vertical axes respectively) run on the spectra after having subtarcted their mean.As far as LW is concerned, on the one hand 1-D CNN does not identify the region surrounding 200 cm −1 as predictive, while the one after 300 cm −1 is considered salient.On the other hand, the first PCA component detects that the spectrum intensity decreases until 300 cm −1 to become then almost constant.Other PCAs, such as the third and the fourth components, provide a better representation of the characteristics at 230 cm −1 .It has to be remarked that the proportion of variance explained by these two components is 2 orders of magnitude smaller than the one associated to the first PCA component.
Regarding HW, the 1-D CNN model is capable to put in evidence the peak associated to the CH vibrations -even with a salience 0.6, not particularly high.Within the PCA framework, also in this case the first components describes mainly a scaling factor, even if -at least marginally -also the band of the CH vibrations is represented.Higher components, for example the fourth and the fifth, put more in evidence the band located at about 2930 cm −1 .

Analysis of colon tumor data
As explained above, we tested the robustness of the proposed methods by applying our analysis to a data set obtained by using different tumor samples, namely, the colon tumor cells listed as cases 4-6 in Section 2.2.Although the average spectra compare each other differently with respect to the melanoma case, see figure 2, we will show that our techniques perform well also in this case.In figure 10 the average importance scores of the logistic regression on average pooling are shown.As for the melanoma case, the LW region is particularly sensitive at 480-548 cm −1 with larger importance scores of order 0.69.Similary to the SK-MEl-28 vs. A375 case, the broad peak around 234 cm −1 (see figure 2) does not support the high performance of the logistic regression.In the HW region, instead, the binary problem is solved by exploiting information laying in different sub-bands with respect to the ones relevant in th melanoma case.More specifically, a high importance score of 0.65 is found in the sub-band 2701-3200 cm −1 , i.e., where is located the broad peak representing the vibrational modes of CH-groups (usually represented by a broad peak centered at 2930 cm −1 ; see, figure 2).
In figure 11 the saliency maps of the 1D-CNN models are shown.As one can see, the cases SK-MEL-28 vs. A375 and CaCo-2 vs. HT29 support in a different way the highly accurate predictions of the CNN model.As already noted above, the saliency map of case SK-MEL-28 vs. A375 presents two broad peaks around 310 cm −1 and 490 cm −1 .Likewise, the saliency map of case CaCo-2 vs. HT29 reveal a salient region at 350-514 cm −1 .When dealing with the HW region, one can see that the saliency map of case CaCo-2 vs. HT29 can reveal a narrow salient region in correspondence of 2930 cm −1 (a peak with saliency value 0.6).

Conclusions
In this paper we have demonstrated via a thorough statistical analysis that Raman mapping obtained by dropping genomic DNA on disordered Ag/SiNWs can be used to classify different subtypes of tumors both for melanoma and colon cancer.
We have developed several statistical approaches to classify the experimental data showing that both the global and local methods are able to distinguish the healthy and malignant DNA molecules through a different interaction of these molecules with the Ag/SiNWs platform, mainly affecting the low wavenumber region of the analysed spectral range.In addition, the local methods achieve absolutely remarkable performances in classifying the diverse tumor phenotypes.Indeed, 1-D CNN model takes advantage of its pattern recognition activity to capture the characteristic bands associated with the CH vibrations, located in the high wavenumber region of the analysed spectral region, by allowing to separate different phenotypes for both melanoma and colon cancer on the basis of their methylation degree.
A similar result is achieved by means of the method based on a PCA decomposition, which exploits subsets of relevant frequencies to perform a logistic regression.We highlight that the discrimination of DNA coming from distinct malignant phenotypes occurs without any knowledge of the basis sequencing, unlike most of the present biochemical methods.Our results thus suggest that the analysis of Raman spectra of genomic DNA directly dropped onto disordered Ag/SiNWs with 1-D CNN and PCA algorithms provides a rapid, simple, and accurate discrimination of the cancer subtypes by offering a powerful and effective guidance in the patient personalized treatment.

Figure 1 :
Figure 1: SEM images of a representative DNA drop on Ag/SiNW after water evaporation (left panel) and high magnification image of the area inside the red square (right panel).

Figure 3 :
Figure 3: SK-MEL-28 vs. A375 comparison.ROC graphs for the methods logistic regression on global average (blue), computation of 2 distance (orange), logistic regression on average pooling (green), logistic regression on PCA components (red), 1-D convolutional neural network (purple) in the low (left) and high (right) wavenumber regions.

− 1 Figure 5 :Figure 6 :
Figure 5: SK-MEL-28 vs. A375 comparison.Importance scores (see Section 2.3.3) for the logistic regression on average pooling for LW (left) and HW (right) regions.On the x-axis the sub-regions and on the y-axis the average importance scores.The error bars represent the 95% confidence interval.
min in ice, and centrifuged for 10 min at 2000 rpm and 4 o C, discarding the supernatant.The extraction of the genomic C and re-suspended in DNase free H 2 O to obtain a 20 ng/µL solution.The final samples are prepared by depositing one drop of the DNA solution on Ag/SiNWs substrates coming from the very same batch. o

Table 1 :
AUC values for the five methodes proposed in Section 2.3 for the three melanoma related cases listed at the beginning of Section 2.2 in the low wavenumber (LW) and high wavenumber (HW) spectral regions.The first column specifies the case and the spectral region.In the following five columns the AUC values reported for the methods logistic regression on global average (LRA), computation of

Table 2
Figure 4: SK-MEL-28 vs. A375 comparison.Loadings of the first five PCA components in the LW (left) and HW (right) wavenumber regions.Black, red, green, blue, and light blue, respectively, for components from one to five.

Table 2 :
SK-MEL-28 vs. A375 comparison.Standard deviation, proportion and cumulative proportion of variance related to the first five PCA components in LW and HW regions, respectively.

Table 3 :
As in table 1 for the three colon tumor cases listed at the beginning of Section 2.2.