Proof-of-concept Raman spectroscopy study aimed to differentiate thyroid follicular patterned lesions

Inter-observer variability and cancer over-diagnosis are emerging clinical problems, especially for follicular patterned thyroid lesions. This challenge strongly calls for a new clinical tool to reliably identify neoplastic lesions and to improve the efficiency of differentiation between benign and malignant neoplasms, especially considering the increased diagnosis of small carcinomas and the growing number of thyroid nodules. In this study, we employed a Raman spectroscopy (RS) microscope to investigate frozen thyroid tissues from fourteen patients with thyroid nodules. To generate tissue classification models, a supervised statistical analysis of the Raman spectra was performed. The results obtained demonstrate an accuracy of 78% for RS based diagnosis to discriminate between normal parenchyma and follicular patterned thyroid nodules, and 89% accuracy – for very challenging follicular lesions (carcinoma versus adenoma). RS translation into intraoperative diagnosis of frozen sections and in preoperative analysis of biopsies can be very helpful to reduce unnecessary surgery in patients with indeterminate cytological reports.

. A-projection of healthy and pathologic thyroid tissue samples (including additional independent ones) along f 1 by progressive thyroid number. B-projection of carcinoma and adenoma thyroid tissue samples (including additional independent ones) along f 2 by progressive thyroid number.

Supplementary results: statistical analysis of the thyroid follicular patterned lesions -a different approach
In addition to the statistical results presented in the paper, a new statistical analysis was performed on a larger training dataset in order to employ a higher number of spectra. The main sources of this variability are the spectral noise (no longer reduced by the averaging) and the presence of spectra collected in non-optimal focus condition, due to the non-homogeneous topography of samples (resulting in non well defined peaks and random distortions induced by the algorithm of polynomial correction).
For both the "healthy versus pathologic" and the "carcinoma versus adenoma" cases, a criterion of equality has been applied to obtain a well-equilibrated new dataset. The same number of spectra per patient, equally distributed between a couple of maps for each case for each patient, was taken: in the first case, 200 spectra per each patient (100 healthy and 100 pathologic); in the second case, 100 spectra per patient with pathology. All these spectra were obtained by a systematic grid sampling of the hyperspectral images.
The pre-treatment of the data and statistical procedures were the same as in the main text.
The FP and HWN spectral ranges considered were the same as defined in the main text. The two build PCA-LDA models were tested both by the leave-one-patient-out cross validation and a test set, constituted by a sub-set of the patient data from which also the training set was derived.
Discrimination between healthy and pathologic tissue. In this case, the first principal components assume lower values with respect to the study carried out on the average spectra, as a consequence of the increased number of sources of variability present in the dataset (for comparison, the ~95% of the explained variance is reached at the 11 th PC). From the analysis of the first PCs accounting for the 90%, PC1, PC2 and PC3 didn't pass the t-test for the difference of the score means for the two sample typologies (p-values «0.001). In particular, for PC1, the p-value is very low (p«3E-83). The LDA was then applied on the score samples, and the best result (f 1bis ) was obtained for a linear combination of PC1, PC2, PC3 and PC5, giving about 72% of the cases correctly classified. The validation by means of the test set is shown in Supplementary Table S1 (left part). These results are in good accordance with the ones obtained for the corresponding model f 1 . A comparison of the loadings of the two models (f 1 and f 1bis ) is shown in Fig. S2(A), where a common trend and the presence of peaks identified in the biochemical study are well recognizable.
Discrimination between carcinoma and adenoma tissue. Also in this case, the first principal components assume lower values with respect to the study carried out on the average spectra, and the ~95% of the explained variance is reached at the 10 th PC. All of the first PCs accounting for the 90% didn't pass the t-test for the difference of the score means for the two typologies of samples, but only PC1, PC3, and PC5 reach a value as low as the one corresponding to a recognizable separation in a graphical representation of the scores (respectively, p-values «3E-29, 3E-33, 5E-19).
The LDA was then applied on the score samples and the best result (f 2bis ) was obtained for a linear combination of the first five PCs plus the seventh, giving about 61% of the cases correctly classified. The validation by means of the test set is shown in Supplementary Table S1 (right part).
Even if these results are worse than the ones obtained applying the corresponding model f 2 , f 2bis still shows an equilibrium between sensitivity and specificity. In Fig. S2(B), a comparison of the loadings for f 2 and f 2bis is shown. Even if with some differences at certain wavenumbers (777, 1006, 1337, 1486, 1602 cm -1 ), both the trends have common features and the peaks identified in the biochemical study are well recognizable. Table S1. Results obtained applying classification models f 1bis and f 2bis . The values of the estimation parameters of both f 1bis and f 2bis are reasonably lower in the case of the cross-validation with respect to external validation, due to the fact that the latter dataset was extracted from the same patients used for the training set, resulting this in an overoptimistic evaluation. In the "healthy versus pathologic" case, the discrepancy between the sensitivity and the specificity from the external validation and the considerable difference of the sensitivity values for the two validation methods (Table S1, left part) suggests that the best model f 1bis obtained applying the PCA-LDA methods overfits the data. The much more coherent accuracy, sensitivity and specificity values of the cross validation appear more realistic. In the "carcinoma versus adenoma" case, for each validation method, sensitivity and specificity show similar values, but the discrepancy between the results of the methods (Table S1, right part) evidences that the model f 2bis overfits the data. The obtained results regarding the quality of the developed models considering average spectra and large number of spectra from hyperspectral images suggest that, in the latter case, other approaches of statistical analysis should be employed to reach biomedical-level accuracy.

healthy (H) vs pathologic (P) -model f 1bis carcinoma (C) vs adenoma (A) -model f 2bis
In fact, in homogeneous cell samples, such as those employed in our study, the major part of variability of the hyperspectral images collected using the Raman microscope is related to the variety of focus conditions. All this resulted in a negative effect on the PCA-based statistical approach. On the contrary, the average spectra are not effected by the same issue and more stable and accurate PCA-LDA models can be obtained.
Point-by-point focus collection is permitted by the present-day technology, but still requires long collecting times and, consequently, the availability of fresh biological samples. All these issues will be taken into account in the setting of future approaches to these studies.
Supplementary Fig. S3. FP spectral region: A-sequence of average Raman spectra collected upon PTC follicular variant and FC thyroid tissues (cases 1-7); B -sequence of average Raman spectra collected upon corresponding healthy tissues. C-sequence of average Raman spectra collected upon Adenoma (Follicular, Macrofollicular, Hyperfunctioning, Oxyphil) and Hyperplastic Colloidal Nodule (cases 8-14), D-sequence of average Raman spectra collected upon corresponding healthy tissues. Spectrum numbers correspond to the thyroid case/patient numbers given in Table 1.