Early identification of esophageal squamous neoplasm by hyperspectral endoscopic imaging

Esophageal squamous neoplasm presents a spectrum of different diatheses. A precise assessment for individualized treatment depends on the accuracy of the initial diagnosis. Detection relies on comprehensive and accurate white-light, iodine staining, and narrow-band imaging endoscopy. These methods have limitations in addition to its invasive nature and the potential risks related to the method. These limitations include difficulties in precise tumor delineation to enable complete resection, inflammation and malignancy differentiation, and stage determination. The resolution of these problems depends on the surgeon’s ability and experience with available technology for visualization and resection. We proposed a method for identifying early esophageal cancerous lesion by endoscopy and hyperspectral endoscopic imaging. Experimental result shows the characteristic spectrum of a normal esophagus, precancerous lesion, canceration, and intraepithelial papillary capillary loop can be identified through principal component score chart. The narrow-band imaging (NBI) image shows remarkable spectral characteristic distribution, and the sensitivity and specificity of the proposed method are higher than those of other methods by ~0.8 and ~0.88, respectively. The proposed method enables the accurate visualization of target organs, it may be useful to capsule endoscope and telemedicine, which requires highly precise images for diagnosis.


S1. Hyperspectral Endoscopic Imaging (HSEI) calculated processes
The estimated spectral processes of the HSEI data are illustrated in Fig. 3. For the algorithm building as step A, the spectra of the 30 Macbeth color checkers are measured by a spectrophotometer (Ocean Optics QE65000) under the illumination of a particular uniform artificial light, and the reflection spectrum of each color checker in the visible light region (380-780 nm) is obtained. These spectra are arrayed as a matrix, ] [D 401x30. The rows and columns indicate the intensities of the wavelengths at 1 nm intervals and the numbers of the color checkers, respectively. By determining the eigen-system and applying the PCA, we selected six eigen vectors that have substantial contribution as the bases of spectral estimation.

S2. Automatic circling and pixel coordinate recording algorithm
We used the flow chart in Fig. 5 as an introduction to encircle and select the lesion's part automatically. The chart is mainly divided into four parts: image capture (step 1), image preprocessing (steps 2-5), lesion's part judgment (step 6), and circle result (step 7).
Step 1: Image capture Step 2: Gray scale The grayscale aims to simplify the image data for other related image processing techniques. These images are RGB color images. The RGB values of each pixel with 24-bit code are 8 bits each for red, green, and blue. The value ranges from 0 to 255. The grayscale image only shows 8-bit strength from 0 (black) to 255 (white). We transmitted RGB values into gray using Eq. S14, and the result is shown in Fig. 5(b).
Gray=R×0.299+G×0.587+B×0.114 (S14) Step 3 Step 4: Binarization Narrow-band imaging (NBI) is used to capture images in this study. Light is centered on the lesion's sample for easy observation. Fig. 5(c) reveals that a selected part is brighter than the other part after contrast enhancement. We filtered unwanted parts to determine the most suitable binarization threshold. The desirable parts are used for analysis, and the image is binarized to 0 (black) and 1 (white), as shown in Fig. 5(d).
Step 5: Invert Invert steps should be conducted on the images to make the autocircle easily and increase the visual condition in Fig. 5(d).
Step 6: Guo-Hall thinning Noise is present in images after binarization. We used a label to process our images and distinguish between noise and signal. The label is used to determine whether a pixel is connected or disconnected. The same label is used for all connected pixels, and a different label number is provided for each of the connected components. Each connected component is separated, and the characteristics are studied. The processing method of the label is shown in Ref. [R1]. Therefore, the lesion region can be obtained using a label, as shown in Fig. 5(f). The unnecessary noise is filtered, which facilitates the lesion observation.
Step 7: Circle result According to the lesion position marked in step 6, we determined the central point and boundary of each lesion, defined the position of each lesion, and encircled the lesion.
Result is shown in Fig. 5(g).

S3. PCA calculated processes
PCA is a common method in multivariate statistics [40]. This technique has been applied to color technology since 1960. PCA involves the determination of a subspace that is less than the original variable, maintaining of data change in a multivariable data set, and projecting the original data in the subspace for analysis [41]. Principal axis analysis is used to define the principal axis direction of a large number of spectral information and simplify the data. This method calculates the variable with high correlation and independence when the original data are reformed. The principal component is subsequently obtained by analysis. The PCA method can explain the variability of most data [R2]. The principal axis component score is expressed as follows: = 1 ( 1 − 1 ̅̅̅) + 2 ( 2 − 2 ̅̅̅) + ⋯ + ( − ̅̅̅), (S16) where 1 、 2 ⋯ refers to the spectral intensity corresponding to the first, second, and until the nth wave length; 1 ̅̅̅、 2 ̅̅̅ ⋯ ̅̅̅ refers to the average spectral intensity of the first, second, and nth wavelength; and coefficients 1 、 2 、 ⋯ are the coefficients of the eigenvector when the covariance matrix is determined for each spectrum. According to the Hotelling's rule, the first principal component constitutes the most information in the original data, which can be regarded as the comprehensive index. The information in the second and third principal components in the original data can be used to classify all groups [R3]. We marked 30 positions as the four kinds of human early esophageal cancerous lesion. Each position includes 400 image elements.
The average spectrum of each position is determined for PCA to obtain different eigenvector groups. Six positions from each group of eigenvectors are used as bases to obtain six eigenvalues. After determining the eigenvectors corresponding to two groups of the largest eigenvalue by PCA, Eq. S16 is used to calculate the eigenvalues of each lesion part. Two groups of eigenvalues (a1 and a2) are selected to plot the scatter graph.
Results are shown in Fig. 6. The a1/a2 axis presents the first and second principle components constituting the 97.50% and 1.96% approximation in the lesion's spectrum of white-light endoscopy, respectively. PCA is performed again to determine the distribution using the data of the individual groups. The range is shown in ellipsoid.
The equation of the ellipsoid is expressed as follows: ( 1 + 1 + 1 ) 2 1 2 + ( 2 + 2 + 2 ) 2 2 2 = 1, where a1, b1, a2, and b2 refer to the eigenvector coefficient of the inverse covariance matrix of the groups, the physical meaning of which is axis rotation; and c1 and c2 represent the average of the data values of the groups. Groups make a parallel movement of all data points. PCA is conducted so that the center of an ellipse is moved to the original space. The d1 and d2 values refer to the feature values of the inverse covariance matrix, which refers to the long axis and half of the short axis of the ellipse.

S4. Sensitivity and specificity of this study
We collected the endoscopic images of patients with different cancer stages to verify the accuracy of triangle region of cancer staging defined in this study: patients 1 and 2 are in intraepithelial papillary capillary loop (IPCL)-V1 squamous cell carcinoma (SCC) stage of esophageal cancer, patients 3 and 4 are in IPCL-V3 SCC stage, patient 4 is also in IPCL-IV severe dysplasia stage, and patient 5 is in IPCL-V1 severe dysplasia stage (Figs. S1-S5). In the five patients, we selected the area (red frame) with even brightness of NBI endoscopic image and converted the image into gray scale.
Consequently, image comparison is enhanced. Binarization is carried out to enhance the image. Visual Basic program is adopted to add IPCL pixel coordinates recording NBI endoscopic image after binarization. The recorded pixel coordinates is exported to notepad. NBI endoscopic image is read through the interface of Color Factory program, and the pixel coordinates in the notepad are also read simultaneously. The displaced image and simulation spectrum of the image are reproduced by hyperspectral image technology.
Analysis is performed on lesion area unclear in the endoscopic images of the five patients through hyperspectral image technology to obtain the average spectrum of these unclear lesion areas. The principal component score chart is obtained through PCA (Fig. S6). Most points of patients 4 and 5 are located in the triangle scope of IPCL-IV severe dysplasia and IPCL-V1 severe dysplasia, respectively. The points of patients 1, 2, and 3 are concentrated in the triangle area of IPCL-V1 SCC and IPCL-V3 SCC.
Furthermore, given that the overlap area of patients 1 and 2 is large, which is not good for assessment, we separated these areas and magnified them individually for evaluation (Fig. S6, small figure). We carried out statistics for the evaluation results of the five patients, and results are shown in Table 1. Notably, the possibility of IPCL-V1 SCC and IPCL-V3 SCC stages for patient 1 is 57.6% and 16.2%, respectively. The possibility of IPCL-V1 SCC and IPCL-V3 SCC stages for patient 2 is 48.8% and 21.8%, respectively. In addition, the possibility of IPCL-V1 SCC and IPCL-V3 SCC stages for patient 3 is 8.5% and 68.3%, respectively. The possibility of IPCL-IV severe dysplasia and IPCL-V1 SCC stages for patient 4 is 95.2% and 1.5%, respectively. The possibility of IPCL-V1 severe dysplasia for patient 5 is 93.6%. Therefore, we can rapidly evaluate the possibility of each cancerous stage of patients.