Asymmetry between right and left optical coherence tomography images identified using convolutional neural networks

In a previous study, we identified biocular asymmetries in fundus photographs, and macula was discriminative area to distinguish left and right fundus images with > 99.9% accuracy. The purposes of this study were to investigate whether optical coherence tomography (OCT) images of the left and right eyes could be discriminated by convolutional neural networks (CNNs) and to support the previous result. We used a total of 129,546 OCT images. CNNs identified right and left horizontal images with high accuracy (99.50%). Even after flipping the left images, all of the CNNs were capable of discriminating them (DenseNet121: 90.33%, ResNet50: 88.20%, VGG19: 92.68%). The classification accuracy results were similar for the right and left flipped images (90.24% vs. 90.33%, respectively; p = 0.756). The CNNs also differentiated right and left vertical images (86.57%). In all cases, the discriminatory ability of the CNNs yielded a significant p value (< 0.001). However, the CNNs could not well-discriminate right horizontal images (50.82%, p = 0.548). There was a significant difference in identification accuracy between right and left horizontal and vertical OCT images and between flipped and non-flipped images. As this could result in bias in machine learning, care should be taken when flipping images.

non-flipped right horizontal OCT images and flipped left horizontal images using DenseNet121, ResNet50, and VGG19. The numbers of images in Sets 2-4(R h f L h ) were the same as in Set 1(R h L h D 121 ). After the 50th epoch, the validation accuracy was 92.97%, 88.92%, and 92.23% in Set 2(R h f L h D 121 ), Set 3(R h f L h R 50 ), and Set 4(R h f L h V 19 ), respectively (Fig. 1). The test accuracies were around 90% (90.33%, 88.20%, and 92.68%, respectively; AUC: 0.902, 0.882, and 0.927, respectively; all p values < 0.001, Fig. 2). The AUCs differed significantly in the ROC curve comparisons (all p values < 0.001).

Comparison of flipped right and non-flipped left horizontal OCT images (Set 5; f R h L h D 121 ).
Set 5( f R h L h D 121 ) comprised horizontally inverted versions of the images in Set 2(R h f L h D 121 ). As we flipped only the left horizontal images in other Sets, it could cause bias. We tried to verify the results by flipping the right eye images. The DenseNet121 model classified the flipped right horizontal images and non-flipped left horizontal images. After the 50 th epoch, the validation accuracy was 89.83% (Fig. 1). The test accuracy was 90.24% (AUC: 0.902, p < 0.001, Fig. 2). In comparison to the ROC curve analysis for Set 2(R h f L h D 121 ), the AUCs were not significantly different ( f R h L h D 121 vs. R h f L h D 121 ; p = 0.756).

Discussion
OCT is a novel imaging modality that provides high-resolution cross-sectional images of the internal microstructure of living tissue 1 . The low-coherence light of OCT penetrates the human retina and is then reflected back to the interferometer to yield a cross-sectional retinal image 14 . Retinal OCT images consist of repeated hyporeflective and hyperreflective layers. Hyperreflective layers in OCT include the retinal nerve fiber layer, inner plexiform layer, outer plexiform layer, external limiting membrane, ellipsoid zone, and retinal pigmented epithelium. Hyporeflective layers include the ganglion cell, inner nuclear, and outer nuclear layers. The choroid and parts of the sclera also appear in OCT images 15 . Our previous study 13 showed that left fundus images are not mirror-symmetric with respect to right fundus images. CNNs are capable of distinguishing the left from right fundus with an accuracy greater than 99.9%. However, it is important to consider the various factors that can affect fundus photography outcomes. In fundus photography, light from the flashlight reflects off the retina and enters the sensor of the fundus camera; a sensor then examines the wavelength and strength of the light. According to the working principle of a fundus camera, fundus images may be affected by the type and location of the light source and sensor, as well as by reflection and   16 . However, OCT is a completely different modality free from these confounding factors; high-quality cross-sectional OCT images allow visualization of the anatomy. Our CNNs showed 99.93% classification accuracy for bilateral horizontal OCT images (Set 1; R h L h D 121 ). This result is not surprising because the thick retinal nerve fiber layer (RNFL), which consists of the papillomacular bundle, is on the nasal side of the fovea but not the temporal side; in addition, large blood vessels are concentrated on the nasal side. Notably, CAM highlighted not only the RNFL but also the entire thickness of the parafoveal retina. These results indicate that CNNs are capable of recognizing anatomical asymmetry based on the anatomical information of every layer of the retina, choroid, and sclera, as well as the RNFL.
The human eye cannot distinguish a right horizontal OCT from a flipped left horizontal OCT, as the images largely coincide with each other. To examine this problem, we used image sets (Sets 2-4; R h f L h ) to train DenseNet121, ResNet50, and VGG19. Although the classification accuracy differed among the models, all of the CNNs showed around 90% accuracy to distinguish for left and right horizontal OCT images. Thus, CNNs can discriminate horizontal OCT images that are not mirror-symmetric.
However, we could not fully explain the CAM results of Sets 2-4(R h f L h ). CAM was used to analyze the last layers of the CNN, and the results may have been affected by the model structure. , and the AUCs were not significantly different (p = 0.756). Through this, we found that there was no difference between flipping the left eye images and the right eye images.
Set 6(R v L v D 121 ) consisted of vertical images of the right and left eyes. It is believed that vertical images of the two eyes are symmetrical; thus, we did not expect the CNNs to distinguish them. Set 6(R v L v D 121 ) images were unmodified, similar to Set 1(R h L h D 121 ). The CNNs distinguished the right and left vertical OCT images with relatively high accuracy (86.57%, AUC = 0.866, p < 0.001). However, the accuracy for Set 6 images was significantly lower than for Sets 1 and 2. The CAM result for Set 6 were also different from Set 1(R h L h D 121 ) and Set 2(R h f L h D 121 ). For Set 6(R v L v D 121 ) images, CAM highlighted not only the parafovea, but also the fovea. In a previous study using fundus photography 13 , CAM brightly highlighted the temporal parafovea and moderately highlighted the fovea. It is possible that the asymmetric differ on the location of the retina, and the temporal parafovea may have a larger asymmetric than the superior and inferior parafovea. This could be explained by asymmetry differing according to retina location; additional research is required to test this hypothesis. Set 8(R h R h D 121 ) was designed to test overfitting, which is a common problem with CNNs. The results for Sets 1-7 may have resulted from overfitting, in which a CNN would show similar results for any random OCT image. The reliability of our results would be demonstrated by an inability of the CNNs to distinguish among uniform images. The CNNs could not accurately discriminate Set 8(R h R h D 121 ) images (p = 0.548), although training loss decreased. This result indicates that the classification results for Sets 1-7 were not affected by overfitting.
We observed asymmetry between the left-and right-eye OCT images. Cameron et al. 20 also observed asymmetry; however, they were unable to identify specific asymmetric components. Wagner et al. 21 reported that the "angles between the maxima of peripheral RNFL thickness" were higher in right than left eyes, and that RNFL asymmetry could be influenced by the locations of the superotemporal retinal artery and vein 22 . The retinal vascular system also exhibits interocular asymmetry. Leung et al 23 reported that the mean central retinal arteriolar equivalent of right eyes was 3.14 µm larger than that of left eyes. In this study, the CNNs were well-capable of recognizing asymmetry.
Based on our results and previous studies using fundus imagery [24][25][26] , it seems clear that CNNs can distinguish several features through analyzing retinal images that cannot be resolved by humans; that is, CNNs can determine patient age, sex, and smoking status. Our CNNs identified several features distinguishing left-and right-eye images that cannot be detected by humans. The results were similar after resetting the CNNs many times. Therefore, we assume that there are hidden patterns in gray-scale OCT images detectable only by CNNs. One possible hypothesis is that the human brain has limited multi-tasking capacity compared to the computer. Human cognition has limitations in processing multiple inputs at the same time 27 . For example, in the "Where's Wally?" visual search task 28 , the human brain has difficulty processing the seven salient features (a red-striped long-sleeved T-shirt, jeans, round glasses, a hat, a chin, and curly hair) simultaneously, whereas a computer can do this easily 29 . The numbers of filters in the last layer of DenseNet121, ResNet50, and VGG19 are 1024, 2048, and 512, respectively. In theory, each filter can find a different feature. Thus, DenseNet121 can process 1024 features, which is beyond human capabilities. The main strength of this study was that we included images of both normal and pathological eyes. It seems that there are significant biocular asymmetries in both healthy and pathological eyes. It would be interesting to analyze asymmetry according to disease type and progression, which could aid the development of a scale for measuring normal structure degradation/destruction. In addition, we used "lossless" BMP and unmodified images (except for the cropping and flipping processes). However, only one OCT device (Spectralis SD-OCT; Heidelberg Engineering) was used, and all analyses were conducted at a single institution. Thus, future studies should compare multiple OCT systems.
In conclusion, we hypothesized that OCT images of the right and left eyes are mirror-symmetric. However, we found asymmetry in both vertical and horizontal OCT images of the right and left eyes. To our knowledge, this is the first machine learning study to assess differences in OCT images of the left and right eyes. Our CNNs could accurately distinguish left-and right-eye OCT images. However, asymmetry may introduce bias into CNN results; thus, care should be taken when flipping images during preprocessing, given the possible impact of bias on evaluations of diseases that involve the macula, such as age-related macular degeneration and diabetic macular edema.

Methods
Study design. This retrospective study was approved by the Institutional Review Board of Gyeongsang National University Changwon Hospital (GNUCH 2020-07-009). The procedures used in this study followed the principles of the Declaration of Helsinki. The requirement for informed patient consent was waived by the Institutional Review Board of Gyeongsang National University Changwon Hospital due to the retrospective nature of the study. Image acquisition protocol. An expert examiner evaluated the retinas with a Spectralis SD-OCT device (Heidelberg Engineering, Heidelberg, Germany). The system acquired 40 k A-scans per second, with an axial resolution of 3.9 μm/pixel and transverse resolution of 5.7 μm/pixel. Twenty-five cross-sectional images were taken with an interval of 240 μm. Each cross-sectional image consisted of 768 A-scans and subtended an angle of 30° (Fig. 3). The examiner took a horizontal cross-sectional image of the macula, followed by vertical crosssectional images. We accessed these images using automated programs written in AutoIt and saved them in the bitmap (BMP) format. Only the 13th image (i.e., the median image of 25 consecutive OCT images) was analyzed. We included all cases in the analysis to reduce selection bias. The cases included various retinal diseases such as epiretinal membrane, macular hole, and rhegmatogenous retinal detachment; some eyes were filled with gas, air, or silicone oil, etc. Since the retinal nerve fiber layer (RNFL) and major retinal vessels gather from the optic disc, the RNFL is thicker on the optic disc than on the opposite side. Also, shadows of the retinal vessels are easily identifiable on the optic disc side. As the optic discs of both eyes are located on the opposite sides, the horizontal images of the right and left eyes look like mirror images. In vertical images (C,D), the left side is the inferior part of the fovea (bottom of vertical green line), and the right side is the superior part of the fovea (top of vertical green line). Because the inferior and superior parts are located at similar distances from the optic disc, the RNFL thickness and the vascular shade density are similar. The vertical images of the right eye (C) and left eye (D) are difficult to discriminate.  Class activation mapping. We used class activation mapping (CAM) 38 to better understand how the CNNs worked. CAM uses heatmaps to identify the areas used by CNNs to make decisions. Hotter areas carry more weight in CAM heatmaps and are more important in the CNN class discrimination process. Using CAM, we identified locations that carried more weight in the final convolutional and classification layers.