Unbiased identification of novel subclinical imaging biomarkers using unsupervised deep learning

Artificial intelligence has recently made a disruptive impact in medical imaging by successfully automatizing expert-level diagnostic tasks. However, replicating human-made decisions may inherently be biased by the fallible and dogmatic nature of human experts, in addition to requiring prohibitive amounts of training data. In this paper, we introduce an unsupervised deep learning architecture particularly designed for OCT representations for unbiased, purely data-driven biomarker discovery. We developed artificial intelligence technology that provides biomarker candidates without any restricting input or domain knowledge beyond raw images. Analyzing 54,900 retinal optical coherence tomography (OCT) volume scans of 1094 patients with age-related macular degeneration, we generated a vocabulary of 20 local and global markers capturing characteristic retinal patterns. The resulting markers were validated by linking them with clinical outcomes (visual acuity, lesion activity and retinal morphology) using correlation and machine learning regression. The newly identified features correlated well with specific biomarkers traditionally used in clinical practice (r up to 0.73), and outperformed them in correlating with visual acuity (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {R}^{2} = 0.46$$\end{document}R2=0.46 compared to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {R}^{2} = 0.29$$\end{document}R2=0.29 for conventional markers), despite representing an enormous compression of OCT imaging data (67 million voxels to 20 features). In addition, our method also discovered hitherto unknown, clinically relevant biomarker candidates. The presented deep learning approach identified known as well as novel medical imaging biomarkers without any prior domain knowledge. Similar approaches may be worthwhile across other medical imaging fields.

+ these authors contributed equally to this work *corresponding author. email: ursula.schmidt-erfurth@meduniwien.ac.at

This PDF file includes:
Supplementary text Figures S1 to S2 Table S1 SI References Supporting Information Text Table   Table S1. Supplementary Table S1: Univariate Pearson correlation coefficients between the global features (v1 -v20) and functional variables as well as measures of disease activity by OCT and fluorescein angiography. Green colour indicates a positive, and blue colour a negative correlation. Correlations with no significant difference from 0 are shown greyed out.

Supplementary Methods
A. Background and approach. In optical coherence tomography (OCT) an interferogram is obtained at a specific point of a sample, yielding an A-Scan containing one-dimensional information (along the z-axis) (1). The A-scan data thus represent the condition of the retina at that specific position in the eye. By scanning the measurement beam across the sampling area, millions of A-scans are concatenated to form entire volume scans. It is this multi-step data acquisition which motivates the reasoning behind our proposed approach. Instead of trying to find an embedding for a volume in a single step, we construct two separate embeddings as depicted in Figure 1 of the main manuscript that reflect the underlying process of OCT acquisition as well as the basic anatomy of the retina. In the first level, we learn a compact embedding of A-Scans and therefore of the local condition of the retina, using a fully connected auto-encoder. In the second level, a convolutional auto-encoder is used to learn a global representation of whole OCT volumes based on the embedding obtained in the first level, resulting in a massive reduction of dimensionality.

B. Dataset.
The experiments reported in this paper were conducted on a dataset consisting of 54,900 OCT volume scans of 1,094 patients enrolled in a randomized clinical trial (2). The volumes were acquired using Cirrus OCT devices (Carl Zeiss Meditec, Dublin, CA, USA) and had a voxel dimensionality of 512 × 128 × 1024, covering a physical area of 6mm × 6mm × 2mm, with a voxel spacing of 11.7 micrometer × 46.9 micrometer × 2 micrometer. The dataset was randomly divided into a train (90%) and test set (10%) with 985 and 109 patients, respectively. There was no overlap of patients between those two sets.
C. Data preprocessing. To reduce the large amount of speckle noise inherently present in OCT data, we use Bilateral Grids due to their fast runtime and easy implementation, on the individual B-Scans. We perform a single pass of filtering to reduce noise while retaining subtle details (3). The position of the retina along the A-Scan is not fixed and depends on patient position during acquisition. To be invariant to this translation we compute a one-dimensional Fast Fourier Transform (FFT) of the A-Scan and discard the phase information by keeping only the magnitude of the complex FFT signal. Due to the resulting symmetry of the real-valued signal we only keep a vector of length 512 of the FFT amplitudes per 1024-long A-Scan.

D. Deep unsupervised learning of local features.
Auto-encoders are trained without any labels and consist of two parts, the encoder and the decoder. During training, the input is encoded by the encoder into a low-dimensional embedding, and subsequently decoded by the decoder to reconstruct the original input. The underlying assumption is that the auto-encoder has to learn a meaningful compact high-level representation of the data to be able to perform accurate reconstruction. In Figure S1 and Figure S2, information within each auto-encoder always flows from the left to the right, with the embedding being the lowest-dimensional state in the middle of the stack. In the first stage of our framework ( Figure S1), the A-Scan auto-encoder AE1 is composed of three simple fully connected layers ([256/64/20] channels), with a weight matrix W l , a bias vector b l and an activation function σ: The sizes of the layer on both sides of the embedding are mirrored, and the weight matrices of two corresponding layers are tied: W l = W T l . Throughout this work the activation function σ is set to be the exponential linear unit (ELU) (4), with α = 1: The cost function used to drive the optimization in auto encoders measures the reconstruction error of the final output y given an input vector x:  Figure S2. All layers are followed by the non-linear activation function ELU, and random-region dropout is applied to the input during training (5). Applying the encoder of AE2 on the A-Scan feature volumes yields a 20-dimensional global feature vector for each volume.

F. Training details.
For the training of the fully connected and the convolutional auto-encoder, we use Adam optimizer with standard parameters. For the former we use a learning rate of 0.0001, early stopping with a maximum of 500 epochs, a minibatch size of 64 and dropout at the input level with a rate of 0.5. For the latter we use a learning rate of 0.0001 for 10 epochs and 0.00001 for 2 epochs, a minibatch size of 8, random-region dropout-factor of 0.25 for the input and ordinary dropout in the first fully-connected layer of AE2 with a factor of 0.5.