Neuron-level explainable AI for Alzheimer’s Disease assessment from fundus images

Alzheimer’s Disease (AD) is a progressive neurodegenerative disease and the leading cause of dementia. Early diagnosis is critical for patients to benefit from potential intervention and treatment. The retina has emerged as a plausible diagnostic site for AD detection owing to its anatomical connection with the brain. However, existing AI models for this purpose have yet to provide a rational explanation behind their decisions and have not been able to infer the stage of the disease’s progression. Along this direction, we propose a novel model-agnostic explainable-AI framework, called Granu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\underline{la}$$\end{document}la_r Neuron-le\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\underline{v}$$\end{document}v_el Expl\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\underline{a}$$\end{document}a_iner (LAVA), an interpretation prototype that probes into intermediate layers of the Convolutional Neural Network (CNN) models to directly assess the continuum of AD from the retinal imaging without the need for longitudinal or clinical evaluations. This innovative approach aims to validate retinal vasculature as a biomarker and diagnostic modality for evaluating Alzheimer’s Disease. Leveraged UK Biobank cognitive tests and vascular morphological features demonstrate significant promise and effectiveness of LAVA in identifying AD stages across the progression continuum.

Adjacency-constrained Hierarchical Agglomerative Clustering.Hierarchical clustering can be generated either top-down called divisive clustering similar to k-means (where a data set is divided into more number of smaller clusters gradually) or bottom-up called agglomerative clustering (where initially every data point is considered as an individual cluster and then gradually merged into less number of bigger clusters).Divisive clustering can be linear in the number of clusters if the number of top levels is fixed, despite that the number of clusters in LAVA formulation is not pre-defined and depends on the application and the granularity nature of the data structure.We use Hierarchical Agglomerative Clustering (HAC) with the ward's linkage.The time complexity of naive agglomerative clustering is O(n 3 ) and can be reduced to O(n 2 logn) when priority queue data structure is used and can be reduced to O(n 2 ) with some further optimization.In the HAC algorithm, the between-cluster agglomerative distance can be recursively computed, while the aggregated distance between clusters can be updated without the need to compute all the pairs of objects contained in the clusters.In this setting, we use the ward's linkage to update the aggregated distance between clusters.This approach attempts to merge two clusters for which the change in total variation is minimized.The total variation of a clustering result is the sum of squared-error ESS(C) (so-called inertia of cluster C [? ]) between every object and the centroid of the cluster containing that object.Thus, Ward's linkage criterion δ can be formulated as follows when two clusters C and C ′ are merged where C is the mean vector (centroid) of the clusters.
Suppose we have two clusters C and C ′ that are merged into a new cluster C * , and let C ′′ be any other cluster.Let the size of cluster activation dataset in the constrained version of the clustering algorithm in a semi-supervised setting.
Given k different parameterization of a classifier model Φ through nested k-fold cross-validation learning paradigm, with L ′ selected layers where l ′ = {1, .., L ′ }, and N = {1, ..., N } total number of input samples denotes the activation of critical neurons at selected layer l ′ of k-th model for i-th input instance.To aggregate activation values, first, we stack activation values across all cross-validating models {Z l ′ } N i=1 .Second, we stack them across all selected layers {Z} N i=1 (where i is an index of input sample instance) to construct a two-dimensional array of the activation values of critical neurons across entire networks over all input samples.
Let Y = {y 1 , ...y n } denote the array of ground-truth labels for all input images.We construct the connectivity graph h = {0, 1} N ×N out of the knearest neighbor graph (K-NNG) as constraints in a semi-supervised learning algorithm.In this graph, if the distance between two nodes p and q is among the k th smallest distance from node p to any other nodes, p and q are connected.In this setting, the standard Euclidean metric measures the difference of ground truth labels assigned to each sample point.The output of this algorithm is a sparse CSR-format connectivity matrix A of shape N × N where only k × N number of entries (self-included) are one and the rest are zero.This algorithm reduces a chunk of distances to the k-nearest neighbors where elements are partitioned by element index k − 1 in the stable sorted array of distances for each sample instance.
Connectivity constraints make the clustering algorithm perform differently in the constrained version of HAC in two aspects: 1.After each step of merging, a graph h (p) will be created (recursively) to record the connectivity constraints between clusters at any iteration p where current clusters are treated as nodes in the graph.2. Two clusters can be merged only if they are connected according to the connectivity constraint graph at the current iteration h (p) .
The pseudocode of this clustering method is provided in Supplementary Algorithm 2.

B Supplementary Algorithms
Algorithm 1 LAVA -Neuron-level Probing 1: Input: 2: A binary vector Ŷ = (ŷ 1 , ..ŷn) of predicted labels for n input samples X = (x 1 , ..xn).3: A set of all neurons at L layers denoted as {S l } L l=1 and their activation values for all input samples, denoted as {Z l } L l=1 .4: A positive integer P number of critical neurons to extract from the selected layer.5: A Kernel type.6: An equal or greater than 1 integer for regularization parameter C. scores ⇐ coefficient of the contribution of neurons to the output of the model Ŷ estimated by ϵ-SVR on {Z l } L l=1 .

13:
Ŝl ⇐ recursively eliminate the least important neurons based on the scores until P is reached by RFE.

C Supplementary Figures
Set of critical neurons at each layer { Ŝl } L l=1 and their activation values {Z ′ l } l ⇐ filter activation values for only critical neurons at each layer.15: end for 16: return Ŝ and Z ′ .
Critical neurons identification.Overlapping between sets of critical neurons at different layers of the network identified repetitively by different parameterizations of the model obtained from K-fold cross-validation is measured by Jaccard similarity.Supplementary Figure 2: Box-plot comparisons between AD and NC groups of cognitive and vascular features.(*) indicates statistical significance (p < 0.01) by two-tailed significance tests.
SupplementaryTable 2: Baseline characteristics of the study populations.P-values conducted on continuous data are computed by the Student's t-test.Categorical variables are computed by Pearson's Chi-squared test.* indicates statistically significant (p < 0.05).