## Abstract

The pair-wise observation of the input and target values obtained from the same sample is mandatory in any prediction problem. In the biomarker discovery of Alzheimer’s disease (AD), however, obtaining such paired data is laborious and often avoided. Accumulation of amyloid-beta (Aβ) in the brain precedes neurodegeneration in AD, and the quantitative accumulation level may reflect disease progression in the very early phase. Nevertheless, the direct observation of Aβ is rarely paired with the observation of other biomarker candidates. To this end, we established a method that quantitatively predicts Aβ accumulation from biomarker candidates by integrating the mostly unpaired observations via a few-shot learning approach. When applied to 5xFAD mouse behavioral data, the proposed method predicted the accumulation level that conformed to the observed amount of Aβ in the samples with paired data. The results suggest that the proposed model can contribute to discovering Aβ predictability-based biomarkers.

### Similar content being viewed by others

## Introduction

Prediction is a powerful approach to evaluating the association between the input and target values. To predict a target value based on an individual input value, it is usually necessary to acquire paired data consisting of input and target values obtained from the same individual. However, in biomedical research, especially in the field of neurodegenerative diseases, obtaining such paired data in different modalities is laborious and often avoided. The same holds true for the research field of biomarker discovery in Alzheimer’s disease (AD), the most common cause of dementia^{1,2}.

AD is a neurodegenerative disease in which neurons in the brain gradually die, causing progressive cognitive decline characterized by memory loss, impaired judgment and reasoning skills, communication difficulties, and changes in personality and behavior. Most cases of AD progress sporadically, and many patients have no family history of AD^{3}. The genetic background of sporadic AD has been extensively investigated, revealing the complex genetic architecture of late-onset neurodegenerative disease^{4,5,6,7}. The diagnosis or risk foresight of AD before the onset of the irreversible progression of neuronal loss may enable the potential treatment of the disease or the administration of appropriate symptomatic medication. The rapidly growing population of affected people raises an urgent demand for developing novel biomarkers and prediction methods for AD before irreversible neurodegeneration^{8,9,10,11,12,13}.

In AD, the gradual accumulation of amyloid-beta (Aβ) precedes irreversible neurodegeneration^{1,2,14}. The Aβ accumulation level, therefore, can be an indicator of disease progression in the very early phase^{15,16,17}. The indirect assessment of Aβ, Aβ imaging in the brain using positron emission tomography (PET), and the assessment of Aβ in cerebrospinal fluid (CSF) potentially facilitate AD onset prediction^{1,2,18,19,20}. Recently, phosphorylated plasma tau was reported as a promising blood biomarker candidate to detect the accumulation status of Aβ in a human cohort^{8,21}. However, the current biomarker candidates in AD are not pair observed with the direct quantification of Aβ; thus, the effectiveness of the candidates in the very early phase is unknown. During biomarker discovery, researchers usually evaluate the binary predictability, i.e., healthy/disease, mild cognitive impairment/AD, amyloid positive/negative, or significant difference between groups using statistical tests. However, the progression timing and disease dynamics may be heterogeneous even within each group. Especially during the accumulation process, such heterogeneity could mask the differences researchers aim to detect (Fig. 1a). The direct quantitative observation of Aβ can be performed only in the brain tissue of humans after death or animals after euthanization. Owing to the difficulty of paired observations, the quantitative accumulation level of Aβ in the brains of individuals, which may reflect the progression state of early AD pathology, may have been overlooked in past biomarker discoveries.

In the present study, to overcome this problem, we developed a hierarchical Bayesian model that describes the Aβ accumulation process and observation of biomarker candidates utilizing mostly unpaired data (Fig. 1b). In the model, Aβ deposits over time according to a logistic function, whose parameters are unique to each sample. The effectiveness of the biomarker candidates can be evaluated by the predictability of the quantitative accumulation level of Aβ. Owing to the Bayesian probabilistic formulation, the model can naturally integrate mostly unpaired data through few-shot learning, increasing the predictability of accumulation levels based on the observed biomarkers. By applying the model to the behavioral data sets of 5xFAD mice^{22}, we predicted the accumulation level of Aβ solely from behavioral data from mostly unpaired data, supporting the concept of biomarker discovery based on predictability.

## Results

### Hierarchical Bayesian model of Aß accumulation

To represent the pathogenesis of AD, we developed a mathematical model describing the accumulation process of Aβ in the brain and the observation process of Aβ and biomarker candidates. Here, we assumed an AD study using model animals, and each sample was identified as belonging to either healthy wild type (WT) or AD model animals. In the model, Aβ accumulates over time according to the logistic function (Fig. 1c):

where the suffix \(n\in \{1,2,\ldots ,{N}\}\) represents the index of the sample, and \({t}_{n}\) is the age of the month in which the observation of sample \(n\) was conducted. \({\theta }_{n}=\left\{{\alpha }_{n},{\beta }_{n},{\tau }_{n}\right\}\) is a set of parameters depending on the individual animals and \({\alpha }_{n}\), \({\beta }_{n}\), and \({\tau }_{n}\) denote the maximum level of Aβ accumulation, steepness of Aβ accumulation, and critical period to reach half of the maximum Aβ, respectively. To express heterogeneity in the Aβ accumulation process among animals, different values of \({\theta }_{n}=\left\{{\alpha }_{n},{\beta }_{n},{\tau }_{n}\right\}\) are assigned to the individual animals, following the distribution shared within the same type of animals \({s}_{n}\), i.e., WT or AD model animals.

The observed amount of Aβ, \({y}_{n}\), is subject to noise as \({y}_{n}={z}_{n}+{\sigma }_{y}{\xi }_{n}\), where \({\sigma }_{y}\) and \({\xi }_{n}\) indicate the noise strength and Gaussian noise with zero mean and unit variance, respectively. The observed data on biomarker candidates \({{\bf{x}}}_{n}\in {{\mathbb{R}}}^{L}\) were assumed to reflect \({z}_{n}\) and its temporal derivative \({{z\text{'}}}_{n}\), i.e., \(d{z}_{n}/{dt}\),

where \({{\bf{z}}}_{n}={\left({z}_{n},{{Cz\text{'}}}_{n},1\right)}^{{\rm{T}}}\), \(W\in {{\mathbb{R}}}^{L\times 3}\) indicates the weight matrix, \({S}_{x}={\rm diag}({\sigma }_{{x}_{1}},{\sigma }_{{x}_{2}},\ldots ,{\sigma }_{{x}_{L}})\in {R}^{L\times L}\) and \({{\boldsymbol{\varepsilon }}}_{n}\in {{\mathbb{R}}}^{L}\) indicate independent Gaussian noises with zero mean and unit variance, respectively (Fig. 1b, c)\(.\) The temporal derivative \({{z\text{'}}}_{n}\) indicates the instantaneous speed of Aβ accumulation derived from the logistic equation. \(C\) is a scaling factor common across samples. Here, we assumed that biomarker candidate data \({{\bf{x}}}_{n}\) were generated (Fig. 1b, c) via the same biological process among animals. \(W\) is the shared parameter among animals sampled from the same distribution. This model was formulated using a hierarchical Bayesian model (see Methods).

### Few-shot learning procedure to predict Aß accumulation

Based on this model, we aimed to estimate the latent Aβ accumulation \({z}_{n}\) from the observed data on the biomarker candidate \({{\bf{x}}}_{n}\). To this end, we also needed to train the model by estimating the parameters from the data. Here, we made two assumptions, based on the data that were actually available: first, the animals must be sampled once as snapshots, not as a time series, owing to the requirement of euthanization. Second, most samples only included either Aβ accumulation or candidate biomarker data, implying that these were observed in different animal populations and only a small subset of the samples had paired data for the biomarker candidates and Aβ accumulation at the same age. Therefore, there were three types of data: unpaired data on Aβ accumulation, unpaired data on biomarker candidates, and paired data containing both.

Given the above assumptions, we proposed a Bayesian probabilistic approach to train the model by integrating paired and unpaired data via few-shot learning or semi-supervised learning (see Methods). In the first step, using only the unpaired dataset of Aβ accumulation, we pre-trained the model by estimating the distribution of the logistic function parameter \(\theta\) for each of the WT and AD model animals (Fig. 1d). Next, using the estimated distributions used in the first step as prior knowledge, we inferred the distribution of all model parameters from the remaining datasets; the unpaired dataset on biomarker candidates was used for unsupervised learning, whereas the paired dataset was used for supervised learning.

After learning the parameters, the model could predict the accumulated level of Aβ, \({z}_{n},\) from the observed data on biomarker candidate \({{\bf {x}}}_{n}\) (Fig. 1d; see Methods).

### Experimental data of AD model mice

To apply our estimation method, we adopted real-world experimental data from the AD model and WT mice^{22}. The AD model mice were 5xFAD transgenic mice with five human familial AD mutations in APP and PSEN1 on the background of a C57BL/6J strain, showing robust Aβ pathological accumulation and neuronal cell death. Most of the data contained unpaired data either of the Aβ accumulation (24 samples) or behavioral features (82 samples). In contrast, some paired data existed (18 samples, only for 8- and 12-month-old AD model mice; Fig. 2a). Aβ accumulation was evaluated by the insoluble fraction level of Aβ_{40} and Aβ_{42} in the hippocampus of 42 AD model mice at 4, 8, 12, and 18 months of age (Fig. 2d). For WT mice, the observation of Aβ was unavailable in the dataset. We assumed that the insoluble Aβ in WT mice remained undetectable during the mice’s lifetime based on reports that WT mice do not develop Aβ plaque during their normal life span^{23}, and prepared virtual samples of the WT mice with the unpaired observation of Aβ accumulation. Each AD model and WT mouse were behaviorally evaluated; we selected three types of behavioral experiments where 11 features were obtained at 4, 8, or 12 months of age (Supplementary Table 1; Fig. 2b). These 11-dimensional data were addressed as biomarker candidate data and visualized using principal component analysis. The distribution of data from the samples that had the paired data was approximately within the range of the distribution of data from the samples that had the unpaired data (Fig. 2c).

### Prediction of Aß accumulation in AD model mice

Using our method, we predicted Aβ accumulation in the hippocampus using behavioral features as biomarker candidates. First, we pre-trained the model to learn the distributions of the parameters of the logistic function based on the insoluble fraction level of Aβ. The pre-trained model generated logistic time courses, representing the observed insoluble fraction level of Aβ_{40} and Aβ_{42} (Fig. 2d). Next, we trained the model using an unpaired dataset of behavioral features and a paired dataset.

Using the trained model, we predicted the accumulation level of Aβ from the behavioral features of the paired data using leave-one-out cross-validation (Fig. 2e). The prediction errors were almost the same for the two types of Aβ (mean squared error [MSE] = 0.060 for Aβ_{40} and MSE = 0.111 for Aβ_{42}). Most predictions followed the observed amount of Aβ both in Aβ_{40} and Aβ_{42}. However, several data were unpredictable with large errors (asterisks in Fig. 2e). Three of the samples that were not predictable were from 8-month-old mice and shared between Aβ_{40} and Aβ_{42}. Moreover, 8-month-old mice showed larger prediction errors than 12-month-old mice (MSE of Aβ_{40} = 0.083 at 8 months, 0.046 at 12 months; MSE of Aβ_{42} = 0.214 at 8 months, 0.046 at 12 months, Supplementary Figure 1a). To identify the cause of the large errors, we predicted the mouse type of the samples based on the behavioral features and found that two of the three samples from 8-month-old mice with large Aβ prediction errors were predicted as WT rather than AD models. All samples that were incorrectly predicted to be from WT mice were from 8-month-old mice. (Supplementary Fig. 1a, b). We then evaluated the importance of the instantaneous speed of Aβ accumulation, \({{z\text{'}}}_{n},\) incorporated at the hidden state, \({{\bf{z}}}_{n},\) of the proposed model. If the term was excluded from the hidden state, the absolute error increased in some samples with observed Aβ accumulation level lower than 0.66 in Aβ_{40} (Fig. 2f). Here, the threshold was set such that all the 8-month-old samples were included below the threshold. The Aβ of the samples may still be in the process of accumulating, suggesting the importance of the term in the very early phase of AD. The accumulation level of Aβ in the cortex was predicted, though the prediction performance was not as good as that in the hippocampus (Supplementary Fig. 2).

We also evaluated the prediction performance when changing the ratio of the paired and unpaired data. We divided the paired data into two equal parts and examined the prediction accuracy when only one of them was used as paired data and the other was used as unpaired data. Similarly, we examined the prediction accuracy of dividing the data into three equal groups, using only one group as paired data and the other as unpaired data. On average, the prediction accuracy did not change much even when only 1/2 or 1/3 of the paired data was used (Fig. 2g).

### Selection of biomarkers to predict Aß accumulation

To assess the importance of each behavioral feature in predicting Aβ accumulation, we removed each behavioral feature and evaluated the prediction error for each feature (Fig. 3a). We then found that predictive performance was considerably decreased by removing “time in the center” in the open-field experiment and “time spent in the open arm” in the elevated plus maze experiment, which exhibited significant differences between WT and 5xFAD mice^{22}. Next, we ranked the features by their impact on the prediction error and assessed the prediction performance by including features individually from the top of the ranks (Fig. 3b). The prediction error decreased considerably until the top five features of the ranks were recruited, which comprised features from three different experiments. The results show that multivariate features from different experiments could be potential AD biomarkers.

We also evaluated the number of features required for the proposed model to determine its predictability. Here, we randomly selected several behavioral features by varying their number, trained the model, and made a prediction (Supplementary Fig. 3a). Statistically significant differences were detected between predictions using 1–7 and 11 features (Mann–Whitney U-test with Holm’s correction for multiple comparisons, *p* < 0.05). In contrast, there was no significant difference between the predictions using 10 and 11 features (Supplementary Fig. 3b), suggesting that as many diverse features as possible are preferable to achieve better prediction performance.

### Prediction of Aß accumulation by conventional machine learning methods

The predictive performance of the proposed model was compared with that of conventional machine learning techniques, namely ordinary linear and random forest regressions. To fairly compare prediction performance, we virtually created paired data from randomly selected WT mice and used the mice as a training or test sample for the prediction. We then demonstrated that our proposed model outperformed standard machine learning techniques both in the prediction of Aβ_{40} and Aβ_{42} in the hippocampus (Fig. 4a, b, Supplementary Table 2). The predicted values of Aβ accumulation with the ordinary linear regression overlapped in WT and 5xFAD mice, and the predicted values in some WT mice were negative (Fig. 4a, b). Moreover, the random forest regressor failed to predict the level of accumulation (for instance, large or small) in 5xFAD mice (especially as shown in Fig. 4b**)**. Indeed, the proposed model showed a smaller median absolute prediction error than that of standard models (Supplementary Table 2).

### Application to synthetic data

In the 5xFAD experimental data used in this study, the paired data with Aβ accumulation and behavioral features were limited only to the phase when Aβ accumulation has vastly progressed, i.e., in 8- and 12-month-old mice. Thus, the predictive performance of the proposed model for the earlier phase of Aβ accumulation remains unknown. To evaluate this, the model was applied to synthetic data that contained earlier phase samples.

To this end, we prepared synthetic unpaired and paired data at various ages, including the early phases, by simulating the model. The synthetic data were composed of 20 samples of Aβ accumulation alone, 50 samples of biomarker candidates alone, and 50 samples of paired data for each AD model and WT (Fig. 5a). First, we pre-trained the model using the synthesized unpaired data on Aβ accumulation, representing the variation in the observed accumulation level of Aβ (Fig. 5b). We then trained the model using unpaired data on biomarker candidates and paired data in the manner of few-shot learning. We confirmed that the estimated parameters followed the ground truth used for the synthesized data (Fig. 5c), indicating that our estimation method was efficient. Using the trained model, we predicted \({z}^{* }\), the Aβ accumulation level of unknown samples from the observed biomarker features \({\bf x}^{* }\) (Fig. 5d, top left). We evaluated the prediction accuracy by changing the ratio of supervised paired samples for training. The MSE changed slightly with varying ratios, suggesting that not many supervised samples were required for prediction (right and bottom left in Fig. 5d).

Finally, we evaluated the predictive performance dependency of the samples on age. When assessing whether the true mouse type was consistent with whether the predicted Aβ accumulation levels of zero or non-zero, the presence or absence of Aβ accumulation was predicted relatively accurately in the samples from older mice as compared to those from younger mice, i.e., zero Aβ in the WT mice and non-zero Aβ in the AD model mice (Fig. 5e). The predictive performance of Aβ accumulation in the AD model samples did not change significantly with age (Fig. 5f, Mann–Whitney U-test with Holm’s correction for multiple comparisons). The results suggest that the model can predict Aβ accumulation levels in samples in the early phase of the accumulation process, while it may fail to predict the type of sample in the same phase.

## Discussion

Herein, we proposed a hierarchical Bayesian model that describes how biomarker candidates are generated in response to the accumulation of Aβ in the brain. By integrating mostly unpaired data on Aβ quantification and the behavioral features obtained in behavioral experiments with 5xFAD mice, our model predicted the quantitative accumulation level of Aβ based on behavioral features in most samples. The instantaneous accumulation speed introduced at the hidden state of our model was suggested to play an important role in the prediction, especially during the early phase. Based on the effect of each biomarker candidate on predictability, we revealed that multiple behavioral features from three different behavioral experiments could be important biomarkers for predicting Aβ accumulation level. This study may demonstrate the proof-of-concept of Aβ-predictability-based multivariate AD biomarker discovery.

The proposed model can naturally integrate information from paired and unpaired data. The inference of the distribution of the logistic function parameters from Aβ-observed unpaired data constrains the dynamic range of Aβ accumulation level. The information from unpaired data that lack an observed amount of Aβ provides the generation process of biomarker candidates and their variability. Learning information from the paired supervised data further helps calibrate the generation process. Notably, as the simultaneous observation of biomarker candidates and Aβ from the same sample is labor-intensive, expensive, and technically challenging, most samples usually lack information on either biomarker candidates or Aβ accumulation. In this scenario, the proposed model makes it possible to make such incomplete datasets available for Aβ predictability-based biomarker discovery.

The predictive performance of the proposed model was found to be inferior in young mice than that in older mice. The nature of the analyzed biomarker candidates and the proposed model limitations could have affected the results. The principal component space of the 5xFAD mouse behavioral test data showed little difference between WT and AD model mice in the younger samples (Fig. 2c), which probably caused “misclassification” of some of the 5xFAD mice as WT mice resulting in large prediction errors (Fig. 2e, Supplementary Fig. 1). Such “misclassification” of the two samples was avoided in the prediction by a random forest regressor (Fig. 4a, b). Therefore, the proposed method could be improved by adding a nonlinear process, such as the kernel method, to the process of generating the observed data. We confirmed whether the model with nonlinear transformation yields better prediction performance by adopting a 3-layer neural network at the generation from \({\boldsymbol{z}}\) to \({\bf x}\). However, the prediction performance only improved slightly (data not shown). Since the goal of this study is to contribute to the discovery of biomarkers, we believe that discovering biomarker candidates more suitable for prediction, is also important.

In humans, the accumulation of Aβ may initiate 10–20 years before the recognized cognitive decline^{1,2}. However, in model animals, behaviors and cognitive abilities were altered before the saturation of Aβ accumulation. For instance, 5xFAD mice showed a decline in memory function before 6 months of age, when Aβ was still in the accumulation process^{22,24,25}. These facts suggest that a probabilistic model sensitive to an earlier stage of Aβ accumulation is preferable to discover biomarkers in model animals. Furthermore, the gradient term of accumulated Aβ at the generation of biomarker data in the model might bestow such specificity to the proposed model when analyzing data that contain phase-specific biomarkers (Fig. 2f).

Whether Aβ acts upstream in the cascade leading to cell death, as in the amyloid hypothesis, is controversial^{26,27}. However, Aβ indeed accumulates in the initial stages of AD. Even if there is no causal relationship between the accumulation of Aβ and AD progression, Aβ may be a useful precursor for predicting AD. Notably, studies have reported alterations in the phenotypes of AD model mice that appear earlier than the onset of Aβ accumulation^{28}. Such lesions may allow for an earlier definition of the latent state of AD progression.

Machine learning approaches for identifying the latent progression states of AD have recently attracted attention. Probabilistic models, such as mixed effect^{29,30,31,32,33} and hidden Markov models^{34,35,36}, found the latent trajectories of disease progression from longitudinal data of clinical cohorts in unsupervised learning. Our approach differs from those of previous studies in that the proposed model assumed the quantitative accumulation level of Aβ as “the latent progression state” and estimated the state via a few-shot learning approach and directly describes the relationship between Aβ levels and biomarkers. These characteristics should be advantageous in predicting Aβ levels from biomarkers, especially when the number of samples available for training is limited. Furthermore, Aβ may propagate from a brain region to other regions^{37,38}, which suggests that the spatial distribution of accumulation could be the hidden state. The molecular biological observation or the direct quantitative observation of Aβ is challenging in humans. Nevertheless, the proposed framework is potentially beneficial for discovering non-invasive convenient biomarkers^{8,9,39,40} that are relevant to the amount of PET-detected Aβ or CSF Aβ in human data.

In other neurodegenerative diseases, such as Parkinson’s disease, Lewy body dementia, multiple system atrophy, Huntington’s disease, amyotrophic lateral sclerosis, and frontotemporal lobar degeneration, abnormal proteins accumulate in specific brain regions, possibly leading to neuronal death^{41,42,43}. The modeling approach presented herein is also potentially applicable to such neurodegenerative diseases. The risk of neurodegenerative diseases is a growing concern in an aging society. Predictability-based biomarker discovery using the proposed model may contribute to identifying biomarkers that make available predictions and potential interventions for diseases.

## Methods

### Generative model of Aß and biomarker candidates

Serial PET scans of humans used for the imaging of Aß^{44} and in vivo imaging of Aß plaques^{45} have demonstrated that the temporal progression of Aß accumulation can be characterized by a sigmoid-shaped trajectory. Based on these previously reported findings, our model assumes that Aß accumulation follows a logistic function:

where \({\theta }_{n}=\left\{{\alpha }_{n},{\beta }_{n},{\gamma }_{n}\right\}\) is a set of parameters, and \(n\) is the animal index. This equation can be rewritten from Eq. (1), where \({\gamma }_{n}={\beta }_{n}{\tau }_{n}\). Accordingly, the temporal derivative of the Aβ accumulation is

The parameter \({\theta }_{n}\) depends on the individual, following distributions as

where \({\mathscr{N}}{\mathscr{(}}{x|}\mu ,{\sigma }^{2})\) and \({{\mathscr{N}}}_{+}({x|}\mu ,{\sigma }^{2})\) indicate normal distribution with mean \(\mu\) and variance \({\sigma }^{2}\) and truncated normal distribution with a range of \(x\, > \,0\), respectively; \({{\bf{s}}}_{n}\) is a one-hot vector representing the type of the samples, i.e., WT as \({\left(\mathrm{1,0}\right)}^{{\rm{T}}}\) or AD model mice as \({\left(\mathrm{0,1}\right)}^{{\rm{T}}}\); \({\mu }_{\phi ,k}\) and \({{\sigma }^{2}}_{\phi ,{k}}\) (\(\phi \in \{\alpha ,\beta ,\gamma \}\)) indicate parameters of WT (\(k=1\)) or AD model mice (\(k=2\)).

The observed amount of Aß, \({y}_{n}\), is generated from \({z}_{n}\) as

The observed data on \(L\)-dimensional biomarker candidates \({{\bf{x}}}_{n}\in {{\mathbb{R}}}^{L}\) was generated from \({z}_{n}\) and \({z}_{n}^{{\prime} }\) as

where \({{\bf{z}}}_{n}={\left({z}_{n},{C}{{z\text{'}}}_{n},1\right)}^{{\rm{T}}}\), \(W\in {{\mathbb{R}}}^{L\times 3}\) indicates the weight matrix, and \({\Sigma }_{x}={\rm diag}({\sigma }_{{x}_{1}}^{2},{\sigma }_{{x}_{2}}^{2},\ldots ,{\sigma }_{{x}_{L}}^{2})\). \(C\) indicates a scaling factor common among samples that calibrates the range of \({z\text{'}}\).

### Prior distribution of parameters

For parameter estimation in a Bayesian manner, we introduced the prior distributions of parameters. The hyper-parameters \(P\left({\alpha }_{n}\right)\), \(P\left({\beta }_{n}\right)\) and \(P\left({\gamma }_{n}\right)\) are sampled from the following distributions:

where \(\phi \in \{\alpha ,\beta ,\tau \}\), \({\rm{Gamma}}\left(\left.x\right|a,{b}\right)\) indicates a Gamma distribution with shape parameter \(a\) and rate parameter \(b\).

The prior distribution of \(W\) was

where \({{\bf{w}}}_{l}\) indicates the \(l\)-row of the weight matrix \(W\). The prior distribution of its hyper-parameter \({\sigma }_{{w}_{l}}^{2}\) was hierarchically introduced as

The prior distribution of \({\sigma }_{{x}_{l}}^{2}\) was

In the Markov chain Monte Carlo (MCMC) sampling algorithm, we used \({v}_{\alpha ,k}=0.1\), \({a}_{\alpha ,k}=25\), \({b}_{\alpha ,k}=0.5,{v}_{\beta ,k}=0.1\), \({a}_{\beta ,k}=100\), \({b}_{\beta ,k}=1\), \({v}_{\gamma ,k}=1\), \({a}_{\gamma ,k}=10\), \({b}_{\gamma ,k}=1\), \({a}_{w}=100\), \({b}_{w}=1000\), \({a}_{x}=0.5\), \({b}_{x}=1\) and \({\sigma }_{y}=0.05\).

When the model learned the distributions of the hyper-parameters from the Aß observation (step 1), we set \({m}_{\alpha ,k}\), \({m}_{\beta ,k}\) and \({m}_{\gamma ,k}\) as the values estimated by the least square method.

### Bayesian inference of parameters

The model parameters were learned in two steps. In the first step (step 1), we inferred the posterior distributions of the parameters of a logistic function and those of hyper-parameters, given the data on Aβ accumulation as follows:

where \({\mu }_{\theta }=\{{\mu }_{\theta ,1},{\mu }_{\theta ,2}\}\), \({\mu }_{\theta ,k}=\{{\mu }_{\alpha ,k},{\mu }_{\beta ,k},{\mu }_{\gamma ,k}\}\), \({\sigma }_{\theta }^{2}=\{{\sigma }_{\theta ,1}^{2},{\sigma }_{\theta ,2}^{2}\}\), \({\sigma }_{\theta ,k}^{2}=\{{\sigma }_{\alpha ,k}^{2},{\sigma }_{\beta ,k}^{2},{\sigma }_{\gamma ,k}^{2}\}\), and \({{\mathcal{S}}}_{y}\) is the set of samples with Aβ accumulation in the training data, and \({N}_{y}\) is the number of samples in \({{\mathcal{S}}}_{y}\). We assumed that the observed level of Aβ at time \(t=0\) in all the samples would be \({y}_{n}=0\). This posterior distribution was estimated using the MCMC sampling algorithms. No-U-Turn samplers (NUTS) were used in this step because it is difficult to derive the closed form of the posterior distribution. From the posterior samples, we estimated the parameters of the Gaussian and Gamma distributions for \({\mu }_{\theta ,k}\) and \({\sigma }_{\theta ,k}^{2}\), respectively; these distributions with the estimated parameters were used as priors for the hyper-parameters. In the first step, MCMC sampling was performed across three independent chains, where 3000 samples were drawn for burn-in, and another 3000 were drawn to estimate the distribution.

In the second step (step 2), we inferred the posterior distributions of all parameters in the model, given the unpaired data on biomarker candidates and the paired data on Aβ accumulation and biomarker candidates. We used the NUTS-within-Gibbs approach for the inference. The weight matrix \(W\), the variance of the weight matrix \({S}_{W}=\{{\sigma }_{{W}_{1}}^{2},{\sigma }_{{W}_{2}}^{2},\ldots ,{\sigma }_{{W}_{L}}^{2}\}\), and the variance of the observation noise \({\sigma }_{x}\) were sampled using Gibbs sampling as follows:

The Gibbs sampler for weight matrix \(W\!:\)

where \({{\mathcal{S}}}_{X}\) is a set of samples with at least biomarker candidates in the training data, \({N}_{X}\) is the number of samples in \({{\mathcal{S}}}_{X}\), and

The Gibbs sampler for observation noise of biomarker candidates \({S}_{x}\!:\)

where

The Gibbs sampler for the variance of the coefficient matrix \({S}_{W}\!:\)

where

The posterior distribution of other parameters was sampled by MCMC using NUTS as

where \({{\mathcal{S}}}_{{XY}}\) and \({{\mathcal{S}}}_{X}\) are sets of samples with paired data on Aβ accumulation and biomarker candidates and unpaired data on biomarker candidates, respectively. We adopted the distributions estimated in Step 1 as \(P\left({\mu }_{\theta }\right)\) and \(P({\sigma }_{\theta }^{2})\). In the second step, MCMC sampling was performed across three independent chains, where 5000 samples were drawn for burn-in, and another 5000 were drawn to estimate the distribution. The inference program was implemented in Python using the NumPyro framework.

### Prediction of Aß accumulation from biomarkers

In the prediction of Aß accumulation in the test data (step 3), we computed a conditional posterior predictive distribution of \({y}^{* }\) given \({{\bf{x}}}^{* }\) using the following equation:

where \(D\) is the learned training data, \({W}^{\left(i\right)}\), \({\Sigma }_{x}^{(i)}\), \({{\rm{S}}}_{W}^{(i)}\), and \({\theta }^{(i)}\) are the posterior samples of the parameters, \(I\) is the number of posterior samples, and \({N}_{t}\) is the number of time points considered in the prediction \(t=\{2,3,\ldots ,18\}\).

Similarly, in the mouse type prediction in the test data, we computed a conditional posterior predictive distribution of \({\bf s}^{* }\) given \({{\bf{x}}}^{* }\) using the following equation:

### Behavioral experiments with 5xFAD mice

We used the dataset previously described in Forner et al.^{22}, obtained from a public repository (AD Knowledge Portal; https://adknowledgeportal.synapse.org/). Eleven features from three experiments were analyzed using our proposed model. In the open-field experiment, the velocity and time ratio in the center (the time in the center divided by the time in the arena) was used in the analysis. In the elevated plus maze experiment, the amount of time a mouse spent cumulatively in the open arm, closed arm, and center area of the maze was used in the analysis. In the contextual fear conditioning experiment, the activity level, inactive freezing frequency, and cumulative duration of inactive freezing were monitored for each mouse during 2-min habituation and exploration in a chamber. Subsequently, an electrical shock was applied to the mouse. After 24 h, the same behavioral features were monitored for 5 min in the chamber.

### Preprocessing

Preprocessing was performed to analyze 5xFAD mouse behavioral data. Behavioral features were standardized such that the mean and standard deviation of each feature were 0 and 1.0, respectively. The observed amount of Aβ was scaled so that the maximum observed value for 12-month-old mice equaled 1.0. Based on the assumption that insoluble Aβ in the brain of WT mice remains undetectable throughout their lives, we virtually generated unpaired-Aβ-observation WT samples at 8, 12, and 18 months of age, where the observed amount of Aβ at each time sample was 0.0. A 5xFAD mouse “individual ID = 572” was excluded from the paired-data samples because the measurement of insoluble Aβ in the sample may have failed.

### Comparison with linear regression and a random forest regressor

To fairly compare the prediction performance of the proposed method and conventional machine learning methods against both WT and 5xFAD mice, we prepared supervised samples of the WT mice and 5xFAD mice. We randomly selected 18 WT mice, all of which had unpaired behavioral data, and provided paired data in which the Aβ level was observed to be zero using them as samples with the paired data. The standard machine learning methods were implemented using the scikit-learn module in Python. The number of trees in the random forest regressor was set to 100.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

Datasets for the current study are obtained from the AD Knowledge Portal (https://adknowledgeportal.synapse.org/).

## Code availability

All code used in the current study is available on the GitHub repository. (https://github.com/yadapo/PredictionAbeta).

## References

Long, J. M. & Holtzman, D. M. Alzheimer disease: an update on pathobiology and treatment strategies.

*Cell***179**, 312–339 (2019).Bateman, R. J. et al. Clinical and biomarker changes in dominantly inherited Alzheimer’s disease.

*N. Engl. J. Med.***367**, 795–804 (2012).Bekris, L. M., Yu, C.-E., Bird, T. D. & Tsuang, D. W. Genetics of Alzheimer disease.

*J. Geriatr. Psychiatry Neurol.***23**, 213–227 (2010).Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease.

*Nat. Genet.***45**, 1452–1458 (2013).Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing.

*Nat. Genet.***51**, 414–430 (2019).Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk.

*Nat. Genet.***51**, 404–413 (2019).Kondo, T. et al. Dissection of the polygenic architecture of neuronal Aβ production using a large sample of individual iPSC lines derived from Alzheimer’s disease patients.

*Nat. Aging***2**, 125–139 (2022).Karikari, T. K. et al. Blood phosphorylated tau 181 as a biomarker for Alzheimer’s disease: a diagnostic performance and prediction modelling study using data from four prospective cohorts.

*Lancet Neurol.***19**, 422–433 (2020).Schindler, S. E. et al. High-precision plasma β-amyloid 42/40 predicts current and future brain amyloidosis.

*Neurology***93**, e1647–e1659 (2019).Eyigoz, E., Mathur, S., Santamaria, M., Cecchi, G. & Naylor, M. Linguistic markers predict onset of Alzheimer’s disease.

*EClinicalMedicine***28**, 100583 (2020).Vitek, M. P. et al. Translational animal models for Alzheimer’s disease: an Alzheimer’s association business consortium think tank.

*Alzheimers Dement.***6**, e12114 (2020).Sabbagh, J. J., Kinney, J. W. & Cummings, J. L. Alzheimer’s disease biomarkers: correspondence between human studies and animal models.

*Neurobiol. Dis.***56**, 116–130 (2013).Sasaguri, H. et al. APP mouse models for Alzheimer’s disease preclinical studies.

*EMBO J.***36**, 2473–2487 (2017).Jack, C. R. Jr et al. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers.

*Lancet Neurol.***12**, 207–216 (2013).Collij, L. E. et al. Spatial-temporal patterns of β-amyloid accumulation.

*Neurology***98**, e1692–e1703 (2022).Collij, L. E. et al. Multitracer model for staging cortical amyloid deposition using PET imaging.

*Neurology***95**, e1538–e1553 (2020).Grothe, M. J. et al. In vivo staging of regional amyloid deposition.

*Neurology***89**, 2031–2038 (2017).Palmqvist, S. et al. Detailed comparison of amyloid PET and CSF biomarkers for identifying early Alzheimer disease.

*Neurology***85**, 1240–1249 (2015).Caminiti, S. P. et al. FDG-PET and CSF biomarker accuracy in prediction of conversion to different dementias in a large multicentre MCI cohort.

*NeuroImage: Clin.***18**, 167–177 (2018).Therriault, J. et al. Biomarker modeling of Alzheimer’s disease using PET-based Braak staging.

*Nat. Aging*1–10 https://doi.org/10.1038/s43587-022-00204-0 (2022).Milà-Alomà, M. et al. Plasma p-tau231 and p-tau217 as state markers of amyloid-β pathology in preclinical Alzheimer’s disease.

*Nat. Med.***28**, 1797–1801 (2022).Forner, S. et al. Systematic phenotyping and characterization of the 5xFAD mouse model of Alzheimer’s disease.

*Sci. Data***8**, 270 (2021).Xu, G. et al. Murine Aβ over-production produces diffuse and compact Alzheimer-type amyloid deposits.

*Acta Neuropathol. Commun.***3**, 72 (2015).Jawhar, S., Trawicka, A., Jenneckens, C., Bayer, T. A. & Wirths, O. Motor deficits, neuron loss, and reduced anxiety coinciding with axonal degeneration and intraneuronal Aβ aggregation in the 5XFAD mouse model of Alzheimer’s disease.

*Neurobiol. Aging***33**, 196.e29–40 (2012).Richard, B. C. et al. Gene dosage dependent aggravation of the neurological phenotype in the 5XFAD mouse model of Alzheimer’s disease.

*J. Alzheimers Dis.***45**, 1223–1236 (2015).Holmes, C. et al. Long-term effects of Aβ

_{42}immunisation in Alzheimer’s disease: follow-up of a randomised, placebo-controlled phase I trial.*Lancet***372**, 216–223 (2008).Dickson, D. W. et al. Identification of normal and pathological aging in prospectively studied nondemented elderly humans.

*Neurobiol. Aging***13**, 179–189 (1992).Jacobsen, J. S. et al. Early-onset behavioral and synaptic deficits in a mouse model of Alzheimer’s disease.

*Proc. Natl. Acad. Sci. USA***103**, 5161–5166 (2006).Wang, Z. et al. AD risk score for the early phases of disease based on unsupervised machine learning.

*Alzheimers Dement.***16**, 1524–1533 (2020).Raket, L. L. Statistical disease progression modeling in Alzheimer disease.

*Front. Big Data***3**, 24 (2020).Platero, C. Categorical predictive and disease progression modeling in the early stage of Alzheimer’s disease.

*J. Neurosci. Methods***374**, 109581 (2022).Bilgel, M., Prince, J. L., Wong, D. F., Resnick, S. M. & Jedynak, B. M. A multivariate nonlinear mixed effects model for longitudinal image analysis: application to amyloid imaging.

*Neuroimage***134**, 658–670 (2016).Ishida, T. et al. A novel method to estimate long-term chronological changes from fragmented observations in disease progression.

*Clin. Pharmacol. Ther.***105**, 436–447 (2019).Sukkar, R., Katz, E., Zhang, Y., Raunig, D. & Wyman, B. T. Disease progression modeling using Hidden Markov Models.

*Conf. Proc. IEEE Eng. Med. Biol. Soc.***2012**, 2845–2848 (2012).Williams, J. P., Storlie, C. B., Therneau, T. M., Clifford, R. J. Jr & Hannig, J. A bayesian approach to multistate hidden markov models: application to dementia progression.

*J. Am. Stat. Assoc.***115**, 16–31 (2020).Liu, Y.-Y., Li, S., Li, F., Song, L. & Rehg, J. M. Efficient learning of continuous-time hidden markov models for disease progression.

*Adv. Neural Inf. Process. Syst.***28**, 3599–3607 (2015).Jaunmuktane, Z. et al. Evidence for human transmission of amyloid-β pathology and cerebral amyloid angiopathy.

*Nature***525**, 247–250 (2015).Lam, S. et al. Transmission of amyloid-beta and tau pathologies is associated with cognitive impairments in a primate.

*Acta Neuropathol. Commun.***9**, 165 (2021).Hampel, H. et al. Blood-based biomarkers for Alzheimer disease: mapping the road to the clinic.

*Nat. Rev. Neurol.***14**, 639–652 (2018).Leuzy, A. et al. Blood-based biomarkers for Alzheimer’s disease.

*EMBO Mol. Med.***14**, e14408 (2022).Soto, C. & Estrada, L. D. Protein Misfolding and Neurodegeneration.

*Arch. Neurol.***65**, 184–189 (2008).Shastry, B. S. Neurodegenerative disorders of protein aggregation.

*Neurochem. Int.***43**, 1–7 (2003).Spires-Jones, T. L., Attems, J. & Thal, D. R. Interactions of pathological proteins in neurodegenerative diseases.

*Acta Neuropathol.***134**, 187–205 (2017).Jack, C. R. Jr et al. Brain β-amyloid load approaches a plateau.

*Neurology***80**, 890–896 (2013).Burgold, S., Filser, S., Dorostkar, M. M., Schmidt, B. & Herms, J. In vivo imaging reveals sigmoidal growth kinetic of β-amyloid plaques.

*Acta Neuropathol. Commun.***2**, 30 (2014).

## Acknowledgements

This work was partly supported by the Moonshot R&D–MILLENNIA Program (grant number JPMJMS2024-9) by JST. The results published here are in part based on data obtained from the AD Knowledge Portal (https://adknowledgeportal.synapse.org/). The IU/JAX/UCI MODEL-AD Center was established with funding from The National Institute on Aging (U54 AG054345-01 and AG054349). Aging studies are also supported by the Nathan Shock Center of Excellence in the Basic Biology of Aging (NIH P30 AG0380770).

## Author information

### Authors and Affiliations

### Contributions

Y.Y. and H.N. conceived the project and developed the method. Y.Y. implemented the software and analyzed data. Y.Y. and H.N. wrote the manuscript.

### Corresponding authors

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Yada, Y., Naoki, H. Few-shot prediction of amyloid β accumulation from mainly unpaired data on biomarker candidates.
*npj Syst Biol Appl* **9**, 59 (2023). https://doi.org/10.1038/s41540-023-00321-5

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41540-023-00321-5