A remote digital memory composite to detect cognitive impairment in memory clinic samples in unsupervised settings using mobile devices

Remote monitoring of cognition holds the promise to facilitate case-finding in clinical care and the individual detection of cognitive impairment in clinical and research settings. In the context of Alzheimer’s disease, this is particularly relevant for patients who seek medical advice due to memory problems. Here, we develop a remote digital memory composite (RDMC) score from an unsupervised remote cognitive assessment battery focused on episodic memory and long-term recall and assess its construct validity, retest reliability, and diagnostic accuracy when predicting MCI-grade impairment in a memory clinic sample and healthy controls. A total of 199 participants were recruited from three cohorts and included as healthy controls (n = 97), individuals with subjective cognitive decline (n = 59), or patients with mild cognitive impairment (n = 43). Participants performed cognitive assessments in a fully remote and unsupervised setting via a smartphone app. The derived RDMC score is significantly correlated with the PACC5 score across participants and demonstrates good retest reliability. Diagnostic accuracy for discriminating memory impairment from no impairment is high (cross-validated AUC = 0.83, 95% CI [0.66, 0.99]) with a sensitivity of 0.82 and a specificity of 0.72. Thus, unsupervised remote cognitive assessments implemented in the neotiv digital platform show good discrimination between cognitively impaired and unimpaired individuals, further demonstrating that it is feasible to complement the neuropsychological assessment of episodic memory with unsupervised and remote assessments on mobile devices. This contributes to recent efforts to implement remote assessment of episodic memory for case-finding and monitoring in large research studies and clinical care.


Background
Differentiating mild cognitive impairment (MCI) from subjective cognitive impairment is important to provide prognosis regarding future cognitive decline as well as regarding the potential eligibility for treatments at the MCI stage of Alzheimer's disease (AD). However, differentiating MCI from subjective cognitive impairment is still very challenging using brief cognitive tests (Petrazzuoli et al., 2020). Older adults who seek medical advice due to memory complaints and who are later found to have an Alzheimer's biomarker profile, have an amnestic variant in which a major component of the impairment affects episodic memory in more than 80% of the cases (Xie et al., 2014). Indeed, episodic memory, the ability to recall spatial and temporal relationships of personally experienced events (Tulving, 2002), is a key component of the neuropsychological assessment of individuals with suspected AD (Costa et al., 2017). Not surprisingly, episodic recall is an important element of the Preclinical Alzheimer Cognitive Composite (PACC5) (Donohue et al., 2014;Papp et al., 2017).
The aim of the PACC5 is to provide a comprehensive assessment of AD relevant cognitive impairment and to serve as a tool with validated sensitivity to detect cognitive decline over time (Donohue et al., 2014;Papp et al., 2017). The assessment of the PACC5 is time-consuming and requires supervision by a trained neuropsychologist (Donohue et al., 2014). This severely restricts its utility and implementation in primary care, especially when considering equalopportunities to PACC5-like assessments also in rural areas, and high-frequency monitoring of cognitive functions in clinical trials and research studies. In general, the long test duration and specialized supervision make the high-frequency longitudinal use of established neuropsychological assessments practically impossible. There is, thus, a strong need for unsupervised, remote, high-frequency cognitive assessment that can provide meaningful approximation of PACC5-like composite scores.
Given that the PACC5 draws heavily on episodic memory measures (WMS-R Logical Memory Delayed Recall and Free and Cued Selective Reminding Test), implementing a mobile and remote proxy for a neuropsychological assessment such as the PACC5 also offers the opportunity to overcome some of the shortcomings of neuropsychological tests. One potential disadvantage of established neuropsychological assessments of episodic memory is for example that they heavily tax on verbal abilities which makes it difficult to assess episodic memory in multi-lingual settings or when verbal abilities are already impaired (Costa, et al., insights into the functional architecture of episodic memory and the spread of AD pathology. Recent work on the functional neuroanatomy of episodic memory showed that episodic memory involves a network including medial temporal, midline parietal and cortical regions, each of which serve different functions and are affected in different stages of AD (Grothe et al., 2017). Episodic memory requires pattern separation processes that are mediated by the dentate gyrus (Bakker et al., 2008;Berron et al., 2016) and reduce memory interference between similar events, and pattern completion processes that are mediated by hippocampal Cornu Ammonis 3 (CA3) and enable the recollection of details from a past event in interplay with neocortical regions (Grande et al., 2019). The medial temporal lobe regions provide information to the hippocampus mainly through the entorhinal cortex. That in turn, receives partly domain-segregated information such that object representations are transferred via the perirhinal cortex and the anterior-lateral entorhinal subdivision and scene representations via the parahippocampal cortex and posterior-medial parts of the entorhinal cortex Maass et al., 2019Maass et al., , 2015Schröder et al., 2015). Taken together, there is converging evidence that in addition to long-term recall, short-term mnemonic discrimination of object and scene representations is impaired in the predementia stages of AD (Grande et al., 2021). Besides pattern separation and completion, a third aspect of episodic memory is recognition memory (Düzel et al., , 2011. Although the neurobiology of recognition memory is complex and it is likely to have a non-episodic, familiarity-based component (Düzel et al., 2001(Düzel et al., , 1999Horner et al., 2012), it is evident that medial temporal lobe dysfunction can impair recognition memory alongside impairments of recall (Horner et al., 2012).
A set of anatomically-informed and non-verbal tasks for episodic memory that incorporate these recent insights into the functional anatomy of episodic memory is available on the neotiv digital platform (https://www.neotiv.com/en) and has been implemented in prospective cohort studies of the German Center for Neurodegenerative Diseases (DZNE).
There are three different tests of memory. First, a short-term mnemonic discrimination test tapping into pattern separation, separately implemented for object and scene stimuli, second, a short-and long-term cued-recall test of object-scene associations tapping into patterncompletion and, third, a long-term photographic scene recognition memory test.
Here we evaluate these three memory measures in a remote and unsupervised fashion using mobile devices. To that end, we develop a Remote Digital Memory Composite score and assess its construct validity using PACC5 in-clinic testing as well as its retest reliability across All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 independent test sessions. Finally, we assess the diagnostic accuracy of the Remote Digital Memory Composite score when differentiating between individuals with and without PACC5based cognitive impairment in a memory clinic sample.

DELCODE study design
DELCODE is an observational longitudinal memory clinic-based multicenter study in Germany.
The detailed study design of DELCODE is reported in . In total, 1079 individuals at the age of 60 years or higher were enrolled in the study between April 2014 and August 2018. Participants were included as individuals with subjective cognitive decline (SCD; n=445), if they presented to a memory clinic with a complaint of cognitive decline and performed better than -1.5 standard deviations (SD) of the age-, sex-and education adjusted normal range on all subtests of the consortium to establish a registry of AD neuropsychological test battery (CERAD) and fulfilled the SCD research criteria (Jessen et al., 2014;Molinuevo et al., 2016). Participants with amnestic MCI (MCI; n=190) and mild dementia of the Alzheimer's type (DAT; Mini-Mental-State-Examination, MMSE, ≥ 18 points; n=126) were enrolled based on the memory clinic's diagnosis, which were guided by the current research criteria for MCI and DAT (National Institute on Aging and Alzheimer's Association -NIA-AA) (Albert et al., 2011;McKhann et al., 2011). First-degree relatives of individuals with DAT were recruited by advertisement (REL; n=82). DAT in the relatives had to be confirmed by medical documentation. Healthy control participants (HC; n=236) were also recruited by advertisement, which explicitly addressed individuals who felt no relevant cognitive impairment. Ten university-based memory centers are participating, which are all collaborators of local DZNE sites. All local institutional review boards (IRB) and ethical committees approved the study protocol.

Remote mobile monitoring add-on study
The remote mobile monitoring add-on study started in 2019 after a separate approval by IRBs and ethical committees of each participating site. All DELCODE participants except patients with DAT were eligible in case they owned a smartphone or tablet with internet access that was technically suitable for the mobile app to be installed on and that they could operate on their own. Seven DELCODE sites recruited 77 participants successfully into the remote mobile monitoring add-on study. One memory clinic associated with the DZNE, the memory clinic of All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 the Department of Neurology and the Institute of Cognitive Neurology and Dementia Research at the Medical Faculty of the University Hospital of the Otto-von-Guericke University, recruited additional 25 memory complainers that were referred from general practitioners (GPs) following memory complaints. The PACC5 was conducted according to the same Standard Operating Procedures in all participating memory clinics throughout the study. DELCODE participants were asked at their regularly scheduled annual follow-up visit and memory clinic patients during their in-clinic visit whether they would like to participate in the add-on study and perform one remote cognitive test every two weeks on their smartphone for 1.5 years. If they agreed, study personnel did lend support installing the app from the respective app store on the participants own mobile device (smartphone or tablet computer), but participants received no further verbal instructions apart from a printed manual. The Object-in-Room Recall test (ORR), the Mnemonic Discrimination Test for Objects and Scenes (MDT-OS) and the Complex Scene Recognition Test (CSR) were completed by participants remotely and unsupervised using their mobile device. Participants were asked to complete memory assessments every two weeks, each of which consisted of a 2-phase session separated by a short delay. The two phases were either two halves of mnemonic discrimination, or encoding and retrieval phases of complex scene recognition and object-inroom recall (see details of the tasks below). Every phase took around 10 minutes. The three different paradigms alternated over the weeks in the following order: CSR, ORR, MDT-OS.
Note, that we only present the results of the first test session of each task (and used the second session for reliability measures). Tests were remotely initiated every two weeks via push notifications which were sent at the same time-of-day as the registration, but participants had the possibility to postpone test sessions. This approach was chosen in order not to urge participants to take the test under suboptimal conditions such as distraction, fatigue or temporary illness. Daily reminders were sent via push notifications until the respective task was completed, and the actual time of testing was recorded. Before each test session, participants were reminded by the app to perform the test in a quiet environment, to put their glasses on if needed and to ensure that their screen was bright enough to see the pictures clearly. They also received a short practice session at the beginning of each session.
After each test session, participants were asked within the app if they were distracted by things happening around them during the session (yes/no decision) and to rate their concentration level and subjective performance (1=very bad, 2=bad, 3=middling, 4=good and All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint 5=very good). Hence, participants received the instructions for the cognitive tests remotely and performed the test fully unsupervised.

Clinical and neuropsychological assessments
The annual neuropsychological testing in DELCODE included the PACC5 (Papp et al., 2017) and other assessments reported in full in . The PACC5 z-score was calculated as the mean performance z-score across the MMSE (Folstein et al., 1975), a 30 item composite screening test, the WMS-R Logical Memory Delayed Recall (Wechsler and Stone, 1987), a test of delayed (30 min) story recall, the Digit-Symbol Coding Test (DSCT; 0-93) (Wechsler, 1981), a test of memory, executive function and processing speed, the Free and Cued Selective Reminding Test-Free Total Recall (FCSRT96; 0-96) (Grober et al., 2008), a test of free and cued recall of newly learned associations, and the Category Fluency Test, a test of semantic memory and executive function. The z-scores for the PACC5 in our analysis were derived using the mean and standard deviation of healthy controls, participants with SCD as well as relatives of patients with dementia in the entire DELCODE study. A PACC5 composite score was calculated when at least three of its five components were available while making sure that at least the MMSE, one memory and either category fluency or DSCT were included (out of the 102 participants, eight provided four PACC5 elements, four participants provided three elements, and 90 provided all five elements).
In the DELCODE cohort, the clinical labels (HC, REL, SCD, MCI) were established in the baseline assessment of each participant. Therefore, the PACC5 assessment provided a more accurate and up-to-date assessment of the cognitive impairment of each participant with respect to the time at which the Remote Digital Memory assessment was conducted (mean time between baseline assessment and app-based testing was 1.2 years while mean time between closest-in-time PACC5 visit and app-based testing was only 0.7 years). Furthermore, the PACC5 assessment is a composite of widely used and well-established cognitive tests and thus allows generalizability of our findings that is stronger than what would be achievable with a single neuropsychological test-based clinical classification. In the DELCODE cohort, clinical assessments also included the Clinical Dementia Rating (CDR). Figure 1A shows the outline of the MDT-OS test Güsten et al., 2021;Maass et al., 2019). In this test, participants are presented with 3D rendered computer-All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint generated objects and scenes that are repeated either identically or in slightly modified versions. Participants need to decide whether a repeated presentation shows a repetition of the original picture or a modified version. They indicate their response by either tapping on a button (for an exact repetition) or by tapping on the location of a change (for a modified version). They see 32 object and 32 scene pairs where half are repeated or modified respectively. One session was split into two phases and completed on two consecutive days following a 24-hour delay. The first phase was presented as a one-back task while the second phase was presented as a two-back task. The test provides a hit rate, a false alarm rate and a corrected hit rate for both the object and scene condition. The corrected hit rate for the scene condition is used for the Remote Digital Memory Composite. Figure 1B shows the outline of the ORR-Test (for a discussion of the principles of pattern completion on which this test is based see (Grande, et al., 2019)). In this test, participants are presented with 3D rendered computer-generated rooms, in which two 3D-rendered objects are placed. Participants recall which object was placed at a specific location cued by a colored circle in the empty room in an immediate recall test. They indicate their recall decision by tapping on one of three objects displayed below the empty room: the correct object for that location, the object that was also present in the room but at a different location (correct source distractor) and a completely unrelated object (incorrect source distractor). They learn 25 such object-scene associations. After a delay of either 30 minutes or 24 hours, the same recall test is repeated. In the ORR test, the ability to recall the correct association is graded and allows to separate correct episodic recall from incorrect source memory. Thus, correct recall excludes the choice of an object that was present in the same room but at a different All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint location (wrong source memory for specific location) and an object that was not present in the room but nevertheless associated with the objects belonging to the room during encoding (wrong source memory for overall location). The test provides several outcome measures.

Object-in-Room Recall Test (ORR)
Total recall: Number of correct immediate plus correct long-term recalled items with a maximum number of 50 correct responses. Total delayed recall: the number of correctly recalled items at the delayed recall. Total recall and cued recognition: the number of correct choices of the target object and the correct source distractor (but not the incorrect source distractor). Delayed recall of successfully encoded items: The number of correct immediately recalled items plus those items that have additionally been recalled after a delay. The latter measure is used here for the Remote Digital Memory Composite.
In the DELCODE add-on study, 12 test sessions of the ORR test with 30-minute and 24-hour delay versions were alternated over successive measures (tests sessions with odd numbers had 30-minute delays while test sessions with even numbers had 24-hour delays). Here, we only report results of the first session, i.e. using a 30-minute delay. For reliability measures, we use data from the first and the third ORR test, since they both have 30-minute delays. Figure 1C shows the outline of the CSR test (Bainbridge et al., 2019;Düzel et al., 2018Düzel et al., , 2011.

Complex Scene Recognition Test (CSR)
Participants see 60 photographic images depicting indoor and outdoor scenes. For encoding, participants make a button-press decision whether the presented scene is indoors or outdoors. After a delay of 65 minutes, the participants are informed via push notification to complete the second phase of the task. Here, the encoded images are presented together with 30 new images and participants make old/new/uncertain recognition memory decisions.
The test provides a hit rate, a false alarm rate and a corrected hit rate. The corrected hit rate is used for the Remote Digital Memory Composite.

Data handling and quality control
DELCODE participants used the app with a pseudonymized ID (no identifying information or clinical information was available or required in the mobile app) provided to them during a memory clinic visit. The app data were transferred directly to the clinical research platform of the DZNE in accordance with the General Data Protection Regulation. The mobile app data were then related to the clinical data by the clinical research platform of the DZNE and in the All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint following released to DELCODE Principal Investigators and to neotiv GmbH. Data handling and quality control procedures for the clinical DELCODE data are reported in

Statistical analysis
All statistical analyses were performed in R (R Core Team, 2020). We correlated the Remote Digital Memory Composite score with the PACC5 score to assess convergent validity using the Pearson correlation coefficient. We conducted this analysis for the entire group of participants and also for those without (HC and REL), and with memory complaints (SCD and MCI) given

Recruitment and adherence
Here we considered the first 102 study participants who completed at least one session of each of the three cognitive tests (25 healthy controls, 48 individuals with SCD, 7 relatives of DAT patients and 22 MCI patients, see Table 1 for sample characteristics). In addition, 87 of these participants completed at least two sessions of each cognitive test which allows us to estimate the test-retest reliability. Thus, 15% of those that have completed the first composite (at 6 weeks) have not yet reached the second completion (at 12 weeks). The DZNE site in Magdeburg obtained additional recruitment data to quantify interest and identify reasons to decline participation in the add-on study. Of the first 90 participants that were asked to participate, 51% agreed and were successfully recruited. 28% expressed interest, but could not be recruited for technical reasons (either they owned no mobile device, their mobile device was technically too old, or they had no mobile plan or WIFI at home). 4% were undecided and agreed to be asked again at the next annual DELCODE visit. 5% expressed mistrust towards apps and 12% were not interested. The remote mobile monitoring add-on All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

Contextual factors
Across all three cognitive tests, participants reported high concentration levels during the task (mean = 4, scale 1-5, which translates to good concentration), and high subjectively rated task performance (mean = 3.7, scale 1-5 which translates to good subjectively rated performance).
While concentration levels were similar across tasks (3.63, 4.11, 4.22 for MDT-OS, ORR and CSR respectively), subjective performance indicated higher task difficulty for the MDT-OS (2.87) compared to ORR and CSR (4 and 4.3 respectively). In addition, 89% of the participants reported no distractions during their test sessions.
The time between encoding and retrieval in the ORR and CSR tests was adhered to as follows.
45% of participants completed the retrieval within 1.5 hours, 22% within 6 hours, 19% within 48 hours and 13% took more than 48 hours. Participants were invited to the retrieval phase of the ORR after 30 minutes, and their actual median delay was 57 minutes, while they were invited to the CSR retrieval after 65 minutes, and completed it after a median delay of 2 hours 46 minutes.
Across tasks, individual test sessions were performed between 8.20 AM and 8.30 PM (mean 1.54 PM, SD = 2 hours 22 minutes). Mobile devices had a screen diagonal between 10.15 -27.65 cm (mean 13.7 cm, SD = 3.6) indicating the use of smartphones as well as tablet computers.

Development of the Remote Digital Memory Composite
We built a Remote Digital Memory Composite score using equal weights where each component (each of the three cognitive tests) had the same weight. The mnemonic All rights reserved. No reuse allowed without permission. perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint discrimination test comes in two task conditions, one for scenes and one for objects. For the Remote Digital Memory Composite, we decided to include scene mnemonic discrimination but not object mnemonic discrimination for the following reasons. First, we were aiming for a rather short overall testing time for the future Remote Digital Memory Composite score and therefore wanted to only include a single mnemonic discrimination condition. Second, our earlier work showed that while object mnemonic discrimination (MDT-O) has been associated with measures of tau pathology in cognitively unimpaired individuals Maass et al., 2019), scene mnemonic discrimination (MDT-S) was associated with amyloid load in posterior brain networks known to be affected at the MCI stage (Maass et al., 2019). All individual components (ORR, MDT-S and CSR) were z-standardized using the mean and standard deviation of the cognitively unimpaired participants (HC, REL, SCD). The resulting three z-scores were averaged to derive the final Remote Digital Memory Composite score. The test-retest reliability between two independent time points was good (r = 0.74, p<.001).

Relationship between the Remote Digital Memory Composite and the PACC5
Given that the participants are part of a longitudinal cohort study, we used the PACC5 score from the closest-in-time in-clinic visit (to the mobile app add-on study) to perform a correlation analysis between the Remote Digital Memory Composite and the PACC5 score to assess convergent validity. In DELCODE, data release is conducted by the clinical research platform and for those individuals where the closest-in-time data had not yet been released, we used data from the second closest assessment. The average time interval between the inclinic visits and the remote app assessments was 0.7 years. The first Remote Digital Memory Composite correlated highly (r=.75, p<.001) with the closest-in-time available in-clinic PACC5 scores. When considering only participants with memory complaints, meaning those that were referred to the memory clinics by their GP and fulfilled either SCD or MCI criteria, the construct validity of the Remote Digital Memory Composite remained very high (r = .76, p<.001). The construct validity in individuals without memory complaints (HC and REL) was moderate (r = .51, p=.003). Results of the whole cohort, and separately for memory complainers are presented in Figure 2 For completeness, we also ran a multiple regression model including all possible tests All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 including the MDT-O. While ORR, MDT-S and CSR showed a significant effect again, MDT-O did not contribute significantly to the model in addition to the other three components (Adjusted R2 = 0.56, βORR = 0.41; βCSR = 0.29; βMDT-S = 0.2; βMDT-O = 0.14).

Relationship with Age, Sex, Education and other factors
Multiple regression models with age, sex, years of education, Time-Of-Day, Time-to-Retrieval and screen size were calculated to identify the relationships with individual components of the Remote Digital Memory Composite. For ORR and MDT-S, none of the above predictors was significantly associated with task performance in any of the tests. For CSR, however, sex, years of education as well as Time-to-Retrieval were significant predictors for task performance, i.e. female participants and those with higher education performed better in the task, and the longer the delay between encoding and retrieval, the worse the particpants' performance.
With respect to the Remote Digital Memory Composite, female sex (βsex = 0.49, p=0.017) and more years of education (βedu = 0.28, p=0.005) were associated with higher task performance, but not age and screen size. In comparison, the PACC5 was also associated with sex (βsex = 0.58, p=0.003) and years of education (βedu = 0.29, p=0.002), i.e. women and participants with more years of education received a higher PACC5 score.

Diagnostic accuracy
In order to assess how well the Remote Digital Memory Composite score differentiates cognitively impaired and cognitively unimpaired individuals based on the PACC5 score, we All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint calculated a cut-off score across all non-demented participants in the DELCODE cohort (n=933; 235 HC, 440 SCD, 82 REL and 176 MCI patients) that distinguishes MCI from cognitively unimpaired participants (HC, REL and SCD) with an optimal cut-off prioritising sensitivity > 0.8. This resulted in a cut-off of -0.515 and yielded sensitivity and specificity of 0.82 (female: 0.83, male: 0.8) and 0.8 (female: 0.9, male: 0.69), respectively. No other cut-off resulted in more favorable values for men. Based on that cut-off, we divided the entire sample of the add-on study in cognitively unimpaired (CUPACC5 n=73) and cognitively impaired (CIPACC5 n=29) (see Table 1   All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint

Functional impairment
We also investigated wether the Remote Digital Memory Composite was related to a clinical functional impairment. A subgroup analysis within individuals from the DELCODE study allowed us to determine the AUC for the differentiation of individuals with a Clinical Dementia Rating scale (CDR global score) of 0 and those with higher scores. Scores higher than 0 indicate that participants are already somewhat constrained in their every-day life. For this analysis, the AUC was 0.69 and a cut-off of -0.3 resulted in a sensitivity of 0.52 and a specificity of 0.73. This suggests that a majority of those that have been identified as being cognitively impaired

Discussion
We developed an unsupervised and Remote Digital Memory Composite based on one single test session from each of three equally weighted memory tests (ORR, MDT-S and CSR) which were performed remotely and fully unsupervised. The resulting Remote Digital Memory Composite showed high construct validity in relation to the PACC5 score and good retest reliability in a subsample that performed each test twice. Finally, the Remote Digital Memory Composite could differentiate between individuals with and without PACC5-based cognitive impairment with an AUC of 0.9 demonstrating high diagnostic accuracy.
In terms of construct validity, we found a strong correlation between the Remote Digital Memory Composite and the PACC5. This correlation was present in both non-complaining healthy older adults and those with memory complaints indicating that the correlation was not driven by collating an impaired and a non-impaired group as two extremes into the same analysis. The fact, that the correlation also held within memory complainers (SCD and MCI) and that all of these individuals were recruited on the basis of referrals (as opposed to recruitment advertisements) indicates that the construct validity would also hold in a health care setting. In terms of reliability, we found a high correlation between two different instances of the Remote Digital Memory Composite conducted within a time interval of ~12 weeks.
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 The Remote Digital Memory Composite identified individuals with an MCI-grade impairment in the PACC5 with an AUC of 0.9. This allowed to identify individuals with MCI-grade impairment with a sensitivity of 0.83 and a specificity of 0.74 on the basis of a single assessment of the Remote Digital Memory Composite using optimal cut-offs. In this study, we used the PACC5 to define an MCI-grade cut-off between impaired and unimpaired individuals for several reasons. First, neuropsychological assessments that are used to identify memory impairment in the context of MCI are often based on a single test, such as delayed verbal recall. Validating against a single test could potentially undermine the generalizability of the Remote Digital Memory Composite among different clinical settings and MCI populations where a different test was used as a criterion. Validating against a composite including several dedicated assessments, protects from potential validation distortions caused by single testbased criteria. Second, the PACC5 is also a measure optimized to detect longitudinal decline.
Hence validating against the PACC5 also holds the promise that the Remote Digital Memory Composite would be equally sensitive to longitudinal decline, but much easier to implement widely. Third, in the DELCODE sample, the diagnostic classification of each individual was performed at the baseline visit. However, when these participants were recruited into the mobile add-on study, this was on average 1.5 years later. Hence, there was the possibility that some of the SCD participants had already progressed to MCI or that some of the MCI diagnoses had to be reverted back to SCD. Given this uncertainty, defining a cut-off distinguishing between MCI and all pre-MCI groups based on the closest-in-time PACC5 assessment provided a more accurate approach for classifying impaired and non-impaired individuals several years after their established diagnoses.
The Remote Digital Memory Composite allowed to differentiate individuals with and without PACC5-based MCI-grade impairment with high diagnostic accuracy. This is higher or comparable to several other recently reported unsupervised (Mackin et al., 2018) or in-clinic and supervised digital cognitive assessments (Alden et al., 2021;Groppell et al., 2019;Kalafatis et al., 2021;Ye et al., 2020). Importantly, however, several earlier approaches reported outcomes by comparing MCI patients against samples that exclusively consisted of healthy asymptomatic older adults (Alden et al., 2021;Groppell et al., 2019;Kalafatis et al., 2021;Mackin et al., 2018;Maruff et al., 2013;Ye et al., 2020). In health care settings, the main challenge is to identify significant impairment within memory complainers. Therefore, we believe that our focus on memory complainers and the inclusion of a large number of SCD All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ; https://doi.org/10.1101/2021.11.12.21266226 doi: medRxiv preprint patients who sought medical advice (hence were not recruited through advertisements) in this sample is a major advance in the validation and critical for future application.
Usability is a major limitation for mobile device-based assessments of cognition in old age and particularly in preclinical and prodromal AD. While participants were assisted during the installation of the neotiv mobile app and received a printed manual at the time when their consent was obtained at the memory clinic, all three tests were conducted fully remotely and without supervision. Participants received a push-notification on their mobile device each time a test was available to be performed. All instructions and guidance for performing the tests was provided in the app and included a training run of each test. Participants were also instructed to seek a quiet place where they would not be distracted and after each test were inquired through a questionnaire about whether they could perform the test without distraction. The adherence to the mobile tests was quite good, with a maximum of 15% of participants dropping out after 6 tests within a period of at least 12 weeks. Our results, thus, indicate that it is possible to achieve the level of usability that is required to perform a detailed assessment of episodic memory fully remotely and without any supervision in a memory complainer cohort.
The total testing time required to obtain the Remote Digital Memory Composite (a single run of ORR, CSR and MDT-OS) was ~45 minutes. In principle, all three tests could be obtained within a single day. However, we decided not to enforce the shortest possible acquisition time.
Instead, we decided to leverage the opportunities of mobile and unsupervised testing to achieve a more meaningful implementation. To that end we stretched out the assessment over several weeks to enable a more representative sampling of memory performance over time and thereby be less vulnerable to day-to-day performance fluctuations. We used the spaced testing to ease stress for the patients and eliminate potential implementation problems that would lead to worries and complaints by those patients that felt being tested on a bad day. Thus, the Remote Digital Memory Composite reflects memory performance over a period of several weeks rather than a single day, something that would be very difficult to implement with a supervised testing approach.
Episodic memory tests such as the FCSRT (Buschke, 1984) and the other elements of the PACC5 place heavy demands on verbal abilities. This significantly reduces applicability in international trials or in conditions with mild language disorders (e.g., due to a vascular event or primary progressive aphasia) (Costa et al., 2017). The three tests of the Remote Digital All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Memory Composite established here, however, are not dependent on verbal abilities such as naming, word-finding or pronunciation and thereby facilitate testing across different dementia syndromes and subtypes of AD as well as in international comparisons.
Furthermore, the Remote Digital Memory Composite shows no overlap with the PACC5 in terms of the paradigm and modalities tested so that there would be no interference with a memory-clinic or trial-based PACC5 assessment following case-finding.
In the currently used implementation of the ORR and CSR tests, we did not strictly reinforce adherence to the planned retrieval-delay intervals of these tests, which led some individuals to perform recall and recognition assessments after longer than planned delays. When we restricted the diagnostic accuracy analysis of the Remote Digital Memory Composite to discriminate MCI-grade impairment in the PACC5 to those individuals who were more strictly adhering to the delay intervals in the ORR and CSR, the AUC increased numerically to 0.92.
This might indicate that in a health care implementation of the Remote Digital Memory Composite, it could be beneficial to optimize usability aspects to a stricter reinforcement of delay intervals.
This study has a number of shortcomings. First, our results are based on a single study with a modest sample size and thus need to be cross-validated across independent cohorts and different countries. Second, while we could show evidence for limited relationships between the Remote Digital Memory Composite and sample demographics, a large and diverse norm sample is needed in order to adjust norm scores for various covariates. Third, our sample size was not yet sufficient to assess the relationship with AD biomarkers and the diagnostic accuracy of biomarker stratified subgroups. Finally, the number of follow-up remote assessments in our sample did not allow yet to assess the added benefit of calculating a mean composite across several repetitions of each test over a longer assessment period.
Taken together, the high construct validity and retest reliability of the Remote Digital Memory Composite score in a memory clinic setting paves the way for implementing mobile app-based remote assessment in clinical studies as well as in health care. The current data indicate that the Remote Digital Memory Composite can facilitate case-finding whenever the main question is about an individual's impairment based on a comprehensive neuropsychological assessment score. Future studies need to show whether repeated assessments of the Remote Digital Memory Composite over time will be sensitive to cognitive change.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021