Assessing the quality of mobile applications in chronic disease management: a scoping review

While there has been a rapid growth of digital health apps to support chronic diseases, clear standards on how to best evaluate the quality of these evolving tools are absent. This scoping review aims to synthesize the emerging field of mobile health app quality assessment by reviewing criteria used by previous studies to assess the quality of mobile apps for chronic disease management. A literature review was conducted in September 2017 for published studies that use a set of quality criteria to directly evaluate two or more patient-facing apps supporting promote chronic disease management. This resulted in 8182 citations which were reviewed by research team members, resulting in 65 articles for inclusion. An inductive coding schema to synthesize the quality criteria utilized by included articles was developed, with 40 unique quality criteria identified. Of the 43 (66%) articles that reported resources used to support criteria selection, 19 (29%) used clinical guidelines, and 10 (15%) used behavior change theory. The most commonly used criteria included the presence of user engagement or behavior change functions (97%, n = 63) and technical features of the app such as customizability (20%, n = 13, while Usability was assessed by 24 studies (36.9%). This study highlights the significant variation in quality criteria employed for the assessment of mobile health apps. Future methods for app evaluation will benefit from approaches that leverage the best evidence regarding the clinical impact and behavior change mechanisms while more directly reflecting patient needs when evaluating the quality of apps.

Statistical comparisons of means were conducted across operating systems using t-tests. Tests of associations between Index Scores and popularity were conducted with Pearson correlations.
2 Of 62 apps [that were initially identified], 10 were excluded because their descriptions in the iTunes store indicated they were irrelevant for reducing or quiting smoking...; four were eventually removed from the sample because they were no longer in the iTunes store at the time of downloading; one app was removed because the basic and deluxe versions proved to be identical.
Each app was independently coded by 2 reviewerss for its (1) approach to smoking cessation and (2) adherence to the US Public Health Service's 2008 Clincial Practice Guidelines for Treating Tobacco Use and Dependence. Each app was also coded for its (3) frequency of downloads.
3 When an app had the same name or was developed by the same company as another app that was already included in the study, it was considered to be a duplicate content and excluded. Additionally, we excluded apps that were duplicated in the search results of the various search terms used. Following a review of both free and paid versions of apps, we found that the free version (where a paid version was also available, n=18) only provided a portion of the functionality or advice available in the paid version. For this reason, apps that included both free and paid versions were counted as one app, and only the paid version was downloaded.
Prior to downloading, we reviewed the description pages of all apps... The description page was used as an initial screening tool to determine whether weight loss was the stated purpose of the app. We downloaded all apps in the Arabic language that had weight control content to include in the study, and we systematically explored their functionalities over one period of use. If the app was capable of performing any of the following tasks, they were allocated one point per task, up to a maximum of 13 points: (1) determining and explaining BMI, (2) recommending and tracking daily servings of fruit and vegetables, (3) recommending daily physical activity, (4) advising the user to drink water instead of soda or juice and tracking their daily intake of water, (5) allowing for the recording of daily food intake, (6) providing a calorie tracker to maintain calorie balance, (7) providing weight-loss goals of 1-2 lb/week, (8) providing information about portion control, (9) recommending that the user read and understand nutrition labels, (10) providing a way to track weight, (11) providing a way to keep a physical activity journal, (12) offering suggestions for meal planning, and (13) offering a private social network or the capability of being linked to popular social media such as Facebook, Twitter, or Instagram for social support.
Descriptive statistics were used to summarize the total number and percentages of evidence-informed practices included in the Arabic apps. The median and interquartile range (IQR) were calculated for nonnormally distributed data, such as stars representing user opinions (ranging from 0.5 to 5 stars), and number of user ratings (number of users who gave a star rating on the app description page)...We used latent class analysis to identify whether there were distinct classes or subgroups within the apps included in the study.
No prior prediction of the outcome was made. The number of classes was determined by fit indices [14,15], such as Bayesian information criteria (BIC), adjusted Bayesian information criteria (ABIC), Akaike's information criteria (AIC), and entropy and interpretability or model usefulness.
The number of classes that minimized AIC and BIC was chosen from the modeling. The profile or description of these classes was based on the response pattern of evidenceinformed practice features for that class. Based on the posterior probability of class membership, each app was categorized into one of the classes. The app price was categorized into "paid" and "free". Fisher's exact test was used to investigate associations between pricing and class, based on the posterior probability...Apps that did not have any stars or ratings were excluded from the comparison, and the median stars and ratings of the two classes were compared using the Mann-Whitney test. A p value of less than 0.05 was considered significant. Arabic language speakers, or apps that have an Arabic interface were considered.
Detailed information on each app were extracted and reviewed including description, functionality, and price. All free apps were downloaded and their functions were examined closely.

Unclear
(assumption: 1 as there is 1 author) Not specified; Author from: College of Computer Science & Engineering, Kuwait University Thematic 5 Every hit was reviewed in terms of its relevance and explicit link to diabetes mellitus.
We identified relevant keywords, comparative categories, and their specifications. Subsequently, we performed the app review based on the information given in the Google Play Store, the Apple App Store, and the apps themselves. In addition, we carried out an expertbased usability evaluation based on a representative 10% sample of diabetes apps. The basis for the systematic and comparative market analysis was defined by categories and respective subcategories/specifications outlined in Table 1...To examine the usability of currently available diabetes applications for the elderly, we performed an expert-based usability evaluation. With this method, usability experts put themselves in the role of potential or current users to examine products in terms of usability. We performed a summative evaluation as we exclusively included apps whose development was already finished. 6 Since the [search] keyword was generic, but the main target of our study was multifeatured apps dedicated to diabetes management for both patients and health-care professionals, we introduced some exclusion criteria. Apps were excluded from our analysis by the following criteria: they were a limited version (i.e., lite or free app) of an available fully featured version; they were supported only a single feature (e.g., insulin calculator only); they did not support any diabetes-specific data collection, archiving, and analysis for time-monitoring (e.g., glycated hemoglobin converter, pills reminder); their target was not diabetes selfmanagement (e.g., generic health trackers, activity trackers, cooking apps, educational apps) or if diabetes self-management was an incidental-only element within the app; or if they were a content-consumption-only app (e.g., magazine, journal). Considering the fast evolution of the app market, we also excluded apps that did not receive any updates during the 12 months prior to the search...All apps meeting the inclusion criteria were used for our review. Among these apps selected for our study, a subset According to our conceptual framework, we reviewed all the apps meeting our inclusion criteria by considering general features related to the mobile market (such as app pricing, app updates), diabetes-specific features including basic features (data logging, representation, and delivery), and advanced features (e.g., community services, insulin calculators) [26,34]. We considered whether the app was reviewed by one of the known reviewing initiatives. Based on an existing tool for representing the benefits and weaknesses of medical apps, we created the pictorial identification schema/Diabetes Self-care tool, which specifically identified medical apps in the diabetes domain.  [6,7,9]), (c) were not available on both iTunes and GP and (d) were free apps with limited functionality which is only unlocked by purchasing the full version (i.e., "freemium") [22]. In GP, freemium apps were filtered out as the database contains the variable "in-app purchases"; in iTunes in-app purchases were manually checked...the first author and a collaborator read the descriptions of the apps and applied further inclusion criteria. Apps were included if they addressed "weight management", which consists of both PA and dietary behavioural strategies [28], considering the limited role of PA and the predominant role of dietary strategies for effective weight loss [25,27], and the importance of the combination of PA and diet for long-term effects on weight [28]. This allowed the exclusion of apps that focused only on PA and fitness. We also excluded apps that focused on other aspects of health (maternal health, mental health, etc. ). installation process was not successful or the app did not work properly… All potentially relevant mobile apps by title and description were independently screened for eligibility by three experts (PPB, PK, ER) in the field of healthcare-related mobile apps. The subjective assesment of a mobile app's features by each reviewer introduces a source of bias to this study. In the attempt to mitigate this bias, we required that at least two reviewers agree with the inclusion of the app into the further analysis. Differences in judgement were resolved through a consensus process. The inter-rater agreement based on Cohen's Kappa statistic between the reviewers ranged between 0.77 (reviewer 1 vs. reviewer 3) and 0.90 (reviewer 2 vs. reviewer 3).
The evaluation of mobile apps features included five main categories (personal data, glucose and insulin therapy, nutrition, physical activity and additional features) with subcategories. A template containing the data that should be extracted was designed in the form of a Excel spreadsheet, which is further presented in 11 Among the apps searched with keywords, those which were not relevant to smoking cessation were excluded. Apps were also excluded even if they had some relevance to smoking cessation in the following cases: (1) task management apps were dropped unless their primary purpose was to aid smoking cessation; (2) hypnosis apps for smoking cessation were disregarded because they attempt to exert a subconscious influence and are not appropriate to be analyzed within the frame of SDT; (3) apps developed for physicians to aid their medical treatment, rather than for general consumers, were also not included; (4) apps offering simulation of smoking were also not included unless they clearly stated their purpose as smoking cessation.  [34]. Duplicate apps were removed from the 800 search results and the unique apps were classified as either alcohol reduction (apps that aim to reduce drinking-related behavior and those that track consumption), entertainment (drinking games, cocktail recipes, bar finders); BAC measurement; or other (apps not about alcohol, apps not in English, information for employers, etc). Of the 91 alcohol reduction apps, we installed, examined, and coded all 51 free apps as users prefer apps that are free to download [35]. However, 10 paid apps were installed, examined, and coded as a sensitivity check of the BCTs included. The remaining paid apps (n=15), apps that could not be installed (n=5), or those that focused on hypnosis (n=10) were excluded (see Supplementary Figure 1). All statistical analyses were conducted using SPSS version 20.0. Frequencies, percentages, and associated 95% CIs were calculated for the categories of alcohol-related apps (alcohol reduction, entertainment, blood alcohol content, other), for each of the 41 BCTs, and for the mention of theory or the mention of evidence contained within the alcohol reduction apps.
15 Reviewer: blank NK: could not find full text to confirm Summaries of elements of each app were coded line by line, which were then classified into categories based on common themes. This procedure was repeated as themes emerged until the data were placed into exclusive categories. This process resulted in two primary categories: those that "facilitated" alcohol use and those that aimed to "intervene." 2 Not extracted Not extracted 16 Apps were excluded if they were designed specifically for one medication type or a single disease. Lastly, those lacking a general description of functionality also were excluded.
App descriptions and available screenshots were analyzed for content and app functionality. To identify the apps that might have the most utility for patients that could be recommended by pharmacy practitioners, the authors developed a list of desirable attributes of these apps by consensus of all of the authors to evaluate them for comparison. Whether the app possessed each attribute was assessed based on each app's features described on their website or their respective product listing on their app source (e.g., iTunes). The relative desirability or usefulness of these features then were rated by the study authors using a three-point rating system (1, modest; 2, moderate; or 3, high) based on the perceived importance of each feature or characteristic. Apps were evaluated for each manufacturer claim that met the authors' scoring criteria, functionality of the reminder system, and ability to process reminders from the test medication regimen. The content of the applications was analyzed by two independent investigators (DD, AA). Each app was analyzed and classified on the basis of cost, target audience, type of information, validity, involvement of health-care agencies and usefulness based on audience reviews and ratings. For the purpose of evaluation, a set of 50 applications were randomly selected and independently reviewed by the authors with a joint probability of agreement of 0.98. We resolved any conflict by discoursing about the disputed applications and by revisiting the aforementioned criteria of inclusion and exclusion. We then independently reviewed another set of 50 randomly selected applications with a joint probability of agreement of 1.0. The remaining apps were reviewed by one of the authors; however, for each application review, another author did random crosschecking and validation. Each of the apps identified in the second step was assessed. Their functions were grouped into four categories: (i) Provision of information; (ii) Self-assessment; (iii) Self-monitoring; and (iv) Provision of advice or treatment.
Popularity of apps was also assessed using a platform called "Xyologic". We employed the usability assessment criteria developed by Arnhold et al [20], that consists of four main criteria (comprehensibility, presentation, usability and general characteristics), 11 subcriteria and 18 items with 5 Likert-scales and dichotomous scales....Each item was independently scaled and average scores were obtained. The inter-rater reliability was measured using intra-class correlation.
Descriptive statistics were used to present the apps. they were written in English, medication related, and last updated in 2014. Applications were excluded if they lacked a ''Description'' under the ''Details'' section of the App Store, were specific to a single medication (e.g., birth control) or single disease (e.g., HIV, COPD), had health-related functionality other than medication adherence such as blood pressure or blood sugar monitoring, were tailored to countries outside of the United States, specific to members only (e.g., insurance plans), or utilized for veterinary medicine, and if they lacked at least one ideal application feature.
The Apple iTunes App Store highlights features of an application using a description and screenshots for the user to identify if they are interested in downloading the application to their mobile phone. We used each application's description and screenshots to identify available application features.  (4) Social Cognitive Theory. The strategies are listed individually as they are common to more than one theory. Each intervention strategy is scored out of 5 as it is rated dichotomously for the inclusion of the following five dimensions of user interaction: (1) provides general information or guidelines, (2) assesses current practices or use of strategies, (3) provides feedback on assessment, (4) offers general assistance on behavior change, and (5) offers individually tailored assistance in response to assessment and feedback. The levels are hierarchical as level 5 (individual advice) is thought to be more effective than level 1 (providing general information). The BTS is the sum of scores for all 20 intervention strategies; the maximum BTS score is 100, representing 20 strategies, each of which are scored out of 5 to indicate the level of interactivity.
28 Once all apps were identified using the previously described protocol, the first author reviewed all apps identified via keyword searches and excluded those apps that met the following exclusion criteria: 1) did not target diet tracking, 2) did not include taking or posting pictures of food for the purpose of self-monitoring diet, and 3) not in English. We calculated means with standard deviations and total n with percentages for the coded variables. In order to identify qualities of the apps which were associated with the popularity and user-rated quality of the apps, we used univariate regression models. Given the descriptive nature of this content analysis, we used an explorative approach, and thus did not correct P values for multiple testing. We used logistic regression to test predictors of popularity (ie, >10 000 downloads vs. fewer, the top 20% of the rated apps), and linear regression to test predictors of user-rated quality (ie, average number of stars per app).
Analyses concerning the quality of the apps were restricted to the apps that had star ratings (77% of the apps). app results. Due to restraints in time and resources, the number of apps included had to be restricted. The first ten apps passing the pre-screening from each search term were included, giving 40 apps in total. Following identification, the apps were downloaded and evaluated again based on the same inclusion 96 and exclusion criteria as stated above. At this point some of the apps were excluded, and therefore a second stage of searches and screening was performed to meet the study's aim of evaluating 40 apps, ten from each search term. This second search was performed on 9 June 2015. Five apps were independently evaluated by another assessor in order to determine the repeatability and relative validity of the assessments.
Each app that met the inclusion and exclusion criteria was used by the author (CH) to identify the functions and BCTs included. The results were recorded in a data extraction form (Table 4) recording the functions and BCTs included in each app. . T-tests were performed to assess the difference in mean number of functions, number of BCTs, overall score, price and user rating according to inclusion of 'optimum BCT', price (free or paid) and user rating. For the latter, user rating, normally ranging from one to five, was divided into the following two groups; low=1.0-4.0 and high=4.1-5.0. The uneven division of user rating was due to average app rating for the majority of apps being greater than 4. Regression was performed to see if there was a relationship between number of functions, number of BCTs and overall score versus Diabetes apps with behaviour change techniques Page 6 of 14 price (£) and user rating. Regression models for price adjusted for user rating and vice versa. Cohen's kappa was calculated to determine the inter-rater reliability from the duplicate extracted data. applications, 165 were excluded for the following reasons: 61 applications were not in the English language (e.g. applications in Chinese, Arabic, Spanish and French), 34 applications were irrelevant to asthma (e.g. an application on coronary heart disease), 22 applications were physician focused and 41 were paid applications. In all, 22 applications made it to the next stage of the review. After careful evaluation, another seven mHealth applications were excluded because they were not closely related to asthma.
Each Application was coded into one of the following categories: (1) Basic facts about the nature of the condition; (2) The nature of treatment: relievers and preventers; (3) Allergen and trigger avoidance; (4) How to use treatment; (5) Self-monitoring and assessment skills; (6) The role of a written, personalized action plan; (7) Recognizing and responding appropriately to acute exacerbations; or (8) Personalizing the definition of good asthma control. Each application was coded by the researcher and cross-checked with another researcher. There were no disagreements between the coders.6,7,14 Asthma applications were also coded for their level of adherence to the Health On the Net (HON Assessment was performed by two authors. Basic details were extracted into a standard form reviewer recorded their responses in a structured form. These were compared and any discrepancies were resolved by discussion. 35 Obviously irrelevant apps were eliminated by reviewing app store descriptions and screenshots to identify apps that were either unrelated to diabetes self-management or for which no calculator could be present, for example, diabetes eBooks.
Apps were assessed using a standardized method to examine each component of the calculation process. Expected inputs and outputs, supported unit systems, terminology, and any supplementary app features were characterized by inspection. Simulated data were used to define behavior in response to missing and extreme input values.
Where the formula used for calculation was not displayed in the app or in associated documentation, the developer was contacted. For those apps where a formula was ultimately identified, performance was assessed using a set of test cases generated by permutation of the range of possible values for each input parameter. Where present, clinical disclaimer text was extracted and coded to identify statements advocating discussion with a healthcare professional prior to calculator use and the role of personal judgment in interpreting generated results.
2 All apps were assessed by a clinician-researcher (KH) and a second reviewer (either JTP, a mobile health researcher, or SA, a public health researcher).
Fisher's exact test was used to calculate the two-tailed probability of an association between issue prevalence and platform or distribution model. A significance level of 0.05 was prespecified. Statistics were computed with R (Version 3.0.0) using the package exact2x2 (Version 1.4.0). attention was CBT or BA only those apps that offered this type of treatment were downloaded for full evaluation. When both a paid and free version of an app was available, the version requiring payment was purchased and used, while the free version was excluded. This was done to ensure that the most comprehensive version of the app was considered.
The apps retrieved by our searches were categorized by two independent reviewers (AH, JC) according to the type(s) of support that they offered to the users. The categories, defined a priori, included: self-tracking tools, education, social support, CBT/BA treatment, state induction, diagnostic/screening tools, and miscellaneous. For usefulness: The expert evaluated each app against each core ingredient on a 0-2 scale where 0 meant that the core ingredient was not integrated at all into the app, and 2 meant that the core ingredient was completely integrated.
For usability: The usability expert rated each app on a scale of 1 to 5 (1 = poor, 5 = excellent) against each usability heuristic. Final anaylsis: Basic summary statistics including counts and percentages were used to describe the characteristics of the apps. Spearman's correlation coefficient was used to explore whether a relationship may exist between the adherence of the user interface to Nilsen's principles of usability and adherence to the core principles underlying CBT and BA. 37 We excluded the mobile apps and systems that could only be considered as educational or informational tools, meaning those that did not provide any direct functionality for the selfmanagement of diabetes-related issues.
Categories, upon which we extracted relevant app information, were based on previous research a literature reviews and further elaborated upon via iterative brainstorming among the coauthors. The agreed upon categories are as follows: Diabetes-related features (scored on a scale of 0-2); Popularity and presence in social media; Availability; Interoperability and 'shareability', User friendliness; Quality assurance and regulatory oversight; and Research based.
Not specified.
The coauthors have extensive research experience with mobile applications and have a multidisciplinary background ranging from healthcare and business, to health informatics, statistics, computer science and electrical engineering.
Descriptive 38 To determine eligibility for full review, two Master's level coders catalogued basic information for each app link retrieved. This approach to app screening has been used in other reviews . To be eligible for full review, apps had to be in English and include text related to cessation either in the Facebook App Center overview or within the app.
Eligible apps were installed using the native Facebook platform on a personal computer or the iPhone platform for mobile-only apps. Coders used apps over 3 days with at least 3 logins to ensure all features were utilized and coded. Apps were coded for publisher/developer type, cost, and content features (interactive, informational, and social). Operational definitions of content features were developed prior to coding and were noted as present or absent.
2 Two [public health] Master's level coders catalogued basic information for each app link retrieved.
Frequencies and descriptive statistics were used to characterize apps. For each adherence index item, the proportion of apps receiving a 2 (fully present) by at least one coder was calculated (average score≥1.5; as in).
The relationship between publisher/developer source and app type on the adherence index summary score was examined. All analyses were performed in SPSS Version 21. each application. We recorded data on total number of reviews for all versions of the app using publically available data on Apple iTunes and Google Play. The total number of downloads are reported only by Google Play in the form of an ordinal variable with nine different categories ranging from 50-100 to 1 million-5 million; this download information was available for a total of 50 apps included in the study. We recorded major functional characteristics of each app in the following non-mutually exclusive domains including hypertension education, tracking function, tools to promote medication adherence, whether the app can transform the smartphone into a medical device, and whether access to support groups and patient forums was facilitated. condition, one for each review, were selected for an in-depth analysis. For the in-depth analysis, two applications were chosen for each condition: one obtained from the literature research and the other from the commercial apps review. If the app studied was available in stores, it was downloaded and personally tested on an iPhone 4 in the case of an iOS app or a Samsung Galaxy S SCL GT-I9003 in case of an Android app. For apps available on the market, one of the authors downloaded them for a joint evaluation at the meeting.
To evaluate the papers, after reading them individually, the authors convened to discuss opinions and fill in a table of features. For apps available on the market, one of the authors downloaded them for a joint evaluation at the meeting. For the analysis of the commercial apps, the procedure followed was similar to the one developed with the research papers. The authors downloaded them on the mentioned mobile phone Samsung Galaxy S before meeting to study the apps together and complete the previously initiated Descriptive statistics were used to summarize the results of the app assessment. Pearson product-moment correlation was performed to examine relationships between comprehensiveness and quality of information, and the average user rating of an app. Mann Whitney U tests were performed to examine differences in comprehensiveness and quality of information by app price. Findings were compared thematically against evidence based guidelines. 53 The five free, and five paid, most downloaded apps from 'Lifestyle/Health' and 'Fitness' categories of the five app stores for the same countries (total 500 apps) were identified for more complete description, using App-Annie software.
Basic descriptions of the apps were provided using the specific content of focus (e.g., fitness, diet, etc.), and the total number of apps available in each market were stated. Apps were searched to determine whether any input had been provided by government agencies or whether formally endorsed guidelines were used. For each behavioral strategy, the percentage of apps that included it was computed, in order to demonstrate the extent to which each strategy was employed across weight-loss apps. Then, for each app, the percentage of the total number of strategies included was computed. Using ANOVA, paid apps were compared to free apps, on the measure of percentage of total strategies included. 55 We screened potential weight loss and smoking cessation apps to identify a final list of 120 Android and Apple apps (four groups of 30 apps each) based on their focus on the topic, price (all under $4), being in English language and download popularity as estimated by a relevant website (xyo.net).
Each app was examined by two assessors and was rated against a published "Mobile App Rating Scale" (MARS),19 (45% of the total score); in terms of weight loss/smoking cessation as appropriate (45% of the total score); and cultural appropriateness criteria (10% of the total score). We designed these other criteria (in addition to the MARS) based on relevant New Zealand literature. 56 We assessed all apps found by using the keywords 'alcohol addiction' (62 apps), 'alcohol help' (27 apps) and 'stop drinking' (21 apps). For the keyword 'alcohol', the first 40 apps were analyzed to provide a more comprehensive list of apps. This number was chosen on the basis of previous studies on Internet users showing that people rarely search beyond the first 20 retrieved results. Apps not related to problematic alcohol consumption (reducing alcohol consumption) as a main topic were excluded. Further exclusion criteria were as follows: the app could not be found at the moment of analysis, the app could not be downloaded after 3 attempts, the app was a book or an article, the app was not in English or the app consisted only of a contact address.
We evaluated the apps by using tools from previous studies and modified instruments from quality evaluation studies of websites , as well as instruments represented in various studies on the quality of smartphone apps.
3 The study involved 3 trained evaluators (L.P., M.V.S. and Y.K.). Raters completed an in-group practice run of a set number of materials, which were checked (with a standardized coding process for each component) and discussed before the evaluators participated separately in a formal assessment process.
After an initial exploratory analysis involving the calculation of proportions, as well as means and SDs of the above-mentioned measures, we compared paid apps with free apps in bivariate analyses using parametric tests (t test, chi-square or Fisher's exact test) or non-parametric tests (median test) when appropriate. Next, we computed prediction models by using multiple linear regressions for 2 outcome variables of interest: content quality and self-help. each search and their accompanying descriptions were recorded into a database. We chose 20 as most users typically view one page of search results.
App characteristics were listed based on (a) general app characteristics (such as number of downloads), (b) relevance to mental health (diagnoses or symptoms, etc.), (c) "apparent purposes", (d) approaches for symptom relief, (e) information supporting app use or diagnosis, and (f) sescriptive terms used for depicting qualities of the app in app descriptions. 58 We searched the categories medical as well as health & fitness to maximize our capture of apps that may be relevant to dementia by focusing on topics such as cognitive health, which are often classified as "health & fitness" rather than medical apps. Second, we determined which apps collected usergenerated content, with the intention of excluding purely informational apps. If any of the following words were included in the App Store description we assumed that the app collected some form of user data: analyze, assess, collaborate, communicate, data, email, game, graph, location (GPS), journal, keep, measure, monitor, notes, photos, play, post, predict, progress, question, questionnaire, rate, record, report, research, results, save, score, screen, send, share, statistics, store, survey, take, test, tips, tool, track, train, write.
Regarding general characteristics, we examined whether privacy policies covered the app in question (rather than the developer or Web site in general), explicitly mentioned safeguards for protecting user data, and distinguished between how individual-level and aggregate data would be handled. To assess for the existence of safeguards, we interpreted any mention of "encrypt, encryption, physical security measures" or "an established records retention and disposal system" as an indication that a mechanism for protecting user-generated content was in place.
Regarding the protection of individual-level data, we documented whether privacy policies disclosed the collection of Internet protocol (IP) addresses or unique device identifiers (UDIDs), or whether they were explicit in their ability to store cookies on user devices. We noted whether policies admitted to sharing information with business partners or third parties, as indicated by references to the following terms: "advisers, affiliates, any other party, business partners, contractors, partners, partner companies, service providers, third parties." Similarly, we documented whether policies mentioned the potential to share user data with marketers or 2 Not specified; authors from: Department of Psychiatry (LR), Massachusetts General Hospital, Boston, MA; and Department of Psychiatry and Clinical Informatics (JT).
Descriptive advertisers, as evidenced by using any of these terms: "advertising, commercial purposes marketing, demographic profiling, industry analysis." We also made note of whether user data might be sold in a merger or acquisition, or otherwise, and recorded whether policies admitted to potentially sharing identifiable data if legally bound. Finally, we reviewed each policy to see if individual-level data could be deleted or amended upon user request. We noted whether each of these criteria was met, not met, or not mentioned in (i.e., was absent from) the privacy policy. Two of the authors (LR and JT) reviewed all policies using these criteria, and disagreements were resolved by consensus among all authors.