Fox Insight collects online, longitudinal patient-reported outcomes and genetic data on Parkinson’s disease

Fox Insight is an online, longitudinal health study of people with and without Parkinson’s disease with targeted enrollment set to at least 125,000 individuals. Fox Insight data is a rich data set facilitating discovery, validation, and reproducibility in Parkinson’s disease research. The dataset is generated through routine longitudinal assessments (health and medical questionnaires evaluated at regular cycles), one-time questionnaires about environmental exposure and healthcare preferences, and genetic data collection. Qualified Researchers can explore, analyze, and download patient-reported outcomes (PROs) data and Parkinson’s disease- related genetic variants at https://foxden.michaeljfox.org. The full Fox Insight genetic data set, including approximately 600,000 single nucleotide polymorphisms (SNPs), can be requested separately with institutional review and are described outside of this data descriptor.

Parkinson's disease. Other mutations increase risk but have lower penetrance 12 . Multiple genetic variants have been aggregated into a genetic risk score and combined with phenotypic characteristics to classify people with our without Parkinson's disease 13 . Remotely assessed self-reported genotype and phenotype information suggested different clinical subtypes in one online study 14 . Genetic variation and risk alleles are an important component to understanding many aspects of Parkinson's disease, and genetic data is a large asset.
Fox Insight is an online study consisting of regularly-administered questionnaires collected longitudinally over several years, the data from which can be used to improve understanding of participant lived experience and complement PROs with Parkinson's disease genetic risks and modifiers 15 . Study eligibility is open to participants with and without self-reported PD. For those that do not self-report a diagnosis of PD, PD connection (e.g. relative, spouse, and/or caregiver) is captured to further characterize participant experience as well as environmental and/or genetic factors. Given that the progression of PD can lead to challenges in motor and executive functions, the online platform also allows and registers data entry deputized to someone in the PD participant's circle of care, such as a partner/spouse or caregiver, helping to foster long-term participant engagement.
Fox Insight integrates validated PRO instruments and PD -related questionnaires through the online platform. The content and cadence of each questionnaire is dependent on participant self-reported diagnosis. Though the reliability of self-reported diagnosis relies on the accuracy of the information provided by participants, previous and ongoing studies have found high concurrence rates between self-report and clinician-determined diagnosis 14,16 . Fox Insight also includes the implementation of one-time questionnaires and genetic data collection. By design, Fox Insight can support modifications to multi-modal data collection in alignment with evolutions in Parkinson's disease research. This flexibility is enabled by Fox Insight's infrastructure, an agile-developed web application, built through a software development framework that emphasizes phased deployment, that manages enrollment, e-consent, and a collection of routine longitudinal assessments 17 .

Methods
Fox Insight is open to participants, aged 18 or older, who provide informed consent through the Fox Insight website; informed consent and study protocol are reviewed by the New England IRB (IRB#: 120160179, Legacy IRB#: 14-236, Sponsor Protocol Number: 1, Study Title: Fox Insight). Volunteers are recruited through digital channels (e.g. social network ads, search engine marketing, and email newsletters) and on-the-ground recruitment efforts (e.g. research events, clinician referrals). Upon registration, participants are divided into two primary cohorts, those with Parkinson's disease and those without. Importantly, participants without PD are asked about new diagnoses every three months, and are given a different set of assessments based on self-reported Parkinson's disease diagnosis. People with Parkinson's disease respond to health, non-motor assessments, motor assessments, quality of life, and lifestyle questionnaires (through twenty questionnaires that are part of each routine longitudinal assessment). In contrast, people without Parkinson's disease respond only to health and lifestyle questionnaires (through a separate grouping of thirteen questionnaires in each routine longitudinal assessment). Participants that meet the pre-set eligibility criteria of optional, one-time questionnaires are invited to participate in additional PRO collection. People with Parkinson's disease based in the US who have completed at least twenty questionnaires in a routine longitudinal assessment are invited to participate in genetic research. Figure 1 below represents the data flow in Fox Insight combining patient-reported outcomes and genetic data into Fox Insight's data ecosystem. Demographic data and patient-reported outcomes from routine longitudinal assessments are merged with responses from one-time questionnaires and genetic data into a central database accessible to researchers.
The following methods describes the three data acquisition sources of Fox Insight: routine longitudinal assessments, one-time questionnaires, and genetics as illustrated in Fig. 1. Routine longitudinal assessments form the main study activities and are collected through a custom survey application developed by Mondo Robot, a creative digital agency. One-time questionnaires are deployed through Qualtrics ® survey software, leveraged for additional survey programming rules. Finally, genetic data are collected in collaboration with 23andMe, Inc., a personal genetics company.

Routine Longitudinal assessments
Routine longitudinal assessments are hosted through an online survey platform and offered to participants based on self-reported Parkinson's disease diagnosis. The assessment schedule is derived from the participant's registration date. These assessments aim to comprehensively evaluate many potential aspects in Parkinson's disease, including motor impairment, non-motor symptoms, medication efficacy, functional impact, and quality of life. Validated instruments are used, when possible, such as the Movement Disorders Society -Unified Parkinson's disease Rating Scale (MDS-UPDRS) Part II, the Non-Motor Symptoms Questionnaire (NMSQUEST), and the Geriatric Depression Scale (GDS), among others (Online-only Table 1).
Data collection from routine longitudinal assessments is governed by survey logic. More specifically, this includes: 1. Participants who answer "Yes" to the registration question "Do you currently have a diagnosis of Parkinson's disease, or Parkinsonism, by a physician or other health care professional?" are presented with Parkinson's disease assessments. Those who answer "No" are classified as people without Parkinson's disease and receive a different set of questionnaires. 2. Questionnaires are presented sequentially; a participant cannot begin a second questionnaire without completing the first. 3. Participants cannot explicitly skip questions within an opened questionnaire and can instead respond "Prefer Not to Answer" to move onto the next question. The only empty values collected in routine longitudinal assessment data are from bifurcated logic, incomplete surveys, or undistributed questions.
4. Sets of questionnaires are repeated at regularly recurring intervals (Online-only Table 1) at which time a participant is invited via email to answer these assessments in Fox Insight. 5. Participants can update Parkinson's disease diagnosis, living situation, and hospital experience every three months in Fox Insight. If a participant indicates a change in diagnosis, the participant is redirected to a new, alternate set of questionnaires consistent with the change in Parkinson's disease diagnosis to best capture current health, including a full baseline battery for newly diagnosed Parkinson's disease. Subsequent routine longitudinal assessments continue to be based on the updated diagnosis and initial study registration date. 6. Assessments can be modified, added, or removed. A participant sees changes to available questionnaires at the start of the next complete assessment interval. 7. Responses to a survey question can determine the deployment and collection of another related survey question. Condition-based questions that are not presented to participants have empty values in the data set. For instance, if a participant answers "Have you ever had a form of heart disease?" in the affirmative, then the following question asks "What kind of heart disease did you have?" and the participant selects from a drop down list of heart disease options. An initial answer of "No, I have not had a form of heart disease" skips the second follow up question and the response values are empty in the output dataset. 8. Participants can review a summary of responses to an individual questionnaire and can change a question response ahead of finalizing questionnaire submission. In addition, questionnaire responses can be reviewed/revised at any point before the participant receives the next set of assessments.

one-time Questionnaires
One-time questionnaires (Table 1) are deployed through Fox Insight to enrich the PRO data collected through routine longitudinal assessments with additional validated instruments. These questionnaires can collect cross-sectional data from novel or unique instruments not included in routine longitudinal assessments. For instance, one-time questionnaires can be a useful first step for in-person trials as a means of obtaining patient perspective during research development, evaluating interest in specific interventions, or targeting recruitment in clinical trials. The ability to deploy one-time questionnaires is an enormous advantage of the Fox Insight platform. The frequency and content of questionnaires is vetted by study leadership to ensure alignment with scientific goals. www.nature.com/scientificdata www.nature.com/scientificdata/ Table 1 above summarizes scope and eligibility criteria of one-time questionnaires offered in Fox Insight (which may be subject to change as the study evolves).

Fox Insight Genetic Data
Genotyping, through 23andMe, will be available for up to 17,000 participants with Parkinson's disease in the US who have completed a series of routine longitudinal assessments (5,000 participants have been genotyped at the time of this Data Descriptor). This eligibility criteria of requiring phenotypic data collection upfront ensures valuable context for interpreting and analyzing genotype data; more so, researchers can explore correlations between genetic variations and phenotypic manifestations. Eligible participants provide a sample using 23andMe's saliva collection kit. Samples have been genotyped on a variety of genotyping platforms. Within Fox Insight, 6.9% of participants are genotyped on the V3 platform which is based on the Illumina OmniExpress + BeadChip and contains a total of about 950,000 SNPs, 12.7% of participants are genotyped on the V4 platform which is a fully custom array of about 570,000 SNPs, and 80.4% of participants are genotyped on the V5 platform which is in current use and is a customized Illumina Infinium Global Screening Array of about 690,000 SNPs. As part of the resulting dataset, several genetic variants that may be relevant for Parkinson's disease research and have a non-identifiable prevalence within the Fox Insight cohort (including variants located near GBA, LRRK2, APOE, PRKN, MCCC1, BIN3, and the HLA locus) are available in tabular form alongside phenotypic data in Fox Insight's public repository. These variants are included as categorical data to democratize data access and interpretation for otherwise complex SNP output (the full set of SNPs is available upon request to qualified researchers).

Data Centralization
Participant answers to routine longitudinal assessments, one-time questionnaires and genetic data from key variants are integrated in a public repository managed at the USC Laboratory of Neuro Imaging, Mark and Mary Stevens Neuroimaging and Informatics Institute. Using dates of birth provided during user registration, dates associated with participant answers are converted to participant ages to protect patient confidentiality. As questions for a single routine longitudinal assessment may be edited and answered intermittently, the total number of days used to complete each survey is also recorded for each participant. Along with dates of birth, unrestricted and free form textual answers are quarantined from the general public data set; when appropriate, "derived" variables are defined for those questions to filter out (e.g., reject non-decimal number values) arbitrary (and possibly patient-identifying) responses. Derived variables are also added for cases in which participants are allowed to answer a question in different ways (e.g., enter weight in pounds or kilograms) in order to help standardize these responses.

technical Validation
Technical Validation for Fox Insight is bifurcated into tool and data validation. Data validation closely reviews caveats associated with collecting patient reported outcomes and compares sex chromosome to self-reported sex for genetic data validation. Table 2 below provides a snippet of the full data dictionary demonstrating variable truncation, corresponding questionnaire, and code names.

Deployment of Routine Longitudinal assessments
To verify the appropriate deployment of routine longitudinal assessments, development tests are routinely conducted by Mondo Robot. Using RSpec, a testing framework for Ruby on Rails ® , unit tests are run on isolated pieces of code functionality. These unit test include, but are not limited to, database querying for cadence expiration and questionnaire assignment based on registration date. All unit tests automatically run when code is moved into development, staging, and production environments.
While platform tests verify that questionnaires are deployed according to set intervals, post-tests spot check data collection nuances from said tools. For example, data from the Physical Activity Scale for the Elderly (PASE) assessment is expected to be collected regularly. There are 21,484 participants (as of 01-24-2019) who completed the questionnaire in the first round of longitudinal assessments and 285 (1.32% of total) who skipped this assessment entirely in the first set of routine longitudinal assessments. Fox Insight successfully deploys the PASE questionnaire to participants who skip the questionnaire in subsequent assessment periods until a complete questionnaire is submitted; in fact, three-quarters (127) of the participants who skipped PASE in the initial battery of assessments go on to complete the survey in the subsequent assessment period. Redeploying incomplete assessments helps establish a more robust PRO data set.

Collected Data
The aforementioned data collection methods converge to form a large sample size of PROs from routine longitudinal assessments, one-time questionnaires, and genetic data as illustrated. To note, any potential duplicate records are removed in upstream data management stages. Table 3 highlights the scale of collected data in Fox Insight and key cohort characteristics. As of Q1'19, there are over 22,000 people with Parkinson's disease enrolled, making Fox Insight the largest prospectively followed Parkinson's disease cohort worldwide, exceeding the second largest cohort of 12 K people with Parkinson's disease followed in the Parkinson's disease Outcome Project. Of the 30,436 total individuals enrolled in Fox Insight, 72.9% (n = 22,205) participants are people with Parkinson's disease. The average age of the Parkinson's disease cohort is 66 and these participants, on average age, have been diagnosed for over 6 years. At the time of this Data Descriptor, the Fox Insight dataset has a larger sample size of cross-sectional data than longitudinal data; 90.5% (n = 20,099) of people with Parkinson's disease have answered at least one questionnaire and 47.7% (n = 10,600) of people with Parkinson's disease participants have continued participating in routine longitudinal assessments. People without Parkinson's disease exhibit a similar trend in assessment completion. Optional one-time questionnaires are completed by a comparatively lower proportion of the study population with 34.8% (n = 7,726) of people with Parkinson's disease, and 14.2% (n = 1,174) of people without Parkinson's disease participating in one-time surveys. As of 03-06-2019, 5,880 total participants agreed to genetic data collection and 5,092 participants are genotyped. www.nature.com/scientificdata www.nature.com/scientificdata/

Beta Participants
Approximately 16% of total participants (N = 4,697) are part of Fox Insight's beta group, defined as those joining before the March 2017 soft launch of Fox Insight. Responses to routine longitudinal assessments for all beta group participants are included in the Fox Insight data set. Data from the beta group could be subject to questionnaire versioning and inconsistencies associated with platform troubleshooting and optimization.

Missing Data
As a comment on missing data collection, there are 2,868 (as of 01-24-2019) participants who did not complete demographic questions in About You; a subset of ~500 individuals skipped this questionnaire due to a platform glitch which has been resolved as of Q3'2017. Participant drop-off also results in missing demographic data.
There are 1,476 participants (as of 01-24-2019) who have two consecutive assessment periods starting on the same day (i.e., questionnaire responses are associated with the same "Days since Acquired" variable). This questionnaire assignment error has since been fixed. The resulting output for these participants includes data from the most recent, later, routine longitudinal assessment; data from former assessments are skipped.
As routine longitudinal assessments are completed sequentially, there is observed drop-off from the first to the last assessment within the same period of approximately 10.1%.

Validating Fox Insight Genetic Data
The sex chromosome and self-reported sex match for 99.76% of the genetic sub-study participants. As additional validation documentation, tables of genotyping call rates are provided. The genotyping rates are ancestry and genotyping platform specific and are derived from the 23andMe participant database (i.e. the table for genotyping rates of participants with European ancestry genotyped on the V5 platform was computed on 23andMe participants with European ancestry genotyped on the V5 platform).

Usage Notes
Fox DEN User Interface. Using the Fox DEN interface, investigators may explore, select data, and apply statistical methods using user-created cohorts based on subject demographics, PROs, and SNPs. Routine longitudinal assessments, one-time questionnaires, and genetic data are organized in a tree structure. The tree is filtered using drop-down categories (e.g., questionnaires, genetic data) or keyword searches. The distributions of participants' questionnaire responses and SNP variants are visualized when selected in the tree. Categorical variables can be reduced to user-defined binary variables, which are useful inputs to the statistical methods. Variable visualizations are dependent upon the user-selected cohort, and this provides visualizations specific to subsets of participants. Cohorts are created by recursively selecting values of a variable and using them as a filter to subset a parent cohort. Cohorts are viewed in a tree structure that shows how the cohorts are inherited from one another as well as the filters that define them. Fox DEN supports common statistical methods (linear correlation, logistic regression, chi-square and T-test) through drag and drop operations of its cohorts and variables. A "Guided Statistics" wizard provides step-by-step guidance in choosing appropriate statistical methods for user selections.