Human loss-of-function variants suggest that partial LRRK2 inhibition is a safe therapeutic strategy for Parkinson’s disease

Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation, and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson’s disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5–8, the biological consequences of LRRK2 inhibition have not been well-characterized in humans. Here we systematically analyse pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome sequenced individuals from the UK Biobank, and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2, 82.5% with experimental validation. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but are not strongly associated with reduced life expectancy, or with any specific phenotype or disease state. These data suggest that therapeutics that partially downregulate LRRK2 levels or kinase activity are unlikely to have major on-target safety liabilities. Our results demonstrate the value of large-scale genomic databases and phenotyping of human LoF carriers for target validation in drug discovery.


Main text
Novel therapeutic strategies are desperately needed in Parkinson's disease (PD), one of the most common age-related neurological diseases, which affects about 1% of people over the age of 60 10,11 . Around 30% of familial and 3-5% of sporadic PD cases have been linked to a genetic cause 12 . LRRK2 missense variants account for a significant fraction of cases, including highpenetrance variants 13 , moderately penetrant variants such as G2019S 14 , and risk factors identified in genome-wide association studies 15 . Although the precise mechanism by which LRRK2 variants mediate their pathogenicity remains unclear, a common feature is augmentation of kinase activity associated with disease-relevant alterations in cell models 3,16,17 . Discovery of Rab GTPases as substrates of LRRK2 18 has also highlighted the role of LRRK2 in regulation of the endolysosomal and vesicular trafficking pathways implicated in Parkinson's disease 18,19 . LRRK2 kinase activity is also upregulated more generally in PD patients (with and without LRRK2 variants) 20 . LRRK2 has therefore become a prominent drug target, with multiple LRRK2 kinase inhibitors and suppressors 21 in development as disease-modifying treatments for PD 20,22,23 .
Despite these promising indications, there are concerns about the potential toxicity of LRRK2 inhibitors. These mainly arise from preclinical studies, where homozygous knockouts of LRRK2 in mice, and high dose toxicology studies of LRRK2 kinase inhibitors in rats and primates, have shown abnormal phenotypes in lung, kidney and liver [5][6][7][8] . While model organisms have been invaluable in understanding the function of LRRK2, they also have important limitations as exemplified by the inconsistencies in the phenotypic consequences of reduced LRRK2 activity seen among yeast, fruit flies, worms, mice, rats and non-human primates 24 . Complementary data from natural human knockouts are critical for understanding both gene function and the potential consequences of a long-term reduction in LRRK2 kinase activity in humans.
Large-scale human genetics is an increasingly powerful source of data for the discovery and validation of therapeutic targets in humans 1 . Predicted loss-of-function (pLoF) variants, predicted to largely or entirely abolish the function of affected alleles, are a particularly informative class of genetic variation. Such variants are natural models for life-long organismwide inhibition of the target gene, and can provide information about both the efficacy and safety of a candidate target 2,25-28 . However, pLoF variants tend to be rare in human populations 29 , and are also enriched for both sequencing and annotation artefacts 30 . As such, leveraging pLoF variation in drug target assessment typically requires very large collections of genetically and phenotypically characterized individuals, combined with deep curation of the target gene and candidate variants 31 . Although previous studies of pLoF variants in LRRK2 have found no association with PD risk 32 , no study has assessed the broader phenotypic consequences of such variants.
We identified LRRK2 pLoF variants, and assessed the associated phenotypic changes, in three large cohorts of genetically characterized individuals. Firstly, we annotated LRRK2 pLoF variants in two large sequencing cohorts: the gnomAD v2.1.1 dataset, which contains 125,748 exomes and 15,708 genomes from unrelated individuals 9 , and the 46,062 exome-sequenced unrelated European individuals from the UK Biobank 33 . We identified 633 individuals in gnomAD and 258 individuals in the UK Biobank with 150 unique candidate LRRK2 LoF variants, a combined carrier frequency of 0.48%. All variants were observed only in the heterozygous state. Compared to the spectrum observed across all genes, LRRK2 is not significantly depleted for LoF variants in gnomAD (loss-of-function observed/expected upper bound fraction (LOEUF) 9 =0.64).
To ensure that the 150 identified variants are indeed expected to cause LoF, we manually curated them to remove those of low-quality or with annotation errors (Figure 1a; Supplementary Tables 1 and 2). We removed 16 variants identified either as low-confidence by the Loss-of-function Transcript Effect Estimator (LOFTEE; 6 variants in 409 individuals) 9 , or manually curated as low-quality or unlikely to cause true LoF (10 variants in 129 individuals; see methods). One additional individual was excluded from the UK Biobank cohort as they carried both an pLoF variant and the G2019S risk allele.
This analysis results in a final dataset of 255 gnomAD individuals and 97 UK biobank individuals with 134 unique high-confidence pLoF variants (Figure 1a), an overall carrier frequency of 0.19%; less than half the frequency estimated from uncurated variants, reaffirming the importance of thorough curation of candidate LoF variants 31 . A subset of 25 gnomAD samples with 19 unique LRRK2 pLoF variants with DNA available were all successfully validated by Sanger sequencing (Supplementary Table 3).
Secondly, we examined LRRK2 pLoF variants in over four million consented research participants from the personal genetics company 23andMe, Inc. Participants have been genotyped on a variety of platforms, and imputed against a reference panel comprising 56.5 million variants from 1000 Genomes phase 3 34 , and UK10K 35 . Eight putative LRRK2 LoF variants were identified in the 23andMe cohort (LOFTEE high-confidence). After manual curation all putative carriers of each variant were submitted for validation by Sanger sequencing. Variants with <5 confirmed carriers were excluded from the analysis. The resulting cohort consisted of 749 individuals, each of whom is a Sanger-confirmed carrier for one of three pLoF variants (Figure 1a; Supplementary Table 4). The high rate of Sanger confirmation for rs183902574 (>98%) allowed confident addition of 354 putative carriers of rs183902574, from expansion of the 23andMe dataset, without Sanger confirmation. Analyses with and without these genotyped only carriers were not significantly different (Supplementary Table 6). Across the two most frequent pLoF alleles we observed an extremely small number (<5) of sequenceconfirmed homozygotes; however, given the very small number of observations, we can make no robust inference except that homozygous inactivation of LRRK2 appears to be compatible with life. For the remainder of this manuscript we focus on heterozygous pLoF carriers. . pLoF variants seen more than 10 times appear in colour with remaining variants in grey. LRRK2 pLoF variants are mostly individually extremely rare (less than 1/10,000 carrier frequency), with the exception of two nonsense variants almost exclusively restricted to the admixed American/Latino population (Cys1313Ter and Arg1725Ter), and two largely Non-Finnish European-specific variants (Leu2063Ter and Arg772Ter). All variant protein descriptions are with respect to ENSP00000298910.7. c) Schematic of the LRRK2 gene with pLoF variants marked by position, with the height of the marker corresponding to allele count in gnomAD (grey bars) and UK Biobank (blue bars). The 51 exons are shown as rectangles coloured by protein domain, with the remaining exons in grey. The three variants genotyped in the 23andMe cohort are annotated with their sample count in black text.
The three combined datasets provide a total of 1,455 carrier individuals with 134 unique LRRK2 pLoF variants. These variants are found across all major continental populations (Figure 1b To confirm that LRRK2 pLoF variants lead to a reduction in LRRK2 protein levels, we analysed total protein lysates from cell lines with three unique pLoF variants by immunoblotting. We obtained lymphoblastoid cell lines (LCLs) from two families with naturally occuring heterozygous LoF variants, and a third variant was CRISPR/Cas9 engineered into embryonic stem cells (Supplementary Figure 2), which were differentiated into cardiomyocytes. In all instances, LRRK2 protein levels were visibly reduced compared to non-carrier controls ( Figure  2). These results agree with a previous study which assessed three separate pLoF variants, and found significant reductions in LRRK2 protein levels 32 . Together, these six functionally validated variants represent 82.5% of total pLoF carriers in this study (1,201/1,455). Although heterozygous pLoF carriers have LRRK2 protein remaining, we believe that this state represents the closest genetic model to the partial kinase inactivation expected to be generated by therapeutic LRRK2 inhibitors. We next sought to determine whether lifelong lowering of LRRK2 protein levels through LoF results in an apparent reduction in lifespan. We found no significant difference between the age distribution of LRRK2 pLoF variant carriers and non-carriers in either the gnomAD or 23andMe datasets (two-sided Kolmogorov-Smirnov P=0.085 and 0.46 respectively; Figure 3a), suggesting no major impact on longevity, though we note that at current sample sizes we are only powered to detect a strong effect (see Methods). For a subset of studies within gnomAD, phenotype data is available from study or national biobank questionnaires or from linked electronic health records (see online methods). We manually reviewed these records for all 60 of the 255 gnomAD LRRK2 pLoF carriers for whom data were available, and recorded any phenotypes affecting the lung, liver, kidney, cardiovascular system, nervous system, immunity, and cancer (Supplementary Table 5). We did not find an over-representation of any particular phenotype or phenotype category in LRRK2 pLoF carriers.
The 23andMe dataset includes self-reported data for thousands of phenotypes across a diverse range of categories. We performed a phenome-wide association study on LRRK2 pLoF carriers compared to non-carriers for 368 health-related traits (see methods) and found no significant association between any individual phenotype and carrier status (Figure 3b). In particular, we found no significant associations with any lung, liver or kidney phenotypes (Supplementary  Tables 6 and 7).
The UK Biobank resource includes measurements for 30 blood serum and four urine biomarkers. We found no difference in any of these biomarkers between pLoF carriers and controls (Supplementary Table 8; Supplementary Figure 3). In particular, there was no difference between carriers and non-carriers for urine biomarkers transformed into clinical measures of kidney function (Figure 4a; see online methods) and no difference in six blood biomarkers commonly used to assess liver function ( Figure 4c). We also observed no difference in spirometry measurements of lung function ( Figure 4b).
We grouped self-reported disease diagnoses in UK Biobank individuals into categories corresponding to organ system and/or mechanism (Supplementary Table 9; see online methods). We observed no enrichment for any of these phenotype groups in LRRK2 pLoF carriers when compared to non-carriers (Supplementary Table 10). We also mined ICD10 codes from hospital admissions and death records for any episodes relating to lung, liver and kidney phenotypes, removing any with a likely infectious or other external causes (Supplementary  Table 11; see online methods), and identified six pLoF carriers with ICD10 codes relating to these organ systems (6.19%), compared to 4,536 non-carriers (9.87%; Supplementary Tables 12  and 13). Our results indicate that approximately one in every 500 humans is heterozygous for a pLoF variant in LRRK2, resulting in a systemic lifelong decrease in LRRK2 protein levels, and that this partial inhibition has no discernible effect on survival or health. These results suggest that drugs that reduce LRRK2 kinase activity are unlikely to show toxicity due to on-target effects. This observation is particularly important given that individuals at risk of PD will need to take therapeutics chronically, likely soon after clinical diagnosis and extending for decades. While initial phase I studies of the safety of LRRK2 kinase inhibitors have shown a promising safety profile 23,37 , these are not yet able to address the question of long-term, on-target pharmacology-related safety profiles.
The rarity of pLoF variants in LRRK2, combined with the relatively low prevalence of PD, prevents us from directly assessing whether LRRK2 inhibition may reduce the incidence of PD with current sample sizes (Supplementary Table 6). Future cohorts with much larger numbers of sequenced and phenotyped individuals (likely in the millions of samples) will be required to answer this question. As such, our study focuses entirely on the question of whether partial genetic LRRK2 inactivation has broader phenotypic consequences that might correspond to adverse effects of chronic inhibition of LRRK2 inhibitors.
We acknowledge multiple limitations to the work described here. Firstly, we relied on heterogeneous phenotype data mostly derived from self-reported questionnaires. Both 23andMe and gnomAD record only age at recruitment, and participants are of a relatively young age compared to the typical age of onset for PD. Our ascertainment of LRRK2 pLoF variants in 23andMe was necessarily incomplete, due to the availability of targeted genotyping rather than sequencing data in this cohort; this means that a subset of the 23andMe individuals treated as non-carriers could be carriers of LRRK2 pLoF variants not genotyped or imputed in this dataset. Finally, the low-frequency of naturally occuring pLoF variants in LRRK2 results in a relatively small number of carriers which could be assessed for each biomarker and phenotype, meaning that we are not well-powered to detect subtle or infrequent clinical phenotypes arising from LRRK2 haploinsufficiency. However, our study strongly suggests that any clinical phenotype arising from partial inactivation of LRRK2 is likely to be substantially more benign than early-onset PD.
This study provides an important proof of principle for the value of very large genetically and phenotypically characterized cohorts, combined with thorough variant curation, in exploring the safety profile of candidate drug targets. Over the coming years, the availability of complete exome or genome sequence data for hundreds of thousands of individuals who are deeply phenotyped and/or available for genotype-based recontact studies, combined with deep curation and experimental validation of candidate pLoF variants, will provide powerful resources for therapeutic target validation as well as broader studies of the biology of human genes.

Data and code availability
The data presented in this manuscript and the code used to make the figures are available in https://github.com/macarthur-lab/gnomad_lrrk2.

Conflicts of interest
A.K., B.A., A.G., P.M., P.C., and members of the 23andMe Research Team are current or former employees of 23andMe, Inc., and hold stock or stock options in 23andMe. DGM is a founder with equity in Goldfinch Bio, and has received research support from AbbVie, Astellas, Biogen, BioMarin, Eisai, Merck, Pfizer, and Sanofi-Genzyme. EVM has received research support in the form of charitable contributions from Charles River Laboratories and Ionis Pharmaceuticals, and has consulted for Deerfield Management. KJK owns stock in Personalis. MJD is a founder of Maze Therapeutics.

Supplementary Tables and Figures
Supplementary  Table 6: Association statistics for phenotypes specific to lung, kidney and liver (also plotted in Figure 3c). For privacy reasons, counts of less than five are rounded. None of the phenotypes associated with carrier status, even without correction (p < 0.05).

Online Methods gnomAD variant annotation and curation
The gnomAD resource, including both sample and variant quality control (including sample ancestry assignment), is fully described in our companion paper 9 . gnomAD version 2.1.1 was used for this analysis. Putative LoF variants were defined as stop gained, frameshift or essential splice site (splice donor or splice acceptor) as annotated by the Ensembl Variant Effect Predictor 38 .
Variants were included if they were annotated as LoF on any of the three high-confidence GENCODE annotated protein-coding transcripts that are expressed in the lung, liver or kidney. All variants also underwent transcript expression-aware annotation which evaluates the cumulative expression status of the transcripts harboring a variant in the GTEx dataset 39 . All high-confidence variants were found in exons with high evidence of expression across all relevant tissues in GTEx. In addition, all were high-confidence pLoF on the canonical transcript, which is the only transcript to include the kinase domain.
Variants were filtered out if they were flagged as low-confidence by the Loss-of-function Transcript Effect Estimator (LOFTEE) 9 . For the remaining variants, a manual curation was performed, including inspection of variant quality metrics, read distribution and the presence of nearby variants using the Integrative genome viewer (IGV) and splice site prediction algorithms using Alamut.
A single splice-site variant (12-40626187-T-C), found in 77 gnomAD carriers, was identified in an individual with RNAseq data in the Genotype-Tissue Expression (GTEx) project. The RNA-seq reads were manually inspected to look for any effect on splicing. Assessing the read distribution of a linked heterozygous variant in this individual showed convincingly that the variant has no discernible effect on transcript splicing (Supplementary Figure 4). All available tissues were assessed with reads from lung tissue shown in Supplementary Figure 4. The variant was also identified in 8 UK Biobank carriers and in 23andMe and was similarly excluded from these cohorts.
We have complied with all relevant ethical regulations. This study was overseen by the Broad Institute's Office of Research Subject Protection and the Partners Human Research Committee. Informed consent was obtained from all participants.

Sanger validation of gnomAD variant carriers
Sanger validation was performed on genomic DNA derived from peripheral blood under the following PCR conditions: 98°C 2min; 30 cycles 20sec 98°C, 20sec 54°C, 1min 72°C; 3min 72°C using the Herculase II Fusion DNA polymerase (Agilent, 600679). PCR products (5ul) were analyzed on a 2% agarose gel and the remaining product was purified with the Qiagen PCR Purification kit. Sequence analysis was performed with both PCR primers at Quintarabio. Details of variants and PCR primers used for each can be found in Supplementary Table 3.

gnomAD phenotype curation and cohort descriptions
The below described studies with LRRK2 pLoF carriers had available phenotype data. For each study, all available records were manually reviewed to identify any reports of health problems which were categorised into the following classes: lung, liver, kidney, cardiovascular, nervous system, immune and cancer.

The Genomic Psychiatry Cohort (GPC) project
The GPC cohort is a longitudinal resource with the aim of making population-based data available through the National Institute of Mental Health (NIMH). The NIMH repository contains whole-genome sequencing data and detailed clinical and demographic data particularly focussed on schizophrenia and bipolar disorders. A large fraction of participants, 88%, have consented for re-contact 40 .
The screening questionnaire consisted of 32 yes/no questions about mental health issues and 23 yes/no questions covering other medical problems including liver, digestive and cardiovascular problems. There were no specific questions relating to lung or kidney phenotypes although participants were asked to answer yes/no to having any additional health problems. If a participant answered yes to this question, we marked the existence of lung or kidney disease as 'unknown'. One sample was excluded due to conflicting questionnaire answers.
The age of the 25 LRRK2 carriers ranged from 19 to 67. Two carriers, aged 55 and 60, reported having had liver problems with 4 participants over 60 years reporting no liver problems.

The Pakistan Risk of Myocardial Infarction Study (PROMIS)
The PROMIS cohort comprises 10,503 individuals characterized using a phenotype questionnaire with >350 items covering demographic and dietary characteristics and over 80 blood biomarker measurements 41 . The predominant focus of the questionnaire was cardiac function and phenotype. Whist the participants were specifically asked to report suffering from asthma, no other lung, liver, kidney, nervous system or immune phenotypes were directly assayed so these were marked as 'unknown' for these individuals.The 12 LRRK2 LoF carriers in PROMIS did not differ in terms of age, sex and myocardial infarction status when compared to the entire cohort.

The Swedish Schizophrenia and Bipolar Studies
Cases with schizophrenia or bipolar disorder were identified from Swedish national hospitalization registers 42,43 . Controls were selected at random from population registers. All individuals had whole exome sequencing data 44 . All available ICD codes from inpatient hospitalizations and outpatient specialist treatment contacts were provided for each patient.

FINRISK: the National FINRISK study
The FINRISK survey has been carried out for 40 years since 1972 every five years using independent, random and representative population samples from different parts of Finland. For this work, we used sequencing and health register data from FINRISK surveys 1992-2007 45 .
Full health records including ICD10 codes were reviewed by study coordinators who provided us with yes/no answers for each of our phenotype classes.

The BioMe biobank at The Charles Bronfman Institute for Personalized Medicine at Mount Sinai
The Mount Sinai BioMe Biobank, founded in September 2007, is an ongoing, broadly-consented electronic health record (EHR)-linked bio-and data-repository that enrolls participants nonselectively from the Mount Sinai Medical Center patient population (New York City). BioMe participants represent a broad racial, ethnic and socioeconomic diversity with a distinct and population-specific disease burden, characteristic of the communities served by Mount Sinai Hospital. Currently comprised of over 47,000 participants, BioMe participants are of African (AA, 24%), Hispanic/Latino (HL, 35%), European (EA, 32% of whom 40% Ashkenazi Jewish) and other/mixed ancestry.
BioMe is linked to Mount Sinai's system-wide Epic EHR, which captures a full spectrum of biomedical phenotypes, including clinical outcomes, covariate and exposure data from past, present and future health care encounters. The median number of outpatient encounters is 21 per participant, reflecting predominant enrollment of participants with common chronic conditions from primary care facilities. Clinical phenotype data has been meticulously harmonized and validated.
Genome-wide genotype data and whole exome sequencing data is available for >30,000 participants. In addition, whole genome sequencing data is available for >11,000 participants. The full electronic records of three BioMe LRRK2 pLoF carriers were reviewed by local clinicians and we were provided with detailed summaries.

Estonian Biobank of the Estonian Genome Center, University of Tartu
The Estonian Biobank cohort is composed of volunteers from the general Estonian resident adult population 46 . The current number of participants -close to 165,000--represents 15%, of the Estonian adult population, making it ideally suited to population-based studies. Participants were recruited throughout Estonia by medical personnel, and participants receive a standardized health examination, donate blood, and fill out a 16-module questionnaire on health-related topics such as lifestyle, diet and clinical diagnoses. A detailed phenotype summary from a health survey and linked data including ICD10 codes, clinical lab values and treatment and medication information is annually updated through linkage with national electronic health databases and registries.

UK Biobank variant detection and curation
The 49,960 exome sequenced individuals from the UK Biobank were restricted to a subset of 46,062 unrelated individuals of European ancestry. Relatedness was determined using KING kinship coefficient estimates from the genotype relatedness file with a cutoff of 0.0884 to include pairs of individuals with greater than 3rd-degree relatedness. European ancestry was determined by projecting individuals on to 1000 Genomes phase 3 34 principal component analysis coordinate space, followed by Aberrant R package 47 clustering to retain only individuals falling within 1000 Genomes project EUR PC1 and PC2 limits (lambda=4.5). We further removed individuals that self-reported as non-European ethnicity.
We identified all individuals with putative LoF variants detected in the FE analysis pipeline using GATK 3.0 for variant calling and filtering 33 . We did not use the SPB pipeline calls due to advertised errors in the Regeneron Genetics Center pipeline at the time we were conducting these analyses. Variants were included if they were annotated as LoF on any transcript expressed in the lung, liver or kidney. As with the gnomAD analysis, variants were filtered out if they were flagged as low-confidence by LOFTEE, before manual curation of the remaining variants. This curation included inspection of variant quality metrics, read distribution and the presence of nearby variants using IGV and splice site prediction algorithms using Alamut.
In addition, 266 individuals in the full genotyped cohort of 488,288 samples who were carriers of the G2019S risk allele were identified. One individual who was a carrier for both a LRRK2 pLoF variant and G2019S was excluded from all analyses. Carriers of G2019S were not included in the 'non-carrier' cohort in any of the analyses.
LRRK2 pLoF carriers, G2019S risk allele carriers and non-carriers are well matched for both sex (Supplementary Figure 5) and age (Supplementary Figure 6).

Blood serum and urine biomarkers
The first recorded value of all fields relating to 'Blood biochemistry' (field codes 30600-30890) and 'Urine assays' (field codes 30510-30535) was extracted for all individuals. The distribution of values for all biomarkers was plotted (Supplementary Figure 2), and a two-sided Wilcoxon test was used to test for a difference between LRRK2 pLoF carriers and non-carriers.
These data were also extracted for G2019S risk allele carriers and these individuals were compared to pLoF carriers. There was no significant difference in any of the 34 biomarkers between pLoF and G2019S carriers after accounting for multiple testing (Supplementary Table  14).

Clinical measures of kidney function
Albumin to creatinine ratio (ACR) was calculated by dividing the urine microalbumin value (field 30500; mg/L) by the urine creatinine value (field 30510; umol/L) multiplied by a factor of 0.0001131222. Glomerular filtration rate (eGFR) was calculated using the CKD Epidemiology Collaboration (CKD-EPI) creatinine equation 48 . Normal range values for both ACR and eGFR were taken from the National Kidney Foundation website (https://www.kidney.org/kidneydisease/).

Spirometry measures of lung function
To assess lung function we used Global Lung Initiative (GLI) 2012 reference equation Z-scores standardised for age, sex and height for Forced expiratory volume in 1-second (FEV1), Forced vital capacity (FVC) and FEV1/ FVC ratio measured using Spirometry. These calculations are available in field codes 20256, 20257 and 20258 and were described previously 36 .

Grouped phenotype analysis
The list of all codings within the field '20002 Non-cancer illness code, self-reported', were taken from the UK Biobank showcase (http://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=6). All selectable codings were given a primary grouping pertaining to the main system relating to that disease. In rare instances where more than one grouping could be assigned, the second was included as a secondary grouping. Diseases with an autoimmune basis were given a secondary grouping to reflect a similar underlying mechanism. Due to the opposing effects of some respiratory diseases, where appropriate, phenotypes in this category were given a secondary grouping of Airway, Interstitial or Pleural. Any codings reflecting symptoms, trauma/injury, benign cancer, mental health phenotypes or diseases arising as a result of infection were excluded. All phenotype codings and assigned groupings are listed in Supplementary Table 9. Any coding within the field '20001 Cancer code, self-reported' was assigned a grouping of 'Cancer'.
To test for an association between any phenotype group and LRRK2 pLoF carrier status each individual was counted once as either having self-reported any of the phenotypes within a group, or having reported none. A Fisher's exact test was used to test for an association.

Analysis of ICD10 codes
All ICD10 codes relating to diseases of the liver (K70-K77), diseases of the respiratory system not specific to the upper respiratory tract (J20-J22, J40-J47, J80-J99) or kidney diseases (N00-N29) were curated to exclude any with a primary infectious or external cause (Supplementary  Table 11). Asthma was excluded from all analyses to avoid any issues caused by the deliberate ascertainment of the exome sequenced portion of the cohort based on asthma status.

23andMe variant annotation and curation
Putative LRRK2 LoF variants were defined as those classified as high-confidence by LOFTEE. Variants were manually assessed for call rate, genotyping and imputation quality and manually curated to ensure they were expected to cause true LoF.
For each of the two genotyped LRRK2 pLoF, we determined carrier status by manually inspecting and custom calling the probe intensity plots. For the imputed variants, carrier status was determined from the Minimac-Imputed dosage. As these calling methods might produce false positives, we confirmed the participants' genotypes through Sanger sequencing. Individuals with discordant genotypes were excluded. This resulted in a cohort of 749 individuals, each of whom is a Sanger sequence-confirmed carrier for one of three pLoF variants (Supplementary Table 4).
During initial selection and sequencing, expansion of the database led to inclusion of a number of additional individuals genotyped for one of the pLoF variants, rs183902574. We performed custom-calling on these individuals and found 354 deemed high-confidence carriers (Supplementary Table 4). As these individuals were not Sanger sequenced, all subsequent analyses were performed both including and excluding these individuals.
Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services (E&I Review).

Testing the power to detect an age effect in 23andMe
As a positive control for the age analysis, we tested the APOE Alzheimer's disease risk allele rs429358, which has a known effect on lifespan. This effect is highly significant in this dataset (P=1.2x10 -211 ).
Given that the carrier count for rs429358 is much higher than for LRRK2 pLoF, we assessed the power of the 23andMe dataset to detect an age effect associated with LRRK2 pLoF variants that is of the same effect size as the known effect of the APOE allele rs429358 by sampling carriers of this variant. We randomly selected N carriers of rs429358 from the 23andMe dataset, performed a KS test on the age distribution of those carriers versus 4,000,000 non-carriers, and considered the resulting p-value. We repeated this process 100 times, and then computed the proportion of these simulations with p < 0.05. This tells us our power to reject the null hypothesis that APOE does not have an affect on age at alpha=0.05, if we had N carriers in the dataset. We repeated this for different values of N, between 1,000 and 20,000 (Supplementary  Table 15).

Phenotype selection
The 23andMe dataset includes self-reported phenotype data for thousands of phenotypes across a diverse range of categories. These phenotypes have different sample sizes and prevalences, so the power to detect associations varies widely. We began with a curated set of 748 disease phenotypes. We then applied a liberal filter based on our power to detect an association with carrier status. More specifically, assuming a MAF of 2E-05, we restricted to phenotypes where we had power 0.1 to detect an association effect with OR > 1.3 (for binary traits) or beta > 0.2 (for quantitative traits) at alpha=0.0001 significance. This left us with 460 binary and 14 quantitative phenotypes.

Association testing
For the subset of 368 health-related phenotypes (i.e. excluding any related to diet, drug use, lifestyle and personality), we first restricted to individuals for whom we had phenotypic data. We calculated pairwise identity by descent (IBD) over all individuals using a modified version of IBD64, and then iteratively removed individuals until we were left with a set of participants, no two of whom shared > 700cM in IBD. We then tested the association between the phenotype and carrier status, controlling for age, sex, genotyping platform, and the first ten genetic principle components. We used logistic regression for binary phenotypes, and linear regression for quantitative phenotypes.
To control for population structure we restricted our analyses to participants with > 97% European ancestry, but the results did not qualitatively change when we dropped this restriction. We also tested associations using only individuals whose carrier status was confirmed by Sanger sequencing, but this also did not result in any meaningful difference.
A Bonferroni corrected P-value threshold for 368 independent tests of 1.36x10 -4 was used to assess statistical significance.

Power analysis
For each phenotype, we compute the theoretical odds ratio we are powered to detect (given in Supplementary Tables 6 and 7) as follows: Let m be the proportion of individuals used in the association study of that phenotype who are LRRK2 pLoF carriers, and let n0 and n1 be the number of controls and cases, respectively. For each OR in the interval [1,10] at steps of 0.02, we compute the power of the Cochran-Armitage trend test to detect an association between a variant with MAF m and odds ratio OR at alpha=0.05, with n0 controls and n1 cases 49 . We report the smallest OR such that this power is >= 0.8.

Analysis of LRRK2 protein levels
Cell culture All cell lines tested negative for mycoplasma contamination on a monthly basis with the MycoAlert™ Detection kit (Lonza, LT07-118) and MycoAlert™ Assay Control Set (Lonza, LT07-518). Cells were grown at 37°C with 5% CO2.

Human embryonic stem cell culture
All pluripotent stem cells were approved by Harvard ESCRO protocol #E00052 and #E00067. Human embryonic stem cells (hESCs) were obtained from WiCell Research Institute (WA01, H1) under an MTA. Cell lines were authenticated by visual inspection of cell morphology with brightfield microscopy, staining with anti-Oct4 antibody to determine maintenance of pluripotency (Santa Cruz, sc-5279, data not shown), sent to WiCell Research Institute after 6 months of passaging or after isogenic cell line generation for karyotyping, and in some cases PCA of RNA sequencing data to confirm clustering with other pluripotent stem cell lines. Pluripotent stem cells were plated onto hESC qualified matrigel (VWR, BD354277) coated 6 well plates, mTeSR1 media was changed daily (StemCell Technologies, 85850), and cells were passaged every 5-7 days with 0.5mM EDTA.

Lymphoblastoid cell culture
Lymphoblastoid cell lines (LCLs) were obtained from Coriell Biorepository (GM18500, GM18501, GM18502, HG01345, HG01346) and approved by the Broad Institute Office of Research Subject Protection protocol #3639. Cell lines were authenticated by visual inspection of cell morphology with brightfield microscopy and in some cases PCA of RNA sequencing data to confirm clustering with GTEx LCLs. LCL media was changed every other day with RPMI 1640 (Life Technologies), 2mM L-glutamine (Life Technologies), and 15% FBS (Sigma).

Cardiomyocyte differentiation
Cardiomyocyte differentiation of the control and engineered H1 hESC lines was performed according to the protocol by Lian et al., 2012. Briefly, 500,000 cells were plated on hESC qualified matrigel (VWR, BD354277), grown in mTeSR1 media for four days (StemCell Technologies, 85850), and switched to RPMI media (Life Technologies) with B27 supplement (Life Technologies), switching to B27 with insulin at day 7, for the remainder of the protocol. On day 0 of differentiation, 12µM CHIR99021 (Tocris) was applied for 24 hours. At day 3, cultures were treated with 5µM IWP2 (Tocris) for 24 hours. Bright field images and movies were acquired at day 17 and cells were harvested for protein/RNA extraction at day 19.

Isogenic cell line engineering
The following guide and HR template were delivered into single cell H1 hESCs via nucleofection (Lonza 4D-Nucleofector X unit) using the P3 Primary Cell Kit (V4XP-3024), pulse code CA137, and pX459 (Addgene): AATAAGGCATTTCATATAGT and ACAGGCCTGTGATAGAGCTTCCCCATTGT GAGAACTCTGAAATTATCATCTGACTATATGAAATGCCTTATTTTCCAATGGGATTTTGGTCAAGATTAA. Cells were allowed to recover from nucleofection in mTeSR supplemented with 10µM Rock Inhibitor (Y-27632, Tocris) overnight. The following three days the cells were treated with 0.25µg/ml puromycin (VWR) in mTeSR. Cells were then cultured in mTeSR until colonies were ready to be split. Engineered cells were split into single cells and plated in matrigel coated 96well plates at a density of 0.5 cells per well. Plates were screened for colonies 8-10 days after plating and grown until colonies were ready to be split. Colonies were then split with 0.5mM EDTA into two identical 96-well plates, one for DNA extraction/PCR/sequencing and one for freezing cells. Once colonies were ready to be split, 96-well plates were frozen in mFreSR (Stem Cell Technologies) and stored at -80°C until HR positive wells were identified. HR edited cells were then thawed and expanded for 4 generations, validated by Sanger sequencing, karyotyping, and OCT4 staining prior to proceeding with cardiomyocyte differentiation.

Off-target analysis of CRISPR/Cas9 engineering
To detect any potential off-target effects caused by CRISPR/Cas9 genome editing, wholegenome sequencing (WGS) was conducted for both the engineered and control cell lines. DNA extraction, Quality Control (QC) and 30x PCR-Free WGS were performed by the Genomics Platform at the Broad Institute. AllPrep DNA/RNA extraction kit was used, following its protocol. Alignment, marking of duplicates, recalibration of quality scores and variant calling are all performed using GATK best practices 50 .
We identified 157,230 variants in the engineered cell line that were not found in the control cell line as candidate variants. For the guide RNA (gRNA) used, we defined potential off-target regions as those with <4bps mismatch against the full 20bp gRNA sequence (334 regions), and/or <2bp mismatch against the seed (=PAM proximal) 12bp of the gRNA sequence (5,780 regions), each followed by the NGG PAM. We looked for any candidate variant that fall into the potential off-target region, resulting in detection of only one variant (chr8-65084564-A-AT) that falls onto a region with 1 mismatch against the seed 12bp of gRNA sequence (chr8:65084560-65084575). No variants with <4bp mismatch against the full 20bp gRNA sequence or perfect match at the seed region were detected. Since a mismatch at the seed region decreases the likelihood of off-target variants 51,52 , and also because the single variant we detected is a known variant (rs1161563412) observed in the population without apparent phenotypic association, we concluded no major off-target effect exists at the level of violating the main steps of our research. All the analysis for the detection of potential off-target were conducted using pybedtools 53 and CRISPRdirect 54 software.