With the advent of personalized medicine emerging in the neurology landscape, genetic testing is playing a greater role in counseling individual patients with Parkinson’s disease (PD). LRRK2 and GBA are the most prevalent genetic risk factors for PD1,2. Mutations in both LRRK2 and GBA occur worldwide, and founder effects have been described in some populations3,4. LRRK2 and GBA variant carriers with PD can be enrolled in neuroprotection trials5,6; knowledge of genetic status is required to determine trial eligibility7,8. Thus, study recruitment may be biased toward better studied populations with known higher mutation frequency9,10. Despite suggestion that LRRK2 variants may be increased in Puerto Ricans10,11, mutation rates in individuals with self-reported Puerto Rican ethnicity10,12,13 have not been widely studied.

Understanding the frequencies of potential variants as genetic risk in different populations is especially important as genetic testing shifts to whole -exome and whole -genome sequencing (WES and WGS) whose interpretation rests on these data. However, the majority of exomes in population databases, such as the Genome Aggregation Database (gnomAD v2, https://gnomad.broadinstitute.org/), are from mixed European Caucasians (45% of the >125,000 exomes in gnomAD) while Latino and African origin populations are still underrepresented.

We analyzed the frequencies of the LRRK2-p.G2019S and specific GBA variants in a group of PD participants with self-reported Puerto Rican ethnicity, and also carried out an exploratory WES analysis in a subgroup of Puerto Rican participants as well as a limited number of Dominican PD participants.

Among 32 PD participants with self-reported Puerto Rican ethnicity, and 119 with non-Hispanic, non-Jewish European ancestry, those from Puerto Rico were more likely to harbor the LRRK2-p.G2019S variant than Europeans [Table 1A; 15.6% vs. 4.2%) (OR 4.22 (95% CI: 1.14–15.6)].

Table 1 Clinical characteristics: (A) Participants for the LRRK2-p.G2019S and GBA variant screening. (B) Participants selected for WES (from Mount Sinai Beth Israel).

Fourteen European participants carried a GBA variant while no Puerto Rican cases had an isolated GBA variant; one carried dual LRRK2-GBA variants (Table 1A).

Additionally, because so few Latino PD patients have been examined for known PD candidate genes at present, as a pilot study, 8 patients of Puerto Rican ethnicity from the initial screening and 4 further patients of Dominican Republic (DR) ethnicity with early age of onset and/or positive family history (FH) underwent WES (as the most readily available tool for detecting potentially relevant coding variants) and standard variant discovery analyses (Table 1B). A search for rare (gnomAD overall allele frequency [AF] <0.01) deleterious variants in 74 disease-associated genes (Supplementary Table 2) identified variants in 8 genes (Table 2), including one in TREM2 (rs2234253) in patients 2 and 5 (Table 1B) both of DR ethnicity with tremor predominant PD; and one in CTSB (rs28605689) in patient 6 (Table 1B) with postural instability and gait disorder (PIGD). Neither of these variants were reported in ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/).

Table 2 Rare deleterious variants detected from exomes of Caribbean Latino PD patients in New York.

To detect potential risk alleles for PD in the Puerto Rican/Dominican populations, variant AF in our sample was compared to AF in control populations (gnomAD overall, Latino, African, (Non-Finnish) European (NFE)) for variants in the candidate gene list. While this analysis did not reveal a significant enrichment in PD of any single variant, we noted a nominally increased frequency of rs2230570 (EIF4G1) as compared to the Latino population in gnomAD v2 (Supplementary Table 1; p-value 0.008; adjusted p-value 0.052). This variant showed a similar trend in the NFE population (p-value 0.023; adjusted p-value 0.131) but was not significant when compared to the African population (Supplementary Table 1). Although this variant is predicted to be benign, it could be in linkage disequilibrium with some other functional variant in non-coding region not assessed by WES.

Evaluating loss-of-function variants exome-wide, we discovered a novel stop-gain mutation in exon 10 (of 13) of the COBL gene (p.Glu777*) in patient 1 (Table 1B) who is affected with mild parkinsonism and positive FH. Follow-up analysis revealed a rare deleterious missense variant (rs202117145, Table 2) in the same gene in patient 8 (Table 1B) with gait onset and rest tremor. Neither variant is reported in ClinVar.

Studies on the genetics and clinical spectrum of PD in non-European, especially Latino and Afro-Caribbean, populations are limited and mainly focusing on one or a few PD- associated genes14,15. Such studies are sorely needed, as the demand for direct-to-consumer testing supports the patients´ interest in determining the genetic contribution to their PD. There is also emerging evidence that awareness of genotype could guide therapeutic decisions7,8. Trials for some genes have emerged and multiple are in progress, thus far focused on LRRK2 and GBA variants16,17.

We have previously reported an excess of the LRRK2-p.G2019S variant10 in Puerto Ricans, however, the numbers of screened samples have been relatively small. Here, the increased frequency of LRRK2-p.G2019S in our New York Puerto Rican population was confirmed (15.6%), suggesting that testing in individuals of Puerto Rican ethnicity who are potentially eligible for trials should be considered as it will improve access and equity. However, Nuytemans et al. reported that only 2/37 individuals (5.4%) of Puerto Rican ethnicity ascertained from Manati and Miami harbored p.G2019S mutations13. Local ancestry analysis showed that these two patients had likely European ancestry, suggesting that p.G2019S was introduced to the Latino population through their European ancestor. Further studies of larger cohorts are thus still needed to obtain better frequency estimates. Lower frequencies of p.G2019 were also observed in the PD populations of Argentina (3.2%), Brazil (1.4%), Colombia (1.5%), Ecuador (1.2%), Peru (0.2%), and Uruguay (4.2%)18 and this raises important questions about why even in the Nuytemans study, the rates were slightly increased among Puerto Ricans and whether this is attributable to a greater frequency in the European than the Amerindian ancestry. As the Puerto Rican population represents an admixture of native Taino, African and Hispanic European, thus ancestry informative markers might further help guide such data. Admixtures of European, African, and Amerindian ancestries have similarly been reported in the Puerto Rican PD patients from Miami and Manatí13.

We carried out principal component analysis (PCA) for ancestry related markers for the 12 patients who were included in the WES section. This analysis showed that these patients have admixed genetic background with contributions from European, American (Hispanic) and African global populations (Supplementary Fig. 1). Whether the Puerto Rican ethnicity contains higher proportion of European (non-Ashkenazi) ancestry remains to be determined.

As other genetic variants may be contributing to PD in the Puerto Rican population, and with the advent of WES and WGS, it is important to extend beyond LRRK2-p.G2019S and more frequently assessed GBA variants, and to identify variants that may be overrepresented in understudied populations. In 2021 three studies reported the genetic variants in the Latino population13,19,20, including that of Nuytemans et al. that identified novel LRRK2 and GBA variants in Latino PD patients from the Caribbean13.

Our exploratory WES analysis identified four genes that carried variants with elevated AF in our patient group. COBL is a constrained gene (LOEUF 0.47, Table 2) related to actin cytoskeletal organization during neuronal morphogenesis and its expression is enriched in brain21. The role of EIF4G1 in the PD-related neurodegeneration is still debated22. TREM2, in contrast, has been confirmed to represent a candidate gene for PD susceptibility and progression, and soluble TREM2 (sTREM2) expression in cerebrospinal fluid was increased in PD patients, suggesting sTREM2 in CSF as a substitute immune biomarker for PD neuron injury23. CTSB belongs to the lysosomal proteases, and associations between the lysosomal pathway and PD pathogenesis has been described broadly. Lack of CTSB was shown to impair lysosomal trafficking during neural development24. However, because of our limited sample size, these results should be treated as a starting point for further studies in a much larger set of Latino PD cases.

While our study highlights the urgency to study genetics of these and other underrepresented populations, our WES included a small number of patients, therefore, we did not have statistical power to identify unique clinical differences or major clinical trends. Additional studies are needed, that not only evaluate known variants, but increase the WES/WGS data to identify population-specific risk variants in not only Puerto Rican, and Dominican, but also other Afro-Caribbean populations. This work will be aided by the Latin American Research Consortium on the Genetics of Parkinson’s Disease (LARGE-PD, Mata PI).

Methods

Participants

Evaluation of the frequency of LRRK2-p.G2019S and GBA variants was carried out in 151 participants in total: 32 with self-reported Puerto Rican ethnicity, and 119 with non-Hispanic-non-Jewish European ancestry from MSBI, SUNY Downstate and James J. Peters Veterans Affairs Medical Center (Bronx). Subsequently, 8 participants of Puerto Rican and 4 of Dominican ethnicity who were negative in the LRRK2 or GBA testing, and who had higher likelihood of genetic etiology because of younger age of onset and/or positive FH, were selected for WES. We chose 57 years as cut-off for age of onset as this was the median age in our overall sample, and included patients with any FH, but did not limit to first-degree relatives. One Puerto Rican participant was subsequently excluded from the AF analyses due to discovery of biallelic PRKN deletions on an independent clinical testing including MLPA.

All participants met criteria for PD and signed informed consents from their respective institutions. The study was approved by and conforms with all Ethical Regulations of the Mount Sinai Internal Review Board and SUNY Downstate Internal Review Board.

Genotyping for GBA variants

Genomic DNA (50–80 ng) was used to PCR amplify 5 fragments suitable for Sanger sequencing. Fragments were designed to include 10 common GBA variants (84GG, IVS2 + 1, p.E326K, p.T369M, p.N370S, p.V394L, p.D409G, p.L444P, p.A456P, and p.R496H). Primer sequences for PCR fragments were chosen to avoid amplification of the GBA pseudogene and are available upon request. Following PCR amplification, fragments were cleaned enzymatically with ExoSAP-IT (Affymetrix, Santa Clara, CA) according to manufacturer instructions. Samples were sequenced using ABI BigDye Terminator chemistry (v1.1) and visualized on an ABI 3730xl DNA analyzer. ABI sequencer files for each sample were compared to GBA consensus sequence to identify variants using Mutation Surveyor software (SoftGenetics, State College, PA). In addition to DNA sequencing, a PCR amplification assay was designed to assess presence of the RecNciI recombinant allele (primers and conditions available upon request).

Evaluation of LRRK2 and GBA variant frequencies

The LRRK2-p.G2019S and 11 variants in GBA (84GG, IVS2 + 1, p.E326K, p.T369M, p.N370S, p.V394L, p.D409G, p.L444P, p.A456P, RecNciI, and p.R496H) were analysed as described previously25,26 and Supplementary Methods]. Logistic regression models were used to evaluate the association between the self-reported ethnicity and the LRRK2-p.G2019S variant status, and the GBA variant status, respectively (STATA16, Texas). Two models were run: (i) with and (ii) without including “age of onset” as a co-variate. No difference in results were observed when adjusting vs not adjusting for this co-variate (data not shown). T-tests and Mann–Whitney tests were used to assess group differences.

Whole exome sequencing

We performed WES on 12 samples (8 PR and 4 DR) to detect functional rare variants in 74 candidate genes (see Supplementary Methods for details on WES, Supplementary Table 2 for candidate genes), chosen based on a literature review for genes previously associated with PD and other movement disorders with phenotypic overlap including frontotemporal dementia (FTD), dystonia, essential tremor (ET), chorea-acanthocytosis (ChAc), McLeod Syndrome (MLS), Huntington disease-like 2 (HDL2), neurodegeneration with brain iron accumulation disorders (NBIA) and Alzheimer’s disease (AD). Additionally, we searched for loss-of-function mutations exome-wide. To test for the enrichment of more common variants in candidate genes in the PD population, Fisher’s tests were performed using allele counts in our WES cohort vs. published allele counts in a population database (gnomad v2).

WES was performed using the Agilent SureSelect Human All Exon V4 + UTR (71 Mb) library and an Illumina HiSeq 2000 with a paired-end module. Burrows–Wheeler Aligner (bwa-mem)27 was used to align reads to the reference sequence GRCh37/hg19, after which quality control, duplicate read removal, base quality recalibration, and variant calling was performed using Haplotypecaller in Genome Analysis ToolKit (GATK) version 3.328. The mean per-base coverage for called variants ranged from 25-34X across samples. Single nucleotide variants and small in/dels were annotated with in silico predictions using the Variant Effect Predictor module from Ensembl (http://useast.ensembl.org/info/docs/tools/vep/index.html?redirect=no), in particular, the CADD (https://cadd.gs.washington.edu/), SIFT (https://sift.bii.a-star.edu.sg/), and Polyphen (http://genetics.bwh.harvard.edu/pph2/) scores, and via MACIE scoring29, (protein score > 0.80), which was eventually used to include variants of interest based on an estimate of the probability of a functional variant at the locus.

Stop and splice-acceptor/donor mutations detected in canonical transcripts genome-wide were filtered for quality using LOFTEE (Loss of Function Transcript Estimator) scores from the VEP (Variant Effect Predictor) plugin of Ensembl. As such, based on criteria set by the program, variants that are near the end of the transcript are filtered out30.

In testing for the enrichment of variants in the PD population, Fisher’s tests were limited to alleles found in >3 out of 11 individuals in our cohort, and those that had a higher variant alle frequency in PD (“risk alleles”). p-values were corrected for multiple testing using the false discovery rate method.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.