Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Deep sequencing of sncRNAs reveals hallmarks and regulatory modules of the transcriptome during Parkinson’s disease progression

## Abstract

Noncoding RNAs have diagnostic and prognostic importance in Parkinson’s disease (PD). We studied circulating small noncoding RNAs (sncRNAs) in two large-scale longitudinal PD cohorts (Parkinson’s Progression Markers Initiative (PPMI) and Luxembourg Parkinson’s Study (NCER-PD)) and modeled their impact on the transcriptome. Sequencing of sncRNAs in 5,450 blood samples of 1,614 individuals in PPMI yielded 323 billion reads, most of which mapped to microRNAs but covered also other RNA classes such as piwi-interacting RNAs, ribosomal RNAs and small nucleolar RNAs. Dysregulated microRNAs associated with disease and disease progression occur in two distinct waves in the third and seventh decade of life. Originating predominantly from immune cells, they resemble a systemic inflammation response and mitochondrial dysfunction, two hallmarks of PD. Profiling 1,553 samples from 1,024 individuals in the NCER-PD cohort validated biomarkers and main findings by an independent technology. Finally, network analysis of sncRNA and transcriptome sequencing from PPMI identified regulatory modules emerging in patients with progressing PD.

## Main

Parkinson’s disease (PD) is the second most common neurodegenerative disease worldwide1 but the exact causes for the disease and its progression remain largely unknown2. PD seems to result from a complicated interplay of genetic and environmental factors that affect fundamental cellular processes3. The diagnosis by means of often heterogeneous symptoms is based on clinical criteria4, making validated diagnostic and prognostic biomarkers an unmet demand to support new therapeutic developments.

The rapid development of high-throughput screening techniques and declining experimental costs fostered fundamental research on the molecular underpinnings of PD such as genome- or transcriptome-wide association studies to discover marker genes5,6,7,8,9. On the search for low-invasive biomarkers, transcriptome analyses based on a variety of biofluids such as blood10,11,12 or cerebrospinal fluid13 have been conducted. Until recently, these studies were primarily focused on the coding transcriptome, obtruding the question about the role of noncoding RNAs in PD14.

As part of the sncRNAs, microRNAs (miRNAs) play a versatile role in post-transcriptional messenger RNA (mRNA) regulation. Owing to their stability, diagnostic and prognostic information content and the well-characterized targeting effects on gene expression they are promising biomarker candidates, also in the context of PD15,16. Already in 2011, we reported evidence that blood-borne miRNAs serve as specific markers for human pathologies17. On the basis of salient cross-disease results we focused on neurodegenerative disorders such as Alzheimer’s disease18,19,20 and malignant tumor diseases such as lung cancer21.

Advanced biomarker studies however, require carefully designed cohorts and only a few large-scale PD studies aiming to advance diagnosis, prognosis and therapeutics already fulfill stringent requirements22,23,24. Among them, the PPMI is a multicohort, longitudinal observational study designed to discover and validate objective biomarkers of PD with the focus on tracking the disease progression of enrolled individuals25. The PPMI project constitutes a global effort of 33 clinical sites in 11 countries with regular study participant assessments (Fig. 1a). It also features comprehensive clinical phenotyping to observe hundreds of characteristics to be contrasted for the different genotypes determined, such as the idiopathic and genetic forms of the disease. Further, longitudinal biosampling following rigorous standard operating procedures is performed to set a robust framework for the discovery of early-onset and prognostic biomarkers.

To identify potential noncoding RNA and transcriptomic markers in PPMI we performed RNA-sequencing (RNA-seq) on blood samples drawn at each clinical visit. For short and long RNAs, we carried out optimized assays and sequenced separate aliquots from the same blood samples for paired RNA analyses. Here, we present the evaluation of the sncRNA-seq fraction for disease detection and progression tracking. We shed light onto the potential of the different classes of small RNAs but emphasize the role of miRNAs. We validate relevant findings for miRNAs on the Luxembourg Parkinson’s Study in the framework of the National Centre for Excellence in Research on PD (NCER-PD) cohort22, which was performed independently and with a different technology. We then provide insights on how the key noncoding RNAs regulate gene expression by utilizing long RNA-seq data. The new comprehensive resources presented here encompass massive amounts of data to conduct statistically robust testing of new hypotheses in the fields of small RNA research and molecular biomarkers in PD.

## Results

### High-quality and ultra-deep sncRNA-sequencing covers all major small RNA species

A known factor that affects sncRNA-seq results is sample integrity26,27, calling for rigorous quality control procedures. Sequencing yielded a total of 322.6 billion reads at an average of 59 million per sample (Fig. 1b and Extended Data Fig. 1a). Of these 279 billion (86%) passed a stringent quality control (Extended Data Fig. 1b). Next, 93% of reads mapped to the human genome (Extended Data Fig. 1c). In addition to RNA features, the initial 5,450 sncRNA-seq samples underwent a sample-wise quality control (Fig. 1c). After sequential filtering, 4,440 samples from 1,511 individuals remained as a high-quality dataset. Inspection of the clinical parameters for the samples removed due to lower quality confirmed that the procedure did not introduce a systematic bias to the dataset. Although patients who provided only one sample were approximately two times more likely to be removed by our procedure, as with two samples, the relative cohort sizes and statistical power of the longitudinal analysis remain mostly unaffected. In particular, the cohort composition, the samples per individual and the overall age distribution retained for analysis were not significantly altered. We additionally generated two large-scale datasets to validate findings and understand downstream regulatory effects (Fig. 1d). First, we sequenced the 4,240 transcriptomes (RNA-seq) paired to the small RNA libraries from PPMI to assess potential targets of miRNAs. Second, we validated key findings in an independent cohort of 1,440 whole-blood miRNA-microarray samples (Methods).

The PPMI cohorts comprise healthy controls, idiopathic (sporadic) cases of PD (iPD), mutation carriers partitioned into genetic PD (gPD) and unaffected individuals, prodromal individuals with isolated rapid eye movement behavior disorder and/or hyposmia, individuals showing scans without evidence of dopaminergic deficits (SWEDD) and other neurodegenerative diseases (ND) with similar symptoms (Fig. 1e and Supplementary Table 1). In addition to samples collected at study baseline (BL), we included up to four additional samples (visits V02, V04, V06 and V08). Relative (sub)cohort sizes show only marginal differences between the study time points. Further, cohort subjects are well-matched with respect to age (Fig. 1f). The overall mean and standard deviation for age is 61 ± 10.8 years in all samples, 62 ± 10.2 years mean age in total PD, 59 ± 11.6 years in controls with similar covariate distributions in the remaining subcohorts.

In a first analysis we examined the distribution of reads to different RNA classes. With a focus on miRNAs and optimized protocols for miRNA-seq we reached the expected enrichment. Over 90% of all reads belong to known miRNAs from miRBase v22. Nevertheless, the extremely high read depth also facilitates valid conclusions from other RNA species (piwi-interacting RNAs (piRNAs) (3.1%), ribosomal RNAs (2.4%), small nucleolar RNAs (snoRNAs) (0.9%), unspecific intergenic sequences (0.5%), transfer RNAs (0.5%), coding exons (0.1%) and others (0.1%); Fig. 1g). We did not observe significant shifts in the composition of the RNA classes between disease and control groups (Extended Data Fig. 1d–j). Moreover, the median RNA integrity number of 8.2 indicates excellent RNA quality (Extended Data Fig. 2a) where samples as well as technical replicates showed remarkable correlation in terms of both sncRNA and miRNA counts (Extended Data Fig. 2b,c). Principal-component analysis (PCA) and batch-effect assessment did not reveal an apparent bias in the data, although total sncRNAs seem to be slightly more affected by technical variables than only the subset of miRNAs (Extended Data Fig. 2d–j). After assembling an initial picture of the quality and the distribution of reads into the RNA classes, we then investigated molecular differences between cohorts.

### Molecular effects vary between different types of PD

The uniform manifold approximation and projection (UMAP) analysis of miRNA expression produces a homogeneous but undifferentiated molecular picture on the samples, not revealing obvious batch effects (Fig. 2a). To find global differences between PD and controls we computed the sample density distribution from each subcohort (Fig. 2b). While samples from healthy controls and iPD seem to form a larger cluster, patients and unaffected controls with genetic predisposition cluster together. Notably, the prodromal cohort approximates the genetic cluster, whereas samples from the SWEDD cohort are scattered. While biological variables outweigh the technical ones in a principal variance component analysis (PVCA), a large fraction of variance is not explained by available annotation variables (Extended Data Fig. 2g,i).

We next investigated differential expression by phenotype and compared between total PD (idiopathic + genetic) and controls (healthy + unaffected), only among the genetic cohort and patients without a genetic predisposition separately (Fig. 2c–e). For the first comparison, we identified five miRNAs with considerable effect sizes (miR-487b-3p: −0.22; miR-493-5p: −0.2; miR-6836-3p: 0.2; miR-6777-3p: 0.21 and miR-15b-5p: −0.21) in PD. The volcano plot displays a trend toward a global downregulation of miRNAs in PD. Notably, the comparison for the genetic cohort does not follow this global shift. In total, three miRNAs appeal by their effect size, two are upregulated in affected genetic carriers (miR-103a-2-5p: 0.22, P = 9.4 × 10−4; miR-339-5p: 0.22, P = 7.5 × 10−4) and one downregulated (miR-15b-5p: −0.19, P = 0.0011). The third comparison suggests the non-genetic cohort as a driver for global downregulation. While 24 miRNAs are downregulated, only two are upregulated (miR-125b-5p: 0.24, P = 6.4 × 10−6, miR-100-5p: 0.23, P = 5.4 × 10−5). Overall, the effects on miRNAs seem to be more diverse and stronger in sporadic cases than for genetic cases (Supplementary Table 2). Adjusted and raw P values for each comparison are available from the Supplementary Information and have been omitted here due to the overall high statistical significance of many sncRNAs owing to the large sample sizes.

The exceptional depth of the sequencing data enables calling of new miRNA candidates. From an initial set of 30,924 candidates reported by miRMaster28, 834 were manifested in a consistently detected readout across the subcohorts. For these high-profile candidates we repeated differential expression analysis (Fig. 2f–h). New miRNAs differ from known ones in two aspects. First, the differential expression shift seems to be flipped but consistent for the new miRNAs in total and iPD. Second, hypothesis testing for the difference in expression resulted in 168, 51 and 110 significantly dysregulated miRNA candidates (adjusted Wilcoxon P values at α = 0.05; Supplementary Table 3). Among the downregulated candidates in iPD, we found nov-pred-mir-3679, showing a sufficient minimum free energy of −22 kcal mol−1 and a favorable precursor secondary structure (Fig. 2i). The stem loop of this new miRNA originates from chromosome 15 and exhibits an excellent read profile at the mature forms (Fig. 2j), both of which were downregulated in iPD. We also computed differential expression by the other subcohorts and patient sex and discovered both known and new miRNAs to be affected (Extended Data Fig. 3a–j and Supplementary Tables 4 and 5). Analogous results for the remaining sncRNA classes are available in the Supplementary Information (Supplementary Tables 6 and 7). The next key question addressed is whether miRNAs change also over time and with disease progression.

### Effects within PD depend on time, disease progression and blood cell types

We tested whether miRNAs show increasing effects over time and disease progression by analyzing up to four follow-up visits for patients with PD. To this end, we performed differential expression analysis for all pairwise comparisons of clinical visits (BL versus V02, BL versus V04 and so forth). We took advantage of the core strength of PPMI to provide longitudinal follow-ups available for most patients to perform paired, individualized comparisons between time points instead of pooling samples and comparing at the group level. Notably, we discovered not only many miRNAs to be downregulated within PD over time but observed an amplifying trend correlating to the time difference between visits (Fig. 3a–j).

In order to determine the cellular origin of these disease-associated developments, we deconvolved the blood cell components and cell-type origins for the miRNAs using those comparisons with the highest time Δ (Fig. 3k–p). We observed that downregulated miRNAs mainly originated from exosomes, helper T cells and red blood cells, whereas upregulated miRNAs were often associated with neutrophils and serum. These findings are consistent with an observed increase in neutrophils and decrease in lymphocytes identified in the PPMI blood cell count records and further confirmed by transcriptomic changes in neutrophil-associated genes and pathways observed in the long RNA-seq experiments. Last, larger proportions of unchanged miRNAs were associated with B cells, natural killer cells and cytotoxic T cells. For a systematic analysis of observed depletion trends, we intersected sets of miRNAs that exceeded a fold-change threshold, resulting in 34 miRNAs, all of which were downregulated in one to six of the ten comparisons (Fig. 3q). Thereby, miR-101-3p and miR-144-5p showed the most consistently detected downregulation (Extended Data Fig. 3k). We then repeated the pairwise analysis scheme for the controls and not only observed significantly smaller effects than for PD, in particular for the largest difference in time (Fig. 3r), but the magnitude of differences again positively correlated with the number of miRNAs being consistent in each of the contrasts. Performing the blood cell-type deconvolution for the 34 miRNAs, we observed that red blood cells, exosomes and monocytes likely represent major cellular origins, whereas only a minor fraction was associated with serum (Fig. 3s). Furthermore, the analysis suggests that the pool of leukocytes is a primary source of time-dependent depleted miRNAs in the blood of patients with PD.

For a rather global analysis of effects, we carried out an analysis of variance (ANOVA) of miRNA expression for individuals, across the cohorts, their genetic status, clinical visits, age binning and Hoehn and Yahr staging. Comparing the sets of several hundreds of significant miRNAs obtained (Fig. 3t), the largest pool of miRNAs is shared between all covariates and clinical variables. In addition, the much smaller set of downregulated miRNAs (Fig. 3q) is largely concordant with the ANOVA results (Supplementary Table 8). These consistent findings not only confirm that sncRNAs have diagnostic and prognostic potential for PD but also point at a general age-dependent variability of miRNA biomarkers.

### PD shows two molecular ages of onset and patterns of dysregulation are highly age-dependent

The PPMI cohort covers most of the average human lifespan and we leveraged miRNA expression to test whether molecular ages of disease onset exist. Indeed, the number of miRNAs deregulated in PD is highly age-dependent and shows two ages of onset in the 30s and beginning with the late 60s (Fig. 4a). The number of differentially expressed miRNAs constantly declines in both iPD and gPD and even a similar pattern for the SWEDD patients was observed. Prodromal patients seem to have a late molecular onset and the number of deregulated miRNAs drastically increases with age. Notably, the molecular miRNA landscape shows distinguished features when directionality of dysregulation is considered as well (Fig. 4b,c). The trends displayed suggest severe downregulation of miRNAs at the different ages of onset. Also, patients with iPD or gPD show highly distinct courses at an early age for these miRNAs, whereas upregulated miRNAs seem to behave vice versa. However, we note that the analysis is confounded by the varying age density of samples across the subcohorts. Especially at younger ages (<40), the lower density implies a less-confident assignment of effect sizes (Fig. 4a–c).

### Validation in an independent cohort

Although samples of many clinical sites are included, biomarker discovery calls for a validation in an independent cohort and on a different technology. We thus analyzed the NCER-PD cohort of the Luxembourg Parkinson’s Study, which comprises 1,440 whole-blood-derived microarray samples from 988 donors, each with up to four clinical visits (Supplementary Table 9). Of the participants, 440 were diagnosed with iPD, 81 with Parkinsonism (atypical forms of PD) and 485 were age-matched healthy controls. Among the Parkinsonism subcohort, there were cases of progressive supranuclear palsy (n = 25), unspecified Parkinsonism (n = 13), cerebrovascular disease with Parkinsonism features (n = 13), multiple system atrophy (n = 10), Lewy body dementia (n = 10), cortical-basal syndrome (n = 7) and drug-induced Parkinsonism (n = 3). In total, 640 miRNAs passed quality control and 416 of those overlapped with miRNAs detected in the PPMI cohort. We carried out differential expression analysis between iPD and controls from NCER-PD (Supplementary Table 10) and compared the effect sizes obtained for the three different comparisons in the PPMI cohort. Indeed, highest Spearman correlation was computed for the best-matching cohort in PPMI (iPD versus healthy control; Spearman correlation of 0.43; Fig. 4d and Supplementary Table 11). A 3 × 3 contingency table on the direction of dysregulation for the miRNAs shared between PPMI and NCER-PD resulted in a chi-squared statistic of 67.84 and P = 6.479 × 10−14 (α = 0.05), suggesting an essential concordance of our observations. In contrast, the Spearman correlation between NCER-PD and PPMI for gPD versus genetic unaffected decreased as expected to 0.21 (Extended Data Fig. 4a,b). To find the biomarker set with highest concordance between NCER-PD and PPMI we performed a DynaVenn29 analysis, suggesting a significant overlap (adjusted P = 10 × 10−18) of 222 miRNAs (Extended Data Fig. 4c–e). In comparing the directionality of dysregulation for total PD as reported above, we found miR-15b-5p and miR-487b-3p to be concordant between the studies, with the other three being discordant or non-matching. Additionally, for the miRNAs dysregulated in iPD, we found 17 of the 24 downregulated miRNAs to be concordant, whereas the two upregulated miRNAs were not reproducible. A detailed comparison is available in Supplementary Table 11.

We also repeated the differential expression analysis of miRNAs along the age axis for NCER-PD and observed a remarkable similarity to the PPMI cohort. Effect sizes of dysregulated miRNAs are highest in the third and seventh decade of life and reach their minimum around the age of 65 years. We did not observe substantial differences between the number of downregulated and upregulated miRNAs; however, the former show larger effects at higher ages. Although similar to the patterns for SWEDD, patients diagnosed with Parkinsonism exhibit three (~50 y, ~62 y, ~80 y) and two (~62 y, ~80 y) peaks for the number of downregulated and upregulated miRNAs, respectively. The clear diagnostic patterns that are largely consistent between the two cohorts open the question as to whether the biomarker candidates share similar molecular functions.

### Dysregulated miRNAs resemble hallmarks of PD

Among the most significant hits in a miEAA30 pathway gene set enrichment analysis (GSEA), downregulated miRNAs in PD show an enrichment in the mitochondrion (Fig. 4e, P = 9.57 × 10−13). Similarly, downregulated miRNAs are enriched in exosome (Fig. 4f, P = 4.34 × 10−11), microvesicle (P = 4.34 × 10−11) and known to be freely circulating (P = 9.54 × 10−9). These findings also agree with the categories found by an over-enrichment analysis of miRNAs validated in the NCER-PD cohort (Extended Data Fig. 4f). Among the associated tissues, the top four are brain (Fig. 4g, P = 1.10 × 10−9), spinal cord (P = 1.59 × 10−8), arachnoid mater (P = 1.19 × 10−7) and dura mater (P = 8.89 × 10−7). Our enrichment analysis suggested almost 200 disease hits with the most significant category of Alzheimer’s disease. Among the top ten we also observed neurodegenerative disease miRNAs (P = 1.17 × 10−9), arguing for a broader miRNA signature of such disorders. For biological processes, regulation of helper T cell differentiation (P = 1.36 × 10−6), regulation of neurotransmitter uptake (P = 1.36 × 10−6) and negative regulation of receptor signaling pathway via STAT (P = 4.8 × 10−6) are most significant. Comparing iPD against gPD resulted in two orders of magnitude more categories with substantially smaller P values. For example, an observed depletion of vascular disease miRNAs (P = 4.02 × 10−30) and cellular signaling (Fig. 4h,i) is exclusive to miRNAs dysregulated in iPD. Conversely, we found no significant pathways exclusively for patients from the genetic subcohort. Based on the enrichment analysis of total PD and controls we assessed the ability of blood-borne miRNAs to resemble previously characterized molecular traits of PD. Indeed, we recovered a depletion of precise molecular functions and pathways for each of the molecular hallmarks31 of the disease (Fig. 4j).

The prominent pathway effects indicate distinct molecular regulatory patterns and open the question whether miRNAs also offer prognostic potential. In correlating miRNA expression with time and clinical events we confirmed the overall trend of decreasing miRNAs (Fig. 4k) with strongest effects in iPD. A clustering of the matrix containing patient-wise Spearman correlation values between miRNA expression and time identifies several clusters by annotation such as Hoehn and Yahr staging (Fig. 4l). These patterns are in line with the observation that the pace of disease progression varies substantially between patients and usually worsens with higher ages, calling for a more accurate analysis of miRNAs in the context of individual disease progression.

### MiRNAs are progression markers and constitute progression networks with targeted genes

According to clinical assessments of the Hoehn and Yahr staging, 68 patients moved to a lower level (improved symptoms) with a difference in mean Unified Parkinson’s Disease Rating Scale (UPDRS) part III score of −5.22 (V08 versus BL), 601 slowly progressing patients remained in the same stage but showed a respective difference of +4.62, and 181 were fast progressing patients with an increased staging at later visits and with difference in means of +12.77 (Fig. 5a). We found that the UPDRS part III score at baseline marginally varied for these patients with mean values of 17.92, 22.41 and 17.2, respectively. These observations underline the importance of longitudinal follow-ups to detect discriminating features for patients who are at risk of experiencing a fast disease progression and their response to putative treatment effects. A correlation analysis of miRNA expression per patient with PD from PPMI with at least three visits, suggested 71 miRNAs to be positively correlated in progressing patients, whereas this was negatively correlated in nonprogressing patients (Fig. 5b). In contrast, we observed 71 miRNAs to be negatively correlated with time in progressing cases of PD and positively correlated in nonprogressing individuals (Fig. 5c). Testing a subsequent hypothesis, we asked whether miRNAs and target genes are orchestrated in prognostic modules and we extended the analysis to the transcriptome of the same patients.

Comparing the type of correlation of all pairs of miRNA and mRNA, we observed predominantly a linear dependency, although some exceptions exist (Fig. 5d). The correlation of miRNAs and mRNAs in PD versus controls resulted in high agreement but with some deviations (Fig. 5e). Notably, some miRNAs are almost perfectly anticorrelated to potential targets (Fig. 5f). As the interactions between miRNAs and mRNAs can be unspecific when viewed individually, we assumed a many-to-many relationship by constructing bipartite regulatory graphs. Based on anticorrelated pairs we constructed putative miRNA–mRNA core networks for pooled controls, iPD and gPD exceeding identical edge thresholds (Fig. 5g–i). Of note, we observed differences in the network complexity; for example, number of nodes and edge density between the three sample groups. Both disease-derived networks display an enlarged number of miRNAs and targets with some nodes being shared (such as let-7b-5p) and some being exclusive to the disease (such as miR-25-5p). Examining possible changes to the networks due to disease progression required recomputing graph components for downregulated miRNAs for which putative targets are upregulated in progressing PD and vice versa. Thereby, we identified three core components. First, upregulated targets in patients with progressive PD are potentially regulated by four downregulated miRNAs (Fig. 5j). Two more subnetworks with two upregulated miRNAs, hsa-miR-769-5p and hsa-miR-140-3p, as well as hsa-miR-3157-3p and hsa-miR-5690, showed downregulated and partially shared target genes (Fig. 5k,l).

## Discussion

As the world population ages, neurodegenerative disorders show a steadily rising incidence. To counteract these developments, large-scale disease progression studies are of immense value toward the exploration of curative treatments. PPMI has accumulated a high-quality dataset of RNAs sequenced from whole-blood of controls and PD patients from multiple disease stages at an unprecedented scale. Our in-depth analysis of this dataset highlights the complex changes of RNA expression upon aging and disease onset. While we generated data on basically all small RNA classes, we focus on the interpretation of miRNAs, which represent the majority of reads. Given the careful cohort design and statistical power facilitated by PPMI, effects are considerable and highly significant. Surprisingly, miRNA expression signatures distinguish between iPD and gPD, with the largest effects for the former.

It is fair to reflect upon the findings and to compare high-profile biomarkers with previous studies in a systematic manner, which is a challenging task. Studies rely on varying body fluids (such as serum, plasma, peripheral blood mononuclear cells (PBMCs) and whole-blood) or different technologies (such as microarrays, next-generation sequencing (NGS) and quantitative PCR with reverse transcription) and cohorts differ in various aspects (such as age distribution, ethnical background and clinical patient characterization). Further, miRNAs are often reported imprecisely using outdated nomenclature. We thus performed two approaches where we first investigated circulating PD miRNAs as annotated in the Human miRNA and Disease Database (HMDD)32 and then carried out a comprehensive literature search. A detailed inspection of manuscripts containing miRNAs annotated as circulating PD markers in HMDD however, revealed that cell cultures or solid tissues have also been investigated to a large extent. These manuscripts were excluded due to their inaccurate annotation in HMDD.

The observed downregulation of miR-15b-5p is in line with a proposed signature of five serum miRNAs in patients with PD33 and an miRNA–mRNA interaction network analysis of PD progression34. In the former study we found miR-185-5p to be concordant with our finding of being downregulated. However, other miRNAs identified in the study did not match our data, most notably miR-195-5p. Whether the difference comes from the different matrix (serum versus whole-blood), the technology or the cohort, remains unclear. Another work that compared PBMCs of 19 patients with PD and 13 control patients using microarrays identified 18 miRNAs35. While we could confirm downregulation for many candidates (such as hsa-miR-26a-5p, hsa-miR-199a-3p, hsa-miR-199b-3p, hsa-miR-126-5p, hsa-miR-29c-3p, hsa-miR-29b-3p and hsa-miR-335-5p), others did not match. Remarkably, hsa-miR-147a from this study was not even expressed in our dataset. Another two miRNAs related to PD in PBMCs are miR-155-5p (up in PD) and miR-146a-5p (down in PD)36. In contrast, both miRNAs were downregulated in PPMI. Notably, miR-155-5p regulation as reported in the study by Caggie et al. was correlated to the treatment using levodopa. Another strongly downregulated miRNA in our study, miR-29a-3p has been described as downregulated in another small dataset37. Finally, a recent study described plasma miRNAs in 92 healthy controls and 108 individuals with iPD38. The work excels by a reasonable preselection of brain-enriched miRNAs. Basically, all significant miRNAs were highly expressed in patients with PD, matching our data (miR-7-5p) but again with some discordant markers (such as miR-22-3p). It is however, worth mentioning that both age and sex between the case and control cohorts were significantly different (P < 0.0001), likely confounding miRNA expression.

We also found a systematic decline in miRNA expression over time, revealing a signature of miRNAs associated with major blood cell types of the immune system. Among signature miRNAs are several members of the miR-19 and miR-29 families, which is consistent with earlier findings34,39,40,41,42. Also, PD-induced changes for both monocyte and exosome expression signatures has been reported in several other studies43,44,45,46. Our results also highlighted a strong age-related component of miRNA expression with two distinct waves of molecular onset. Ravanidis et al. recently showed differentially expressed, brain-enriched miRNAs to have discriminatory power between iPD and gPD47. Comparably, we report a highly significant downregulation of mitochondria, exosome and brain-expressed miRNAs over time in PD. In particular, mitochondrial dysfunction seems to play a role in the disease, although it is still unclear whether it is part of the cause or a consequence of the disease48. Shamir et al.49 described affected gene interaction networks in iPD to be associated with oxidation, ubiquitination and other disease hallmarks, an effect that is mirrored by the dysregulated miRNAs revealed here. Our review highlights the challenges faced in comparing markers from the different studies and yields the expected heterogeneous results. Further comprehensive reviews on this topic are necessary to facilitate our understanding on stable miRNA biomarkers in PD.

Of interest are also the new miRNA candidates. In this context, new means that the miRNAs are either not annotated for Homo sapiens in the miRBase release 22.1 or even have not been described before. The new miRNA candidate we report to be of relevance for PD is one case example where miRNAs are annotated for model organisms but not in humans. An miRBlast analysis using miRCarta50 revealed that this new human miRNA is known in other species, including Rattus norvegicus and Mus musculus (rno-mir-1839 and mmu-mir-1839). In fact, it increases the likelihood that respective candidates are not artifacts but indeed genuine human miRNAs.

Large-scale biomarker studies require a validation in an independent cohort at best using an independent technology. In this aspect, we observed good concordance between NGS-samples from PPMI and microarray samples from NCER-PD. In particular, we provided a list of promising candidates with a concordant direction of dysregulation, pivotal to be further comprehensively described in downstream work. Several reasons are conceivable to explain why the observed effect sizes are not optimally reproducible between the cohorts. First, different technologies with varying levels of sensitivity and bias were compared. Second, determined expression of blood-borne miRNAs is sensitive to low-input extraction protocols and varies by donor. Third, demographic covariates of participants are not matched entirely. Nevertheless, both studies excel through their breadth of scope and we could not find major biases in either of the RNA datasets. While NCER-PD also provides a rich set of clinical data as well as genome and transcriptome sequencing, PPMI is certainly more advanced in other specific aspects such as comprehensive imaging resources and the number of clinical sites involved.

However, the datasets still exhibit varying numbers of samples for the different PD types and staging schemes, further complicating the analysis. Likewise, the age range of the cohorts is not covered uniformly between 30 and 70 years, potentially influencing age-dependent results. Moreover, the count and classification-scheme of distinct PD subtypes is subject to ongoing clinical research. We tried to address this uncertainty by comparing subcohorts in multiple ways, once with pooled controls and genetic and non-genetic cohorts independently. Further, the manifestation of rapid eye movement behavior disorder in prodromal individuals has not yet been broadly characterized; however, the absence of motor symptoms suggests an early disease stage for these patients, who then show a late PD onset, which we confirm from a molecular viewpoint. On the contrary, patients with SWEDD stand out as they are not part of the other clinical continua and exhibit distinct molecular patterns. Further investigation specifically in the direction of SWEDD is thus deemed necessary. A few confounding variables also exist, as PD subjects were required to be drug-naive at study enrollment but could possibly be treated with PD medications at any later clinical visit. Our consecutive comparison of clinical visits nevertheless suggests a systematic and drug-independent effect induced upon miRNAs that has been reported earlier51. Further, vague symptoms can lead to diagnostic uncertainty, a fact that is reflected by a small subset (<10) of patients with PD from PPMI who received a differential diagnosis at their later visits. However, given the design stringency and scale of the presented study we expect a high tolerance toward such exceptions.

Even though we found no apparent bias in the data with respect to batch effects, they are difficult to quantify in general and thus can remain hidden. The greatest challenge is to determine the specificity of markers toward PD. For instance, miR-144 that we reported has already been described to be a general disease marker52. Also, the quite heterogeneous mixture of blood cells can potentially skew molecular patterns. Here, single-cell studies will certainly contribute to an improved resolution in the near future.

Our resource provides a valuable dataset that will facilitate and support diverse facets of molecular PD research. This includes both, the field of studies on pathophysiological processes of neurodegenerative disorders as well as the research on small noncoding RNAs and how they regulate gene expression. Based on the presented data, we propose interesting candidates as diagnostic and prognostic biomarkers suitable for downstream validation. Although effect sizes of single makers might be limited, advanced machine learning and artificial intelligence are expected to contribute to an improved prognosis of PD by mining marker panels. To this end, it is mandatory to better understand the causal nature of the small noncoding RNA patterns, such as by modeling their activation mechanisms upstream or by validating entire target pathways. In this context, cellular models of PD combined with CRISPR–Cas9 knockouts could support ongoing efforts of selecting pivotal candidates for future trials. Finally, when causative cascades are delineated, small noncoding RNAs are on their way to become a class of new therapeutic targets for PD and likely for other neurodegenerative disorders as well. In summary, the presented analysis and comprehensive RNA-seq resource is an excellent foundation to conduct analyses of blood-borne biomarkers with great statistical power.

## Methods

### Ethical compliance

The compliance with all ethical regulations is affirmed. Written and informed consent from each study participant of PPMI and NCER-PD was collected before actions and analysis22,25. The PPMI protocol, recruitment materials and model informed consent form were approved by the University of Rochester’s Institutional Review Board (IRB). The PPMI protocol, recruitment materials and informed consent form, inclusive of any site-specific modifications to the informed consent form, were approved by the IRB of each site. Finally, the consent form that was used by participants who participated in genetic screening before enrollment in PPMI was approved by the Indiana University School of Medicine’s IRB. A complete study protocol, as well as additional documentation outlining acceptable data use is available through the PPMI website at: https://www.ppmi-info.org/study-design/research-documents-and-sops/. Per this document, the PPMI study has been conducted in accordance with good clinical practice and International Conference on Harmonization guidelines. For NCER-PD, all participants provided written informed consent and the collection has been approved by the National Ethics Board (CNER ref. 201407/13) and Data Protection Committee (CNPD ref. 446/2017).

### Cohort design and blood sample collection

Data used in the preparation of this article were obtained from the PPMI database (https://www.ppmi-info.org/data). For up-to-date information on the study, visit https://www.ppmi-info.org/.

A total of 5,450 longitudinal patient blood samples from 1,614 individuals were collected as part of the PPMI. Patient blood samples were collected by venous draw during periodic clinical visits alongside clinical and imaging data at each longitudinal time point. Blood samples were captured in 2 × 2.5-ml PAXgene Vacutainer tubes, mixed by serial inversion and incubated at room temperature (18–25 °C) for 24 h and stored at −80 °C before purification. One microgram of RNA purified according to the QIAGEN PAXgene blood miRNA kit protocol (cat. no. 763134) was used for analysis. A comprehensive study overview, complete sample collection schedules and clinical protocols are available at https://www.ppmi-info.org/.

### RNA extraction and microarray experiments from NCER-PD

The utilized microarray data were published previously53 with sample and data processing steps as performed for the study described in the following. Briefly, for quantification of RNA eluates, a Nanodrop ND-1000 Instrument was used (Thermo Fisher Scientific). Quality control was performed using an Agilent 2100 Bioanalyzer according to manufacturer’s instructions (Agilent Technologies). Expression profiles of all miRBase release v21 human miRNAs were determined using Agilent Sureprint G3 Human miRNA (8 × 60K) microarray slides. Each array targets 2,549 microRNAs with 20 replicates per probe. Then, 300 ng total RNA was dephosphorylated, labeled and hybridized using the Agilent’s miRNA Complete Labeling and Hybridization kit according to manufacturer’s protocol. After hybridization for 20 h at 55 °C, the slides were washed twice and scanned using Agilent’s High-Resolution Microarray Dx Scanner. Scan images were transformed to raw text data using Feature Extraction Software v.12.0.3.1 (Agilent Technologies). All participants of the Luxembourg Parkinson’s Study were comprehensively genotyped via NeuroChip54.

### Quality control and sample preprocessing

For the PPMI whole-transcriptome data, binary base call files were converted to FASTQ files using bcltofastq v.1.8.4, and FASTQ files were merged and quantified with Salmon (v.0.7.2) based on GRCh37 (hs37d5) and GENCODE v.19. For the combined analysis of miRNA and mRNA, miRNA counts were normalized to RPM mapped to miRNA normalized and mRNAs normalized to transcripts per million and log2-transformed. During downstream analysis only paired samples for each individual having a unique match of sequencing identifiers were considered.

For the NCER-PD microarray evaluation, samples were processed as previously described30. From the initial 1,553 samples collected, we removed technical (pool) replicates and patients with an unclear or undetermined diagnosis, retaining 1,440 samples for the analysis. Based on the computed sample detection matrix, features were filtered to exclude miRNAs with a detection rate <50% in each subcohort (idiopathic PD, Parkinsonism, control). Expression signals were quantile-normalized and log2-transformed. The entire sample preprocessing procedure was implemented in R v.3.5.1 with data.table v.1.12.0, bit64 v.0.9.7, preprocesscore v.1.46.0 libraries and Python v.3.6.7 with Numpy v.1.16.4.

### Statistics, bioinformatics and data analysis

In the context of this manuscript, no statistical methods were used to predetermine sample sizes but our sample sizes are described in the respective study manuscripts of PPMI25 and NCER-PD22. Both studies use nonrandom allocation of participants into multiple disease groups and control groups, which were matched by age, sex and treatment (where applicable). Data collection at vendor sites was performed blinded but analysis was not performed blinded to the conditions of the experiments. Data distribution was assumed to be normal and tests on normality were performed but not enforced for downstream analysis (Supplementary Information). Primary analysis was conducted with Snakemake v.5.5.4 in combination with standard analysis code written in R v.3.5.1 and Python v.3.6.7 using dedicated packages for statistics and graphics. Software packages were organized and specifically loaded using the dependency management tool conda v.4.8.3. Downstream analysis was performed using R with multiple software packages listed in the following. For differential expression analysis, log2 fold changes based on normalized expression e were calculated as $$\log _2\left( {\frac{{e_{case}}}{{e_{control}}}} \right)$$ and effect size was calculated with the function cohen.d from the effsize v.0.7.6 package with arguments (ecase, econtrol). Statistical tests (Student’s t-test, Wilcoxon rank-sum test, Shapiro–Wilk test, one-way ANOVA/F-test, Kruskal–Wallis test) were performed two-tailed using the R stats implementations. Wherever applicable 95% confidence intervals were used. All plots shown were generated using either base R functionality and/or functions from the ggplot2 v.3.2.1, hexbin v.1.28.1, pheatmap v.1.0.12, rcolorbrewer v.1.1.2, viridis v.0.5.1, cowplot v.1.0.0, ggrepel v.0.8.1, ggsci v.2.9, gplots v.3.0.1.2, ggpubr v.0.2.4, highcharter v.0.7.0, fmsb v.0.7.0, gridgraphics v.0.4.1, ggrastr v.0.1.7, ggformula v.0.9.2, ggextra v.0.9, upsetr v.1.4.0 and complexheatmap v.2.2.0 packages. Dimension reduction by PCA was accomplished using the prcomp function from R stats and by UMAP55 using the corresponding umap v.0.2.3.1 package. Network analyses were performed using igraph v.1.2.4.2, ggraph v.2.0.1, networkd3 v.0.4, rgraphviz v.2.30.0 and grbase v.1.8.3.4. Common data manipulation tasks were implemented using data.table v.1.12.8, openxlsx v.4.1.4, scales v.1.1.0, stringr v.1.4.0 and rfast v.1.9.5. Precursor secondary structures and read profiles were extracted from the web reports of miRMaster. For the blood component deconvolution of miRNAs, we utilized a miRNA × cell-type matrix from a previously published NGS dataset (GSE100467)56. First, rows were transformed into percentages by dividing each row by its sum. Next, percentages across origins were obtained by dividing each column by its sum of the row percentages. Then, values were extracted and normalized for all miRNAs of interest. One-way ANOVA P values were obtained using the aov function of the R stats package. P values were corrected in any case using the FDR-controlling procedure by Benjamini–Hochberg, also implemented in R stats. Adjusted P values <0.05 were considered significant unless stated otherwise. For the aging trajectories of deregulated miRNAs, effect sizes were computed for binned ages at study consent using a sliding window approach. For each integer i in the interval [30,80], effect size (Cohen’s d) was calculated for each miRNA using samples from the age bin (i, i + 10). Thereby, we required at least ten samples from both case and control groups and a minimum absolute value of 0.3 to consider the effect size for a miRNA at age i. The resulting binary matrix of dimension miRNA × age bins was summed up column-wise to yield a vector containing the number of miRNAs de-/up-/downregulated at each age bin. Final trajectories were obtained by applying the function smooth.spline from R stats with the age bins as predictor variable and the summarized counts as response variable while allowing eight degrees of freedom. The upper and lower 95% confidence bands were calculated using the jackknife residuals. Spearman and Pearson correlation coefficients were computed with the cor function of R stats. To perform PVCA to estimate any influence of batch variables on the expression and the degree of expression variance associated to the annotation variables, we used the function pvcaBatchAssess from the R package pvca v.1.28.0. MiRNA set enrichment (Kolmogorov–Smirnov test) and over-representation analyses (Hypergeometric tests) were performed with miEAA 2.0 (ref. 30). Analysis of significant overlaps between miRNAs detected in either NGS or the microarray data was accomplished with DynaVenn29, once for the two lists of miRNAs sorted by increasing area under the curve and once sorted by decreasing area under the curve. Patients were classified as progressing if Hoehn and Yahr staging increased in the last available clinical visit according to BL and as nonprogressing otherwise. Patient-wise Spearman correlation of expression and time (clinical visits) was calculated and averaged over all patients for each miRNA. For the regulation network analysis, we calculated Spearman’s ρ between paired miRNA and mRNA samples and investigated the subnetworks spawned by pairs exceeding a given threshold for anticorrelation, which was set to be at least −0.35. On a case-by-case analysis for the different group comparisons, thresholds on edge values were further numerically decreased to yield smaller but more specific networks. On these subnetworks we searched for strongly connected components. For the progression networks, we computed the miRNA–mRNA correlations separately for progressing and nonprogressing patients and kept miRNAs that had flipped signs of mean correlation with time in progressing versus nonprogressing patients. Next, we discarded miRNAs and mRNAs exhibiting a mean expression shift between first and last clinical visit of disease-progressing patients <0.05. The final networks displayed were obtained analogously to that described above. The final manuscript figures were compiled using Microsoft PowerPoint v.16.36.20041300. A comprehensive description on reproducibility of statistical results to comply with the requirement of Nature research journals is given in the Life Sciences Reporting Summary.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

Individual-level PPMI RNA-seq data supporting the findings of this work are deposited for distribution in accordance with approved ethics and data-oversight requirements. Access to the full complement of standardized protocols and de-identified human participant data associated with this study is available to researchers through the study data repository record at https://ppmi-info.org. The authors declare that all other data supporting the findings of this study are freely available upon request from the corresponding author.

## Code availability

The computer code written for the primary analysis is available upon request from the corresponding author.

## References

1. 1.

Wang, H. et al. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 388, 1459–1544 (2016).

2. 2.

Deweerdt, S. Parkinson’s disease: 4 big questions. Nature 538, S17 (2016).

3. 3.

Kalia, L. V. & Lang, A. E. Parkinson’s disease. Lancet 386, 896–912 (2015).

4. 4.

Jankovic, J. Parkinson’s disease: clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 79, 368–376 (2008).

5. 5.

Hamza, T. H. et al. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson’s disease. Nat. Genet. 42, 781–785 (2010).

6. 6.

Klemann, C. J. H. M. et al. Integrated molecular landscape of Parkinson’s disease. NPJ Parkinsons Dis. 3, 14 (2017).

7. 7.

Scherzer, C. R. et al. Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl Acad. Sci. USA 104, 955 (2007).

8. 8.

Li, Y. I., Wong, G., Humphrey, J. & Raj, T. Prioritizing Parkinson’s disease genes using population-scale transcriptomic data. Nat. Commun. 10, 994 (2019).

9. 9.

Wang, Q. et al. The landscape of multiscale transcriptomic networks and key regulators in Parkinson’s disease. Nat. Commun. 10, 5234 (2019).

10. 10.

Calligaris, R. et al. Blood transcriptomics of drug-naive sporadic Parkinson’s disease patients. BMC Genomics 16, 876 (2015).

11. 11.

Chen-Plotkin, A. S. Blood transcriptomics for Parkinson disease? Nat. Rev. Neurol. 14, 5–6 (2018).

12. 12.

Wang, C., Chen, L., Yang, Y., Zhang, M. & Wong, G. Identification of potential blood biomarkers for Parkinson’s disease by gene expression and DNA methylation data integration analysis. Clin. Epigenetics 11, 24 (2019).

13. 13.

Hossein-nezhad, A. et al. Transcriptomic profiling of extracellular RNAs present in cerebrospinal fluid identifies differentially expressed transcripts in Parkinson’s disease. J. Parkinsons Dis. 6, 109–117 (2016).

14. 14.

Marz, M., Ferracin, M. & Klein, C. MicroRNAs as biomarker of Parkinson disease? Neurology 84, 636 (2015).

15. 15.

Leggio, L. et al. microRNAs in Parkinson’s disease: from pathogenesis to novel diagnostic and therapeutic approaches. Int. J. Mol. Sci. https://doi.org/10.3390/ijms18122698 (2017).

16. 16.

Starhof, C. et al. The biomarker potential of cell-free microRNA from cerebrospinal fluid in Parkinsonian syndromes. Mov. Disord. 34, 246–254 (2019).

17. 17.

Keller, A. et al. Toward the blood-borne miRNome of human diseases. Nat. Methods 8, 841–843 (2011).

18. 18.

Leidinger, P. et al. A blood based 12-miRNA signature of Alzheimer disease patients. Genome Biol. 14, R78 (2013).

19. 19.

Keller, A. et al. Validating Alzheimer’s disease micro RNAs using next-generation sequencing. Alzheimers Dement. 12, 565–576 (2016).

20. 20.

Ludwig, N. et al. Machine learning to detect Alzheimer’s disease from circulating non-coding RNAs. Genomics Proteomics Bioinformatics 17, 430–440 (2019).

21. 21.

Fehlmann, T. et al. Evaluating the use of circulating microRNA profiles for lung cancer detection in symptomatic patients. JAMA Oncol. https://doi.org/10.1001/jamaoncol.2020.0001 (2020).

22. 22.

Hipp, G. et al. The Luxembourg Parkinson’s Study: a comprehensive approach for stratification and early diagnosis. Front. Aging Neurosci. https://doi.org/10.3389/fnagi.2018.00326 (2018).

23. 23.

Valentine, M. N. Z. et al. Multi-year whole-blood transcriptome data for the study of onset and progression of Parkinson’s disease. Sci. Data 6, 20 (2019).

24. 24.

Lawton, M. et al. Blood biomarkers with Parkinson’s disease clusters and prognosis: the Oxford discovery cohort. Mov. Disord. 35, 279–287 (2020).

25. 25.

Marek, K. et al. The Parkinson’s Progression Markers Initiative (PPMI) – establishing a PD biomarker cohort. Ann. Clin. Translat. Neurol. 5, 1460–1477 (2018).

26. 26.

Ludwig, N. et al. Bias in recent miRBase annotations potentially associated with RNA quality issues. Sci. Rep. https://doi.org/10.1038/s41598-017-05070-0 (2017).

27. 27.

Ludwig, N. et al. Small ncRNA-seq results of human tissues: variations depending on sample integrity. Clin. Chem. 64, 1074–1084 (2018).

28. 28.

Fehlmann, T. et al. Web-based NGS data analysis using miRMaster: a large-scale meta-analysis of human miRNAs. Nucleic Acids Res. 45, 8731–8744 (2017).

29. 29.

Amand, J., Fehlmann, T., Backes, C. & Keller, A. DynaVenn: web-based computation of the most significant overlap between ordered sets. BMC Bioinformatics 20, 743 (2019).

30. 30.

Kern, F. et al. miEAA 2.0: integrating multi-species microRNA enrichment analysis and workflow management systems. Nucleic Acids Res. 48, W521–W528 (2020).

31. 31.

Antony, P. M. A., Diederich, N. J., Krüger, R. & Balling, R. The hallmarks of Parkinson’s disease. FEBS J. 280, 5981–5993 (2013).

32. 32.

Huang, Z. et al. HMDD v3.0: a database for experimentally supported human microRNA–disease associations. Nucleic Acids Res. 47, D1013–D1017 (2019).

33. 33.

Ding, H. et al. Identification of a panel of five serum miRNAs as a biomarker for Parkinson’s disease. Parkinsonism Relat. Disord. 22, 68–73 (2016).

34. 34.

Liu, X. et al. miRNAs and target genes in the blood as biomarkers for the early diagnosis of Parkinson’s disease. BMC Syst. Biol. 13, 10 (2019).

35. 35.

Martins, M. et al. Convergence of miRNA expression profiling, α-synuclein interaction and GWAS in Parkinson’s disease. PLoS ONE 6, e25443 (2011).

36. 36.

Caggiu, E. et al. Differential expression of miRNA 155 and miRNA 146a in Parkinson’s disease patients. eNeurologicalSci 13, 1–4 (2018).

37. 37.

Chi, J. et al. Integrated analysis and identification of novel biomarkers in Parkinson’s disease. Front. Aging Neurosci. 10, 178 (2018).

38. 38.

Ravanidis, S. et al. Validation of differentially expressed brain-enriched microRNAs in the plasma of PD patients. Ann. Clin. Translat. Neurol. 7, 1594–1607 (2020).

39. 39.

Botta-Orfila, T. et al. Identification of blood serum micro-RNAs associated with idiopathic and LRRK2 Parkinson’s disease. J. Neurosci. Res. 92, 1071–1077 (2014).

40. 40.

Bai, X. et al. Downregulation of blood serum microRNA 29 family in patients with Parkinson’s disease. Sci. Rep. 7, 5411 (2017).

41. 41.

Cao, X.-Y. et al. MicroRNA biomarkers of Parkinson’s disease in serum exosome-like microvesicles. Neurosci. Lett. 644, 94–99 (2017).

42. 42.

Barbagallo, C. et al. Specific signatures of serum miRNAs as potential biomarkers to discriminate clinically similar neurodegenerative and vascular-related diseases. Cell. Mol. Neurobiol. https://doi.org/10.1007/s10571-019-00751-y (2019).

43. 43.

Burgos, K. et al. Profiles of extracellular miRNA in cerebrospinal fluid and serum from patients with Alzheimer’s and Parkinson’s diseases correlate with disease status and features of pathology. PLoS ONE 9, e94839 (2014).

44. 44.

Paschon, V. et al. Interplay between exosomes, microRNAs and Toll-Like receptors in brain disorders. Mol. Neurobiol. 53, 2016–2028 (2016).

45. 45.

Schlachetzki, J. C. M. et al. A monocyte gene expression signature in the early clinical course of Parkinson’s disease. Sci. Rep. 8, 10757 (2018).

46. 46.

Nissen, S. K. et al. Alterations in blood monocyte functions in Parkinson’s disease. Mov. Disord. 34, 1711–1721 (2019).

47. 47.

Ravanidis, S. et al. Circulating brain-enriched microRNAs for detection and discrimination of idiopathic and genetic Parkinson’s disease. Mov. Disord. 35, 457–467 (2020).

48. 48.

Billingsley, K. J. et al. Mitochondria function associated genes contribute to Parkinson’s disease risk and later age at onset. NPJ Parkinsons Dis. 5, 8 (2019).

49. 49.

Shamir, R. et al. Analysis of blood-based gene expression in idiopathic Parkinson disease. Neurology 89, 1676 (2017).

50. 50.

Backes, C. et al. MiRCarta: a central repository for collecting miRNA candidates. Nucleic Acids Res. 46, D160–D167 (2018).

51. 51.

Goh, Y. S., Chao, X. Y., Dheen, T. S., Tan, E.-K. & Tay, S. S. Role of microRNAs in Parkinson’s disease. Int. J. Mol. Sci. https://doi.org/10.3390/ijms20225649 (2019).

52. 52.

Keller, A. et al. miRNAs can be generally associated with human pathologies as exemplified for miR-144*. BMC Med. 12, 224 (2014).

53. 53.

Fehlmann, T. et al. Common diseases alter the physiological age-related blood microRNA profile. Nat. Commun. 11, 5958 (2020).

54. 54.

Blauwendraat, C. et al. NeuroChip, an updated version of the NeuroX genotyping platform to rapidly screen for variants associated with neurological diseases. Neurobiol. Aging 57, 247.e9–247.e13 (2017).

55. 55.

McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. J. Open Source Softw. 3, 861 (2018).

56. 56.

Juzenas, S. et al. A comprehensive, cell specific microRNA catalogue of human peripheral blood. Nucleic Acids Res. 45, 9290–9301 (2017).

## Acknowledgements

PPMI, a public-private partnership, is funded by the Michael J. Fox Foundation (MJFF) for Parkinson’s Research and funding partners, including Abbvie, Allergan, Amathus Therapeutics, Avid, Biogen, BioLegend, Bristol-Myers Squibb, Celgene, Denali, GE Healthcare, Genetech, GlaxoSmithKline, Handl Therapeutics, Insitro, Janssen Neuroscience, Lilly, Lundbeck, Merck, MSD, Pfizer, Piramal, Prevail, Roche, Sanofi Genzyme, Servier, Takeda, Teva, UCB, Verily, Voyager and Golub Capital. We highly appreciate the encouragement and support of K. Nikolich in setting up and performing the study. We thank S. Levy and N. Prasad at the HudsonAlpha Institute for Biotechnology for adapting the small RNA sequencing protocol to the instruments and platforms used and performing the sequencing experiments. The microarray experiments were performed as fee-for-service by Hummingbird Diagnostics. We acknowledge the support of HbDx. We give special thanks to all participating patients in the study. Additionally, we are very grateful for all received funding and private donations that enabled us to carry out the project. Furthermore, we acknowledge the joint effort of the NCER-PD consortium members generally contributing to the Luxembourg Parkinson’s Study. The study is funded by the MJFF for Parkinson’s Research under reference 14446 and by the Schaller-Nikolich Foundation.

## Author information

Authors

### Contributions

F.K. led statistical analysis of data and contributed to writing the manuscript; T.F. performed primary analysis of data and supported statistical analysis; I.V. and E.A. contributed to primary analysis of data and matching to clinical variables; E.H. contributed to general analysis of sequencing data; M.K. and C.B. assisted general analysis of microarray data; N.L.G. supported statistical analyses and visualization of aggregated data. P.G. supported statistical analysis with a focus on disease progression; K.L.P. contributed to clinical interpretation of data; B.C. contributed to study setup and interpretation of data; R.B., L.G. and R.K. contributed to the setup of NCER-PD and interpretation of microarray data; D.G. and B.M. contributed to development of standard operating procedures, participant recruitment and clinical interpretation of data; E.M. contributed to miRNA-gene interaction network analysis and manuscript writing. T.W.C. contributed to interpreting and discussing data as well as writing the manuscript. D.W.C. and K.V.K.-J. worked on mRNA expression data and blood cell-type analyses and contributed to study setup; A.K. contributed to study setup, supported statistical analyses and contributed to manuscript writing.

### Corresponding author

Correspondence to Andreas Keller.

## Ethics declarations

### Competing interests

The authors declare the following competing interests: E.H. received funding from the MJFF; M.K. is employed by Hummingbird Diagnostics; K.L.P. is a member of the executive steering committee of PPMI and received funding from the MJFF; B.C. is an Associate Director in MJFF’s Research Programs division and employed by the MJFF; B.M. is a member of the executive steering committee of PPMI and principal investigator of the Systemic Synuclein Sampling Study of the MJFF; T.W.C. is a founder and scientific advisor of Alkahest; D.W.C. and K.V.K.-J. received funding from MJFF and are principal investigators of RNA bioinformatics at PPMI; A.K. received funding from the MJFF.

Peer review information Nature Aging thanks Alice Chen-Plotkin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data

### Extended Data Fig. 2 Quality control, batch variable analysis, and dimension reduction.

a, Histogram of RNA integrity numbers (RINs) for all valid sequencing samples. The dashed grey line indicates the distribution mean located at 7.8. b, Clustered pairwise correlation matrix of (pooled) technical controls and sequencing replicates based on the sncRNA counts. c, Clustered pairwise correlation matrix of all valid samples based on the known miRNA counts and with major annotation variables depicted on the left. d, The number of principal components resulting from PCA of the miRNA to sample expression matrix versus the cumulative percentage of variance explained (continuous blue line), the fraction of non-zero coefficients (dashed blue line), and the standard error (dashed orange line). e, Distribution of miRNA coefficient loadings (n = 897) onto the first two principal components. The smoothed trendline (orange) is enclosed by light grey bands showing the standard error. Center of measurements were computed by LOESS fitting. Standard errors correspond to the 95% confidence intervals. f, Stacked coefficient barplots for the miRNAs showing the largest sum of absolute coefficient contributions along the top 10 principal components as indicated by the color legend. g, PVCA based on sncRNA counts using the major annotation variables and combinations of such for all valid samples. h, UMAP embeddings for all valid samples using the sncRNA counts and colored in order by PPMI project phase, sequencing plate, study participant, age binning, genetic status, and Hoehn and Yahr staging. i, PVCA based on miRNA counts using the major annotation variables and combinations of such. j, UMAP embeddings for all valid samples using the miRNA counts and colored in order by gender, biogroup, PPMI project phase, genetic status, age binning, and clinical visits.

### Extended Data Fig. 3 Complementary biogroup and cohort comparisons of miRNA expression.

a–j, Volcano plots with absolute effect size (cohen’s d) of expressed miRNAs for the secondary group comparisons considered in the PPMI study. MiRNAs that exhibit both a considerable effect size and fold-change are colored in blue or green when being down- or upregulated, respectively. Orange points depict miRNAs with a considerable effect size but a small fold-change. k, Normalized and log2-scaled expression of miRNAs showing a progressive depletion in PD (cf. Figure 3q) grid-wrapped by clinical visit and disease status / subcohort.

### Extended Data Fig. 4 miRNA expression marker validation using 1,440 microarray samples of the independent NCER-PD cohort.

a, Scatter-plot of miRNA effect size (n = 416, black circles) between total PD and controls obtained from sncRNA-seq (PPMI) and microarray (NCER-PD). The smoothed trendline (orange) is enclosed by light grey bands showing the standard error. Center of measurements were computed by LOESS fitting. Standard errors correspond to the 95% confidence intervals. One-dimensional histograms are shown on the right and top for sncRNA-seq results and microarray results, respectively. b, Similar to a but restricted to the effect sizes between gPD and unaffected genetic carriers from the sequencing study. c, Venn-diagram for the most significant overlap between the miRNAs detected with sequencing or microarray as computed by DynaVenn. Input miRNAs were sorted by decreasing AUC for iPD vs. healthy controls in case of the sequencing data and iPD vs. controls in case of the microarray data. d, The most significant overlap obtained (adj. p = 3.39 × 10−18) at n = 222 shown in c as a function of the position in the DynaVenn input list. P values were computed using a two-sided hypergeometric test and subsequently corrected using the BH-FDR procedure at an α-cutoff of 0.05. e, Two-dimensional representation of the entire P value search space investigated by DynaVenn for all possible overlaps of miRNAs from the sequencing and microarray studies. P values were computed as described in d. f, Results of a miEAA 2.0 over-representation/enrichment analysis using the miRNAs determined for the most significant overlap displayed in c. The x-axis shows the BH-FDR adjusted P value for each category on the y-axis. Integers on the right display the number of hits observed per category or biological pathway. The color shading corresponds to the number of hits per category. P values were computed using a two-sided hypergeometric test and subsequently corrected using the BH-FDR procedure at an α-cutoff of 0.05.

## Supplementary information

### Supplementary Information

Detailed acknowledgements of the NCER-PD consortium.

### Supplementary Table 1

Full sample and patient annotations for the PPMI cohort.

### Supplementary Table 2

Results of primary (main cohorts) differential expression analysis comparisons of miRBase v22 miRNAs for the PPMI cohort. The table includes the statistics shown in the main results of the manuscript and raw as well as adjusted P values for the Wilcoxon rank-sum and Student’s t-test.

### Supplementary Table 3

Results of primary differential expression analysis comparisons of new miRNA candidates for the PPMI cohort.

### Supplementary Table 4

Results of secondary (other cohorts) differential expression analysis comparisons of miRBase v22 miRNAs for the PPMI cohort.

### Supplementary Table 5

Results of secondary differential expression analysis comparisons of new miRNA candidates for the PPMI cohort.

### Supplementary Table 6

Results of primary differential expression analysis comparisons of known sncRNAs for the PPMI cohort.

### Supplementary Table 7

Results of secondary differential expression analysis comparisons of known sncRNAs for the PPMI cohort.

### Supplementary Table 8

ANOVA results for primary sample annotation variables using miRBase miRNA counts from the PPMI cohort.

### Supplementary Table 9

Full sample and patient annotations for the NCER-PD cohort.

### Supplementary Table 10

Results of primary and secondary differential expression analysis comparisons of miRBase v21 miRNAs for the NCER-PD cohort.

### Supplementary Table 11

Comparison of miRNA effect size and directionality of dysregulation (idiopathic PD versus healthy controls) between PPMI and NCER-PD. Results on concordance and disconcordance of miRNAs reported in the main text in contrast to NCER-PD are provided in a separate sheet.

## Rights and permissions

Reprints and Permissions

Kern, F., Fehlmann, T., Violich, I. et al. Deep sequencing of sncRNAs reveals hallmarks and regulatory modules of the transcriptome during Parkinson’s disease progression. Nat Aging 1, 309–322 (2021). https://doi.org/10.1038/s43587-021-00042-6

• Accepted:

• Published:

• Issue Date:

• ### The role of noncoding RNAs in Parkinson’s disease: biomarkers and associations with pathogenic pathways

• Ming-Che Kuo
• Sam Chi-Hao Liu
• Ruey-Meei Wu

Journal of Biomedical Science (2021)

• ### RNA sequencing of whole blood reveals early alterations in immune cells and gene expression in Parkinson’s disease

• David W. Craig
• Elizabeth Hutchins
• Kendall Van Keuren-Jensen

Nature Aging (2021)