Single cell transcriptomics of human PINK1 iPSC differentiation dynamics reveal a core network of Parkinson’s disease

Parkinson’s disease (PD) is the second most prevalent neurodegenerative disorder, characterized by the loss of dopaminergic neurons (mDA) in the midbrain. The heterogenous pathology and complex underlying mechanisms are only partly understood and there is no treatment able to reverse PD progression. Here, we targeted the disease mechanisms by focusing on the ILE368ASN mutation within the PINK1 (PARK6) gene and systematically characterized midbrain dopaminergic neurons obtained from human induced pluripotent stem cells (iPSCs). Single-cell RNA sequencing (RNAseq) and pairwise analysis of gene expression identied genes consistently differentially expressed during the mDA neuron differentiation process. Subsequent network analysis revealed that these genes form a core network, which interacts with all known 19 protein-coding Parkinson’s disease-associated genes and includes ubiquitination, mitochondrial, protein processing, RNA metabolism, and secretory pathways as important subnetworks. Our ndings indicate a unied network underlying PD pathology and offers new interpretation of the phenotypic heterogeneity of PD.


Introduction
Parkinson's disease (PD) is one of the most prevalent neurological disorders, second only to Alzheimer's disease, with a prevalence of 1.8%, among persons over the age of 65 and 2.6% in the 85 to 89 age group [1][2][3] . As the average age of the population increases, PD is expected to pose an increasing and signi cant burden to society. PD is characterized by the presence of motor symptoms, including bradykinesia, rigidity and tremor, but many patients also develop non-motor symptoms, such as depression or dementia 4 . Unfortunately, there are no treatments to slow down or reverse the progression of the disease.
Current treatments only temporarily ameliorate the motor symptoms, but do not slow down the progression of PD 5 .
Most of our understanding of PD pathology is based on the identi cation of mutations that lead to PD, although these account for only 3-5% of PD cases, with the remaining cases being idiopathic 2 . Despite the small fraction of cases they explain, these mutations provide an important window into the underlying molecular mechanisms of PD because they identify pathways which, when disrupted, are able to cause the disease. Many of these mutations converge on mitochondrial homeostasis, repair and mitophagy. Hence, mitochondrial dysfunction likely plays a key role in the pathophysiology of PD 6 . An important group of these mutations lies within the PINK1 gene, a nuclear gene that codes for a kinase important for mitochondrial function 7 . PINK1, also known as PARK6, clearly plays a role in mitophagy, but its function is much broader. The targets of this kinase are involved in many cellular functions, including neuronal maturation 8,9 . The impact of the loss-of-function mutations in this important kinase has not yet been fully elucidated 10 .
One key characteristic of PD is the death of the midbrain dopaminergic (mDA) neurons, but until recently, it was impossible to study them since 60% of these neurons have disappeared by the time of diagnosis, and about 90% by the time the patients die 11 . As a result, research was limited to animal models 12 , but human-like mutations in animals often do not lead to the development of comparable pathology 13 . The development of cellular reprogramming allows nowadays for the conversion of somatic cells into induced pluripotent stem cells (iPSCs), which can subsequently be differentiated into neurons. This enables us to generate iPSCs from the skin cells of PD patients 14 and differentiate them into mDA neurons carrying the disease 15,16 . Furthermore, by using cells from patients with a known PD-associated mutation, we can analyze mutation-speci c pathology in the correct, permissive, genetic background. This is important because genetic background is known to signi cantly in uence the age of onset and severity of the disease 7,17 . Differentiating mDA neurons from iPSCs provides an almost unlimited source of neurons that allow for deep phenotyping and the elucidation of the cellular mechanisms underlying PD pathology.
Here, we generated iPSCs from the broblasts of a patient homozygous for the PD-associated mutation ILE368ASN in the PINK1 gene 2 . We used an optimized differentiation protocol to speci cally generate mDA neurons, as this cell type displays a unique susceptibility to cell death in PD 15,18,19 . In contrast to mDA neurons, the effect of PD on other types of DA neurons is variable, hence their study would not be as pertinent to the elucidation of mechanisms causing PD-induced cell death 11,20 .
The development of mDA neurons diverges from other DA neurons even before they commit to neural fate. During early neural development, neural tube stem cells generate neurons and glia, the two basic building blocks of the brain. While other DA neurons follow this direct path from neural stem cells to neurons, determined by the expression of the Pax6 transcription factor, mDA neurons develop from radial glial cells of the oor plate and are exposed to high levels of the SHH transcription factor 21 , which prevents expression of Pax6 22 and sets these cells on an entirely different developmental path 18,23 .
Hence, they follow a very different signalling cascade, leading to the expression of a unique transcriptome that signi cantly differs from that of other DA neurons 18,20,24,25 . Their distinct identity is re ected in their function and leads to their unique susceptibility to death in PD, which in turn leads to the classic movement symptoms of the disease 11,19,20,23,26 .
To investigate the disease mechanisms linked to the PINK1 mutation, we performed extensive single-cell RNA expression analysis using DropSeq 27 at four different timepoints during mDA neuron differentiation 15,16,18 . Differential expression analysis between four pairs, each consisting of a PINK1 and a control cell line, identi ed genes that were strongly and consistently dysregulated at all timepoints.
Based on known protein-protein interactions, we show that these genes form a network and that its members directly interact with all 19 protein-coding PARK genes associated with PD. This suggests that other PD-associated mutations may also be acting through this common network of genes. Overall, our results indicate the existence of a common disease mechanism that potentially underlies idiopathic PD as well and may represent a unifying perspective on PD progression that will guide future intervention strategies.

Results
We performed a systematic differential expression analysis at single-cell resolution between an iPSC line carrying the PD-associated ILE368ASN mutation in the PINK1 gene and an age-and sex-matched control cell line (control 1-2 in ref 28 ) during their differentiation into mDA neurons (Fig. 1). After preprocessing and quality ltering, we used 4495 cells and 18,097 genes in our downstream analysis (Methods). Expression of key genes was also con rmed by qPCR and corresponding marker staining. For data integration, we performed a network analysis to identify the underlying key mechanisms of PD progression.
Single-cell RNAseq analysis reveals gene expression panel for direct classi cation of iPSC-status The standard accepted procedure for determining iPSC status involves staining for iPSC markers and the use of gene expression platforms, such as Scorecard. However, we show that a standard panel of genes readily detectable by single cell analysis can be used to con rm iPSC status directly in the cells used for the single cell experiment, rather than by staining or expression analysis of an independent sample, which in some cases may not re ect the iPSC status of the experimental sample.
Fibroblasts were isolated from a 64 year-old male with PD symptom onset at 33 years of age who was homozygous for the ILE368ASN (P.I368N/P.I368N) mutation in the PINK1 gene (Coriell Institute, Cat. No. ND40066). The broblasts were con rmed to have a normal karyotype ( Supplementary Fig. 1) and an episomal reprogramming method (Epi5™ Episomal iPSC Reprogramming Kit, Invitrogen USA, cat. # A15960) was used to avoid unwanted genetic modi cations of the target cell. The karyotype of iPSCs was con rmed (Supplementary Fig. 2) and their iPSC status was ascertained by standard methods, including staining for the POU5F1 marker (also known as Oct4) and by using the TRA-1-80 antibody (Fig.  2a), as well as by a TaqMan iPSC Scorecard Assay, which also con rmed their trilineage potential (Fig.   2b).
In vitro differentiation of iPSC-derived mDAs recapitulates the in vivo process To con rm that our differentiation protocol (Supplementary Table 1) is recapitulating the in vivo mDA differentiation path, the expression of key genes (OTX2, EN1, LMX1B, LMX1A, and FOXA2) known to drive mDA differentiation in vivo 19,23,37 was con rmed using our single-cell RNAseq data or by qPCR (Fig.3b, c, Table 1). The absence of the early, non-mDA neuron marker PAX6 21,22 was further validated by staining (Supplementary Fig. 3 and Supplementary Table 2).
In vivo, the development of mDA phenotype depends on the early high expression of Sonic Hedgehog (SHH), followed by the induction of Wnt signaling and the expression mDA-speci c downstream pathways 18,19,23 . Consistent with these in vivo differentiation steps, among the highest-expressed genes on day 6 (D6) of the differentiation protocol were PTCH1, a receptor for SHH, and FZD7, a receptor for Wnt proteins (Fig. 3b). Following the differentiation process from iPSCs to D21 (Fig. 4a), we could see the onset of expression of the mature mDA markers TH and KCNH6 (also known as GIRK2) in the D21 cluster (Fig. 4b). By day 21, many factors that are speci c to the mDA differentiation path, such as TCF12, ALCAM, PITX2, ASCL1, and DDC 20,38-41 , were among the most highly expressed genes (Fig. 4c). The onset of TH expression indicated that the cells achieved the state of early postmitotic mDA neurons 18 (Fig. 4b), which was con rmed by staining (Fig. 4d) and qPCR (Fig. 4e, Supplementary Table 3). In addition, markers speci c to mature mDA neurons including TH, DDC, ALDH1A1, and KCNJ6 (also known as GIRK2) also started to become expressed ( Fig. 4 b-e, Table 1).
The PINK1 ILE368ASN mutation is associated with persistently dysregulated expression of nearly 300 loci To analyze mutation-induced changes in gene expression, we performed single-cell RNA expression analysis, at four important timepoints during the mDA neuron differentiation process, using DropSeq 27 - (Fig. 1). After preprocessing and quality-ltering (Methods and Supplementary Fig. 4), a total of 4495 cells (2518 control and 1977 PINK1 cells) and 18,097 genes were included in our analysis). Control and PINK1 cells co-clustered together based on their differentiation stage, from iPSCs, to day 6 (D6), D15 and D21 (Fig.4a), indicating that RNA expression was speci c to differentiation stages, rather than to a cell line. The PINK1 cells at D10 showed low viability, hence D10 timepoint was not included in the pairwise analysis (Fig. 5).
The analysis of pairwise differential expression at each time point with adjusted p-values (p adj ) <0.01 fold changes (FC) > 0.1 resulted in 14 genes that were upregulated and 13 genes that were signi cantly downregulated in the PINK1 cell line compared to control (Table 2). Because iPSCs are very different from differentiating neuronal precursors, we next tested whether inclusion of iPSCs had disproportionately affected the results through the exclusion of neuron-speci c genes. Repeating the analysis using D6, D15 and D21 identi ed 28 genes that were upregulated and 27 genes that were downregulated at all four timepoints, including all genes previously identi ed ( Table 2, Fig. 5, Group A). As expected, excluding iPSCs resulted in the identi cation of a broader range of genes because genes that are differentially expressed only in the neuronal lineage were previously excluded due to the absence of their expression in iPSCs and the requirement that DEGs have to be dysregulated at all timepoints. However, both sets are equally valuable, as genes dysregulated even in iPSCs are more likely to be genes that participate in systemic PD pathology, regardless of cell type, and may be relevant to a broader spectrum of PD pathology than the death of mDA neurons. Interestingly, most of these genes are already linked to PD, other PD mutations, or neurodegeneration ( Table 2).
For an alternative de nition of differentially expressed genes (DEGs), we used the maximum adjusted pvalue in a pairwise combinations as adjusted p-value, and the average fold change that occurred in the pairwise comparison as fold change threshold. With this approach we retained only genes dysregulated in the same direction at all timepoints. This analysis led to 151 DEGs, including the previously identi ed genes of Group A, of which 65 were upregulated and 86 downregulated compared with controls (p adj < 0.01 and FC > 0.1) (Group B, Supplementary Table 4). Taking the mean of FC of the different time points enhanced the identi cation of DEGs because it reduced the effect of the variability between pairs due to their different differentiation states. Repeating the same analysis for timepoints iPSCs, D6, D15 and D21, but taking into account only the absolute degree of change in iPSCs, yields 172 genes (Group C, Supplementary Table 5). Repeating the analysis using only timepoints D6, D15 and D21 identi ed a total of 285 DEGs (Group D) (Supplementary Table 6 and Supplementary Fig. 5). Together, when all analyses were pooled, we obtained 291 DEGs (6 genes in Group C depended on the inclusion of iPSCs and did not appear in Group D, see Supplementary Table 6).

Data integration reveals a common PD network.
To integrate the expression analysis and identify underlying disease mechanisms, we generated a network of interactions between the DEGs via Gephi 42 , using protein-protein interaction information obtained from the STRING and GeneMANIA databases 43,44 . The network we obtained includes 246 of the 291 DEGs, since pseudogenes and non-coding RNAs could not be integrated into a protein-protein intearction network (Supplementary Table 7), and 2122 interactions (Fig. 6, Supplementary Fig. 5c). The curated network only considers DEGs and any genes automatically added by the databases were excluded to ensure a reliable core network based solely on data. Based on known protein-protein interactions, the DEGs integrate into a close-knit core network in which several DEGs form central nodes (Fig. 6a, Supplementary Fig. 5). To evaluate the signi cance of the DEG-based PPI (protein-protein interaction) network produced by STIRNGdb (v10), we compared the DEG-based network with corresponding random networks generated from sets of 292 randomly chosen genes excluding DEGs. Based on 50 random networks, we show that the DEG-based network includes signi cantly more proteincoding genes and interactions than by chance (Fig. 6b) and that the network structure in term of degree distribution is signi cantly distinct as evaluated by Wilcoxon test (p = 2.22e-16) and indicates the mechanistic character of the network (Fig. 6c).
The network of genes dysregulated by the presence of the PINK1 mutation includes genes related to other PD-associated pathways, which is intriguing, since it was generally assumed that each PD-associated mutation leads to PD pathology via an independent, characteristic path. For example, two DEGs, GOPC and GPC3 45,46 , interact with the PD-associated gene DJ-1 (PARK7) 2,47 . The DEG network also includes genes of the LRRK2 (PARK8) network 2,47 , namely ENAH, HSPA8, MYL6, MALAT1, and SNHG5 (Table 2).
SNHG5 and MALAT1 interact with LRRK2 via miR-205-5p 44,45 . DLK1 and MALAT1 mediate a-synuclein accumulation 48,49 . In fact, the DLK1-NURR1 interaction involved in this process may be mDA neuronspeci c 50 , highlightihe necessity to use mDA neurons for the study of PD-related pathways. Additionally, MALAT1 was shown to increase a-synuclein protein expression 51 . In short, this suggests that interactions leading to PD pathology are more complex than one mutation leading via one path to PD, as generally thought, but it also indicates that there are likely many druggable targets that may be useful in treating PD, and that these may be universally effective for PD caused by several different mutations, and perhaps even for idiopathic PD. For example, terazosin, which is already in clinical use, was found to be associated with slower disease progression, likely by enhancing the activity of phosphoglycerate kinase 1 (PGK1) 52 , one of the top DEGs identi ed in our study.
For the evaluation of the relative importance of each node within the network, we applied betweenness centrality 42 ( Supplementary Fig. 5a-c). The major nodes of these networks are formed by genes that were already shown to play an important role in PD pathology (Table 3). Next, we built a correlation network (pvalue < 0.05, r > 0.1) of the 246 DEGs based on the normalized counts (Fig. 6e). By extracting the common interactions of these two networks, we obtained a network with 297 interactions (Fig. 6d, e and Supplementary Table 7), which highlights protein-protein connections that correlate with differential expression of the genes. This analysis further supports the role of the connections between these genes in mediating the resulting differential expression in the presence of the PINK1 mutation. STRING was subsequently used to highlight functional pathways represented within the DEG network (Suppplementary Fig. 6 and Supplementary Table 8). Several pathways known to play a role in PD pathology are signi cantly represented within the network, notably ubiquitination 12,53 , mitochondrial pathways 6,54 , cellular response to stress 55 , lysosomal proteins 56 , protein metabolism (localization, modi cation, transport, folding and stability), RNA processing 57 , aromatic compound metabolism 58-61 , vesicle mediated transport and exocytosis 62 , and cellular catabolic processes 55,56 . Interestingly, the DEGs include genes of the KEGG-PD 43 pathway and the CHCHD2 gene, which was recently identi ed as a PDassociated gene and named PARK22 47,63,64 .
To investigate further how the identi ed network relates to other known PD mechanisms, PD-associated genes, also known as PARK genes (Supplementary Table 9, Fig. 7), were added to the DEG network. Next, PARK-PARK interactions were removed and only PARK-DEG interactions were retained to test how PARK genes integrate into the network. All 19 protein-coding PARK genes 2,47 interact directly with at least one, but usually several DEGs ( Supplementary Fig. 7). The degree of interaction of PARK genes with the DEGs of the network is illustrated by coloring (in pink) DEGs that directly interact with a PARK gene. The darker the color, the greater the number of PARK genes the DEG interacts with. The central nodes of the network generally interact with several PARK genes, suggesting that they play a central role in linking the PARK genes to the network, but also that PARK genes may mediate PD pathology through a few central pathways of this network, and that the effects of different PARK genes converge on the same set of pathways.
Further analysis revealed that a large number of the DEGs interact with genes associated with mitochondria or ubiquitination. For this analysis, we used BioGRID 44,65 to identify interactions with mitochondrial or ubiquitination proteins for the top 172 DEGs of groups A-C (Fig. 6f). These interactions were used to create a network illustrating that many of the DEGs in our study directly interact with genes involved in mitochondrial function and in ubiquitination. Thereby, only direct DEG -mitochondrial gene or DEG -ubiquitination gene interactions were included and PARK genes were added for reference ( Supplementary Fig. 8).
Based on manual literature search, we determined that at least 68% of the DEGs (174 of 255 genes, not including pseudogenes and RNA genes) are already directly associated with PD, either experimentally, or as signi cant links in GWAS-PD, or by PD expression studies (Fig.7, Supplementary Table 10). This is particularly true for the major nodes of the network (Supplementary Fig. 9).

Discussion
The aim of this study was to identify genes that were differentially expressed as a result of a mutation in the PINK1 gene, using mDA neurons differentiated from patient-derived iPSCs, a model relevant to PD. We focused on cells undergoing neural differentiation, as these are not expected to display the activation of damage control pathways induced by neurotoxicity, but are likely to identify pathways that lead to primary pathology of PD. Gene expression alterations in aging neurons as a result of the modi cation identi ed here will be the next important step to investigate the manifestation of neurodegeneration.
The single-cell expression data were analyzed in several layers. First, we identi ed the most strongly DEGs (group A), consistently altered in the same direction at all four timepoints including iPSCs (Fig.6, Table 2, Supplementary Fig. 5). By choosing four different timepoints along the differentiation path and selecting only genes whose differential expression was consistent at all timepoints, we excluded pathways associated with mDA differentiation, as these were expected to change signi cantlyuring the different steps of the differentiation process. In our rst analysis, we included iPSCs, in order to identify strongly DEGs due to the presence of the PINK1 mutation, regardless of cell type ( Table 2, Group A "incl. iPSCs"). Our reasoning was that since iPSCs differ signi cantly from neural precursors and are not expected to express neuronal pathways, their inclusion would bias against the selection of neuronal pathways. A corresponding approach in which iPSCs were excluded resulted in an expanded gene list that included genes more likely dysregulated speci cally in a neural cell type (  Fig. 5).
Creating a protein-protein interaction network based on these groups of DEGs demonstrated that genes in Group Dlso formed important nodes within the interaction network and were frequently associated with PD ( Supplementary. Fig. 5). The resulting network illustrates that in spite of the very different nature of PD-associated genes, there is interconnectedness among the molecular pathways through which they mediate PD pathology. It also suggests that any one mutation leads to pathology via several molecular paths. This underscores the role the genetic background plays in PD penetrance and severity, as alleles of several network genes may reduce or amplify the effect of a mutation 7,17 .
The complexity of genetic interactions in PD is well illustrated by the interaction of PINK1 and Parkin.
PINK1 is known to interact with Parkin directly, however, our data indicates that the presence of the PINK1 mutation results in the dysregulation of several other genes that are possibly upstream of Parkin 66 , including HNRNPC 67 , MTRNR2L1 68 , MYL12A and SLC25A4 69 , as well as LMAN1, a membrane mannosebinding lectin, which was shown to play a role in Parkin translocation 70 . This suggests that the direct interaction between PINK1 and Parkin is not the only means by which PINK1 interacts with the Parkin pathway.
Analysis of the networks shows that certain DEGs are a point of convergence between pathways and form major nodes. These DEGs seem to play a central role within the network, speci cally CUL3, HSPA8, EEF1A1, UQCRFS1, CNTNAR2, PSMA4, HNRNPC, and PLCB4, but also EGLN3, IPO5, IPO7, PALLD, PGD, RALGPS2, CYCS, SHH, BRCA2 and others ( Supplementary Fig. 5). In fact, these genes play key roles in PD pathology (Table 3, Supplementary Fig. 9). Hence, the network derived from our analysis of a PINK1 mutation is revealing the convergence of many known, key PD-associated pathways. This convergence suggests that different mutations may feed into the same PD pathology-associated routes. These central pathways include several genes (listed in Table 3), such as CUL3, which has been linked to PD by GWAS studies and is considered a potential PD drug target 71 . HSPA8 (also known as HSP73 and HSC70), which disaggregates α-synuclein amyloid brils and plays a role in autophagy and the catabolic pathway for αsynuclein, mediates mitophagy by regulating the stability of PINK1, and its expression was shown to be impaired in sporadic PD [72][73][74][75] . It is also one of the most highly dysregulated genes in our dataset.
EEF1A1 mediates activation of heat-shock transcription factor HSF1, a key player in PD 76 , and prevent asynuclein aggregation, as well as interacts with Parkin (PARK2) and HTRA2 (PARK13) 65,77 . UQCRFS1 is a mitochondrial electron transport chain ubiquinol-cytochrome c reductase 78 , a member of the KEGG-PD pathway (Entry K00411 79,80 ), and has been identi ed as a PD risk gene 81 . CNTNAP2, which belongs to the neurexin superfamily, plays a role in triggering protein aggregates 82,83 , was found to be signi cantly differentially expressed in the blood of PD patients with LRRK2 mutation 84 , and was also associated with PD by GWAS 46 . PSMA4, a proteasome subunit, is part of the KEGG-PD pathway (hsa05012, bta05012, K02728) 79,80 and is a member of the ubiquitin-proteasomal pathway, which plays a key role in Parkinson's disease 85 . It also interacts with Parkin (PARK2) and FBXO7 (PARK15) 65 . HNRNPC interacts with both PARK2 and members of the Poly (ADP-ribose)-dependent cell death pathway implicated in PD 67 . PLCB4 has been linked to PD 46 and knock-out mice show motor defects consistent with ataxia 86 .
These central nodes interact with several PARK genes (Supplementary Table 9), so it is possible that PDassociated mutations converge on the same main pathways, which may play a central role in PD pathology.
Currently, mutations in 23 genes or loci are linked to PD. In addition to their o cial names, they are named PARK1-PARK23 (Supplement Table 9). Of these, 19 are protein-coding genes. The DEGs we have identi ed in this study directly interact with all 19 protein-coding PARK genes (Supplementary Fig. 7). In addition, the CHCHD2 gene (PARK22) was identi ed as one of the DEGs. This further supports the hypothesis that perturbations due to the mutations in PARK genes may converge on the same network, which may then be responsible for the PD phenotype. This would explain why mutations in so many genes lead to a common or at least similar PD pathology 87 .
It will be of great interest to see if cells from idiopathic patients show dysregulation of this integrated network. In fact, our analysis has identi ed genes which had no known connection to molecular mechanisms underlying PD pathology, though some of themare known to be involved in sporadic PD. Knowing how they integrate into the network may point to the underlying mechanism of how they cause PD pathology. For example, one of the top DEGs is LGI1 88 . The development of antibodies to this protein leads to immunomodulated parkinsonism, yet there is no known mechanism linking it to PD pathology 88 .
In the network LGI1 has several nearest DEG neighbors it interacts withSupplementary Fig. 10). Its most signi cant interaction is its co-expression with CNTNAP2, which is part of the neurexin family and is required for axon organization, and MGMT, which repairs the methylated nucleobase in DNA 43 . From GeneMANIA alone, the strongest evidence is for interaction with GOLT1B, which plays a role in Golgi transport 89 . Hence, LGI1-associated pathology leading to PD symptoms may be mediated through pathways which are also dysregulated by the presence of the PINK1 mutation. CNTNAP2 is another very good candidate, as it was shown to be dysregulated in PD patients carrying a mutation in the LRRK2 gene, providing additional evidence that it likely plays a role in PD pathology 84 .
Another DEG in our dataset is CHCHD2, the PARK22 gene. Mutations in this gene are linked with autosomal dominant PD, although the precise mechanism is unknown 90 . One hypothesis was that CHCHD2 colocalizes with the mitochondrial contact site and cristae organizing system (MICOS) 90 . However, in the DEG network, CHCHD2 has strong interactions with at least three other DEGs, SLC25A4/ANT1 (STRING 43 ), GHITM (STRING and GeneMANIA), and NME4 (GeneMANIA 44 ). Evidence suggests that GHITM plays a role in PINK1-mediated neurodegeneration 91 and NME4 was shown to be downregulated in PD 58 . SLC25A4 (also known as ANT1) plays an essential role in mitophagy and has been linked to PD pathology 69,92 . Hence, the interaction of CHCHD2 with SLC25A4 (ANT1), GHITM, and NME4 may be relevant in mediating pathological changes in CHCHD2-associated PD.
It is of note that pathways known to play a key role in PD are profoundly and consistently dysregulated at all time points examined, far before the onset of PD pathology. For example, CHCHD2 is part of the purine metabolic pathway that produces DNA, RNA, nucleosides and nucleotides and has been shown to be altered in PD [58][59][60][61] . The DEG network illustrates that the expression of a large number of interconnected aromatic compound metabolic pathway genes is altered in cells carrying the PINK1 mutation ( Supplementary Fig.6b). In total, 39 genes of the nitrogen compound metabolic process (Ncmp) and 88 genes speci c to the aromatic compound metabolic process (Acmp, a subgroup of the Ncmp) are differentially expressed in our dataset (Supplementary Table 8). Many of the DEGs identi ed in our study are part of more than one pathway and, therefore, interconnect the various pathways known to play a role in PD, including stress and catabolic processes 55,56 , aromatic compound metabolism 58 , vesiclemediated transport and exocytosis 62 , RNA metabolism 57 , protein transport, localization, folding, stability, and ubiquitination 53 (Supplementary Fig. 6 and Table 8). This con rms observations that PD pathology involves many different pathways 93 .
Additionally, when a manual literature search was performed, 68% of the DEGs were found to be associated with PD (Fig. 7, Supplementary Table 10), which is particularly true for the central nodes of the network ( Table 3, Supplementary Fig. 9). This indicates that these genes may be key points of integration of the effects of PD pathology, an idea further substantiated by the convergence of the added PARK genes on these nodes ( Supplementary Fig. 7). Furthermore, these nodes form a link between different functional pathways. In particular, this is true for CUL3, HSPA8 and PSMA4 (Supplementary Fig. 6 and Table 8).
We have also analyzed the correlation of expression between various gene pairs. This correlation may indicate that they are a target of the same regulatory pathway or otherwise related. In our dataset, the expression of several interaction partners shows high correlation, namely PLCB4-RALGPS-TTC3-ZNF37A, EIF3B-HSPA8 (major DEG network node, ubiquitination pathway)-PCBP1 (ubiquitination pathway).
Another cluster centers on the MT-CYB gene and involves both mitochondrial and ubiquitination pathways by NME1-MT-CYB -MT-ND5 -MT-CO3 -MRPS21 interactions. Among the top pairs are also PSMD7-PSMB5, TAGLN-MYL9 and VWA5A-ZCCHC11 (Figure 6d). The interactions of these genes may, therefore, play a key role in PINK1-mediated PD pathology.
The fact that so many genes which belong to other PD mutation-related pathways were dysregulated by the presence of the PINK1 mutation suggests that pathways involved in PD pathology are far more interconnected than previously thought. It is likely that PD pathology is more a disease with a characteristic network ngerprint, than a disease caused by independent defects (mutations) (Fig. 6).
This and future studies will, hopefully, provide a picture of how various mutations feed into this network and cause its dysregulation. If idiopathic PD is shown to also be based on dysregulation of this network, then we may nally be able to understand the cause of idiopathic PD, which represents 80-85% of all PD cases 2 .

Methods
Generation of iPSC cell lines Fibroblasts (cat. No. ND40066) isolated from a 64-year-old male with PD symptom onset at 33 years of age who carried a homozygous mutation for the ILE368ASN (P.I368N/P.I368N) mutation in the PINK1 gene were obtained from the Coriell Institute (Cat. No. ND40066). Samples were collected in accordance with the US Government guidelines and are subject to an MTA issued by Coriell Institute for Medical Research NINDS Cell Repository. Fibroblasts were cultured on gelatin-coated plates (10% gelatin in PBS, coated for 10 min at room temperature) in KO DMEM +10% FBS + 1% penicillin/streptomycin (stock was 10,000 units penicillin and 10 mg streptomycin/mL) at standard culture conditions (37°C, 5% CO2).
Live adherent broblasts in culture media were sent to be karyotyped by Cell Line Genetics, Madison, WI, USA (Supplementary Fig. 1)  Analysis of iPSC status and trilineage potential by TaqMan iPSC Scorecard Assay.
We also performed a TaqMan iPSC Scorecard Assay, which con rmed their iPSC status and their trilineage potential (Fig. 2b). The protocol was followed as described by the manufacturer of the TaqMan hPSC Scorecard Assay (ThermoFisher Scienti c).

Immunocytochemistry
Adherent colonies were xed in 4% PFA for 10 min, washed and permeabilized with 0.1% Triton X-100 in 1X PBS for 15 min, then washed and incubated in a blocking solution of 2% BSA in 1X PBS for 1 hour.

Differentation of iPSCs into mDA neurons
The protocol used to differentiate iPSCs into mDA neurons was modi ed from 16,94 (Table 1). The iPSCs were grown to 95% con uence, dissociated using accutase, and 1.5 wells were combined into one well at day -1. They were allowed to recover in the presence of ROCK inhibitor for about 8 hours and then in mTeSR without ROCK inhibitor for about 16 hours. After this, day 0 media were applied (Table 1). Cells were fed fresh media daily, 36 ml per 6-well plate or as needed, judging consumption from media colour and replacing media whenever it turned yellow, using the appropriate media and factor mix for that day.
Real-time quantitative PCR of mDA and non-mDA markers.
Total RNA was extracted from a cell pellet of a 12-well plate well using the RNeasy Plus Universal Kit Mini (2×) (cat. #K0223). This was followed by a dissociation curve to con rm that only one PCR product was present. The values were standardized per expression of the housekeeping gene GAPDH.

Statistics
In real-time qPCR graphs, each timepoint consists of three independently differentiated samples of the ND40066 (PINK1) cell line (n=3) and the control (n=3). Each of the samples was ampli ed in triplicate. Each sample value was an average of the experimental triplicate. Standard error was calculated using GraphPad, based on n=3 independently differentiated samples for each cell line at each timepoint.

Single-cell RNA seq
On the day of collection, cells were dissociated using accutase. The single-cell suspension was spun down and cells were washed with (PBS, 2%BSA) twice, then passed through a 40mm lter to remove larger cell clumps. The sample was then counted and viability was determined using (Vi-CELL XR, Cell Counter, Beckman Coulter). Cells were required to have at least a 90% viability. Most samples showed 95% viability. They were then diluted in PBS with 2%BSA to a nal concentration of 190,000 cells/ml. 3ml were then used for single-cell analysis. Subsequently, cells were processed by the DropSeq approach as described previously 27,96,97 and sequenced.

NGS preparation for Drop-seq libraries
The 3' end-enriched cDNA libraries were prepared by tagmentation reaction of 600 pg cDNA library using the standard Nextera XT tagmentation kit (Illumina). Reactions were performed according to the manufacturer's instructions. The PCR ampli cation cycling programme used was 95°C 30 s, and twelve cycles of 95°C (10 s), 55°C (30 s) and 72°C (30 s), followed by a nal extension step of 72°C (5 min).
Libraries were puri ed twice to reduce primers and short DNA fragments with 0.6× and 1× Agencourt AMPure XP beads (Beckman Coulter), respectively, in accordance with the manufacturer's protocol. Finally, puri ed libraries were eluted in 10 μl Molecular Grade Water. Quality and quantity of the tagmented cDNA library were evaluated using Bioanalyzer High Sensitivity DNA Chip. The average size of the tagmented libraries prior to sequencing was between 400 and 700 bps.
Puri ed Drop-seq cDNA libraries were sequenced using Illumina NextSeq 500 with the recommended sequencing protocol except for 6pM of custom primer (GCCTGTCCGCGGAAGCAGTGGTATCAACG CAGAGTAC) applied for priming of read 1. Paired-end sequencing of 20 bases (covering the 1-12 bases of random cell barcode and the remaining 13-20 bases of random unique molecular identi er (UMI)) was performed for read 1, and of 50 bases of the genes for read 2.
Bioinformatics processing and data analysis The FASTQ les were assembled from the raw BCL les using Illumina's bcl2fastq converter and run through the FASTQC codes (Babraham bioinformatics; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to check for consistency in library qualities. The monitored quality assessment parameters monitored were (i) per base sequence quality (especially for the read 2 of the gene), (ii) per base N content, (iii) per base sequence content, and (iv) over-represented sequences. The FASTQ les were then merged and converted into binaries using PICARD's FastqToSam algorithm. The sequencing reads were converted into a digital gene expression matrix using the Drop-seq bioinformatics pipeline 27 .
Single-cell RNA-seq data analysis The identi cation of low quality cells was done separately for each data set. In order to select only the highest quality data, we sorted the cells by their cumulative gene expression. Only cells with the highest cumulative expression were considered for the analysis 98 .
In addition to this ltering we de ned cells as low-quality based on three criteria for each cell. The number of expressed genes must be more than 200 and 2 median-absolute-deviations (MADs) above the median; the total number of counts has to be 2 MADs above or below the median, and the percentage of counts to mitochondrial genes has to be 1.5 MADs above the median. Cells failing at least one criteria were considered as low quality cells and ltered out from further analysis. Similar to the cell ltering, we ltered out low quality genes, identi ed by being expressed in less than 10 cells in the data.
The integration of the ltered matrices of the different datasets was performed using scTransform 99  was computed using the 5000 most variable genes of the integrated data. The clustering of data was performed using Louvain clustering. The resolution of the clustering was selected based on the best silhouette score of the different resolutions 101 . A short list of manually curated markers was used to validate the different stages of the differentiation process.
We then performed differential expression analysis between the two treatments (control and PINK1) at each time point. The differential expression analysis was done using MAST 100 (default parameters) on the normalized counts using the total number of transcripts in each cell as a covariate and the Bonferroni correction to correct for multiple hypothesis testing (Padj). In addition, we tried to nd conserved markers among the different time points using MAST again and the total number of transcripts in each cell as a latent variable. Genes with fold changes of the same sign in the fold change were then identi ed across the different time points and the average fold change was calculated. The genes with average fold change > 0.1 and maximum adjusted p-value < 0.01 were selected as differentially expressed.

Network analysis
We extracted protein-protein interaction information between the DEGs from STRING 43 and from GeneMANIA 44 . We excluded indirect association, such as text mining, co-occurrence and neighborhood from STRING, and coexpression, colocalization, shared protein domains and predicted interactions from GeneMANIA, retaining only genetic interactions, pathways and physical interactions (2122 interactions in total). We deletes any any genes or interactions that were added by these databases, to focus only on DEGs and interactions among them. The network diameter was calculated and betweenness centrality was used to illustrate the relative importance of each node within the network. As a control, we selected the same number of genes at random, using the list of genes detected by our RNA-seq analysis, excluding DEGs. This control set did not produce a network and led to a mostly disconnected array of genes ( Supplementary Fig. 11). Networks were also generated using the STRING and GeneMANIA inputs independently ( Supplementary Fig. 5 a, b). We constructed a correlation network based on the correlation of expression of DEGs (p-value < 0.05, correlation > 0.1) and identi ed edges that are common to the two networks. This network consisted of 860 interactions (Fig. 6). We then extracted shared interactions of these two networks, which amounted to 297 interactions (Fig. 6).
In order to validate the PPI network produced by STIRNGdb (v10), we created 50 PPI (protein protein interaction) networks using 292 random genes (same as the number of DEGs). We then compared the number of detected proteins, the number of interactions between the genes and the distribution of the node degrees. We performed the Wilcoxon test to access if the two-degree distributions are different from one another in a statistically signi cant manner, which showed a statistically signi cant difference (p=2.22e-16) (Fig.5b,c.).
Data availability. Single cell RNAseq data will be uploaded to GEO upon acceptance of the manuscript.  Table 3. Central nodes of the DEG network are associated with PD ( Supplementary Fig. 9). Central nodes were determined using the Gephi visualization platform. They represent points of convergence of the network ( Supplementary Fig. 5). Since these nodes have already been linked to PD pathways, many more DEGs might also contribute to PD pathology through these pathways. These nodes not only provide a point of convergence for DEGs identified in our study, but they also interact with several PARK genes, suggesting that PARK proteins may also converge on the pathways identified here (Supplementary Fig. 7).  Figure 1 Experimental design. Fibroblasts were used to generate human induced pluripotent stem cells (iPSCs), which were then used to generate mDA neurons. Differentiation was initiated at three different times to obtain cells at different stages of differentiation that could all be collected and analyzed by single-cell RNA-seq on the same day, thus avoiding batch effects. "P+1" indicates that the iPSCs were passaged before new differentiation was initiated. Since D10 was not used in pairwise analysis, we indicated "P+2" between D15 and D6 differentiation initiation Generation and classi cation of iPS cell lines. a. Immunocytochemistry. Staining for the iPSC markers Oct3/4 and TRA-180. DAPI was used to stain cell nuclei as a reference. b. Results of Scorecard analysis of iPSCs and embryonic bodies (EBs). iPSCs are expected to show high expression of self-renewal genes (Self-renew +) and low expression of mesoderm, ectoderm and endoderm markers (Ecto -, Meso-, Endo -).
EBs include cells at an early stage of spontaneous differentiation. Scorecard analysis of EBs determines the iPSC line's potential to differentiate into the three germ layers: ectoderm, mesoderm, and endoderm. EBs are expected to express few or no self-renewal genes (Self-renew -) and to show expression of some mesoderm, ectoderm and endoderm markers: Ecto +/-, Meso +/-, Endo +/-.  Table 1). b. Expression of key iPSC markers and transition to the expression of genes belonging to pathways characteristic of early differentiation. c. Expression of OTX2 was con rmed by absolute quantitative real-time PCR (qPCR) (bars represent SE, n=3 independently differentiated samples, qPCR of each sample was done in triplicate). EN1, an important marker of mDA differentiation, was measured by qPCR (n=3, same qPCR procedure) in bulk samples, as its expression is too low to be detected by single cell RNA seq. Primers are listed in Supp. Table 3.