Dynamic landscape of chromatin accessibility and transcriptomic changes during differentiation of human embryonic stem cells into dopaminergic neurons

Chromatin architecture influences transcription by modulating the physical access of regulatory factors to DNA, playing fundamental roles in cell identity. Studies on dopaminergic differentiation have identified coding genes, but the relationship with non-coding genes or chromatin accessibility remains elusive. Using RNA-Seq and ATAC-Seq we profiled differentially expressed transcripts and open chromatin regions during early dopaminergic neuron differentiation. Hierarchical clustering of differentially expressed genes, resulted in 6 groups with unique characteristics. Surprisingly, the abundance of long non-coding RNAs (lncRNAs) was high in the most downregulated transcripts, and depicted positive correlations with target mRNAs. We observed that open chromatin regions decrease upon differentiation. Enrichment analyses of accessibility depict an association between open chromatin regions and specific functional pathways and gene-sets. A bioinformatic search for motifs allowed us to identify transcription factors and structural nuclear proteins that potentially regulate dopaminergic differentiation. Interestingly, we also found changes in protein and mRNA abundance of the CCCTC-binding factor, CTCF, which participates in genome organization and gene expression. Furthermore, assays demonstrated co-localization of CTCF with Polycomb-repressed chromatin marked by H3K27me3 in pluripotent cells, progressively decreasing in neural precursor cells and differentiated neurons. Our work provides a unique resource of transcription factors and regulatory elements, potentially involved in the acquisition of human dopaminergic neuron cell identity.

In eukaryotic cells, DNA is bound to histones forming chromatin, whose structure is dynamic and suffers reversible chemical changes, mainly DNA methylation and histone post-translational modifications. These epigenetic modifications have gained increasing interest since they play fundamental roles in modulating gene expression throughout development, differentiation, and in response to environmental cues 1 . Chromatin modifications alter its packing level, resulting in mainly two distinct environments: 'open' accessible regions, also referred to as euchromatin, and 'closed' compact regions or heterochromatin. These regions are enriched or depleted in specific histone modifications. Euchromatin is permissive for gene activation, whereas genes in heterochromatin are mainly silenced. Interestingly, regions bound by certain proteins such as CTCF, have a role in maintaining the boundary between heterochromatin and euchromatin 2 . Studies have revealed the role of CTCF in regulating chromatin loops' formation and, hence, in controlling three-dimensional nuclear landscape to regulate gene expression [3][4][5] .

Results
Human ES cells efficiently differentiate into mDA. We differentiated pluripotent human ESC H9-GFP 18 expressing OCT4 and SOX2 into mDA (Fig. 1a), using a reported floor-plate protocol 12 . NES-TIN + neural precursors (NPC) were present at D14 (Fig. 1b). Neurons identified by the BETA-3 TUBULIN (TUJ1) antibody, that co-express TH (Fig. 1c) and FOXA2 (Fig. 1d) are present at D28. It is important to note that at D14, TH is not expressed; this enzyme is detected by immunocytochemistry in 15% of neurons at D24, reaching 50% of TH + neurons at D28 and this percentage increases to 80% at D42 13 . Immunoblot experiments in pluripotent cells and NPC show that they do not express TH, and this enzyme shows increasing levels at D21 and D28 (Fig. 1e). To further characterize the transition from pluripotency to neurons, we performed immunoblots for OCT4, SOX2, and TUJ1. The pluripotency-related transcription factor OCT4 was present only at D0. SOX2 was present in pluripotent cells and was down-regulated at D14, appearing again at D28, consistent with its role in neural differentiation. On the other hand, TUJ1 increased at D14 and D28 (Fig. S1). Thus, the times selected for this study allow a clear distinction between pluripotent cells, neural precursors, and early mDA. Extended periods of differentiation yield mDA that fire action potentials and release DA in vitro at D60 and D70, respectively 13 . RNA-Seq reveals transcriptional profiles associated with mDA differentiation. Dopaminergic differentiation can be monitored by mRNA expression: H9 cells at D0 express high levels of OCT4 and SOX2, but not LMX1A, FOXA2 or TH, by RT-qPCR; at D14, LMX1A and FOXA2 were detected, and its expression progressively increased. Transcripts for TH were initially detected at D21 and significantly increased afterwards 13 . Here we performed RNA-Seq at D0, D14 and D28 to determine transcriptional profiles underlying early DA induction. The number and category of DEGs found in each comparison are depicted in Fig. 2a,b. A list of 4009 DEGs (3377 protein-coding genes and 632 lncRNAs) was compiled from all comparisons and used for downstream analysis. Gene expression values (FPKM, normalized counts, and batch effect adjusted counts) and statistics of differential expression are included in Supplementary Tables S1 and S2 respectively.
To identify transcriptional patterns, we performed hierarchical clustering of DEGs, resulting in 6 clusters with distinct temporal profiles (Fig. 2c, Supplementary Table S3). Cluster 1 (825 coding genes and 220 lncRNAs) displays downregulation in D14 and D28. Top enriched gene-sets are related to pluripotency and proliferation, consistent with pluripotency loss as cells differentiate. Cluster 2 (309 coding genes and 13 lncRNAs) exhibits significant downregulation in D14, with upregulation at D28; mainly cell cycle gene-sets are enriched. Cluster 3 (410 coding genes and 47 lncRNAs) displays high expression at D0 and D14 and downregulation in D28, with enriched functions related to the regulation of proliferation and cell adhesion. Cluster 4 (581 protein-coding genes and 150 lncRNAs) is upregulated in D14 and decreases its expression in D28. Enriched gene-sets are related to head and hindbrain development, pattern specification, and regionalization. Cluster 5 (163 coding genes and 33 lncRNAs) depicts an upregulation at D14 with a downregulation at D28. Enriched gene-sets include extracellular structure organization. Cluster 6 (1090 coding genes and 168 lncRNAs) displays low levels at D0 and D14 and becomes upregulated at D28. Enriched gene-sets are mainly related to neurogenesis, synapse, axon, and DA neuron differentiation. A heatmap of selected DEGs related to pluripotency and neural DA differentiation is presented in Fig. S3. This analysis confirms that neural lineages are clearly observable from upregulated genes at D14 and maintained at D28 (Cluster 4) or upregulated at D28 (Cluster 6).   (Fig. 2b). Among the top 10 DEGs in neural precursors (D0 vs. D14), 6 are lncRNAs, and only 4 are protein-coding genes (Fig. 3a). Interestingly, the number of lncRNAs within the top (10, 20, 30, 40, and 50) ranked genes is statistically more significant than the number of proteincoding genes (Fig. 3a, Supplementary Table S4). A similar pattern is observed when comparing the number of lncRNAs and protein-coding genes between D0 vs. D28. This proportion changed when analyzing the top DEGs between D14 and D28, with a higher significance in the number of protein-coding genes. Almost all lncRNAs within the top 50 DEGs in the comparisons D0 vs. D14 and D0 vs. D28 are downregulated. These results suggest that lncRNAs participate in pluripotency and decrease upon neural differentiation. We categorized DE lncRNAs according to GENCODE annotation which considers their location relative to protein-coding genes. Using a hypergeometric test, we found that a significant number of DE lncRNAs are classified as processed transcripts and long intergenic non-coding RNA (lincRNA). The high number of lincRNAs suggests potential regulatory functions in trans (Fig. 3b). Genes are classified as processed transcripts when their transcripts lack an open reading frame and due to complexity in their structure, they cannot be placed in any other category 19 . Interestingly, the number of DE processed transcripts is significant in all pairwise comparisons. Given the predominance of top DE lncRNAs we searched among all pairwise comparisons and found 96 DE lncRNAs (majority antisense) with known regulatory functions. We found that 56% of DE lncRNAs (54 out of 96) had significant gene expression correlation (Pearson correlation p-value < 0.05, Supplementary Table S5) with their target protein-coding genes in samples from all time points. From significant correlations, 91% (49 out of 54) were positive and only 9% (5 out of 54) were negative, suggesting that DE lncRNAs may regulate target protein-coding genes in cis and in concordant direction (up-or downregulated). For visualization purposes, we contrasted FCs of DE lncRNAs with their related protein-coding genes (Fig. 3c). Examples of positively correlated genes (either down or upregulated) include SOX1:SOX1-OT, EPHA1: EPHA1-AS1, FOXD3:FOXD3-AS1 and DIO3:DIO3OS. However, a few pairs, like CACNA1C:CACNA1CA-AS2, showed a negative correlation.

Stages of DA induction are characterized by differentially accessible chromatin regions.
We used ATAC-Seq to identify genomic chromatin accessibility depicting active (open) and inactive (closed) chromatin during DA induction. We identified OCRs in samples from D0, D14, and D28 collected simultaneously from the same experiments used for RNA-Seq. Nucleosome packing and an apparent periodicity of approximately 200 bp in insert size distribution plots were evident (Fig. 4a), consistent with previous results 6 . The majority of reads were located in regions less than 100 bp, indicating OCR. Mononucleosome, dinucleosome, and trinucleosome patterns were also observed.
From ATAC-Seq, we identified a total of 464,783 consensus peaks by merging results from all sample timepoints. By comparing peak region locations, we found categories of peaks that were either specific to, or shared between time-points. Overall, 103,989 (22%) of the consensus peaks were specific to D0, 150,787 (32%) were D14-specific, and 89,235 (19%) were D28-specific. Only 49,126 (11%) of consensus peaks were shared among all time-points (Fig. 4b). By mapping consensus peaks to genomic regions, we found a higher proportion of peaks in intronic and intergenic regions (Fig. 4c). Due to the fact that we have one sample per time-point, we compared our ATAC-Seq peaks with public data derived from H9 ESCs 20 to assess the quality of our peak regions. We found that 86% and 93% of ATAC-Seq peak regions in two replicates 20 overlapped with our D0-specific peak regions.
To analyze global accessibility, we used + /-5 kb of identified OCRs. Higher chromatin accessibility in D0, compared to D14 and D28 was observed (Fig. 4d), consistent with published results 21 . Additionally, we compared time-point specific OCRs with enhancer regions predicted by GeneHancer 22 whose database incorporated DNase and H3K27ac assays using H9 pluripotent stem cells. We found that 60% (62,250 out of 103,989) of D0 specific OCRs overlapped GeneHancer annotated regions. In contrast, 31% (46,571 out of 150,787) and 42% (37,700 out of 89,235) of D14 and D28 specific OCR regions respectively, overlapped GeneHancer regions.
To find pathways and gene-sets revealed by OCR specific to different time-points, we tested for statistical differences. We found 2548, 107, and 1236 peaks at D0, D14, and D28, respectively, which were mapped to promoters, exons, or introns to perform a gene-set enrichment analysis (FDR < 0.05). Significant gene-sets enriched in D0-specific peaks are diverse but related to negative regulation of differentiation, stem cell division, and endodermal/mesodermal/ectodermal differentiation (Fig. 4e). In contrast, D28 presented neuronal-related processes, such as neurogenesis, dendrite and axonal development, and synaptic function (Fig. 4f). The list of differentially enriched gene-sets for time-point specific OCRs is included in Supplementary Table S6.
Temporal patterns of chromatin accessibility are correlated to transcriptomic profiles. To correlate significant differential chromatin accessibility and differential gene expression, we selected consensus ATAC-Seq peaks mapped to promoters, exons, or introns whose p-value < 0.05 between D0 and D28. The results show a significant correlation between FC of filtered genes (cor = 0.44, p-value = 5.54e−51, Fig. 5a) consistent with previous work 23 .
Coverage plots were used to analyze chromatin accessibility in the promoter regions of DEGs for each cluster depicted in Fig. 2c. A predominant pattern of higher accessibility in D0 compared to D14 and D28 in all clusters was observed (Fig. S4a). Interestingly, we found temporal chromatin accessibility patterns that were similar to transcriptomic profiles. For example, genes in cluster 1 depict a downregulation from D0 to D14 and maintain their lower expression at D28. Genes in cluster 6 display an upregulation from D14 to D28, and a similar pattern is observed in chromatin accessibility. We also compared the chromatin accessibility in promoter regions of DE protein-coding genes and lncRNAs in different clusters. Results show that chromatin accessibility at protein-coding promoter regions is higher at the transcript start site (TSS), whereas slightly shifted at lncRNA www.nature.com/scientificreports/ TSS (Fig. S4b), potentially due to inaccuracies in lncRNA annotation 24 . In contrast, 500 random non-expressed genes lack signals at TSS (Fig. S4c).
Analyzing individual loci, we observed changes in both chromatin accessibility and expression levels of expected genes at D0 and D28. We complemented our analysis by using public histone modification and DNase I hypersensitivity datasets. Pluripotency genes like PRDM14 (Fig. 5b) and NANOG (not shown) showed a simultaneous significant downregulation of RNA expression and a decrease in OCRs at their promoters. Both loci were also marked by an enrichment of promoter histone mark H3K4me3, the activating histone modification www.nature.com/scientificreports/ H3K27ac, and DNase I hypersensitivity in H9 undifferentiated hESCs. We also identified a decrease in chromatin accessibility at an upstream region coincident with enhancer histone modification H3K4me1, histone activation modification H3K27ac, and DNase I hypersensitivity. Interestingly, H3K27ac is absent in the same location in hESC-generated NPC. This putative enhancer region is already annotated in GeneHancer database 22 ; our results suggest that this region is a pluripotency-related enhancer (Fig. 5b). In contrast, CORIN and SLIT2, floor-plate markers involved in dopaminergic differentiation 25

Differentially accessible open chromatin regions of D28 coincide with PD-related SNPs. Pre-
vious studies have demonstrated that certain disease-associated genetic variants are found within non-coding genomic regions 26,27 . Hence, we compared both consensus and time-point specific OCRs in our experiments with PD-associated single nucleotide polymorphisms (SNPs) derived from GWAS analysis 28 . Interestingly, significant overlaps were found between PD-related SNPs 28 and OCRs in consensus peaks and D0 specific peaks only in promoters. Whereas, OCRs specific to mDA (D28) depicted significant overlap with PD-related SNPs only in intronic and intergenic regions. The significancy of overlaps was determined using a hypergeometric statistical test (Supplementary Table S7). We also assessed the SNCA locus, known to be associated with PD-related SNPs 29 , for enrichment in OCR in mDA (D28). SNCA expression and accessibility in OCR overlapping TSS, presumably at the promoter, were invariant through differentiation. In contrast, in the same locus there was an increase in chromatin accessibility at intergenic and intronic OCR by D14 and a higher one at D28 (Fig. S5). Some of these OCR are overlapping with enhancer histone modifications in ESC-derived NPC and N, and importantly, in adult SN. Moreover, the OCR regions identified in D28 in SNCA locus are also annotated in GeneHancer database and appear to be enriched in PD SNPs.
DEGs are enriched in TF motifs. From the 3377 DE protein-coding genes, 348 (10.3%) are TFs. We searched for TF binding motifs in promoter regions (± 1 kb up-and downstream) of protein-coding and lncRNA DEGs. For motif search, we used FIMO (FDR < 0.1) 30 with position weight matrices obtained from ENCODE 31 . Resulting motifs were further filtered to include only those belonging to differentially expressed TFs. We found 65 TFs with significant binding motifs in promoter regions of DE genes, including 48 TFs in protein-coding genes and 49 TFs in lncRNAs. The list of TF motifs found and their metrics are included in Supplementary  Table S8. A total of 32 TFs had binding sites in both gene types, whereas motifs of 16 and 17 TFs were found exclusively in protein-coding genes and lncRNAs, respectively. Interestingly, POU5F1, a well-known pluripotency marker depicted binding sites only in protein-coding genes whereas PAX6, a TF involved in the development of neural tissues was specific to lncRNAs. Selected protein-coding and lncRNA-specific TFs with binding motifs are listed in Fig. 6a.
To compare the frequency of TF binding motifs in promoter regions of DE genes from the previously defined expression clusters, we counted the number of genes enriched by each TF binding motif per cluster. This process resulted in a matrix of TFs (rows) and clusters (columns) whose values represent the percentage of genes in each cluster which had a TF binding motif. After filtering out motifs of TFs which were not expressed (FPKM < 1) and TFs whose frequencies remained constant (median absolute deviation < 0.05) in all clusters, we obtained a list of 24 TFs (Fig. 6b, Supplementary Table S8). We observed that TFs such JUN, KLF4, E2F1, and CHD2 have highly occurring motifs in DEGs from all clusters, suggesting global transcriptional regulation. However, there are TFs such as ZBTB7A, EGR3 and NFKB1 with high motif enrichment in specific clusters. In another example, LMO2, and NANOG are TFs with fair enrichment in DEGs of one cluster only. Interestingly, NANOG is upregulated in D0 and depicts binding motifs in DEGs from cluster 3 which have expression upregulation in D0 and D14. Even though not all TFs are DE they might regulate transcription indirectly by interacting with other factors. These results suggest that sets of TFs potentially regulate the coordinated transcriptional changes of genes in clusters while others may have regulatory mechanisms in numerous genes at different time points.

Motif enrichment analysis in open chromatin regions highlights a preferential location of specific TFs in promoters.
To investigate the potential binding of TFs in OCR, we performed a motif search in consensus peak regions. We filtered out motifs with FDR > 0.1. Next, for each TF motif, we calculated its fre-  Based on ENCODE's motif matrix, TFs may have more than one motif, thus we selected the most enriched motif representative of a TF. This strategy is intended to identify potentially important regulators of dopaminergic differentiation. We obtained a list of 171 TFs, which were further filtered to contain only TFs in the list of 4009 previously defined DE genes. We found 23 and 18 DE TFs with a higher frequency of binding motifs in promoter or enhancer peaks, respectively (Fig. 6c). The mean frequency of motifs in promoter regions was higher (6.4 ± 13%; range: 0-46%) than in enhancer regions (1.5 ± 13%; range: 0-11.8%). From the TFs with the highest frequency Gene expression correlation between selected TFs and DEGs. The top 8 TFs with motif frequencies annotated to promoter regions were ESRRA, CTCF, EBF1, POU2F2, BHLHE40, HEY1, SP8, and HIC1. Motifs of ESRRA, CTCF, EBF1, and POU2F2 were also the most enriched in enhancer regions. We calculated the gene expression Pearson correlation of these TFs with all 4009 DEGs considering significant correlations at FDR < 0.05. ESRRA, whose expression decreases from D0 to D28, is positively (18%) and negatively (23%) correlated with DEGs, (Fig. 6d). Conversely, EBF1 increases from D0 to D28, and is positively correlated with 33% of DEGs and negatively with 31% (Fig. 6d). Other significant TFs are included in Fig. S6.

CTCF displays a variable expression during mDA induction.
We identified changes in the expression of important proteins controlling chromatin structure and nuclear architecture (Fig. 7a). The highest FC (9.8) was found in ACTL6B, which codes for BAF53B, a subunit of the BAF remodeling complex. Notably, most of the genes associated with 3D chromatin structure show a consistent downregulation from D0 to D14 and an upregulation from D14 to D28. To further analyze chromatin states, we selected CTCF, a major component of the nuclear organization machinery that shows a dynamic expression pattern during DA induction (normalized RNA-Seq counts, Fig. 7b). CTCF protein levels are significantly decreased from D0 to D14, increasing again at D28 (Fig. 7c). A similar CTCF abundance pattern was observed through immunostaining, with a significant decrease between D0 and D14 (Fig. 7d). We then used confocal microscopy to detect the coincidence of CTCF with heterochromatin regions identified by high Hoechst staining (Fig. S7). CTCF is excluded from dense Hoechst regions at D0, and this spatial co-localization is significantly increased at D14 and D28 (Fig. S7). A further analysis was performed to determine the co-localization of CTCF with the Polycomb-repressed chromatin mark H3K27me3, by confocal microscopy. The co-localization of CTCF with this epigenetic modification, which identifies facultative heterochromatin, was assessed by image analysis (Fig. 7e). The correlation coefficient was maximal at D0, significantly decreasing at D14, and more notably at D28 (Fig. 7f). Three-dimensional reconstruction of nuclei of cells labeled with CTCF and H3K27me3 antibodies are shown for D0 (Video S1), D14 (Video S2) and D28 (Video S3). We noted a variable expression of CTCF isoforms (aliases and transcript IDs in Supplementary Table S10) during DA induction (Fig. 7h). Isoform A has an expression pattern similar to the average of transcripts (genelevel) and protein, depicting a transient decrease in D14 and the highest expression in D28. Isoform B decreases expression with no detectable levels at D28. In contrast, isoforms C and D are upregulated in D14. Isoform E is only expressed in D0 (Fig. 7g, Supplementary Table S10). CTCF promoter shows similar levels of chromatin accessibility in our ATAC-Seq samples, to those of H3K27ac reported in pluripotent H9, but also to ESC-derived NPC and N. The latter suggests that CTCF has ubiquitous activity. However, we detected two intronic OCRs within CTCF, previously annotated by GeneHancer. Both regions are markedly accessible at D0, the stage in which CTCF showed the highest transcript and protein levels (Fig. 7b,c), but only one of them overlapped with the H3K27ac mark in H9 hESCs. The region that lacks visible levels of H3K27ac modification in pluripotent cells is also accessible at D28, and interestingly, it displays significant levels of H3K27ac in ESC-derived N. D14, the stage with the lowest CTCF expression, presented only one OCR within the promoter region.

Discussion
Differentiation of pluripotent human cells to mDA involves morphological changes but also modifications in chromatin structure and RNA expression. Here we adopted a protocol mimicking mDA development 12 with conditions inducing floor plate precursors prior to dopaminergic neuron specification. We observed the expected downregulation of pluripotent-associated transcripts and the induction of FOXA2, LMX1A, EN1, EN2, DDC, NR4A2, NEUROG2, NEUROD1, ASCL1, NKX6-1, LMX1B, and TH. Gene-set enrichment analyses were consistent with these processes.  Previous studies profiled gene expression by microarrays with the same differentiation protocol 12 , and also a comprehensive approach was performed in human DA development with single-cell resolution 36 . However, gene expression is not sufficient for elucidating regulators of cell identity. The integration of ATAC-Seq and the greater resolution of RNA-Seq is a powerful resource to address regulatory heterogeneity between different cellular populations 6 . We found higher levels of global chromatin accessibility in D0 when compared with D14 and D28. These results confirm that ESC have a more open or "loose" chromatin correlating with a more permissive transcriptional state 7 . Differentiation is associated with heterochromatinization and global reorganization of heterochromatin structure, density and position 8 . A clear relationship between chromatin accessibility and gene expression has been demonstrated 37 . We observed a moderate correlation when comparing these parameters, suggesting the participation of other elements in the regulation of gene transcription such as TF binding, DNA methylation, histone tail modifications, miRNAs, and lncRNAs 38,39 .
We identified known and novel TFs potentially orchestrating neurogenesis and dopaminergic specification. We found a higher density of TF motifs in promoter regions compared to enhancer regions, consistent with previous data 40 . By studying DA differentiation in mice, researchers concluded that expression of lncRNAs and accessibility of enhancer regions represent a more accurate molecular signature associated with cell specificity than the expression of protein-coding genes 41 . We found a higher overlap of ATAC-Seq D0-specific OCRs with GeneHancer regulatory elements than with D14 and D28 OCRs suggesting that cell type-specific enhancer regions are potentially involved in dopaminergic differentiation. D14 and D28 OCRs not annotated by Gene-Hancer are interesting candidates for experimental validation.
Overall, 6 out of 8 DE TFs, whose motifs were most frequently found in OCR, have expression patterns significantly correlated with a considerable number of DE genes. Interestingly, these TFs have diverse profiles with their highest expression in D0, D14, or D28, denoting their roles in pluripotency, neural precursor commitment, or mDA specification. ESRRA and SP8, upregulated in D0, might participate in pluripotency maintenance. ESRRA is a nuclear receptor, but its participation in self-renewal and pluripotency has not been elucidated yet; however, another family member, ESRRB is a crucial regulator of pluripotency in mouse ESCs 42 and contributes to reprogramming fibroblasts to pluripotency 43 . Recent research revealed a pioneering role of ESRRB in binding silenced enhancers to promote a permissive chromatin state for pluripotent core TF recruitment 44 . Expression profiling studies of nuclear receptors in mouse and human ESCs showed that ESRRB is unique to mice. In contrast, ESRRA is present in both mice and humans 45 , suggesting that ESRRA might subserve biological functions in hESCs similar to those of murine ESRRB.
Motifs of ESRRA, CTCF, and EBF1 were enriched in enhancer regions, suggesting their role in distal regulatory mechanisms. Some of the TFs identified in this study have not been previously implicated in dopaminergic differentiation, but they have a role in neurogenesis. Early B cell factor (EBF1), highly expressed in D28, has been shown to bind to the promoter 46 and enhancer regions 47 and may interact with p300/CBP regulating gene transcription 48 . EBF1 is expressed in post-mitotic neurons and has an important role in mouse striatal differentiation 49 . Given that computationally predicted motif binding sites do not always indicate physiological binding of TFs 50 , additional ChIP-seq experiments are needed to address the in vivo validation of TF binding in progenitors and mDA 51 .
Genome-wide association studies (GWAS) have identified associations between SNPs and human pathologies. The majority of identified SNPs lie within non-coding regions including lncRNAs and enhancer regions 26,27 , suggesting that these SNPs mark variations that might impact regulatory functions increasing the risk for PD. Concordantly, we observed that OCRs in differentiated DA neurons overlap significantly with SNPs in noncoding regions. Specifically differentiated DA neurons show that OCRs in SCNA overlap with reported PD-related SNPs. Further studies are required to pinpoint the functional effect of enhancer region SNPs with TF binding. www.nature.com/scientificreports/ We observed marginal changes in gene expression of several subunits of the chromatin remodeling complex BAF, except for the ACTL6B transcript that codes for BAF53B subunit and is significantly upregulated from D14 to D28. Interestingly, cell cycle exit in NPCs is accompanied by a subunit exchange from BAF53A to BAF53B 52 , consistent with our results. To elucidate molecular triggers driving chromatin accessibility changes, we queried expression profiles of diverse chromatin remodelers and selected the architectural protein CTCF, given its importance in brain development 53 . CTCF is ubiquitously expressed in numerous cell types and developmental stages and has been implicated in regulating widespread functions 54  www.nature.com/scientificreports/ The leading hypothesis denoting CTCF's diverse regulatory functions is by conforming chromatin loops 55 . Notably, we observed a transient decrease at both protein and mRNA levels of CTCF in samples from D14. Beagan et al. found a decreased genome-wide CTCF occupancy in the transition from mouse ES cells to NPCs 53 . They also showed down-regulation in the protein levels of CTCF when comparing mouse pluripotent cells with NPCs, but they did not analyze differentiated neurons; we observed an increase in CTCF when human NPCs progress to differentiated neurons. Heterochromatin regions can be identified by intense Hoechst staining; however, a more specific epigenetic mark for Polycomb-mediated facultative heterochromatin is H3K27me3 56 . Through further assays, we found the lowest co-localization of CTCF with intense Hoechst staining at D0, increasing at D14 and D28. However, the co-localization of CTCF with H3K27me3 in pluripotent cells is highest. At D14, there was a discrete, although significant decrease in this coincidence. In contrast, there was a significant dissociation of CTCF presence with H3K27me3 deposition at D28. Previous work showed that the majority of CTCF sites present in ESC-derived NPC are already present in pluripotent ESC, suggesting that neural commitment is accompanied by a massive loss of CTCF occupancy instead of an extensive CTCF site acquisition or reshuffling 53 . Reduced CTCF presence in heterochromatin at D0 may result from the heterochromatin-free nuclei of ESCs. Increased co-localization of CTCF with heterochromatin might be due to the extensive heterochromatinization that takes place through differentiation 8 or to CTCF acting as an insulator to prevent excessive repression. Further experimental assays will unveil the role of CTCF in chromatin looping and its changes during DA specification.
Our results indicate that CTCF relocates to euchromatin regions in differentiated neurons.

Conclusion
In summary, we generated genome-wide maps of transcriptional changes and chromatin accessibility and through their integration, we derived a list of TFs and lncRNAs with potential functions in pluripotency maintenance or DA differentiation. We pinpointed important chromatin remodelers and demonstrated the dynamics of CTCF protein and mRNA. Importantly, we highlight the coordinated functions of chromatin accessibility, chromatin remodelers, TFs and lncRNAs to regulate spatiotemporal gene expression. Our work provides a useful resource of TFs and regulatory elements, potentially orchestrating human mDA specification.

Methods
For detailed protocols, refer to Supplementary Information. Dopaminergic differentiation of hESCs. H9 human embryonic stem cells (WA09) of female origin were purchased from WiCell (Madison, WI, USA), transduced with lentiviral vectors to express GFP as described 18 and cultured until 75% confluency. These GFP-expressing cells are useful for transplantation studies 13 and behave as its parental H9 hESC. At day zero (D0), cells were considered pluripotent. We followed a dopaminergic induction protocol 12 with minor modifications 57 . At D14, neural precursors are present, whereas at D28, cells presented a neuronal morphology. Samples were analyzed at D0, D14 and D28, some plates were used for RNA extraction while others for nuclei isolation and chromatin assays.
Immunostaining. Undifferentiated H9-GFP cells were cultured for 2-3 days. Neural progenitors were reseeded on induction day 13 or 21 on 24-well plates and fixed the next day. Cells were permeabilized, washed, blocked, and incubated with primary and secondary antibodies. Incubation with Hoechst 33258 (1 ng/ml) was used for nuclear labeling.
Immunoblot. Standard immunoblot protocols were performed for CTCF and TH. Images were analyzed using ImageJ-Fiji 58 .
Immunofluorescence analysis of CTCF protein levels. Photographs of CTCF-and Hoechst-stained cells, or confocal images were normalized, and analyzed with ImageJ. Co-localization of CTCF with heterochromatin regions stained with Hoechst was determined by the overlapping area from merged images as reported 56 . Also, the co-localization of CTCF with H3K27me3 was determined as previously described 59 .  ATAC-Seq pre-processing. After removing adaptors using cutadapt 61 and trimmomatic 62 we obtained 36-126 bp paired-end ATAC-Seq reads for each time-point samples. The quality of trimmed sequencing datasets was verified using FASTQC 63 . We followed the ENCODE ATAC-Seq pipeline (http:// www. encod eproj ect. org/ atac-seq/) to process samples. Paired-end reads were aligned to the human reference genome (GRCh38/ hg38) using Bowtie2 with parameters '-X 2000 -k 5' 64 . Next, we used SAMTools 65 to remove unmapped reads, fragments with unmapped mates, non-primary alignment reads, and reads failing platform quality checks. Low quality reads (MAPQ < 30) and reads which mapped to mitochondrial DNA were also excluded. Optical and PCR duplicates were also removed using Picard tools (http:// broad insti tute. github. io/ picard). To accurately locate the center of each transposon-binding event, the remaining reads were offset by + 4 and − 5 for positive and negative strands respectively 66 . To evaluate the expected periodicity of DNA winding around nucleosomes, we obtained insert size histograms using Picard tools (Fig. 4a). ATAC-Seq peak regions were called for each sample using MACS2 with parameters -shift 75 -extsize 150 -nomodel -keep-dup all -call-summits 67 . Resulting peaks which overlapped with ENCODE blacklisted regions were filtered out 68 .

Analysis of differential chromatin accessibility.
A consensus set of 464,783 unique peaks was generated merging peaks within 100 bp. Exclusive peaks were obtained using BEDTools 69 . Reads in peaks were counted using HTSeq 70 and tested for differential accessibility using DESeq2 71 considering |log 2 (fold-change)|> 1.5 and p-value < 0.05. Promoter-TSS was defined as 1 kb up-plus 100 bp downstream of the transcript start site (TSS). HOMER annotatePeaks function was used 72 , to associate peaks to genes (GENCODE v29) or genomic regions (intergenic, promoter-TSS, exon, intron, 5' UTR, 3' UTR, and TTS). We used Mann Whitney tests to compare the number of normalized reads found in open chromatin regions (OCR) of specific loci between ATAC-Seq samples (D0 vs D14 and D0 vs D28). Asterisks indicate statistical significance (p-value < 0.05) in IGV tracks ( Fig. 5b-d).
Gene-set enrichment analysis. The hypergeometric statistical test in R (https:// cran.r-proje ct. org/) and a subset of gene-sets from MSigDB 73 combined with Wikipathways 2019 74,75 were used. Gene-sets were considered significantly enriched with FDR < 0.05.
Public datasets. For comparison, we used public datasets for CTCF ChIP-seq, DNase I hypersensitivity, H3K27ac, H3K27me3, H3K4me1, H3Kme3 for H1, H9, NPC, N, LUHMES, and substantia nigra cells (SN) from Gene Expression Omnibus accession numbers indicated in Supplemental information. We used GeneHancer database from GeneCards Suite v4.14 22 for comparison of enhancer regions and predicted target genes.
Transcription factor binding analysis. We performed a motif search using position weight matrices from the ENCODE TF ChIP-seq datasets 31 . Promoter regions of differentially expressed (DE) genes (DEGs) (1 kb up-and downstream of the TSS) and open chromatin regions (OCRs, common and differentially accessible) were scanned for motif occurrences using FIMO 30 with FDR < 0.1. The most frequently occurring DE TFs were correlated with DEGs using the Pearson coefficient. Only correlations with FDR < 0.05 were considered significant.
Gene expression analysis. Standard quality procedures were performed using cutadapt 61 , trimmomatic 62 , and FASTQC 63 . Reads were mapped to hg38 (https:// www. genco degen es. org/) using a previously described pipeline 76 . Assembly of mapped reads was performed with Cufflinks v2.2.1 77  www.nature.com/scientificreports/ genes and transcripts. We adjusted batch effect (Fig. S2) and performed pairwise comparison of read counts implementing DESeq2 71 . Genes were labelled as differentially expressed (DEG) if at least one of the replicates in the comparison had FPKM ≥ 1, and normalized count FC > 4 with an FDR < 0.05. Temporal gene expression profiles of DEGs were obtained using hierarchical clustering. DEGs in each cluster were used for gene-set enrichment analysis.

Data availability
Raw sequencing datasets and processed files have been deposited in NCBI Gene Expression Omnibus (GEO) (http:// www. ncbi. nlm. nih. gov/ geo) under accession GSE153005.