Chromatin accessibility landscape of articular knee cartilage reveals aberrant enhancer regulation in osteoarthritis

Osteoarthritis (OA) is a common joint disorder with increasing impact in an aging society. While genetic and transcriptomic analyses have revealed some genes and non-coding loci associated to OA, the pathogenesis remains incompletely understood. Chromatin profiling, which provides insight into gene regulation, has not been reported in OA mainly due to technical difficulties. Here, we employed Assay for Transposase-Accessible Chromatin with high throughput sequencing (ATAC-seq) to map the accessible chromatin landscape in articular knee cartilage of OA patients. We identified 109,215 accessible chromatin regions for cartilages, of which 71% were annotated as enhancers. By overlaying them with genetic and DNA methylation data, we have determined potential OA-relevant enhancers and their putative target genes. Furthermore, through integration with RNA-seq data, we characterized genes that are altered both at epigenomic and transcriptomic levels in OA. These genes are enriched in pathways regulating ossification and mesenchymal stem cell (MSC) differentiation. Consistently, the differentially accessible regions in OA are enriched for MSC-specific enhancers and motifs of transcription factor families involved in osteoblast differentiation. In conclusion, we demonstrate how direct chromatin profiling of clinical tissues can provide comprehensive epigenetic information for a disease and suggest candidate genes and enhancers of translational potential.

SCIENTIfIC RePORTs | (2018) 8:15499 | DOI: 10.1038/s41598-018-33779-z discover rare variants with larger effect size, the identified variants are often located in the non-coding regions of the genome 14 , complicating the identification of the causal genes. Transcriptomic analyses of cartilage in diseased joints of OA patients (taken from replacement surgeries) provided an opportunity to pinpoint transcriptionally dysregulated genes and pathways relevant to OA 15,16 . However, such studies have yet to fully reveal the underlying molecular mechanism of how the transcription of these genes are dysregulated.
Recently, epigenetic tools have been applied to gain further insight into the pathogenesis of OA. There have been reports of DNA methylation status in cartilage of diseased joints, revealing epigenetic marks as potential mediators of OA genetic risk [17][18][19][20][21][22][23] . However, the change of gene expression is rarely associated with DNA methylation alterations at promoters 16,24,25 . Many of the identified differentially methylated sites fall in enhancer regions 19,[21][22][23] , which are non-coding regulatory elements, disruption of which may lead to dysregulated transcription, and many are cell type-specific 26,27 . Recent large-scale studies, such as FANTOM5 28 , Roadmap Epigenomics Project 29 and GTEx 30 have enabled the prediction of regulatory networks between enhancers and their potential target genes (e.g. JEME 31 ), which could be applied in a clinical context to explore the roles of enhancers in disease pathogenesis.
Here, we set out to investigate alterations of enhancers associated with OA by applying ATAC-seq 32 on the knee joint cartilages from OA patients, using an optimized protocol for cartilage sample preparation. ATAC-seq maps the accessible chromatin regions, which are often regulatory regions such as promoters and enhancers that play roles in regulation of gene expression. By integrating our ATAC-seq data with the publicly available genetic, transcriptomic and epigenomic data, we identified dysregulated enhancers and their potential target genes. Our data highlights a number of OA risk loci and differentially methylated loci (DML) that potentially play roles in cartilage degradation during OA development.

Results
Mapping chromatin accessibility of chondrocytes in OA knee cartilages. To investigate chromatin signatures in articular cartilage associated with OA, we performed ATAC-seq on the chondrocytes isolated from the knee joints of patients. We have previously shown that the oLT region (outer region of the lateral tibial plateau, representing the intact cartilage) is a good control for comparing the iMT region (inner region of medial tibial plateau, representing the damaged cartilage) as a model for OA disease progression 16 , and the transcriptome and methylome of this model have been characterized by us and the others 24,25,33,34 . In this study, we performed ATAC-seq on the chondrocytes isolated from oLT and iMT regions of 8 patients (Supplementary Figure S1a), generating 16 ATAC-seq libraries (pairs from 8 patients) on clinically relevant OA cartilage tissues (Fig. 1a).
Overall, the libraries are of high quality, showing two-to four-fold enrichment in the Roadmap DHS (DHS enrichment score, Methods), with no substantial difference between oLT and iMT libraries (Supplementary  Table S2). In addition, the expected nucleosome banding patterns were observed in the fragment size distribution for both oLT and iMT libraries (Fig. 1b). When we applied nucleoATAC to infer genome-wide nucleosome occupancy and positioning from ATAC-seq data 35 , we found similar aggregated signals around transcription start sites (TSS) for both oLT and iMT libraries, corresponding to −2, −1, +1, +2, +3 nucleosomes as well as nucleosome depletion region at upstream of TSS (Fig. 1c). Thus, we concluded that our ATAC-seq libraries had good and indistinguishable quality between oLT and iMT regions. To assess the relevance of these peaks to OA, we intersected these peaks against OA GWAS (Single Nucleotide Polymorphisms) SNPs and OA DML datasets. We found 149 peaks (31 promoters and 101 enhancers) overlapping with OA-associated GWAS SNPs and 2,508 peaks (595 promoters and 1,715 enhancers) overlapping with OA DML. The fact that majority of these overlapping peaks are enhancers, along with the notion that many disease-associated noncoding variants reside in enhancers 14,36 , suggest the dysregulation of enhancer might play a role in OA pathogenesis. To further characterize these potentially OA-relevant enhancers, we utilized public resources to predict their target genes (Methods). We listed enhancers that overlap OA GWAS SNPs (Supplementary Table S4) and OA DML (Supplementary Table S5), as well as their predicted target genes, representing candidates of dysregulated transcriptional network during OA pathogenesis. The list includes previously identified OA associated genes, such as FGFR3 37,38 (its enhancer overlaps an OA GWAS SNP, Fig. 2b) and PTEN 39 (its predicted enhancer overlaps with an OA DML, Fig. 2c), thus validating our approach. rs4646626 has previously been identified as a suggestive OA risk locus (p = 9 × 10 −6,40 ), for which we found two proxy SNPs (rs10851632 and rs12905608), predicted to target the ALDH1A2 gene, hitting the accessible enhancer region (chr15:57949887-57950974 and chr15:58021340-58022482, respectively) in our samples. Therefore, our study provides an accessible chromatin landscape of cartilage tissue for better interpretation of other genetic and epigenomic data relevant to OA and other skeletomuscular disease, although caution should be taken since public data from many populations are involved in the integrative analysis while our ATAC-seq dataset is acquired from the Japanese population.
Identification of differentially accessible enhancers in OA. To compare the accessible chromatin landscape between the intact (oLT) and the damaged (iMT) tissues, we performed principal component analysis on the peak signals across the 16 samples. The first principal component (41.37% of variance) can be attributed to tissue damage variations (i.e. oLT versus iMT), while the second principal component (16.03% of variance) can be attributed to patient-to-patient variations (Fig. 3a). This observation suggests the accessible chromatin landscapes of damaged and intact tissues are readily distinguishable from each other, despite the variations among individual patients.
To identify the chromatin signatures relevant to OA, we performed differential accessibility analysis between oLT and iMT using edgeR 41 . First and foremost, we assessed the global relevance of the differential accessibility of peaks to OA. In Fig. 3b, we investigated the relationship between the extent of differential accessibility of peaks (measured by False Discovery Rate (FDR)) and their 1) compositions, 2) enrichment in OA GWAS SNPs, and 3) enrichment in OA DML. We found that the peaks that are more differentially accessible (i.e. smaller FDR) contain substantially higher fraction of enhancers (Fig. 3b, top panel), suggesting enhancers are more likely to be dysregulated than promoters during OA disease progression. Moreover, peaks that are more differentially accessible  Tables 1 and 2, respectively. We note that 92% of the differentially accessible peaks (FDR ≤ 0.05) overlapping DML are concordant in the direction of change (hypomethylation with increased accessibility or hypermethylation with decreased accessibility). As a negative control, we examined the GWAS SNPs and DMLs associated with Parkinson's disease 42 , which is a neurodegenerative disease pathologically unrelated to cartilage, and did not observe any substantial enrichment. Taken together, these observations suggest the differential chromatin accessibility measured in this study is relevant to OA disease progression in a specific manner.
We next determined significantly differentially accessible peaks with FDR ≤ 0.05. Out of the 4,450 differentially accessible peaks, 1,565 are more accessible and 2,885 are less accessible in the damaged tissues compared to the intact tissues ( Fig. 3c,d, and Supplementary Figure S2a). The identified differentially accessible peaks cluster in two groups (Supplementary Figure S2b), and are generally consistent among patients (Supplementary Figure S2c). Majority (85.3%) of these differentially accessible peaks are enhancers and only 7.6% are promoters. It is noted that the promoter accessibility of SOX11 and RGR are altered in the OA damaged tissues (more accessible for SOX11 and less accessible for RGR, Fig. 3d), which are consistent with the previous findings that they are up-and down-regulated in OA damaged tissues, respectively 24 .
Majority of the differentially accessible peaks are enhancers, which are known to regulate transcription. Cell type-specific enhancers largely drive the transcriptional program required to carry out specific functions for each cell type 43 and their dysregulation may lead to diseases 44 . To assess the cell type specificity of the differentially accessible enhancers identified in this study, we examined their enrichment of cell type-specific enhancers of 125 cell types defined by the Roadmap Epigenomics Project 29 . We found these differential accessible enhancers are highly enriched for enhancers that are specific to bone-related cell types in damaged tissues (e.g. chondrocyte and osteoblast), as well as mesenchymal stem cells (MSC) and fibroblasts, which have been reported to have similar profiles to MSC 45 (Fig. 3e).
In summary, we demonstrate that the differential accessible chromatin regions identified in this study are enriched in OA-relevant enhancers supported by multiple genetic and epigenomic evidence. These differentially  accessible enhancers and their predicted target genes could be used for prioritization of candidate genes to be tested for studying OA disease progression.

Motif enrichment analysis reveals transcription factors relevant to OA.
In order to gain more insights into which regulatory pathways may be dysregulated in OA, we next examined the enrichment of transcription factor binding motifs in the differentially accessible regions. Enrichment analyses were performed separately for the regions that are significantly more or less accessible in the damaged tissues, using all accessible regions as background. Transcription factors with significantly enriched motifs are summarized (Supplementary Figure S3a), and their binding prediction in robust peaks are listed (Supplementary Table S3). We note most of these transcription factors belong to ETS and bZIP family; many of which are known to regulate genes involved in bone or cartilage development, including AP-1 46 , CEBP 47 , MafK 48 , STAT3 49 , and ERG 50,51 . Moreover, we have previously proposed that ETS-1 may be involved in OA based on our DML study 25 . Taken together, the motif enrichment analysis of the differentially accessible regions in the damaged tissues is consistent with the hypothesis that the transcriptional program for chondrocyte differentiation may be disrupted during OA progression, and suggests that cell type-specific enhancers may be dysregulated through the ETS and bZIP family transcription factors.

Integrative transcriptomics and epigenomics analysis reveals pathways involved in OA.
To evaluate the effects of the dysregulated regulatory regions on gene expression in OA, we reanalyzed the RNA-seq dataset from a different cohort that used the same disease model (i.e. oLT vs. iMT) 15 and integrated it with the differentially accessible regions identified in this study (Fig. 4a). We first verified that the dysregulated regulatory regions have detectable effects on the gene expression; the genes with more accessible promoters or enhancers have significantly higher expression fold-changes between oLT and iMT (p < 0.001, Student's t-test) than the ones with less accessible promoters or enhancers (Fig. 4b).
To further investigate the congruence between these transcriptomic and epigenomic changes, we overlapped the significant differentially expressed genes (n = 3,293, FDR ≤ 0.05) onto the genes with differentially accessible promoters (n = 255) or enhancers (n = 2,406) (Fig. 4c). We find that their overlaps are statistically significant (p < 0.001 in both promoters and enhancers, Fisher's exact test), which further demonstrate the chromatin accessibility dataset from this study is generally consistent with the transcriptomic dataset. As a result, we identified 371 genes that are consistently dysregulated both at the epigenomic and transcriptomic levels, representing a shortlist of OA-related candidate genes supported by multiple lines of evidence (Fig. 4c, Table 3, Supplementary Table S6).
To elucidate the biological pathways dysregulated in OA, we examined the enrichment of gene ontology (GO) terms of the 371 OA-related candidate genes using enrichr 52 (Fig. 4d, top 30 terms listed). Overall, we observe the enrichment of GO terms related to cell fate and differentiation, including MSC differentiation, ossification and bone development (Fig. 4d, dysregulated genes in each GO terms are listed in Supplementary Table S7). In the 'positive regulation of ossification pathway' , two genes were identified to be more accessible and upregulated: SOX11 (only the promoter is more accessible) and WNT5A (both the promoter and the enhancer are more accessible) (Supplementary Figure S3b). It has been shown that WNT5A protein can induce matrix metalloproteinase production and cartilage destruction 53 , and its upregulation is consistent with ossification being an important process and signature of OA progression. Other susceptible genes and pathways that support the ossification during OA from this analysis include LRP5, FGFR2 and BMPR1B in the endochondral bone morphogenesis pathway, which have been reported as OA associated genes previously [18][19][20][54][55][56] . In conclusion, our integrative analysis of ATAC-seq and publicly available RNA-seq datasets indicates that dysregulated chondrocyte differentiation and endochondral ossification are associated with OA progression.

Discussion
Conventional epigenomic profiling at the chromatin level, such as chromatin immunoprecipitation sequencing (ChIP-seq) and DNase-seq, is informative in providing insights into the molecular mechanism underlying the regulation of gene expression. However, applying these methods to clinically relevant tissue is less feasible due to the requirement of large number of cells. Especially with the cartilage tissues, the limited tissue sampling size and the extracellular matrix make collection of sufficient cells difficult. One major advantage of ATAC-seq is that it can be achieved with only thousands of cells, making the direct chromatin profiling of clinical samples feasible. In this study, we applied ATAC-seq on OA samples to obtain a chromatin accessibility map in articular cartilage, and identified regulatory regions associated with OA. The lack of normal knee tissues due to difficulties associated with collecting them is a limitation for studying OA. However, our previous studies of the pathology and transcriptome showed that the oLT regions are very similar to normal 16,24 . Thus, the oLT regions could serve as a suitable alternative to normal control, which could also reduce the inter-individual variations.
Our strategy of integrating the epigenomic data of clinically relevant tissues with the publicly available genetic and transcriptomic data allowed us to better understand how the identified loci may contribute to OA pathogenesis. Most of these accessible chromatin regions are annotated enhancers and we linked them to their putative target genes using public datasets. Previous studies have suggested the potential role of OA-associated epigenetic changes within enhancers in disease pathogenesis 22,23 . With this enhancer-gene map in chondrocyte, we can now better interpret the previously identified OA GWAS SNPs or OA differential methylated loci located lie outside of the coding regions. For example, we have verified 7 OA associated SNPs (rs10851630, rs10851631, rs10851632, rs12905608, rs12910752, rs4238326, and rs35246600) reside in 3 chondrocyte accessible enhancers of the predicted target gene ALDH1A2 (Supplementary Table S4). Since ALDH1A2 is inactivated in prechondrogenic mesenchyme during the cartilage development 57 , these SNPs may contribute to OA through disrupting the enhancer of ALDH1A2 and inappropriately activating a cell differentiation pathway. Consistently, OA genetic risk variants in ALDH1A2 locus has also been functionally characterized in a recent study 58 . In addition, we identified several aberrantly methylated enhancers that may be associated with OA. One example is cg09221159 within the enhancer for PTEN gene which is hypomethylated in the damaged cartilage 25 . PTEN is involved in the positive regulation of the apoptotic signaling pathway and its activation is consistent with the notion that chondrocyte apoptosis may contribute to the failure in appropriately maintaining the cartilage. Thus, our analyses show that the chromatin accessibility map can provide an additional layer of evidence for determining which loci, especially those in the non-coding regions, are associated with OA, which may have been ignored in previous studies.
In general, our differential enhancer analysis shows MSC, chondrocyte and osteoblast-specific enhancers are dysregulated in the damaged tissues. Furthermore, motif enrichment analysis of differentially accessible loci has identified many dysregulated transcription factors, the functions of which are known to be in chondrocyte development regulation. For example, the transcription factor Jun-AP1, which is enriched in the more accessible regions of the damaged tissues, is known to regulate chondrocyte hypertrophic morphology, which contributes to longitudinal bone growth, consistent with the notion of endochondral ossification 59 . A recent study showed injection of adipose-derived stromal cells overexpressing an AP-1 family transcription factor Fra-1 can inhibit OA progression in mice 60 . Consistently, in our ATAC-seq analysis, other members of AP-1 family (e.g. BATF, FOSL2) also showed enrichment in differentially accessible regions. In addition, if we combine the RNA-seq data 15 , 15 out of 18 transcription factor genes identified in the motif analysis are expressed (CPM ≥ 2) in the cartilage tissue, of which ETS1 is differentially expressed between the two cartilage types. These transcription factors could be candidate targets for further experiments.
In the integrative analysis of ATAC-seq and RNA-seq, many dysregulated genes related to lineage differentiation of MSC pathways were observed. For example, we found that BMPR1B (bone morphogenetic protein receptor type 1B) is upregulated and both its promoter and enhancer are more accessible in the damaged tissue. Its activation is consistent with the ossification pathway activation, since it encodes a transmembrane serine kinase that binds to BMP ligands that positively regulate endochondral ossification and abnormal chondrogenesis 61,62 . Consistently, an osteoblast marker gene MSX2 (Msh Homeobox 2) involved in promoting osteoblast differentiation 63,64 , is upregulated and both its promoter and enhancer are more accessible in the damaged tissue, suggesting the osteoblast differentiation may be activated in OA. Furthermore, we found ROR2 (receptor tyrosine kinase like orphan receptor 2) is down-regulated and its enhancers are less accessible in the damaged samples. Since it is required for cartilage development 65,66 , it suggests that the normal chondrocyte development and cartilage formation may be compromised in OA. Consistently, we have also determined FGFR2 and STAT1, which are known  to inhibit chondrocyte proliferation 67 , are upregulated. In summary, our ATAC-seq data and integrative analyses support the OA model of abnormal MSC differentiation and endochondral ossification over other models, such as inflammatory mediums from the synovium. The pathogenesis for OA is not yet fully understood, despite multiple genes and pathways that have been characterized to be dysregulated 13,15,[68][69][70] . In this study, by integrating clinically relevant epigenomic data with genetic and transcriptomic data, we provide multiple lines of evidence supporting a number OA candidate genes and pathways that may be crucial to OA pathogenesis, which could potentially be used for clinical diagnostic or as therapeutic targets.

Conclusions
We present in this work an application of ATAC-seq in OA in a clinical relevant setting. The chromatin accessibility map in cartilage will be a resource for future GWAS and DNA methylation studies in OA and other musculoskeletal diseases. We identified altered promoters and enhancers of genes that might be involved in the pathogenesis of OA. Our analyses suggest aberrant enhancer usage associated with MSC differentiation and chondrogenesis in OA. Understanding these molecular basis of OA is necessary for future therapeutic intervention.

Methods
Knee joint tissues collection. Human knee joint tibial plates were collected from patients who undertook joint replacement surgery due to severe primary OA at the National Hospital Organization Sagamihara Hospital (Kanagawa, Japan). Demographic information for patients is listed in Supplementary Table S1. Diagnosis of OA was based on the criteria of the American College of Rheumatology 71 , and all the knees were medially involved in the disease. Tissues were stored in 4 °C Dulbecco's Modified Eagle Medium (DMEM) after removal for less than one hour prior to cartilage sectioning. Informed consent was obtained from each patient enrolled in this study. This study was approved by the institutional review board of all the participating institutions (Sagamihara National Hospital and RIKEN).
Processing of cartilage samples. Fresh cartilage was separated from the subchondral bone by a scalpel after surgery and stored in 4 °C DMEM for three hours until they were processed. 100 mg cartilage from both outer region of lateral tibial plateau (oLT) and inner region of medial tibial plateau (iMT) (Fig. 1a and Supplementary Figure S1a) for each patient was digested with 0.2% type II collagenase (Sigma-Aldrich) in DMEM at 37 °C with rotation for 12 hours to fully remove the collagen matrix debris, and stopped by washing with chilled PBS. Immediately after digestion, chondrocytes were counted by a cell counting chamber Incyto C-Chip (VWR) and collected by centrifugation at 500 g for 5 minutes at 4 °C.
ATAC-seq of the cartilage. Approximately 50,000 cells were taken after the cartilage processing and used for ATAC-seq library preparation (Fig. 1a) according to the original protocol 32,72 . Briefly, transposase reaction was carried out as previously described 32 followed by 9 to 12 cycles of PCR amplification. Amplified DNA fragments were purified with MinElute PCR Purification Kit (QIAGEN) and size selected twice with Agencourt AMPure XP (1:1.5 and 1:0.5 sample to beads; Beckman Coulter). Libraries were quantified by KAPA Library Quantification Kit for Illumina Sequencing Platforms (KAPA Biosystems), and size distribution was inspected by Bioanalyzer (Agilent High Sensitivity DNA chip, Agilent Technologies). Library quality was assessed before sequencing by qPCR enrichment of a housekeeping gene promoter (GAPDH) over a gene desert region 72 . ATAC-seq libraries were sequenced on an Illumina HiSeq 2500 (50 bp, paired-end) by GeNAS (Genome Network Analysis Support Facility, RIKEN, Yokohama, Japan).
ATAC-seq data processing, peak calling and quality assessment. An ATAC-seq data processing pipeline for read mapping, peak calling, signal track generation, and quality control was implemented 73 . Briefly, fastq files for all patients were grouped by tissue compartment (oLT or iMT) and input into the pipeline separately, with parameters -true_rep -no_idr. Reads were mapped to the hg38 reference genome. Peaks for individual samples, as well as for pooled oLT and pooled iMT, were called by MACS2 with default parameters in the pipeline. Basic sequencing information and library quality metrics are listed in Supplementary Table S2. NucleoATAC was applied to infer genome-wide nucleosome positions and occupancy from the ATAC-seq data 35 . Briefly, NucleoATAC were run for oLT and iMT separately with default parameters, using bam files merging from all patients as input, and the outputted nucleoatac_signal.bedgraph were used for aggregated plot around TSS.
Previously annotated DNase I Hypersensitive Sites (DHS) were used to assess the quality of our libraries. Briefly, a set of DHS defined in Roadmap Epigenomics Project 74 were downloaded (http://egg2.wustl.edu/ roadmap/web_portal/DNase_reg.html) and lifted over from hg19 to hg38 using UCSC liftOver tool 75 (refer as Roadmap DHS thereafter). For each library, we calculated the ratio of reads that fall within Roadmap DHS versus randomly chosen genomic regions of the same total size (DHS enrichment score). It is noted that the DHS enrichment score is in good correlation with enrichment qPCR for GAPDH (Supplementary Figure S1b).
Defining a set of unified accessible chromatin regions. Peaks called from pooled oLT and pooled iMT were merged as raw peak regions using bedtools v2.27.0 76 (mergeBed -d 300). Reads fall in raw peak regions were counted for individual samples using bedtools 76 . Read counts were normalized as count-per-million (CPM) based on relative log expression normalization (RLE) method implemented in edgeR v3.18.1 41,77 . In the end, we defined a set robust peaks (n = 109,215) with ≥2 CPM in at least 6 oLT or 6 iMT samples for all downstream analyses. Peak annotation and differentially accessible peak identification. Peaks were annotated as promoters or enhancers based on their intersection with promoter or enhancer as defined in Roadmap DHS 29 . Noted that SCIENTIfIC RePORTs | (2018) 8:15499 | DOI:10.1038/s41598-018-33779-z a peak was preferentially annotated as a promoter if it intersects both a promoter and a proximal enhancer DHS (n = 1,693. 1.6% of total robust peaks). Differentially accessible peaks between damaged (i.e. iMT) and intact (i.e. oLT) tissues were identified using edgeR 41,77 . Briefly, read counts were normalized by the effective library size with the RLE method and DHS enrichment score of each sample were incorporated as continuous covariates in the design matrix of the generalized linear model implemented in edgeR 41,77 . The Benjamini-Hochberg adjusted p-values (i.e. false discovery rate, FDR) was taken to measure the extent of differential accessibility as used in Fig. 3b. A cutoff of FDR ≤ 0.05 was used to define differentially accessible peaks.

Processing of trait associated SNPs and DMLs. GWAS lead SNPs of all phenotypes of OA with
genome-wide significance p ≤ 1 × 10 −5 were obtained from GWASdb v2 (as of 20 Jun 2018) 78 , NHGRI-EBI GWAS Catalog (as of 20 Jun 2018) 79 , and a recent study 80 , totaling 197 lead SNPs. SNPs in linkage disequilibrium (LD) with these lead SNPs in matched populations (i.e. proxy SNPs, with r 2 ≥ 0.8 within of 500 kb) were searched using PLNK v1.90b5.4 (https://www.cog-genomics.org/plink2), based on the 1000 Genomes Project Phase 3 data obtained from MAGMA 81 . It results in 4,483 GWAS SNPs associated with OA, consisting of 167 independent loci with non-overlapping LD blocks. As a negative control, GWAS SNPs associated with Parkinson's disease were obtained in the same manner. Coordinates for these SNPs in hg38 were obtained from dbSNP Build 150.
Differentially methylated loci associated with OA were obtained from previous studies 25,70 and all identified DML were pooled and mapped to hg38 (n = 9,265). Parkinson's disease associated DMLs as a negative control trait were obtained from a previous study 42 .
Definition of cell type-specific enhancers. Bed files for enhancer clusters with coordinated activity in 127 epigenomes, as well as the density of clusters per cell type, were obtained from online database (http://egg2. wustl.edu/roadmap/web_portal/DNase_reg.html). DNase I regions selected with p < 1 × 10 −2 , lifted over from hg19 to hg38 using UCSC liftOver tool. For each cell type, clusters with two-fold density than average (across all cell types) were defined as specific and pooled.
Enrichment analysis for differentially accessible peaks. The enrichment of GWAS SNPs and DML in differentially accessible peaks were assessed as described in previous studies 82,83 . Briefly, trait associated (i.e. OA and Parkinson's disease) SNPs (i.e. lead and proxy SNPs) and DMLs are referred to as foreground positions. For SNPs, ten millions of randomly selected common SNPs in dbSNP Build 150 were used to define the background positions. For DML, all probes in Illumina Infinium HumanMethylation450K were used as the background positions. Enrichment of foreground SNPs/DML in each set of regions (i.e. differentially accessible peaks at various FDR cutoffs) was evaluated by first counting the number of the foreground and background positions intersecting these regions (refer as observed fore and observed back ). The counting was repeated for 100 rounds of permutation, i.e. regions of the same number and sizes shuffled into the masked genomic regions using bedtools shuffle 76 (refer as shuffled fore and shuffled back ). Masked genomic regions was defined as the whole genome excluding the annotated gaps. Only the 22 autosomes were included in analyses. The odds ratio of foreground signal enrichment for each round of permutation was calculated as (observed fore /observed back )/(shuffled fore /shuffled back ). As a negative control, for the trait, the analyses were repeated by replacing the foreground SNPs and DML with those associated with Parkinson's disease.
In Fig. 3e, the enrichment of cell type-specific enhancers in differentially accessible peaks (FDR ≤0.05) were assessed by Fisher's exact test. Same number of peaks were randomly sampled from all robust peaks as controls.
Linking enhancer peaks to their potential target genes. A promoter peak was assigned to a gene if it intersected with the transcription start sites (TSS) of its transcript as defined in the FANTOM CAGE associated transcriptome 82 . We note that a promoter peak might be associated with multiple genes. An enhancer peak was defined as linked to a target gene if it overlapped an expression quantitative trait loci (eQTL) of the corresponding gene (GTEx V7, in any tissues with p < 1 × 10 −5 ) 84 , or is supported by putative enhancer-promoter linkage predicted by JEME method (http://yiplab.cse.cuhk.edu.hk/jeme/based) on FANTOM5 and Roadmap Epigenomics Project data 31 . Out of the 77,655 enhancer peaks, 36,228 of them were linked to at least one gene, of which 15,391 (42.5%) were assigned to one gene, 8,073 (22.3%) to two genes, and 12,764 (35.2%) to three or more genes.
Integration with publicly available transcriptome data. We reanalyzed an RNA-seq dataset (ArrayExpress E-MTAB-4304) from an independent OA patient cohort, in which the RNA was extracted from cartilage tissues of both iMT and oLT for 8 patients 15 . Read counts on transcripts were estimated by Kallisto v0.43.1 85 using default parameters on FANTOM CAGE associated transcriptome 82 . Estimated read counts of a gene, defined as the sum of the estimated read counts of its associated transcripts, were used as the input for differential gene expression analysis using edgeR v3.18.1. In total, 3,293 genes were defined as significantly differentially expressed between iMT and oLT (FDR ≤ 0.05). A gene is defined as "consistently dysregulated both at the epigenomic and transcriptomic levels" when it is upregulated (or downregulated) in RNA-seq with more (or less) accessible promoters or enhancers in ATAC-seq (n = 371).
Gene ontology analysis. Enrichr 52 was used to identify gene sets enriched in genes implicated in OA. The selected gene lists were input to query enrichment for gene ontology (Biological Process 2017b) in the database. Terms with p-value ≤ 0.05 and FDR ≤0.25 were considered significant (n = 421 of 4437). Top 30 terms ranked by percentage of overrepresented genes are listed in Fig. 4d. All terms are listed in Supplementary Table S7. It is of note that terms with fewer number of genes tend to rank higher in this way.
SCIENTIfIC RePORTs | (2018) 8:15499 | DOI:10.1038/s41598-018-33779-z Transcription factor binding motif analysis. DNA motif analysis for differentially accessible regions was performed using HOMER v4.9 with default parameters 86 . The enriched de novo and known motifs as well as its matching transcription factor are searched and scanned in more-and less-accessible peaks separately (parameters: -mask -size given), with all robust peaks as background region. P-value < 1 × 10 −5 was considered significant.
Ethics, consent and permissions. This study was approved by the ethics review board of all the participating institutions (Sagamihara National Hospital and RIKEN) and all experiments were carried out in accordance with their guidelines and regulations.