A molecular quantitative trait locus map for osteoarthritis

Osteoarthritis causes pain and functional disability for over 500 million people worldwide. To develop disease-stratifying tools and modifying therapies, we need a better understanding of the molecular basis of the disease in relevant tissue and cell types. Here, we study primary cartilage and synovium from 115 patients with osteoarthritis to construct a deep molecular signature map of the disease. By integrating genetics with transcriptomics and proteomics, we discover molecular trait loci in each tissue type and omics level, identify likely effector genes for osteoarthritis-associated genetic signals and highlight high-value targets for drug development and repurposing. These findings provide insights into disease aetiopathology, and offer translational opportunities in response to the global clinical challenge of osteoarthritis.


Supplementary Note 1: Identification of cis-eQTLs and cis-pQTLs in osteoarthritis tissues
To verify that the molQTL results were robust, we carried out a sensitivity analysis including patient age and osteoarthritis joint (knee or hip) as covariates in addition to the PEER factors, sex and array. In each tissue, ³89.8% of eGenes and >82% of pGenes significant in the main analysis also remained significant at 5% FDR in the sensitivity analysis, with a very strong correlation of normalized effect sizes reported for eGene and pGene variants identified across both analyses (Pearson r³0.988, P<10 -15 ).
We note that we detected a smaller proportion of genes with pQTLs (38 of 1,677 proteins in cartilage, 2.6%) than with eQTLs (1,569 of 15,249 in cartilage, 10.3%). Generally, protein expression is influenced by RNA levels as well as translational and post-translational mechanisms, so it is possible that the regulatory effects from genetic variants are diminished. A recent study of brain samples also found fewer genes with pQTLs than with eQTLs (14.7% versus 10.7% of 5,743 genes profiled on both assays) 1 . Moreover, differences in the detected numbers of eQTLs and pQTLs are also likely to be influenced by technological differences between the RNA sequencing and proteomics platforms.
We investigated whether variants that are significant pQTLs also exhibit evidence of eQTL effects ( Supplementary Fig. 3). In low-grade cartilage, we detected 2871 pQTL variants, of which 2866 were also present in the low-grade cartilage eQTL analysis. Of these variants, test p>0.06), which could be due to a broader rewiring of regulatory processes (with some losses and some gains of genetic regulatory effects), and/or due to the small numbers of genes in the enrichment analysis.
We also tested whether genes with differential eQTL were enriched in any GeneOntology annotations with 20-200 genes using GoSeq, compared to all 1569 gene with an eQTL in any cartilage tissue. We separately analysed all 32 genes, as well as 20 respectively 12 genes based on tissue with observed differential eQTL effect, restricting the analysis to genes with unique Ensembl gene ID, gene symbol and Entrez gene ID. Again, we did not detect any significant associations (FDR>5%).
We note that this approach is simplified due to not accounting for several factors that influence the power to detect eQTLs, including the expression levels of genes, the number of SNPs within the gene region, their LD pattern and minor allele frequency. Given the limited power to detect associations based on a small number of genes, we have left more in-depth enrichment analyses of genes with differential eQTLs for future work based on larger sample sizes.

Supplementary Note 3: Differential gene expression between high-grade and low-grade cartilage
We compared the differential expression results between the 5 analysis designs as well as 5 different tests (limma, DESeq2 with and without outlier filtering, edgeR likelihood ratio test and edgeR F test). For each analysis design, the number of significant genes at 5% FDR was similar across all tests:

Protein Digestion and TMT Labeling
The protein content of each sample was precipitated by the addition of 30 μL TCA 8 M at 4 °C for 30 min. The protein pellets were washed twice with ice cold acetone and finally resuspended in 40 μL 0.1 M triethylammonium bicarbonate, 0.1% SDS with pulsed probe sonication. Equal aliquots containing at least 10 μg of total protein were reduced with 5 mM TCEP for 1h at 60 °C, alkylated with 10 mM Iodoacetamide and subjected to overnight trypsin (70 ng/μL) digestion. TMT 10-plex (Thermo Scientific) labelling was performed according to manufacturer's instructions at equal amounts of tryptic digests. Samples were pooled and the mixture was dried with speedvac concentrator and stored at -20 °C until the peptide fractionation.

Peptide fractionation
Offline peptide fractionation was based on high pH Reverse Phase (RP) chromatography using the Waters, XBridge C18 column (2.1 x 150 mm, 3.5 μm) on a Dionex Ultimate 3000 HPLC system. Mobile phase A was 0.1% ammonium hydroxide and mobile phase B 100% acetonitrile, 0.1% ammonium hydroxide. The TMT labelled peptide mixture was dissolved in 100 μL mobile phase A, centrifuged and injected for fractionation. The gradient elution method at 0.2 mL/min included the following steps: 5 minutes isocratic at 5% B, for 35 min gradient to 35% B, gradient to 80% B in 5 min, isocratic for 5 minutes and re-equilibration to 5% B. Signal was recorded at 280 nm and fractions were collected every one minute. For cohort 4, peptide fractionation was performed on reversed-phase OASIS HLB cartridges at high pH and up to 9 fractions (10-25% acetonitrile elution steps) were collected for each set.
The collected fractions were dried with SpeedVac concentrator and stored at -20 °C until the LC-MS analysis.

LC-MS Analysis
LC-MS analysis was performed on the Dionex Ultimate 3000 UHPLC system coupled with the Orbitrap Fusion Tribrid Mass Spectrometer (Thermo Scientific). Each peptide fraction was reconstituted in 40 μL 0.1% formic acid and a volume of 7 μL was loaded to the Acclaim PepMap 100, 100 μm × 2 cm C18, 5 μm, 100 Ȧ trapping column with the μlPickUp mode at 10 μL/min flow rate. The sample was then analysed with a gradient elution on the Acclaim PepMap RSLC (75 μm × 50 cm, 2 μm, 100 Å) C18 capillary column retrofitted to an electrospray emitter (New Objective, FS360-20-10-D-20) at 45 °C. Mobile phase A was 0.1% formic acid and mobile phase B was 80% acetonitrile, 0.1% formic acid. The gradient method at flow rate 300 nL/min was: for 90 min gradient to 38% B, for 5 min up to 95% B, for 13 min isocratic at 95% B, re-equilibration to 5% B in 2 min, for 10 min isocratic at 10% B.
Precursors were selected with 120k mass resolution, AGC 3×10 5 and IT 100 ms in the top speed mode within 3 sec and were targeted for CID fragmentation with quadrupole isolation width 1.2 Th. Collision energy was set at 35% with AGC 1×10 4 and IT 35 ms. MS3 quantification spectra were acquired with further HCD fragmentation of the top 10 most abundant CID fragments isolated with Synchronous Precursor Selection (SPS) excluding neutral losses of maximum m/z 18. Iontrap isolation width was set at 0.7 Th for MS1 isolation, collision energy was applied at 55% and the AGC setting was at 6×10 4 with 100 ms IT. The HCD MS3 spectra were acquired within 110-400 m/z with 60k resolution. Targeted precursors were dynamically excluded for further isolation and activation for 45 seconds with 7 ppm mass tolerance. Cohort 4 were analyzed at the MS2 level with a top15 HCD method (CE 40%, 50k resolution) and a maximum precursor intensity threshold of 5×10 7 using the same MS1 parameters as above in a 360 min gradient.

Heatmaps of molecular differences between high-grade and low-grade cartilage
We used the pheatmap v1.0.12 function in R to plot heatmaps of gene expression and protein abundance values. Among the 290 genes with significant and concordant crossomics differences, we selected the top 20 genes and top 20 proteins with the highest absolute log-fold-differences between high-grade and low-grade cartilage. For gene expression, this was based on the paired-sample analysis with limma including the 10 technical covariates inferred by SVseq. Analogously to the differential gene expression and protein abundance analysis, we then regressed out patient-level effects and 10 technical covariates for gene expression, and the patient-level effects for the protein abundance (as paired samples were always assayed in the same batch).

Supplementary Figures
Supplementary Figure 1. Large-scale multi-omics characterisation of osteoarthritis disease tissue: number of patients with data for each tissue and omics type after quality control.
For each of the 23 genes, the variant with strongest differential eQTL signal is shown. The boxplots show expression (residuals after regressing the normalised expression data on the 15 PEER factors, sex and array) at 25th to 75th percentiles, with centre at the median and whiskers extend to 1.5 times the interquartile range. n: number of individuals included in the analysis for each genotype.
Supplementary Figure 6. Genes with molecular QTL present in low-grade, but not highgrade cartilage (posterior probability m>0.9 and m<0.1, respectively).
For each of the 9 genes, the variant with strongest differential eQTL signal is shown. The boxplots show expression (residuals after regressing the normalised expression data on the 15 PEER factors, sex and array) at 25th to 75th percentiles, with centre at the median and whiskers extend to 1.5 times the interquartile range. n: number of individuals included in the analysis for each genotype.  High-grade versus low-grade cartilage: Signalling Pathway Impact Analysis Protein high-grade vs low-grade differences    CHRDL2  MATN4  RARRES2  FRZB  FST  FABP5  HBA1  HBB  CLIC3  CRLF1  HTRA1  PDLIM1  TGFBI  MAP1A  POSTN  CCL20  TNFAIP6