Analysis of genetically independent phenotypes identifies shared genetic factors associated with chronic musculoskeletal pain at different anatomic sites

Chronic musculoskeletal pain has a negative impact on all aspects of human life. Genetic studies of pain are complicated by the high complexity and heterogeneity of pain phenotypes. In this research, we aimed to reduce phenotype heterogeneity and reveal genes and pathways shared by chronic musculoskeletal pain at four locations: back, neck/shoulder, hip, and knee. Our study was based on the results of genome-wide association studies performed using UK Biobank data with a total sample size of 456,000 individuals. We applied principal component analysis based on the matrix of genetic covariances between the studied pain traits and constructed four genetically independent phenotypes (GIPs). The leading GIP (GIP1) explains the largest proportion of the genetic variance of and covariance between the analyzed phenotypes (78.4%), and the later GIPs (GIP2-4) explain progressively less. We identified and replicated five loci associated with GIP1 and one locus associated with GIP2. The genes decisively prioritized for the GIP1-associated loci were SLC39A8, ECM1, and FOXP2. For the remaining two GIP1-associated loci, we proposed several candidates (AMIGO3, BSN, RBM6, FAM212A, RNF123, UBA7 and MIR7114, NSMF, NOXA1, GRIN1), but were unable to prioritize any of them convincingly. The most likely causal gene in the locus associated with GIP2 was GDF5. For GIP1, gene set/tissue/cell type enrichment analyses identified multiple terms related to the nervous system. Genetic correlations analysis revealed a genetic overlap between GIP1 and osteoarthritis as well as a set of anthropometric (such as overweight and waist circumference), sociodemographic (such as age of first birth and college completion) and psychiatric/personality (such as depressive symptoms and neuroticism) traits. We suggest that GIP1 represents a biopsychological component of chronic musculoskeletal pain, related to physiological and psychological aspects and likely reflecting pain perception and processing.

Сhronic musculoskeletal pain shared heredity, 2019 Abstract 28 Chronic musculoskeletal pain has a negative impact on all aspects of human life. Genetic studies 29 of pain are complicated by the high complexity and heterogeneity of pain phenotypes. In this research, 30 we aimed to reduce phenotype heterogeneity and reveal genes and pathways shared by chronic 31 musculoskeletal pain at four locations: back, neck/shoulder, hip, and knee. Our study was based on the 32 results of genome-wide association studies performed using UK Biobank data with a total sample size 33 of 456,000 individuals. We applied principal component analysis based on the matrix of genetic 34 covariances between the studied pain traits and constructed four genetically independent phenotypes 35 (GIPs). The leading GIP (GIP1) explains the largest proportion of the genetic variance of and 36 covariance between the analyzed phenotypes (78.4%), and the later GIPs (GIP2-4) explain 37 progressively less. We identified and replicated five loci associated with GIP1 and one locus associated 38 with GIP2. The genes decisively prioritized for the GIP1-associated loci were SLC39A8, ECM1, and 39 FOXP2. For the remaining two GIP1-associated loci, we proposed several candidates (AMIGO3, BSN, 40 RBM6, FAM212A, RNF123, UBA7 and MIR7114, NSMF, NOXA1, GRIN1), but were unable to 41 prioritize any of them convincingly. The most likely causal gene in the locus associated with GIP2 was 42 GDF5. For GIP1, gene set/tissue/cell type enrichment analyses identified multiple terms related to the 43 nervous system. Genetic correlations analysis revealed a genetic overlap between GIP1 and 44 osteoarthritis as well as a set of anthropometric (such as overweight and waist circumference), 45 sociodemographic (such as age of first birth and college completion) and psychiatric/personality (such 46 as depressive symptoms and neuroticism) traits. We suggest that GIP1 represents a biopsychological 47 Сhronic musculoskeletal pain shared heredity, 2019

Study sample and phenotype definition 111
The study sample comprised UK Biobank participants [28]. Sociodemographic,physical,112 lifestyle, and health-related characteristics of this cohort have been reported elsewhere [29]. In brief, 113 individuals enrolled in the UK Biobank study were aged 40-69 years; were less likely to be obese, to 114 smoke, to drink alcohol; had fewer self-reported health conditions as compared to the general 115 population. All study participants provided written informed consent, and the study was approved by 116 the North West Multi-Centre for Research Ethics Committee (11/NW/0382). 117 This particular study was approved by the UK Biobank research team under project #18219. 118 Cases and controls were defined based on questionnaire responses. First, participants responded to 119 "Pain type(s) experienced in the last months" followed by questions inquiring if the specific pain had 120 been present for more than 3 months. Those who reported back, neck or shoulder, hip, or knee pain 121 lasting more than 3 months were considered chronic back, neck/shoulder, hip, and knee pain cases, 122 respectively. Participants reporting no such pain lasting longer than 3 months were considered controls 123 (regardless of whether they had another regional chronic pain, such as abdominal pain, or not). 124 Individuals who preferred not to answer or reported more than 3 months of pain all over the body were 125 excluded from the study (since these subjects met criteria for chronic widespread pain and were thought 126 likely to have an underlying generalized propensity to pain). Further details are given in Supplementary  127 Methods. 128 Overall, 456,580 individuals with imputed genotype data and phenotype data were included in 129 the present study. Of these, 265,000 participants of European ancestry (defined by SNP-based principal 130 component analysis) were randomly selected to provide the GWAS discovery cohort. The decision to 131 include only Europeans was based solely on the highest representation of these individuals among the 132 UK Biobank participants. The replication cohort (N = 191,580) comprised individuals of African (N = 133 7,541) and South Asian ancestry (Indian, Pakistani, and Bangladeshi; N = 9,208) as well as the 134 remaining European ancestry participants (N = 174,831). Descriptive characteristics of the groups is 135 provided in Table S1. 136

Genotyping and imputation 137
Genotyping and imputation data were obtained from the UK Biobank March 2018 data release. 138 Genotyping was conducted using the Affymetrix UK BiLEVE and Affymetrix UK Biobank Axiom 139 arrays. Imputation was performed with the IMPUTE4 program (https://jmarchini.org/impute-4/) [ analysis of the genetic data, genotype quality, properties of population structure and relatedness of the 143 genetic data, and efficient phasing and genotype imputation have been reported previously [33]. 144 was applied to obtain phenotypically independent phenotypes, not GIPs. In both cases heritability of 173 obtained principal components was not less than heritability of original traits. 174

Genome-wide association study
The matrix of genetic covariances (estimated by LD Score regression [38]) and orthogonal 175 transformation coefficients were obtained using the discovery cohort of European ancestry individuals. 176 The 95% confidence intervals of these coefficients were estimated via the Monte Carlo sampling. For 177 each resulting "discovery" GIP, GWAS results were calculated as described in Supplementary  178 Methods. 179 GIPs for replication datasets were constructed using the orthogonal transformation coefficients 180 obtained at the discovery step. GWAS results for each "replication" GIP were combined by a meta-181 analysis. Furthermore, GWAS for GIPs for European ancestry replication cohort (N = 439,831 in total) 182 were meta-analyzed with GWAS for discovery GIPs, and the results were used for subsequent post-183 GWAS in silico analyses. Meta-analyses were conducted using the inverse-  genetic correlation analysis  190  impossible. GIP1 for six pain phenotypes was constructed for the discovery and European ancestry  191 replication cohort, and GWAS results for these cohorts were meta-analyzed. GIP1 for six pain 192 phenotypes was included in the analysis of genetic correlation with GIP1 for four pain phenotypes. 193 Сhronic musculoskeletal pain shared heredity, 2019

Conditional analysis 194
Conditional and joint (COJO) analysis was carried out as previously described [40]. Calculations 195 were performed using the GCTA software [41]. Linkage disequilibrium (LD) matrix was computed 196 with PLINK 1.9 software (https://www.cog-genomics.org/plink2) using 100,000 individuals randomly 197 selected from the discovery cohort. We claimed one independent signal per locus if no polymorphism 198 other than the lead SNP passed the significance threshold of P = 5e-08. Regional association plots were 199 generated using LocusZoom (http://locuszoom.org/) for regions within ±250 kb from the lead SNP. 200

Prediction of SNP effects 201
We analyzed the functional effects of a set of SNPs and indels in high LD (r 2 > 0.8) with 202 replicated variants. LD was calculated using PLINK 1. 9 [42] (--show-tags option) and genotype data 203 for 503 European ancestry individuals (1000 Genomes phase 3 version 5 data). Additionally, we 204 selected SNPs within replicated regions (± 250 kb from lead SNPs) associated with GIPs at P ≤ T, 205 where log10(T) = log10(Pmin) + 1, and Pmin is a P-value for the strongest association per locus. These 206 SNPs were added in the analysis since genotype data for the UK Biobank samples were imputed using 207 the methods, predictions of variant effects were made according to scores ranging from 0 to 1, with scores 211 above 0.5 predicted to be deleterious while those below 0.5 predicted to be neutral or benign. 212

DEPICT and FUMA analyses 213
Gene set and tissue/cell type enrichment analyses and gene prioritization were performed using 214 the Data-driven Expression Prioritized Integration for Complex Traits (DEPICT) tool [46]. We 215 employed the DEPICT software version 1.1, release 194 with default parameters 216 (https://data.broadinstitute.org/mpg/depict/). Tests were conducted for both genome-wide significant 217 SNPs (P < 5e-08) and for SNPs associated with GIPs at P < 1e-05. The MHC region was omitted. The 218 significance threshold for DEPICT analyses was set at FDR < 0.05. 219 Gene set and tissue enrichment analyses were also performed using the FUMA (Functional 220 Mapping probes, and tissues). In complex traits analysis, the significance threshold for SMR was set at P = 258 3.71e-06 (0.05/(6*2,244), where 6 is the number of loci, and 2,244 is the number of non-pain traits). 259 The significance threshold for HEIDI tests in both analyses was set at P = 0.01 (P < 0.01 corresponds 260 to the rejection of pleiotropy hypothesis). Details of data processing are given in Supplementary  261 Methods. 262

Genetic correlations and heritability 263
SNP-captured heritability (h 2 ) and genetic correlations between GIPs and human complex traits 264 were estimated using LD Score regression [38]. In total, we examined 209 non-UK Biobank traits 265 available in the LD hub database (http://ldsc.broadinstitute.org/ldhub/). We removed duplicates and 266 included only the most recent study for each trait (as indicated by the largest PubMed ID number). 267 Since osteoarthritis was not present in the LD hub database, we used summary statistics for this trait 268 obtained from the Michigan PheWeb database (http://pheweb.sph.umich.edu/SAIGE-269 UKB/pheno/740). The statistical significance threshold was set at 5.95e-05 (0.05/(210*4), where 210 270 is the number of traits and 4 is the number of GIPs). 271 Genetic correlations between GIPs and LD hub traits were calculated using the LD hub web 272 interface. Genetic correlations between GIPs, osteoarthritis and chronic pain traits were calculated 273 using the GWAS-MAP platform. 274 For 39 LD hub traits showing statistically significant correlations with GIP1 as well as for 275 osteoarthritis, four chronic pain traits and four GIPs, matrices of genetic correlation were generated. 276 Clustering and visualization were performed by the "corrplot" package for the R language (basic 277 Сhronic musculoskeletal pain shared heredity, 2019 "hclust" function). For clustering, we estimated squared Euclidean distances by subtracting absolute 278 values of genetic correlation from 1 and used the Ward's clustering method. 279 Additionally, we estimated the genetic correlation between GIP1 for four analyzed chronic pain 280 traits and the first GIP constructed using the same methodology for six chronic pain traits (back,  281 neck/shoulder, knee, hip, stomach/abdominal pain, and headache) using the GWAS-MAP platform. 282 Results 283

Overview of the study design 284
Our study was designed to investigate the genetic components underlying chronic 285 musculoskeletal pain at four locations: back, neck/shoulder, hip, and knee ( Figure 1). Individuals who 286 reported more than 3 months of pain all over the body were not included in the present study. All 287 studied pain phenotypes were found to have statistically significant SNP-based heritability (2-4% on 288 the observed scale, Table S2) and to be genetically correlated with each other (Figure 2c). 289 Using the matrix of genetic covariances between the studied chronic pain traits as estimated from 290 the discovery cohort, we constructed four genetically independent pain phenotypes (GIP1 to GIP4) in 291 the discovery and replication cohorts. GIP1, explaining most of the genetic variance and covariance 292 between the studied pain traits, was of foremost interest in the present research. Nevertheless, we also 293 considered the remaining GIPs, which are genetically independent contributors to chronic pain at the 294 four studied sites. 295 For each GIP, GWAS results were obtained. Associations reaching the genome-wide significance 296 threshold in the discovery cohort were considered replicated if the Bonferroni-corrected significance 297 threshold was reached in the meta-analysis of replication cohorts. For replicated loci, gene 298 prioritization was performed using several approaches. We conducted a functional bioinformatics 299 analysis searching for relevant gene sets and tissues (DEPICT/FUMA analyses), analyzed pleiotropic 300 effects (SMR/HEIDI analysis) and investigated genetic correlations with other complex traits. In silico 301 functional analysis was performed using the cohort of European ancestry individuals since this 302 subsample was the largest.

Genetically independent phenotypes 318
The four original chronic musculoskeletal pain phenotypes were converted into GIPs using the 319 coefficients of orthogonal transformation generated in the principal component analysis based on the 320 matrix of genetic covariances. Coefficients of orthogonal transformation represent contribution of each 321 pain phenotype on each GIP, while genetic variance explained by GIPs approximates contribution of 322 each GIP to each pain phenotype. A graphical representation of orthogonal transformation coefficients, 323 as well as the genetic variance of chronic musculoskeletal pain phenotypes explained by each GIP, is 324 shown in Figures 2a and 2b, respectively. 325 The contributions of all pain phenotypes to GIP1 had the same direction and approximately the 326 same magnitude. GIP1 showed the best stability based on the narrow 95% confidence intervals of 327 orthogonal transformation coefficients. As expected, GIP1 explained the largest proportion of genetic 328 variance (78.4%) of the four investigated musculoskeletal pain traits (the formula for calculating this 329 value is provided in Supplementary Methods, page 9). SNP-based heritability of GIP1 was 7% and was 330 found to be substantially larger than the heritability of the four individual pain phenotypes (2-4%, 331 Figure 2c). 332 333

GWAS for genetically independent phenotypes 344
At the discovery stage, 9 loci passed the study-level threshold of statistical significance set at 345 P < 1.3e-08 (5e-08/4, where 4 is the number of GIPs) after correction for the LD Score regression 346 intercept (1.016 for GIP1, 1.001 for GIP2, 1.013 for GIP3, and 1.021 for GIP4). Six of the loci were 347 associated with GIP1, and three with GIP2 (Table 1). Conditional and joint analysis showed single 348 association signals per locus (Table S3). Manhattan plots of -log10(P) are given in Figure S1, quantile-349 quantile plots in Figure S2, and regional association plots in Figure S3. 350 Associations of six loci (five associated with GIP1 and one with GIP2) were replicated at 351 P < 5.6e-03 (0.05/9, where 9 is the number of loci identified in the discovery stage). Full results of 352 associations with each GIP and studied chronic musculoskeletal pain phenotype are provided in Table  353 S4. 354 Сhronic musculoskeletal pain shared heredity, 2019 Two of the six replicated loci showed genome-wide significant associations with chronic pain at 355 specific location in the discovery cohort (P < 5e-08, Table S4). These included the GIP1-associated 356 locus near the EXD3 gene (tagged by rs73581580 and associated with chronic back pain) and the GIP2-357 associated locus near the GDF5 gene (tagged by rs143384 and associated with chronic knee pain). In 358 the meta-analysis of European ancestry discovery and replication cohorts, two additional loci reached 359 a genome-wide significance for association with pain at specific location: the GIP1-associated locus 360 near the SLC39A8 gene (tagged by rs13107325 and associated with chronic neck/shoulder pain) and 361 the GIP1-associated locus near the ECM1 gene (tagged by rs3737240 and associated with chronic hip 362 pain) (Table S4). 363 Functional effects of SNPs rs13107325, rs3737240, and rs143384 and/or their associations with 364 complex traits and diseases have been described previously (Table S5). In brief, a missense 365 polymorphism rs13107325 in the divalent cation transporter gene SLC39A8 is one of the most 366 pleiotropic variants in the human genome, associated with multiple traits including spine conditions 367 (  [73,74]. In our study, association of rs143383 with GIP2 had the same magnitude 383 of effect as that of rs143384, and also passed the study-level statistical significance threshold 384 (discovery cohort: P = 8.53e-12 after correction for residual inflation). It should be noted that both 385 variant alleles (rs143384 T and rs143383 T) were inversely associated with GIP2, consistent with the 386 negative coefficient of knee pain phenotype observed for GIP2 (Figure 2a). 387 388 389 Сhronic musculoskeletal pain shared heredity, 2019 better explain their involvement in chronic musculoskeletal pain. The list of genes in the studied 404 regions was based on regional association plots ( Figure S3) and is given in Table S4. Summary 405 information on the genes that we considered most likely to be causal (literature data with references 406 to corresponding sources) is provided in gene, rs3737240 and rs13294 in the ECM1 gene, and rs79140116 in the EXD3 gene. SIFT and 420 PolyPhen tools predicted possibly damaging/deleterious effects only for rs13107325 and rs13294, 421 while the remaining SNPs were designated as benign/tolerated (Table S7a). Polymorphism 422 rs13107325 is a triallelic SNP (C>T, A), and possibly damaging effects were predicted for both minor 423 alleles T and A. Allele A is extremely rare and was not analyzed in the present study. Allele T was 424 pain-predisposing (and positively associated with GIP1). Polymorphism rs13294 is also a triallelic 425 SNP (G>A, T) and the extremely rare allele T was not covered by our GWAS. SIFT    results only for GIP1 (Table S9c-f). For SNP sets associated with GIP1 with P < 5e-08, tissue/cell 484 type enrichment with FDR < 0.05 was found for two terms: the "Neural Stem Cells" cell type and 485 "Retina" tissue. However, relaxing the significance threshold of input SNPs to P < 1e-05 led to 486 identification of 24 additional tissues, all of which were related to CNS. The same pattern was 487 observed for gene set enrichment (for SNPs with P < 1e-05), revealing 462 terms mainly involved in 488 nervous system function, development and morphology (e.g. "regulation of nervous system 489 development", "axonogenesis", "synapse", "regulation of transmission of nerve impulse"). 490 FUMA gene set and tissue enrichment analyses for GIP1 detected 9 gene categories (6 of them 491 were nervous system-related) and 12 brain tissues, respectively (Table S10, Figure S4). For GIP2 and 492 GIP3, a total of three gene sets were found by FUMA analysis, although we considered them as non-493 specific (e.g. "nikolsky_breast_cancer_20q11_amplicon"; Table S10). No statistically significant 494 gene sets were revealed for GIP4, and no statistically significant tissue types were identified for GIP2, 495 GIP3, and GIP4. 496

Pleiotropic effects on complex traits 497
Five out of six replicated loci demonstrated pleiotropic effects on human complex traits in the 498 SMR/HEIDI analysis (Table S11, Figure 3). As expected, the GIP1-associated locus rs13107325 499 (known as one of the most pleiotropic variants of the genome) was associated with the greatest 500 number of diverse phenotypes, which included anthropometric traits (weight, height, and BMI), fluid 501 intelligence score, prospective memory and education, sleep duration, Crohn's disease, self-reported 502 osteoarthritis, diastolic blood pressure, blood cell traits, and alcohol intake frequency. Traits linked 503 Сhronic musculoskeletal pain shared heredity, 2019 with the GIP2-associated locus rs143384 were mainly related to anthropometry and knee-related 504 conditions (gonarthrosis and internal derangement of knee). The locus tagged by the missense SNP 505 rs3737240 (ECM1 gene) showed pleiotropic effects on platelet count and plasma level of extracellular 506 matrix protein 1 (ECM1) measured with the SOMAscan platform [98]. The same pain-promoting 507 allele in this locus that was positively associated with GIP1 was linked to an increase in ECM1 level, 508 reinforcing the role of ECM1 as the candidate in this region. In the locus tagged by rs73581580, GIP1-509 associated alleles were linked to higher frequency of tiredness and difficulty of getting up in the 510 morning. In the locus tagged by rs7628207, GIP1-associated variants were related to decreased 511 plasma level of thioredoxin domain-containing protein 12 (TXNDC12), decreased overall health 512 rating, decreased age at first live birth, decreased educational attainment, increased basal metabolic 513 rate, and increased hip circumference. Interestingly, rs7628207 is adjacent to the AMIGO3 gene 514 prioritized by us based on the literature data ( rs143384, we can speculate that this could be due to the limited statistical power of the analysis. The 520 SMR test P-values for these loci were quite low, although did not reach the Bonferroni-corrected 521 significance threshold of P = 3.71e-06 (rs13107325: PSMR = 1.14e-05, betaSMR = 0.63; rs3737240: 522 PSMR = 1.68e-05, betaSMR = 0.89; rs143384: PSMR = 6.13e-04, betaSMR = -0.40; PHEIDI ≥ 0.01 for all 523 these loci). Thus, we cannot rule out a hypothesis that the same causal SNPs within the loci tagged 524 by rs13107325 and rs3737240 may be associated with GIP1 and the increased risk of osteoarthritis, 525 and the same causal SNPs within the locus tagged by rs143384 can be associated with GIP2 and the 526 decreased risk of osteoarthritis. 527 528 Сhronic musculoskeletal pain shared heredity, 2019 529 530 Figure 3. Pleiotropic effects of identified loci on human complex traits. 531 Color depicts the sign and the magnitude of SMR beta coefficient. Negative sign (red) means opposed effects on the corresponding GIP and the trait, 532 and positive sign (blue) means the same direction of effect. |beta SMR| > 4 are depicted as |beta SMR| = 4. For "Prospective memory result" and "Overall 533 health rating" trait, high scores correspond to poor performance. For "Getting up in morning" trait, high score corresponds to easy getting up. Traits that 534 passed both SMR and HEIDI tests (PSMR < 3.71e-06 and PHEIDI ≥ 0.01) are marked with an asterisk. Data on 45 out of 78 revealed traits are not shown. 535 Full results are given in Table S11. GIPs associated with the loci and genes nearest to lead SNPs are indicated in parentheses. Dendrograms represent 536 clustering based on complete linkage hierarchical clustering method. 537 Сhronic musculoskeletal pain shared heredity, 2019

Genetic correlations between GIPs and complex traits 538
GIP1 showed statistically significant genetic correlations with 40 complex traits (Table S12a, 539 Figure 4). Among them, 11 traits were directly linked to excess weight (BMI, overweight, obesity, 540 waist circumference), that is in line with known epidemiological associations between chronic pain 541 and obesity-related traits [100]. Five more traits fell in the same cluster: HDL cholesterol (negative 542 correlation with GIP1), triglycerides, HOMA-IR, leptin, and fasting insulin. Strong genetic 543 correlations (|rg| ranging between 0.31 and 0.54) were also revealed between GIP1 and the cluster of 544 psychiatric/personality traits (major depressive disorder, depressive symptoms, subjective well-545 being, and neuroticism). This finding is in accord with previous twin and family studies 546 demonstrating a common genetic background for pain and depression [101-103]. Other traits included 547 sociodemographic, reproductive, education-related and smoking-related traits, osteoarthritis, 548 rheumatoid arthritis, coronary artery disease, and sleep duration. 549 Traits that displayed the strongest genetic correlations with GIP1 were osteoarthritis (rg = 0.65), 550 age of first birth (rg = -0.56), depressive symptoms (rg = 0.54), and college completion (rg = 0.54). 551 Overall, the pattern of genetic correlations with GIP1 was very similar to that observed for back pain 552 in our previous study [104]. GIP2 was genetically correlated only with osteoarthritis (inverse genetic 553 correlation, rg = -0.30) and obesity-related traits, and GIP4 only with hip circumference. No 554 statistically significant genetic correlations with complex traits were found for GIP3 (Table S12 b-d). 555 Furthermore, we analyzed the genetic correlation between GIP1 and the first GIP constructed 556 using the same methodology for a broader range of chronic pain traits (back, neck/shoulder, knee, 557 hip, stomach/abdominal pain, and headache). We found out that these GIPs were almost genetically 558 equivalent (rg = 0.99). 559 560 Сhronic musculoskeletal pain shared heredity, 2019 561 Figure 4. Matrix of genetic correlations between GIP1 and human complex traits. 562 Color depicts the sign and absolute value of the genetic correlation coefficients (rg). Genetic 563 correlations between GIP1 and all presented traits were statistically significant (P < 5.98e-05). 564 Osteoarthritis is not shown on this plot since genetic correlations analysis for this trait was performed 565 using the GWAS-MAP platform, whereas for other traits, LD hub web interface was used. The genetic control of chronic musculoskeletal pain is complex, with each of very many genetic 572 variants contributing a small effect. As a result, even very large genome-wide association studies 573 provide only a limited number of replicated loci and rather low SNP-based heritability. Evidence 574 from recent studies indicates that pain at different anatomical sites shares a common genetic 575 component [24][25][26]. This suggests that combining several pain phenotypes in a single analytical 576 framework may facilitate the discovery of common genetic factorschronic musculoskeletal pain 577 genes and pathways. 578 In the present study, we applied an approach that allowed us to detect genes shared between 579 four common chronic musculoskeletal pains: back, neck/shoulder, knee and hip. Our approach relies 580 on capturing heredity of a set of genetically correlated traits via constructing genetically independent 581 phenotypes (GIPs) (Figure 2a). The GIPs are defined as a weighted sum of the original phenotypes, 582 with weights selected in such a way that the first GIP (GIP1) explains most genetic variance of and 583 covariance between the studied traits, with the later GIPs (GIP2-4) explaining progressively less. The 584 four weights defining GIP1 based on the four chronic pain traits (back, neck/shoulder, hip, and knee 585 pain) turned out to be approximately the same (Figure 1a). This means that GIP1, the genetic 586 component explaining most of the cases of chronic musculoskeletal pain at the studied sites, affects 587 the risk of chronic musculoskeletal pain to approximately the same degree, irrespective of pain's 588 location. Unlike the first GIP, the second GIP is site-specific and reflects a genetic propensity for 589 knee pain, but not the back or neck/shoulder pain. 590 We mapped and replicated six genomic loci (five associated with GIP1 and one with GIP2). 591 Importantly, in the discovery sample, only two out of six replicated loci were genome-wide 592 significantly associated with the individual pain phenotypes: rs73581580 with chronic back pain and 593 rs143384 with chronic knee pain. Also, as expected, the SNP-based heritability of GIP1 was 594 substantially higher than for any of separate pain traits (7% vs 2-4%). These results highlight the 595 improved power of the GIP approach for identifying genetic predictors of chronic pain predisposition. 596 It should be noted that phenotypic correlations between the traits were much lower than the genetic 597 correlations (Figure 2c,  Among the six replicated loci, three were well-studied polymorphisms associated with different 605 traits and conditions in previous works (rs13107325, rs3737240 and rs143384, Table S5). In the 606 present study, we performed a hypothesis-free analysis of pleiotropic effects of six GIP-associated 607 loci on 2,243 complex human traits. Our analysis revealed 78 phenotypes influenced by the same 608 causal polymorphisms that are associated with GIPs (Table S11, Figure 3). These phenotypes 609 included a broad variety of anthropometric, sociodemographic, behavior and personality traits, 610 diseases (such as Crohn's disease, gonarthrosis and osteoarthritis), and laboratory parameters. 611 Interestingly, GIP1-associated alleles in the locus tagged by rs73581580 were also associated with 612 higher frequency of tiredness and difficulty of getting up in the morning. Our results demonstrate 613 Сhronic musculoskeletal pain shared heredity, 2019 diversity of effects of the GIP-associated loci and suggest the presence of common pathways 614 underlying chronic musculoskeletal pain and multiple other human traits. 615 GIP1-associated pathways and tissues were mostly related to CNS development and 616 functioning, suggesting that GIP1 depicts neurological and psychological components of chronic 617 pain. Consistent with this, one of the genes prioritized for GIP1-associated loci based on multiple 618 lines of evidence was FOXP2, whose product is a transcription factor expressed in fetal and adult 619 brain and required for the development of speech and language regions [95,96]. Involvement of 620 psychological component in chronic pain was additionally supported by the finding of a very strong 621 positive genetic correlation between GIP1 and depressive symptoms. Having said that, it is equally 622 important that GIP1 was associated also with traits reflecting general health and risk factors for 623 musculoskeletal pain: sociodemographic, reproductive, education-and smoking-related traits, and 624 sleep duration. Importance of morphological factors for chronic musculoskeletal pain was also 625 demonstrated by revealing of GIP1-associated genes SLC39A8 and ECM1, which are known to be 626 implicated in the development and functioning of the musculoskeletal system. ECM1 gene encodes a 627 negative regulator of bone mineralization and chondrogenesis [61-63]. GIP1-associated ("pain-628 promoting") variant in this gene showed an association with the increased level of ECM1 protein in 629 our SMR/HEIDI analysis. GIP1-associated ECM1 allele rs3737240 C is in a high LD (r 2 = 0.94 in 630 European ancestry populations) with the allele rs12040949 C, which was associated with the 631 increased risk of hip osteoarthritis in a recent study [58]. Since our study was aimed at investigating chronic musculoskeletal pains at anatomical sites 639 commonly affected by osteoarthritis, it was not surprising that we found loci and genes associated 640 with this condition and found high genetic correlation between osteoarthritis and GIP1 (rg = 0.65). 641 Note, that this genetic correlation is similar in magnitude to correlation between GIP1 and age of first 642 birth (-0.56), indicating that although similarities are high, there exist substantial differences between 643 OA and GIP1. Furthermore, for GIP1, gene/tissue enrichment analysis revealed a plethora of CNS-644 related terms. In a recent large-scale genetic study for OA, enriched terms were not directly linked to 645 the nervous system ("anatomical structure morphogenesis", "ion channel transport", "histidine 646 metabolism", etc.) [58]. Finally, we performed genetically independent phenotype analysis for the 647 extended set of chronic traits, which include not only musculoskeletal pain (six traits: back, 648 neck/shoulder, knee, hip pain as well as stomach/abdominal pain and headache). Genetic correlation 649 between GIP1 for four pain traits and GIP1 for six pain traits was extremely high (rg = 0.99) providing 650 strong evidence that, despite high genetic overlap with OA, GIP1 for musculoskeletal pain likely 651 reflects chronic pain per se. 652 It is noteworthy that pain is the main symptom and clinical outcome of osteoarthritis. In the UK 653 Biobank study which provided GWAS summary statistics for OA [55], phenotypes were defined 654 according to ICD-9/ICD-10 codes (electronic medical record data), so whether the study participants 655 were examined radiographically or not is unknown. Thus, genetic overlap between GIP1 and OA can 656 Сhronic musculoskeletal pain shared heredity, 2019 be actually biased by a genetic correlation between GIP1 and not the OA, but pain in OA. Besides 657 this, a study by Valdes et al. [107] obtained interesting results on the inverse relationship between 658 preoperative radiographic severity and postoperative pain in OA patients who have undergone total 659 joint replacement (TJR). The authors hypothesized that in OA patients with low preoperative  660  radiographic damage, pain leading to TJR can be caused not entirely by a joint damage, but also by  661 other factors such as central sensitization. It is possible that these factors have common genetic 662 background with GIP1 constructed in our study. 663 Given that GIP1 essentially contrasts chronic musculoskeletal pain (in general) with an 664 unpainful state, the other GIPs might be expected to account for musculoskeletal pain at specific 665 anatomical locations. This was indeed the case with GIP2, which had the greatest impact on knee 666 pain (Figure 2b) pain conditions (such as back and knee pain) and the genetic predictors of non-musculoskeletal pain 679 conditions that may include substantial components of pain due to other causes, such as migraine (in 680 the case of headache), dental or neuropathic pain (in the case of facial pain), or visceral pain (in the 681 case of stomach/abdominal pain). Such equivalence may be too strong an assumption to make without 682 empirical justification. Our approach is empirical, with definition of GIPs driven by the data; another 683 strength of our approach is its ability to reveal pain type specific genetic loci as exemplified by GDP5 684 associated with GIP2 representing knee pain. Comparing with a direct knee pain GWAS, GIP2 may 685 provide a more knee-specific phenotype from which general propensity to pain is subtracted. highlighting the need to replicate these findings in independent cohorts. 696 Our study has limitations. The first general limitation is related to a questionnaire-based 697 approach to phenotyping, which may lead to heterogeneous pain phenotypes. Our methods attempted 698 to overcome this by constructing genetically independent phenotypes whose genetic basis 699 Сhronic musculoskeletal pain shared heredity, 2019  Table S4. Top loci associated with GIPs at a study-level threshold of statistical significance (P < 1.25e-08). 746 Table S5. Literature data on well-studied SNPs associated with GIP1 and GIP2.