Genome-wide interaction and pathway-based identification of key regulators in multiple myeloma

Inherited genetic susceptibility to multiple myeloma has been investigated in a number of studies. Although 23 individual risk loci have been identified, much of the genetic heritability remains unknown. Here we carried out genome-wide interaction analyses on two European cohorts accounting for 3,999 cases and 7,266 controls and characterized genetic susceptibility to multiple myeloma with subsequent meta-analysis that discovered 16 unique interacting loci. These risk loci along with previously known variants explain 17% of the heritability in liability scale. The genes associated with the interacting loci were found to be enriched in transforming growth factor beta signaling and circadian rhythm regulation pathways suggesting immunoglobulin trait modulation, TH17 cell differentiation and bone morphogenesis as mechanistic links between the predisposition markers and intrinsic multiple myeloma biology. Further tissue/cell-type enrichment analysis associated the discovered genes with hemic-immune system tissue types and immune-related cell types indicating overall involvement in immune response.

M ultiple myeloma is the second most prevalent hematological malignancy with almost 31,000 estimated new diagnoses in the United States in 2018 1 . Multiple myeloma, a B-cell neoplasm, is characterized by proliferation of clonal plasma cells in bone marrow. Familial aggregation of multiple myeloma suggests predisposition due to inherited genetic variation 2,3 . Susceptibility to multiple myeloma and its genetic relationship with the related diseases, monoclonal gammopathy of unknown significance (MGUS), and amyloid light chain (AL) amyloidosis, have lately been established through genome-wide association studies (GWASs) [4][5][6] . Although a total of 23 risk loci have been discovered predisposing to multiple myeloma, they are estimated to explain only about 16% of the heritability 5,7 . Moreover, genetic heterogeneity among multiple myeloma tumors bears complication in characterization of genetic susceptibility to multiple myeloma and in understanding of clinical consequences 8,9 . In addition to the linear association analysis, we have recently identified several inherited risk loci predisposing to MGUS through genome-wide genetic interaction 10 . To gain ample insight into genetic predisposition of multiple myeloma, we performed here the first genome-wide interaction study using two patient cohorts comprising a total of 3999 cases and 7266 controls. We extended the investigation with a subsequent metaanalysis of the two cohorts to increase the statistical power of detection. We also examined enrichment of expression of the identified genes in several tissue and cell types. Additionally, we performed gene set enrichment and pathway analyses to confer a biological understanding to our investigation. Collectively, our analyses support the hypothesis that genetic interaction plays a crucial role in multiple myeloma predisposition. The sentinel genes thus discovered are often expressed in tissues and cell lineages of hematopoietic system responsible for immunemodulation and they also influence inherited susceptibility to multiple myeloma through regulation of circadian rhythm and Smad-dependent TGFβ pathways.

Results
Interacting chromosomal loci. Two quality controlled sets of genotyped data consisting 2282 cases and 5197 controls from the UK and 1717 cases and 2069 controls from Germany were subjected to pairwise interaction analysis accounting for 0.43 million and 0.52 million single-nucleotide polymorphisms (SNPs), respectively. Meta-analysis of associative linear interaction on transformed correlation statistics rendered 16 unique SNP pairs belonging to 16 exclusive chromosomal regions reaching genomewide threshold of 5.0 × 10 −10 ( Fig. 1 and Supplementary Data 1).
Biological inference of the interacting chromosomal loci. Many of the risk SNPs identified, although showing promising genotypic interactions, are mapped to non-coding regions of the genome and possibly contribute to multiple myeloma etiology by affecting gene expression 11 . In order to gain biological understanding of the newly identified interacting risk loci, we interrogated expression quantitative trait locus (eQTL) data generated on malignant plasma cells obtained from patients of the German multiple myeloma trials. Strongest eQTL signals were observed by rs2167453 at 11p15.2 for cytochrome P450, family 2, subfamily R, polypeptide 1 (CYP2R1) and by rs923934 at 3q29 for family with sequence similarity 43, member A (FAM43A), both withP eQTL ¼ 4:40 10 À5 (Table 1). Also the interacting partners of these SNPs served as eQTLs with a moderate signal, rs2734459 for CLASRP, ZNF224, and APOE and rs13201167 for AKAP12 and C6orf211.
Summary-data-based Mendelian randomization (SMR) was employed to analyze pleiotropic effects between the GWAS signal and the cis-eQTL for genes residing within 1 Mb window of the interacting SNP loci to identify causal relationship between variants and disease phenotype via instrumentation of gene regulation 12 . The strongest pleiotropic signal was observed at 4p15.33 by rs17362130 for RAS oncogene family member 28, RAB28 P SMR ¼ 4:84 10 À3 ð Þand at 6p25.2 by rs6918808 for receptor (TNFRSF)-interacting serine/threonine kinase 1, RIPK1 (P SMR ¼ 5:04 10 À3 , Table 1 and Fig. 2), respectively. Oncogenic ras family members are frequently mutated in multiple myeloma 13,14 . RIPK1 interacts with RIPK3 to activate the necrosome complex that is responsible for instigation of several death receptors, which can induce apoptosis, necroptosis, or cell proliferation 15 . rs17362130 is also an eQTL for NKX3-2 with a moderate signal P eQTL ¼ 2:11 10 À3 À Á and rs6918808 for SERPINB9. NKX3-2 is involved in skeletal development 15 . SERPINB9 is a known inhibitor of granzyme and may mediate tumor immune evasion by apoptosis inhibition 16,17 .
We investigated shared biological and information driven connections between the genes annotated to the variants by creating a genetic network. Unique annotations from the 16 interaction-identified variants along with the SMR identified causally related genes were subjected to network enrichment and a single batch of first order interacting genes based on data-mined enrichment index were additionally added to increase confidence of network associations (Fig. 3).
We applied Data-Driven Expression-Prioritized Integration for Complex Traits (DEPICT) for in silico analyses of enrichment of expression of genes annotated to the associated loci in tissues and cell types. To this end interaction-identified SNPs were clustered to 12 unique loci and were tested for significant excess expression of the corresponding genes in 209 Medical Subject Heading (MeSH) annotations against 37,427 microarrays procured in backend. Twenty-seven tissue or cell type annotations were discovered significant at a suggestive level (P < 0.05); 16 of them belonged to the hemic and immune system, two to the musculoskeletal system and one to the stomatognathic system ( Fig. 4a), as well as six cell types related to hematopoietic system ( Fig. 4b and Supplementary Data 2).
Biological inference of the GWAS-identified loci. Next, we investigated functional relationships among the previous GWASidentified loci using the pathway analysis tool PASCAL with the bottom-up approach. To avoid possible complications arising from statistical convergence of the test statistic, we used sum of chi-square statistic to test for functional association against pathways extracted from REACTOME, KEGG, and BIOCARTA libraries (Supplementary Data 3). A total of 12 enriched pathways reached a global threshold of 0.0025 for the combined P-value. Three of the pathways, thus, detected were signaling cascades reflecting the activation status of the SMAD family proteins, as signal transducers for receptors of the cytokine TGFβ represented by SMAD2, SMAD3, SMAD4 heterotrimer regulates transcription, P combined ¼ 5:70 10 À3 ; TGFβ receptor signaling activates SMADs, P combined ¼ 8:60 10 À3 and transcriptional activity of SMAD2, SMAD3, SMAD4 heterotrimer, P combined ¼ 1:49 10 À2 . Additionally, two pathways related to the regulation of circadian rhythms mediated by two nuclear receptor proteins retinoic acid receptor-related orphan receptor A (RORA) and Rev-ErbA were identified: Circadian repression of expression by REV-ERBA, P combined ¼ 5:52 10 À4 (Table 2) and RORA activates circadian expression. P combined ¼ 2:13 10 À3 . Also, modulation of ALK receptor tyrosine kinase activity was indicated with ALK pathway, P combined ¼ 2:82 10 À3 .
Heritability estimation. The previously identified 23 multiple myeloma risk SNPs were shown to account for 15.7% of the GWAS heritability, a relatively small fraction of the estimated 15.6% ( ± 4.7) net heritability explained by all common SNPs 5,7,18 . The identified interacting loci explain an additional 1.3% of the GWAS heritability in the UK cohort (1.5% in the German cohort) in the liability scale, which brings the collectively explained GWAS heritability to a modest 17% ( ± 2.4). However, as the heritability estimates are based on individual SNP associations and do not take into account the interaction term, the scope of interaction-identified heritability remains unanswered.

Discussion
We have performed the first genome-wide interaction study on multiple myeloma to date. We discovered 16 unique multiple myeloma risk locus pairs. Several of the discovered SNPs depicted eQTL effects for nearby genes and they were implicated in the networks and pathways relevant for multiple myeloma biology. We also demonstrated that genes annotated to the loci are highly expressed in tissues and cells of the hemic-immune system. Interferon regulatory factor 8 (IRF8) together with G protein subunit alpha Q (GNAQ) were discovered to be the paired risk loci with highest statistical significance. IRF8 is reported to be a risk locus for immunoglobulin trait modulation, whereas GNAQ is a guanine nucleotide-binding protein that regulates B-cell development and survival 19,20 . IRF8 has many functions in regulating innate immunity and immune cell development, including B-and T-cells, dendritic cells, and myeloid cells 21 . In early development, IRF8 and IRF4 function redundantly to regulate transition of pre-B-cells to maturing B-cells. In the germinal center development, the roles of these IRFs are complementary: IRF8 directs early centroblast development, which is taken over by IRF4 as centrocytes mature into plasma cells. IRF8 induces activation-induced cytidine deaminase, which is a key enzyme catalyzing somatic hypermutations of plasma cells 21 . Similar to IRF4, IRF8 transcriptional activity in multiple myeloma may also be related to differentiation of T helper (T H ) 17 cells, which have a regulatory effect on bone morphogenesis-related onset of multiple myeloma 22 . IRF8 has been reported to act as an intrinsic transcriptional inhibitor of T H 17 cells, at least partly through its physical interaction with retinoic acid receptor-related orphan receptor RORγt 23 . As a confirmation of this signal, we identified enrichment of two circadian rhythm pathways regulated by two nuclear receptors, RORA and Rev-ErbA, which are crucial for the development of T H 17 cells 24 . These findings together with our previous identification of rs4487645 at 7p15.3, as a modulator of IRF4 binding at an enhancer element of the c-Myc-interacting gene CDCA7L in multiple myeloma [25][26][27] , support the role of the genetic variants in IRF8 and its interacting partner in GNAQ in multiple myeloma susceptibility.
Another signaling cascade affecting immunoglobulin trait modulation, T H 17 cell differentiation, and bone morphogenesis is the TGFβ pathway 28 , which was represented by three enriched pathways in our analysis. In multiple myeloma, enhanced bone resorption releases and activates TGFβ, which is a potent inhibitor of osteoblast differentiation and mineralization 29 . Our interaction analysis identified rs2834882 annotated to runt related transcription factor 1 (RUNX1) in interaction with rs2860107 at zinc finger CCHC-type containing 6 (ZCCHC6, alias TUT7). RUNX family transcriptional activities have been linked to TGFβinduced IgA class switching, which is involved in multiple myeloma pathogenesis 19,30 . RUNX proteins also play a major role in cell differentiation, and RUNX1 is specifically regulating hematopoiesis 31 . Germline mutations in RUNX1 cause familial platelet disorder with propensity to myeloid malignancies and somatic loss of RUNX1 function is related to several hematologic malignancies 29,32 . RUNX transcription factors are integral components of signaling pathways enforced by TGFβ family members including bone morphogenic proteins (BMPs). RUNX1 and RUNX2 are known modulators of BMP-2/7/9-induced osteoblast differentiation. RUNX1 along with RUNX2 is often found coexpressed in skeletal elements that regulate expression of BMP-2 and BMP-9. 33 . RUNX2 regulatory activity in osteoblast differentiation is also regulated by transcription factor NKX3-2, whose expression was modulated by the sentinel SNP rs17362130 (Table 1) 16 . Additionally, ZCCHC11 and ZCCHC6 TUTase inhibitors are being investigated as potential agents for targeted therapy 34 .
Contextually in multiple myeloma, TGFβ induces differentiation arrest in osteoblasts, increases osteoclastogenesis, promotes angiogenesis, and suppresses host immunity in bone marrow microenvironment to create the so called multiple myeloma niche, thus enhancing multiple myeloma cell growth and survival 29 . TGFβ-activated transcription factors, SMADs also interact with chromatin binding proteins HDAC1 and HDAC2. HDAC1 is a class I histone deacetylase gene and multiple myeloma patients with high protein levels of HDAC1 were shown to have poor progression-free and overall survival 35 . Moreover, inhibition of HDAC1 is demonstrated to induce multiple myeloma cell death 36 . We noted a significant interaction between a class II HDAC family member, HDAC9 and neural cell adhesion molecule 2, NCAM2. Deregulation of HDAC9 in cells of lymphoid lineage is believed to induce B-cell lymphoproliferative disorders including Waldenström macroglobulinemia and is associated with general poor prognosis in cancer 37,38 . HDAC9 is also hypothesized to be responsible for lymphomagenesis by regulating growth and survival related pathways and by modulating of BCL6 and p53 tumor suppressor activity 38 . In germinal cells, it is shown to be co-expressed with BCL6, a therapeutic target for multiple myeloma 39 . Controlled cell cycle is critical for normal cellular growth, and its deregulation may possibly stimulate carcinogenesis.
HDACs are also shown to have role in transcriptional activity of NKX3-2, one of the eQTL targets of our study. It has been shown that BMP and SMAD signaling modulates the activity of NKX3-2 in a BMP-dependent manner by promoting NKX3-2 binding with SMAD1/4 and HDAC/SIN3A corepressor complex 40 . As HDAC inhibitors in general pose a vital role in cell cycle arrest induction and activation of intrinsic apoptotic mechanism, our observation leads to speculation that a common The recent meta-analyses have pointed out apoptosis and autophagy, B-cell and plasma cell development, cell cycle regulation and genomic stability, chromatin remodeling and immunoglobulin production as the main pathways deregulated by the identified multiple myeloma susceptibility loci 5,7 . We identified causally related genes implicated in apoptosis, such as RIPK1 and SERPINB9. Among the interacting loci we identified genes involved in B-cell development and immunoglobulin production, such as GNAQ and IRF8 and the TGFβ pathway and genes modifying the chromatin state, such as HDAC9. As TGFβ signaling is modified by ubiquitination and deubiquitination 41 , our study also support the importance of ubiquitin-proteasome signaling in multiple myeloma, which was highlighted by the meta-analysis together with the mechanistic target of rapamycin (mTOR) signaling as targets for already approved drugs in multiple myeloma 7 .
In conclusion, our findings provide further evidence that multiple myeloma is a genetically heterogeneous disease with inherited genetic susceptibility loci contributing excess risk via regulation of an assortment of regulatory networks and pathways. The two major signaling cascades we identified, TGFβ signaling through its signal transducers SMADs and circadian rhythm regulation by RORA and Rev-ErbA, integrate immunoglobulin trait modulation, T H 17 cell differentiation, and bone morphogenesis, and may provide a mechanistic link between the predisposition markers and intrinsic multiple myeloma biology.

Methods
Ethics. Patient samples and relevant clinico-pathological information was obtained conditional on written informed consent and was subject to approval of corresponding ethical review boards at respective study centers in accordance with the tenets of Declaration  Genome-wide association studies. Diagnosis of multiple myeloma (ICD-10 C90.0) adhered to the guidelines established by World Health Organization. Samples retrieved from all subjects were either before treatment or at presentation. The UK GWAS 5 consisted of 2282 cases (1755 male (post quality control (QC)) recruited through the UK MRC Myeloma-IX and Myeloma-XI trials (ISRCTN68454111: Myeloma-X http://www.isrctn.com/search? q=ISRCTN68454111 and ISRCTN49407852: Myeloma-XI http://www.isrctn.com/ search?q=ISRCTN49407852). DNA was extracted from EDTA-venous blood samples (90% before chemotherapy) and genotyped using Illumina Human OmniExpress-12 v1.0 arrays (Illumina). Controls were recruited from publicly accessible data generated by the Welcome Trust Case Control Consortium (WTCCC) from the 1958 Birth Cohort (58C; also known as the National Child Development Study) and National Blood Service. The control population comprised of 5197 individuals (2628 male (post QC)). Genotyping of these controls was conducted using Illumina Human 1-2 M-Duo Custom_v1 Array chips (www. wtccc.org.uk).
Analysis of GWAS. Quality control of the GWAS data was performed according to predetermined benchmarks already published 5 . In summary, inclusion of samples was initially liable to successful genotyping of ! 95% of the SNPs. Duplicates, first-degree relatives, and closely related individuals were removed with an identity-based-test (IBS) score! 0:80. Genetic heterogeneity was assessed with principal component analysis using dissimilarity measure calculated with our SNP panel and genome-wide IBS distances in reference to the HapMap samples. In each of the samples, SNPs having a minor allele frequency <0:01 and call rate of <95% were filtered out. SNPs were also excluded subject to departure from Hardy-Weinberg equilibrium at P<1 10 À5 in controls.
Genome-wide interaction study. Analyses were primarily undertaken using PLINK (v1.09), CASSI (v3), METAINTER, and R (v3.4.0) software. The interaction between each SNP pair and the risk of multiple myeloma was assessed with Pearsonian product moment correlation coefficient-based test inspired by Wellek- Table 2 Pathway enrichment analysis with PASCAL detects 12 putative pathways related to multiple myeloma. Combined P-values are obtained with Brown's method. P X denotes P-value obtained from interaction test of set X Ziegler statistics given by the formula 42 : Where r is pearsonian correlation coefficient statistics defined by Wellek and Ziegler 43 . r A and r N , respectively, represent the statistics calculated among cases and controls separately. To this end, we used CASSI software. Genomic resolution of the whole interaction test space was deflated with default predefined control option where all the variants having weaker signal (single marker associationP>1:0 10 À3 ) were excluded 10 . We performed the interaction test in the German and UK cohorts separately and meta-analyzed the results to strengthen the signals from co-occurring interacting pairs. To conduct meta-analysis METAINTER software was employed assuming a fixed effects model. Gamma approximated negative sum of log transformed interaction statistics from each of the two sets were considered as the test statistic for each variant pair and was tested with a weighted Chi-square statistic with four degrees of freedom 44 . Odds ratio and associated 95% confidence intervals were calculated with unconditional logistic regression with independence assumption among each component of SNP pairs.
Expression quantitative trait loci analysis. Investigation of true regulatory effects of the SNPs identified with the interaction study was undertaken by analyzing eQTL data obtained from malignant plasma cells of 665 multiple myeloma patients (389 male) enrolled in the German multiple myeloma trials conducted in Heidelberg University clinic. CD-138 purified plasma cells were used for gene expression profiling using Affymetrix U133 2.0 plus arrays. The expression data was submitted to Gene Expression Omnibus (E-MTAB-2299). All analyses were undertaken with R software. GC-RMA was used to normalize the expression data and genes with transformed log2 expression < 3.5 in at least 95% of the samples were excluded from further consideration. With exclusive consideration of autosomal genes, 9722 genes remained after QC. We investigated the correlative relationship between the identified individual risk SNPs within 1 Mb window (cis-eQTL analysis), which narrowed the candidates to a set of 239 genes. A Holm-Sidák corrected level of significance for discovery was determined at <0:0002 i.e., 1 À ð1 À 0:05Þ 1=239 h i on 239 probes corresponding to all the variants. Robust regression on a transformed Huber function was employed to model the qualitative traits as it warrants higher detection power in moderately contaminated sample 45 .
To avoid singularity of the argument space, variants in high linkage disequilibrium were discarded from consideration.
To extend the investigation of relation between SNP genotype and expression levels of genes and to identify causal candidates rather than mere associative pairings, we adapted SMR analysis as per Zhu et al. 12 . In summary, if we nominate effect size of a differentially expressed gene X on coherence of a phenotype Y to be β XY and consider the SNP genotypes to be the instrumental variable actively regulating both gene expression and the phenotype, then we can linearly estimate β XY by comparative effect-sizes.β XY ¼β ZŶ β ZX Whereβ ZY is the estimated effect size of genetic factor on the phenotype, which is assessed as GWAS effect size andβ ZX is that of the genetic factor of the expression levels of the genes, i.e., the eQTL effect size. We need not distinguish pleiotropic effect from high linkage co-occurrence since the SNPs in linkage disequilibrium demonstrated equal effect size. Thus, reliability of causal genes was tested with the approximated SMR statistic against χ 2 1 .
Network enrichment. A protein-protein interaction confidence network was formulated with STRING (v10.5, 04/18/2018). Interactions between two proteins were calculated based on the likelihood confidence of an edge between the two nodes and was transposed to a scale of 0 to 1 (1 representing high confidence). We built our network with the genes annotated by the interaction-discovered SNPs and eQTL analysis; in addition to that, first batch of first-degree predicted interactive nodes were included given a confidence score > 0.99. Erroneous discovery was restricted at 10%, which rendered a protein-protein interaction network index P < 0.0054 (observed number of interactions were tested against expected number of interactions with chi-square statistic with one degree of freedom).
Pathway enrichment. Initial in silico pathway enrichment was performed with the PASCAL tool interrogating the GWAS obtained summary statistics 46 . To create mapping of genes and single entity gene-fusions with PASCAL, we considered all genes within 20 kb upstream and downstream to an index SNP and fused all the corresponding/flanking genes together when the genes were found affecting same pathway(s). Sum of chi-square statistics with individual one degree of freedom was computed by summing over association statistics corresponding to each pathway. Enrichment scores of individual pathways were subsequently obtained by a test assuming chi-square distribution with degrees of freedom equal to the cardinality of fused gene sets.
Tissue and cell type enrichment. DEPICT was employed to analyze tissue and cell type enrichment that predicts differential regulation of the selected loci on any of the Medical Subject Heading (MeSH) annotations 47 . To this end, 209 such annotations were tested for 37,427 inbuilt backend human microarrays on the Affymetrix HGU133a2.0 array platform. The tissue/cell type enrichment is thus performed on the normalized expression matrix after subjecting it to user selected dimension reduction criteria. SNP pairs discovered with interaction test represented 12 unique mapped regions against which the enrichment was tested, hence we tested against a conservative threshold of significance at negative log transformed P-value of 2.37 correcting for multiple testing, which retains the false discovery rate at < 5% 48 .
Heritability analysis. As hypothesized by earlier studies, heritability estimates of complex diseases with polygenic origin are more robust with lifetime risk compared to population prevalence 49 . Following this notion, lifetime risk of multiple myeloma was assumed (0.007 for UK and 0.006 for German population; https:// www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-bycancer-type/myeloma; https://www.krebsdaten.de/Krebs/EN/Home/ homepage_node.html) to ascertain heritability of multiple myeloma explained by the risk SNPs discovered in the two different cohorts separately. Principal components were included to adjust for inflation as covariates. Genome-wide Complex Trait Analysis was used to estimate the genetic variance ascribable to the identified loci at a liability scale 50,51 .
Reporting summary. Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Data availability
SNP genotyping data that support the findings of this study have been deposited in Gene Expression Omnibus with accession codes GSE21349, GSE19784, and GSE15695; in the European Genome-phenome Archive (EGA) with accession code EGAS00000000001; and in the database of Genotypes and Phenotypes (dbGaP) with accession code phs000207.v1.p1. Expression data that support the findings of this study have been deposited in EMBL-EBI with accession code E-MTAB-2299. The remaining data are contained within the paper and Supplementary Data or are available from the author upon request.