Abstract
Thousands of non-coding variants have been associated with increased risk of human diseases, yet the causal variants and their mechanisms-of-action remain obscure. In an integrative study combining massively parallel reporter assays (MPRA), expression analyses (eQTL, meQTL, PCHiC) and chromatin accessibility analyses in primary cells (caQTL), we investigate 1,039 variants associated with multiple myeloma (MM). We demonstrate that MM susceptibility is mediated by gene-regulatory changes in plasma cells and B-cells, and identify putative causal variants at six risk loci (SMARCD3, WAC, ELL2, CDCA7L, CEP120, and PREX1). Notably, three of these variants co-localize with significant plasma cell caQTLs, signaling the presence of causal activity at these precise genomic positions in an endogenous chromosomal context in vivo. Our results provide a systematic functional dissection of risk loci for a hematologic malignancy.
Similar content being viewed by others
Introduction
Genome-wide association studies (GWAS) have identified tens of thousands of sequence variants associated with human diseases and traits1, yet our understanding of the underlying mechanisms is still limited. Each association signal is usually represented by tens to hundreds of variants in linkage disequilibrium (LD). The vast majority of these variants map to noncoding regions of the genome, and likely act by altering gene expression2,3,4. For most signals, however, the causal variants, their target genes, and target cell types remain unknown.
Multiple myeloma (MM) is defined by uncontrolled, clonal growth of plasma cells, usually in the bone marrow. It is a common blood malignancy, with strong epidemiological support for inherited susceptibility5. While genome-wide association studies have identified 24 risk loci6,7,8,9,10,11, the causal variants remain largely unknown12,13. Further, plasma cells can be readily isolated from MM patients using routine methods, and cell lines appropriate for the investigation of MM biology exist. For these reasons, MM is an attractive model disease for deciphering the functional basis of risk loci. To our knowledge, no systematic functional dissection of risk loci for a hematologic malignancy has been reported14,15,16,17,18.
Here, we carried out an integrative study combining massively parallel reporter assays (MPRA), expression analyses (eQTL, meQTL, and PCHiC), and chromatin accessibility quantitative locus (caQTL) analyses in primary cells to investigate 1039 variants in linkage disequilibrium with multiple myeloma (MM) lead variants. We demonstrate that MM susceptibility is mediated by gene-regulatory changes in plasma cells and B-cells, and identify putative causal variants at six risk loci. Notably, three of these variants co-localize with significant plasma cell caQTLs, signaling the presence of causal activity at these precise positions in an endogenous chromosomal context in vivo. Our results provide a systematic functional dissection of risk loci for a hematologic malignancy.
Results
Designing an MPRA to screen MM risk variants
To identify putative causal variants, we first designed an MPRA14,19,20,21 to screen 1039 variants in high LD (r2 > 0.8) with MM lead variants for transcriptional activity (Fig. 1a and Supplementary Table 1). For each variant, we designed twelve 120-bp oligonucleotide sequences corresponding to reference and alternative alleles in six genomic contexts (both strands × three sliding windows with the variant at −20, 0, and +20 bp from the center). Sequences were coupled to a reporter gene with random 20-bp sequence barcodes 3′ of its open reading frame. Following transfection into cell lines, the transcriptional activity of each construct was measured by determining the barcode representation in reporter mRNA relative to DNA (Fig. 1b). Plasmid sequencing identified 1.73 × 106 unique barcodes tagging 12,378 (99.2%) of the 12,468 designed oligonucleotides (Fig. 1c). As a positive control, we included the RUNX3 variant rs188468174, which influences immunoglobulin (Ig) levels and exhibits luciferase activity across a broad range of MM cell lines22.
Identification of causal cell types for MM susceptibility
Since reporter assays can show cell type-dependent activity, MPRA should ideally be performed in an appropriate cellular model. We therefore carried out computational analyses to identify cell types where MM risk variants likely act. First, using ATAC-seq data for blood cell populations23, we found an enrichment of risk variants in genomic regions with accessible chromatin in plasma cells and total mature B-cells (Supplementary Fig. 1). Second, investigating blood cell populations for expression23,24,25,26,27,28 of genes located at MM risk loci, we identified plasma cells and total mature B-cells as the most enriched cell types (Supplementary Fig. 2). Third, using gene expression profiles of CD138+ plasma cells isolated from the bone marrow of 2650 MM patients12,29,30,31, we identified cis-eQTLs in LD with ten risk alleles (Supplementary Table 2). Additional cis-eQTLs were found in whole blood (Supplementary Table 3) or in CD19+ total mature B-cells isolated from 758 random blood donors (Supplementary Table 4). Fourth, since plasma cells are responsible for producing Ig, we tested MM lead variants for association with blood IgA, IgG, and IgM levels22. This revealed enrichments of association signal within the set of 24 MM lead variants for all three Ig isotypes (binomial test P = 6.8 × 10−5 for IgA, P = 0.02 for IgG, P = 0.004 for IgM for the enrichment of association P values <0.05), as well as individually significant associations (Supplementary Fig. 3), including for the SMARCD3, WAC, and ELL2 associations, which also showed plasma cell cis-eQTLs (Supplementary Table 2). Collectively, these data are consistent with many MM risk variants acting by altering gene regulation in plasma cells, while others may act in other cell populations, including B-cells.
Identification of MM risk variants influencing transcription
Focusing our analysis on plasma cells, we performed MPRA in the MM plasma cell lines L363 and MOLP8. Each cell line was assayed in three replicates (Fig. 1d). Based on barcode activity estimates, we calculated a log2 score for each variant reflecting the transcriptional activity of the alternative relative to the reference allele, averaged across genomic contexts and replicates32. L363 and MOLP8 scores showed a positive correlation (Fig. 2a), did not display strand bias (Fig. 2b), and additional validation of 20 selected variants showed a positive correlation with luciferase data (Fig. 2c, Supplementary Fig. 4 and Supplementary Table 5). Moreover, variants with strong MPRA scores were enriched in chromatin accessibility regions in primary plasma cells, consistent with our assay selecting variants with endogenous regulatory activity (Fig. 2d and Supplementary Fig. 5).
In L363, 142 variants were significant (FDR <5%), including 33 with strong effects (absolute log2 score >0.2). In MOLP8, 28 were significant, including 21with strong effects (Fig. 2e, f and Supplementary Data 1). The higher number of significant variants in L363, compared to MOLP8, cells was congruent with a higher transfection efficiency (54% for L363 versus 15% for MOLP8) and higher post-transfection viability (90% for L363 versus 65% for MOLP8). In total, 23 variants were significant in both screens, and eight of these showed concordant plasma cell cis-eQTLs, making them putative causal variants that were selected for follow-up (Table 1, Fig. 3, and Supplementary Figs. 4, 6). The other 15 had discordant or no plasma cell cis-eQTLs, either because of technical limitations (e.g., TERC was not in our eQTL data; the JARID2 and RUNX3 variants are rare), or because these alter gene expression in another cell state (e.g., TNFRSF13B is primarily expressed in switch-memory B-cells33 and had a cis-eQTL in total mature B-cells; Supplementary Table 4).
Functional characterization of MPRA-functional variants
We next investigated potential mechanisms of action for the selected variants. rs78740585 maps to SMARCD3 (Fig. 4a). In humans, SMARCD1, SMARCD2, and SMARCD3 encode alternative, mutually exclusive 60-kD subunits of the SWI/SNF nucleosome remodeling complex34,35,36. Incorporation of either SMARCD subunit variant into the complex influences its activity37. In blood, SMARCD3 is primarily expressed in granulocytes and monocytes whereas basal expression in plasma cells is very low; instead these cells exhibit high expression of SMARCD1 and SMARCD2 (Fig. 5a). By contrast, the MM risk allele associates with upregulation of SMARCD3 in plasma cells (Supplementary Tables 2 to 4). rs78740585-A creates a binding site for IRF4 (Fig. 5b, c, Supplementary Fig. 7, and Supplementary Data 2), a key plasma cell transcription factor essential for the survival of MM cells38. Knockdown of IRF4 attenuated rs78740585-A luciferase activity (Fig. 5d, e). Furthermore, analysis of promoter-capture Hi-C (PCHi-C) data for three MM plasma cell lines showed a chromatin looping interaction between the rs78740585 region and the SMARCD3 promoter (Fig. 4a and Supplementary Fig. 8a). Collectively, these data are consistent with rs78740585-A effecting ectopic SMARCD3 expression in plasma cells by introducing a new IRF4 site into an enhancer, In theory, increased levels of SMARCD3 protein could lead to the displacement of SMARCD1 and SMARCD2 in the SWI/SNF complex through stoichiometric competition, potentially impacting on SWI/SNF-dependent gene expression.
rs2790444 maps to the autophagy gene WAC (Fig. 4b). Rare loss-of-function variants in WAC cause De Santo-Shinawi syndrome39, which can feature hypogammaglobulinemia40. The common MM risk allele associates with increased levels of IgM in the blood (Supplementary Fig. 3) and downregulation of WAC in plasma cells (Supplementary Table 2). rs2790444 maps close to the WAC transcription start site, within the PCHi-C bait region (Fig. 4b and Supplementary Fig. 8b). We found that rs2790444-T creates a binding site for the POU2F1 transcription factor and knockdown of POU2F1 attenuated rs2790444-T luciferase activity (Fig. 6a–d, Supplementary Fig. 9, and Supplementary Data 2). POU2F1 has a dual role in the regulation of gene expression; recruiting the nucleosome remodeling and deacetylase (NuRD) complex, POU2F1 promotes methylation and suppressive histone modifications, while in the context of MAPK signaling it recruits the KDM3A demethylase, promoting pro-transcriptional effects41. Consistent with pro-transcriptional activity, we detected a significant plasma cell cis-meQTL at WAC with rs2790444 (P = 1.37 × 10−8), with rs2790444-T being associated with reduced methylation (Supplementary Table 6). Moreover, CRISPR/Cas9 deletion of a 139-bp region harboring rs2790444 downregulated WAC, supporting functional coupling between the variant-harboring region and the transcriptional regulation of WAC (Fig. 6e, f). These data are compatible with rs2790444-T creating a promoter-proximal POU2F1 site, upregulating WAC through decreased methylation.
rs3777182, rs3777183, and rs3777189 map to ELL2 encoding a key protein in the super-elongation complex that drives Ig synthesis42,43,44,45 (Fig. 4c). The MM risk allele downregulates ELL2 in plasma cells and Ig levels in blood8,12. Recently, we nominated rs3777189 as causal using non-systematic approaches and demonstrated that it changes a MAFF/G/K binding site12. In our MPRA screen, we now identify rs3777189 as the most active variant within its LD block, providing additional, unbiased evidence for causality. In addition, we identify rs3777182 and rs3777183 as previously unappreciated regulatory variants within the ELL2 LD block. Analyzing our PCHi-C data, we identified a chromatin looping interaction between the rs3777183-rs3777182 region and the ELL2 promoter (Fig. 4c and Supplementary Fig. 8c), and predicted several altered motifs (Supplementary Data 2). CRISPR/Cas9 deletion of a 141-bp region harboring rs3777183-rs3777182 and an 89-bp region harboring rs3777189 both altered ELL2 expression, supporting that the ELL2 eQTL is caused by genetic variation in multiple intronic regulatory elements that are involved in the transcriptional regulation of ELL2 (Fig. 7a, b and Supplementary Fig. 10a, b).
rs4487645 maps to the DNAH11-CDCA7L locus (Fig. 4d and Supplementary Fig. 8d), and the risk allele rs4487645-C upregulates the cMyc-interacting CDCA7L29,46. We previously proposed rs4487645 as causal, finding that rs4487645-C creates a new IRF4 binding site13. Our current analysis provides additional unbiased evidence for this variant indeed being the functional basis of the 7p15.3 association. CRISPR/Cas9 deletion of a 76-bp region harboring rs4487645 downregulated CDCA7L (Fig. 7c and Supplementary Fig. 10c), supporting a regulatory link between the region and CDCA7L. Moreover, we employed CRISPR/Cas9 with homology-directed repair (HDR) to generate L363 single-cell clones with different rs4487645 genotypes. In total, we generated six rs4487645-C-homozygous clones, three rs4487645-C/A heterozygous clones, and six rs4487645-A-homozygous clones. We observed a significant association between rs4487645 genotype and CDCA7L expression, with the C allele yielding higher expression (Fig. 7d), further supporting that rs4487645 causes the CDCA7L eQTL.
Finally, rs11960493 and rs6066832 map to CEP120 and PREX1, respectively, and upregulate these genes in plasma cells (Fig. 4e, f and Supplementary Fig. 8e, f). CEP120 is implicated in microtubule assembly47, and PREX1 encodes a guanine nucleotide exchange factor mutated or aberrantly expressed in several cancers48,49. While we predicted several motif changes for both variants (Supplementary Data 2) and a looping interaction between the rs6066832 region and the PREX1 promoter (Fig. 4e and Supplementary Fig. 8f), we could not identify differentially bound proteins.
Effects in the endogenous chromosomal context in vivo
Following characterization in vitro, we investigated if the eight selected variants are active in an endogenous chromosomal context in vivo. Altered gene-regulatory activity is associated with the release or recruitment of proteins to DNA and/or changes in chromatin structure. In turn, this could cause allele-dependent changes in accessibility (chromatin accessibility quantitative trait loci, caQTLs) around the variant, detectable by ATAC-seq of limited numbers of primary cells. Hence, we performed ATAC-seq on plasma cells from MM patients. To detect caQTLs, we estimated the local ATAC-seq signal intensity as the average Tn5 transposase cut-site density across a 150-bp sliding window positioned at every 10 bps across LD regions and examined correlations with the MM lead variant. We also developed a segmentation algorithm (“caQTLseg”) to partition LD regions into subregions with either allele-dependent or allele-independent accessibility.
In an initial set of 56 ATAC-seq samples, we detected lead variant caQTLs at SMARCD3, CDCA7L, and CEP120 (Supplementary Fig. 11). For replication, we performed ATAC-seq on an additional 105 samples. In a combined analysis of all 161 samples, the three caQTL signals increased in significance (Fig. 8). The regions identified as having allele-dependent accessibility were identified with a broad range of caQTLseg parameter settings (Online Methods and Supplementary Figs. 12 and 13). Furthermore, the SMARCD3 and CDCA7L signals were centered at rs78740585 and rs4487645, and were the only LD variants within their caQTLs; both of these risk variants create new IRF4 binding sites. Consistent with the recently described role of IRF4 as a pioneer-like transcription factor that regulates chromatin accessibility50,51,52,53, the SMARCD3- and the CDCA7L-high-expressing MM risk alleles associated with increased accessibility at rs78740585 and rs4487645 (Fig. 8a, b). By contrast, the caQTL at CEP120 (Fig. 8c) was more complex, encompassing rs11960493 plus eight other LD variants, one of which (rs62376437) was borderline-significant in the MPRA (q value 7.57 × 10−6 in L363; 2.72 × 10−1 in MOLP8; Supplementary Data 1) and concordant with the CEP120 cis-eQTL, suggesting multi-variant causality, as in the case of the ELL2 association. These results demonstrate that three of our selected MPRA-functional variants co-localize with significant plasma cell caQTLs, signaling the presence of causal regulatory activity at these variants in an endogenous chromosomal context in vivo.
Discussion
We have carried out a systematic functional analysis of inherited noncoding variants that predispose for MM. Our analysis represents a functional dissection of inherited noncoding variation predisposing for a hematologic malignancy. To our best knowledge, MPRA and caQTL analysis have not been previously used as mutually complementary approaches to identify putative causal variants. While MPRA is a powerful in vitro screening approach, caQTLs provide evidence for causal regulatory activity at specific genomic positions in an endogenous chromosomal context in vivo.
Our analysis identifies eight putative causal regulatory variants at six risk loci: SMARCD3, WAC, ELL2, CDCA7L, CEP120, and PREX1. Out of these variants, seven map to intronic regions within their target genes, and one maps to an enhancer region within a neighboring gene (Fig. 4). These observations are in accordance with other studies where GWAS signals have been dissected functionally (c.f., refs. 14,16,19,23,54,55). Notable findings include a variant effecting ectopic expression of the SWI/SNF gene SMARCD3 in plasma cells by introducing a new IRF4 site into an enhancer, and a variant upregulating the autophagy gene WAC by creating a POU2F1 site. Additionally, we find evidence for multi-variant causality at ELL2 and CEP120, and further support for rs4487645 being a causal variant at CDCA7L. Collectively, our findings provide functional insight into the genetic architecture of MM predisposition.
Regarding limitations, functional dissection of a GWAS signal should ideally include systematic perturbation of each variants within the LD block, for example using CRISPR-HDR or base editors (to replace each reference allele with its corresponding variant allele or vice versa in situ). However, it is widely recognized that such an approach is currently not possible, both because of the workload and because only some variants are accessible to precision editing because of the lack of nearby sgRNAs, and base editors can only achieve certain types of base changes. Additionally, in the case of MM, it is not possible to culture primary plasma cells or primary multiple myeloma cells ex vivo, and thus any editing experiments will need to be done in cell lines. For these reasons, we instead followed up our MPRA screen with dual-sgRNA CRISPR/Cas9 experiments to link variant-harboring regions to eQTL target genes. We achieved successful editing of rs4487645 at CDCA7L, whereas precision editing was not achieved for the other variants of interest. Finally, we carried out caQTL experiments in primary MM plasma cells, demonstrating allele-dependent chromatin accessibility (as a sign of altered regulatory activity) at the positions of the SMARCD3, CDCA7L, and CEP120 MPRA-functional variants in an endogenous chromosomal context.
Deciphering the functional basis of cancer risk variants provides for a more comprehensive understanding of the biological networks underlying tumorigenesis and predisposition. Here we have addressed this challenge in the context of MM by combining information from high-throughput functional screens, QTL analyses, and additional assays. Our integrative approach illustrates how functional dissection of noncoding variation influencing the development of human malignancies can be undertaken.
Methods
MPRA
We designed an MPRA for variants in LD (r2 > 0.8) with lead variants at 22 loci robustly associated with MM risk (Supplementary Table 1)5,6,7,8,9,11,56. We also included twelve candidate MM risk loci8,11,57 that have not so far been replicated5,11,58, although these were excluded in the final data analysis. Finally, we included RUNX3 rs188468174 as a positive control because of its known luciferase activity in plasma cell lines22.
For each variant, we designed twelve 120-bp sequences corresponding to the reference and alternative allele in six genomic contexts (positive and negative strand × three windows with the variant at −20, 0, and +20 bp from the center), flanked by 15-bp adapters: [5′-ACTGGCCGCTTGACG-(oligo)-CACTGCGGCTCCTGC-3′]. In total, 12,468 sequences representing 1039 variants were synthesized (CustomArray Inc.). Random 3′ 20-bp barcodes were then added by PCR (Supplementary Table 7). The library was synthesized per ref. 19. Barcoded oligos were inserted by Gibson assembly (cat no. E2611S, New England Biolabs) into a pGL4:23:∆xba∆luc vector to create a mpra∆orf library. A mpra:gfp library was then generated from the mpra∆orf library by inserting minimal promoter, GFP, and partial 3′ UTR from Pgl4.23:gfp plasmid (gift from Ryan Tewhey19). The final library was transfected (Neon system; Life Technologies) into 5 × 108 L363 or MOLP8 cells (ACC49 and ACC569; DSMZ). Cells were cultured at 37 °C and 5% CO2 in RPMI 1640(1X) + GlutaMAX with 10% fetal bovine serum (Gibco BRL, Thermo Fisher Scientific) at 0.5 to 0.7 × 106 cells/mL. 48 h after transfection, RNA was extracted and reporter mRNA pulled down. After adding sequencing adapters to cDNA synthesized from the DNase-treated GFP-mRNA, samples were sequenced (Illumina NextSeq 1 × 75 bp).
eQTL and gene expression data
To identify cis-eQTLs in plasma cells, we analyzed gene expression profiles of CD138+ cells isolated from bone marrow aspirates from MM patients harvested using immunomagnetic beads. First, we used Affymetrix microarray data, including 183 UK Myeloma IX trial patients (a study aimed at comparing two bisphosphonates in the treatment of MM; Medical Research Council Leukemia Data Monitoring and Ethics committee, no. MREC 02/8/95, ISRCTN68454111)59, 658 German GMMG patients, and 604 patients treated at the University of Arkansas for Medical Sciences Myeloma Center, USA11. Second, we used 185 RNA-seq samples from Lund University (Lund, Sweden)12. Third, we used 716 RNA-seq samples with DNA copy number covariates from the CoMMpass study31. Fourth, we used 309 RNA-seq samples from the Dana Farber Cancer Institute (Boston, USA)30. For the first two data sets, paired SNP microarray genotypes were available. For the third and fourth data sets, only RNA-seq data were available, limiting eQTL analysis to risk alleles with these coding proxies: rs3815768, rs34562254, rs6122720, rs1052501, rs7193541, and rs7782699. For blood, we used eQTLgen (www.eqtlgen.org) and data at deCODE Genetics (RNA-seq for 13,175 Icelanders). For B-cells, we generated eQTL data for 758 Icelanders by isolating B-cells from peripheral blood with negative selection using magnetic beads (StemCell Technologies 19674). To test for enrichment of gene expression of MM-associated genes in blood cell populations, we used gene expression microarray data for sorted blood cells (NCBI Gene Expression Omnibus; accession GSE24759, GSE15695, GSE4581, GSE19784, GSE26760, and GSE5900). These were generated on Affymetrix microarrays and quantile-normalized to a log-normal distribution. For enrichment testing, we used a one-sided Student’s t-test for genes in MM-associated regions versus other genes in the genome.
MPRA data analysis
To map oligo-barcode combinations, we amplified the mpra∆orf library using Illumina_Universal_Adapter and MPRA_v3_TruSeq_Amp2Sa_F primers, added indices by PCR using Illumina_Universal_Adapter and Illumina_Multiplex primers and sequenced the library (Illumina HiSeq 2 × 150 bp). Paired-end reads were merged using PEAR (v0.9.10)60 and aligned to the designed sequences using BWA-MEM (v0.7.15)61. Alignments with more than four mismatches within the designed oligonucleotide, or mismatches within 10 bp of the variant, were excluded. Based on filtered alignments, oligonucleotide-barcode pairs were identified. Combinations supported by at least two reads were included in the mapping. However, barcodes that mapped to more than one oligonucleotide were discarded if fewer than 50 reads supported the barcode or none of the oligo-barcode combinations were supported by at least 95% of the reads (i.e., if one oligo-barcode combination was supported by at least 95% of more than 50 reads, that combination was included). In total, we identified 1.73 × 106 oligonucleotide-barcode pairs mapping to 12,378 of the 12,468 designed sequences.
To score variants, we used MPRA score32. Basically, the activity of each barcode was estimated based on bi = log2(1 + #RNAi)/(1 + #DNAi), where #RNAi and #DNAi are the read counts for barcode i normalized to counts per 10 million reads. Subsequently, an overall log2 score representing the transcriptional activity of the alternative relative to the reference allele was calculated by forming the weighted average of the bi belonging to the variant across the six genomic contexts and three replicates32. To identify strand bias, we also calculated log2 scores based on constructs representing the variant on either the positive or negative strand.
ATAC-seq data for blood cell populations
Sequencing reads for published ATAC-seq libraries from 18 sorted hematopoietic cell types were downloaded from the Sequence Read Achieve23,62. Reads were processed as the MM ATAC-seq libraries using the hg38 reference genome. Next, we created a master peak file by aggregating the summits of each population and enumerating the fragments overlapping each peak for each population62. From this peak-by-cell type matrix, we performed g-chromVAR23 to discern cell type enrichments using two types of annotations for the MM variants: the fine-mapped probability of causality (Supplementary Fig. 1) and log2 MPRA scores (Fig. 2d).
For the first case, we used recalibration of marginal association effects using an approximate Bayes’ method63 as a proxy for fine-mapping to obtain a probability of causality for each MM risk variant11. Because the approximate Bayes’ method does not account for LD, we first performed stepwise conditional analysis, where we did not detect any secondary signals at any of the loci8,9,11. We intentionally used a pure genetics approach, as opposed to fine-mapping methods that factor in functional annotations, to ensure that the downstream g-chromVAR cell type enrichment analysis would be unbiased. For the second case, we used the MPRA score log2 scores to weight variants by the strength of transcriptional activity. g-chromVAR p values thus correspond to the enrichment of MM risk variants within cell types, weighted by quantitative chromatin accessibility signatures and either the variant genetic fine-mapping score or log2 MPRA score. Default parameters for g-chromVAR were used.
meQTL data generation and analysis
We performed cis-meQTL analysis using Illumina 450 K methylation array data for plasma cells from 379 patients from the MRC Myeloma XI trial64. Briefly, patients were randomized to induction therapy with CTD (cyclophosphamide, thalidomide, and dexamethasone) or CRD (cyclophosphamide, lenalidomide, and dexamethasone) with or without CVD (cyclophosphamide, bortezomib, and dexamethasone) intensification in patients with less than very good partial response (VGPR) after initial CRD or CTD. Fitter, younger patients were included in the intensive treatment pathway and received high-dose melphalan (HD-MEL) and autologous stem cell transplantation (ASCT) as consolidation. Post induction ± ASCT patients were randomized to lenalidomide, lenalidomide plus vorinostat, or observation. Primary outcome data has been reported. The collection of samples was undertaken with informed consent and ethical review board approval from the Oxfordshire Research Ethics Committee (MREC 17/09/09, ISRCTN49407852). Diagnosis of MM was established in accordance with World Health Organization guidelines. MM cells from patient bone marrow aspirates were obtained at diagnosis and purified (>95%) using immunomagnetic beads with CD138 antibody (Miltenyi Biotec). RNA and DNA were extracted using RNA/DNA mini kit or Allprep kits (Qiagen). The EZ DNA Methylation kit (Zymo Research) was used for bisulfite conversion of genomic DNA. Tumor DNA methylation was profiled using Illumina Infinium HumanMethylation450 (450k) or EPIC 850 K arrays. Raw data were exported from Genome Studio (Illumina) and quality checking and normalization was performed using the ChIP Analysis Methylation Pipeline (ChAMP)65. The BMIQ method was used to perform normalization. Preprocessed data were analysed using a Bayesian approach to the probabilistic estimation of expression residuals to infer broad variance components, thus accounting for hidden determinants influencing global expressions such as copy number, translocation status, and batch effects66. Genetic associations were tested under an additive model between variants and normalized methylation probes using FastQTL67, adjusting for plate and methylation-based principal component analysis score67.
PCHi-C data generation and analysis
To identify interactions between variant-harboring regions and promoters, we analyzed published PCHi-C data for KMS11 cells68,69. Additionally, we generated PCHi-C data for two additional MM plasma cell lines, KMS12 and MM1S, using the same protocol68,69.
Briefly, KMS11, KMS12, and MM1S cell lines were obtained from the American Type Culture Collection (ATCC). All cell lines were cultured at 37 °C, in RPMI supplemented with 10% FBS. To generate PCHi-C libraries, 25 million cells were fixed in 1% formaldehyde for 10 min. Cross-linked DNA was digested using HindIII (NEB; #R0104). Digested chromatin ends were filled and marked with biotin-14-dATP (Thermo Fisher, 19524-016). The resulting blunt-ended fragments were ligated at 16 °C in the nucleus with T4 DNA ligase (NEB; #M0202) to minimize random ligation. DNA was de-cross-linked by proteinase K (Ambion; #AM2546) treatment. DNA was sheared by sonication (Covaris; #M220) and 200–650-bp fragments were selected. Biotin-tagged DNA was pulled down with streptavidin beads and ligated with Illumina paired-end adapters (Illumina). Six cycles of PCR were performed to amplify libraries before capture. Promoter-capture was based on 32,313 biotinylated 120-mer RNA baits (Agilent Technologies) targeting both ends of HindIII-restriction fragments that overlap Ensembl promoters of protein-coding, noncoding, antisense, small nuclear RNA, microRNA, and small nucleolar RNA transcripts. A post-capture PCR amplification step was carried out using five amplification cycles, after library enrichment. Libraries were sequenced using Illumina HiSeq 2000 technology. Reads were aligned to the GRCh37 build using Bowtie2 v.2.2.640 and identification of valid read pairs was performed using HiCUP v.0.5.941. To call significant contacts, HiCUP output was processed using CHiCAGO v.1.1.842. For each cell line, data from three independent biological replicates were combined to obtain a definitive set of contacts. Looping interactions were called using the CHiCAGO pipeline70 to obtain a unique list of reproducible contacts. Interactions with –log10(CHiCAGO P score) ≥ 2 were considered significant and shown in figures. Genomic loci or genome browser figures were generated using tidyGenomeBrowser (https://github.com/MalteThodberg/tidyGenomeBrowser). Transcript models were obtained from the TxDb.Hsapiens.UCSC.hg38.knownGene R-package71. ChromHMM states for KMS11 cells were obtained from ref. 11 in hg19 coordinates and converted to hg38 coordinates using the rtracklayer R-package72.
Luciferase analysis
Luciferase constructs were generated by cloning genomic sequences (Integrated DNA Technologies; Supplementary Table 5) centered on variants of interest into the pGL3-basic vector. Using electroporation (Neon system; Thermo Fisher Scientific), the constructs were co-transfected with renilla plasmid to enable normalization of the luciferase signal. At 24 h post-electroporation, luciferase and renilla activity was measured using DualGlo Luciferase (cat no. E1960; Promega) on a GLOMAX 20/20 Luminometer. Based on luciferase/renilla readings, we calculated log2 scores for each variant reflecting the luciferase activity of the alternative relative to the reference allele.
Transcription factor motif analysis
To identify differentially binding transcription factors, we used the PERFECTOS-APE tool (http://opera.autosome.ru/perfectosape) with the HOCOMOCO-10, JASPAR, HT-SELEX, SwissRegulon, and HOMER motif databases.
Electrophoretic mobility shift assays
For each variant 25-bp 5′-biotin-labeled, double-stranded probes were synthesized (Integrated DNA Technologies): 5′-ACTTAATTTGCC[C/T]GAATTACATTTC-3′ for rs2790444; 5′-TCAAGAACTGAA[G/A]CTGTAAGTTGAC-3′ for rs78740585. Unlabeled identical sequences were synthesized for competition reactions, and the nuclear extract was prepared12,73. For supershift reaction, we used antibodies against POU2F1 (cat no. sc-8024; Santa Cruz) and IRF4 (cat no. 646412; Biolegend). Reaction mixes were incubated for 15 min at room temperature and an additional 15 min after adding antibodies. Incubations were done at room temperature according to the manufacturer’s instructions (LightShift Chemiluminescent EMSA kit, cat no: 20148, Thermo Fisher Scientific).
siRNA experiments
siRNAs against IRF4 were purchased from Qiagen (cat no. FlexiTube GeneSolution GS3662, IRF4), against POU2F1 from Sigma (cat no. SASI_Hs01_00018404). L363 cells (3 × 106) were transfected with 300 nmol siRNA using the Neon system (Thermo Fisher Scientific) in 100 µl volume. Electroporation conditions were 1500 V, 10 pulse width, and 3 pulse number. Luciferase analysis was done 24 h after transfection. In parallel, cells were harvested for immunoblotting. Luciferase constructs (5 µg plasmid/3 × 106 cells) for WAC rs2790444 and SMARCD3 rs78740585 reference and alternative alleles (Supplementary Table 5) were co-transfected with siRNA in a 100 µl reaction volume. The final siRNA concentration was 300 nmol/100 µl reaction mix. For immunoblotting, cells were lysed in 2X-Laemmli buffer (cat no: 161-0737; Bio-Rad) and sonicated for ten cycles of 30″/30″ s on/off on Bioruptur Pico (Diagenode). Quantitative measurement of total protein was done and 20 µg was loaded on 4 to 20% mini-PROTEAN TGX Gel (cat no: 456-1093, Bio-Rad). Post-electrophoresis gel was transferred to Trans-Blot turbo PVDF membrane and blotting was performed on Trans-Blot Turbo transfer system (Bio-Rad) using the same antibodies used in EMSA experiments.
CRISPR/Cas9 deletion of variant-harboring regions
Dual-sgRNA CRISPR/Cas9 deletion of variant-harboring regions is frequently used to investigate if a given genomic region (e.g., an intronic or distant enhancer) is involved in the transcriptional regulation of a given gene. Compared to CRISPR/Cas9 homology-directed repair (CRISPR-HDR), dual-sgRNA deletion has advantages in that it has high editing efficiency, and is applicable in a broader range of situations, as it does not require an effective sgRNA in the immediate vicinity of the variant (within a few base pairs). Here, we used dual-sgRNA to demonstrate functional couplings between variant-harboring regions in WAC, ELL2, and CDCA7L because the variants of interest themselves were not accessible to CRISPR-HDR due to a lack of efficient sgRNAs that cut DNA close to these variants.
To delete variant-harboring regions, we used a dual-sgRNA CRISPR/Cas9 in plasma cell lines. We identified functional sgRNAs targeting the rs2790444, rs3777189, rs3777182-rs3777183, and rs4487645 regions; (Supplementary Table 8). The sgRNAs were cloned into the pSpCas9(BB)-2A-GFP PX458 vector (gift from Feng Zhang; Addgene cat no. 48138). Cloned sgRNA pairs were co-transfected using the Neon system (Thermo Fisher Scientific) into the following cell lines, which carry at least one copy of the high-expressing allele of the respective variants: RPMI-8226 (heterozygous for rs3777189 and rs3777183-rs3777182; DSMZ ACC402), OPM2 (homozygous for rs4487645-C; DSMZ ACC50), or MOLP8 (heterozygous for rs2790444; DSMZ ACC569). The cell lines were genotyped for the CRISPR-deleted variants and were found to have the expected genotype, as compared to data in the Cancer Cell Line Encyclopedia. The cell lines were not tested for mycoplasma. At 24 h post-transfection, GFP-positive cells were isolated using fluorescence-activated cell sorting. Genomic DNA was extracted and the targeted region was amplified by PCR to verify deletion (Supplementary Table 9). In parallel, RNA was prepared, reverse-transcribed, and quantified using SYBR Green qPCR assays (iTaq Univeral SYBR Green Supermix, cat no: 1725120; Supplementary Table 10).
CRISPR/Cas9 with homology-directed repair
To further test variant causality, we considered the possibility of précising-editing the identified MPRA-functional variants in MM cell lines using CRISPR/Cas9 with homology-directed repair (HDR)74. We achieved successful editing of CDCA7L rs4487645[C > A] in L363 cells. To generate L363 clones with different rs4487645 genotypes, we used the sgRNA sequence CCTCTGAAACTTACAATTCA with PAM sequence AGG cloned into a pSpCas9(BB)-2A-GFP vector (PX458, Addgene), along with the following repair templates: GTTGACCTATAAGGAAGCTGGCTCACAGAGGCTAGGGACAGATGAACCTCTTCGATAAAATTAAGAGA[G/T]AAGTGAAACCTTGAATTGTAAGTTTCAGAGGCTGCTTAAAGGGGACCAGGAGAATGGAGTAGAGAGCATAGCCTCAGTGTAA. Repair templates were synthesized by IDT (Alt-R HDR donor oligo, 2 nmol), with IDTs proprietary for 5′ and 3′ end modification for increased stability in the cell post-transfection. Plasmid and repair templates were co-transfected into L363 cells using a Neon electroporation system (Thermo Fisher). Post 48 h of transfection, single GFP positive cells were sorted using a BD FACSAria Fusion and cultured in a 96 well plate. Clones were genotyped for rs4487645 using Taqman genotyping assay (C_26972688_10, part no. 4351379) on a StepOnePlus qPCR instrument (Applied Biosystems). The selected clones were also analyzed by Sanger sequencing of the region encompassing CRISPR edit by amplifying with primers CDCA7L_F and CDCA7L_R (Supplementary Table 10). Because L363 is a genetically unstable cell line, and because CRISPR editing may introduce local DNA copy number changes due to chromothripsis, we also measured the CDCA7L DNA copy number in each clone using the Taqman copy number assay (Hs 02885634_cn; cat no. 4400291) with reference assay (RNasep, cat no. 4403326) in a duplex qPCR setup (Applied Biosystems, StepOnePlus). To calibrate the assay, we used DNA from two healthy blood donors. Copy numbers were calculated using CopyCaller v2.1 software (Thermo Fisher). To quantify CDCA7L expression, we used qPCR (Supplementary Table 10) with iTaq universal SYBR master mix (cat no. 1725120, Bio-Rad) and GAPDH as endogenous reference genes. To test for association between CRISPR-edited rs4487645 genotype and CDCA7L expression (quantified as 2−∆Ct relative to GADPH), we used multivariate regression with CDCA7L DNA copy number as a covariate.
caQTL data generation
We generated ATAC-seq libraries from 50,000 CD138+ magnetic bead-isolated MM plasma cells per sample using a protocol based on ref. 75. Samples were obtained from the Norwegian MM Biobank in Trondheim, subject to ethical approval (Norway REK2014/97; Sweden 2019-06386). Libraries were prepared using the Nextera DNA Library Prep kit and sequenced (Illumina 2 × 125 bp). Adapter sequences in the ATAC-seq reads were removed using Trimmomatic (v0.36)76 and aligned using Bowtie2 to hg38. Duplicate and mitochondrial reads were filtered out using SAMtools77 and Picard (http://broadinstitute.github.io/picard). Transposase cut-sites were extracted from the BAM files using BEDtools. Read start sites were adjusted to represent the center of transposon binding event78. For quality control, we calculated the enrichment of ATAC-seq reads at transcription start sites (TSS) of protein-coding RefSeq genes as in ENCODE (www.encodeproject.org/data-standards/terms/#enrichment). In short, the distribution of read depths across 2-kb windows centered at TSSs were normalized by the average read depths in the flanking 100 bp on both ends. The average score across all genes was used as a TSS enrichment score, and we excluded samples that had an enrichment score <3.
caQTL detection
We estimated the local ATAC-seq signal intensity as the Tn5 cut-site density (i.e., the average number of cut-sites per bp) across a 150-bp sliding window positioned at every 10 bp across each LD region, normalized by the Tn5 cut-side across the entire LD region in the same sample. Notably, the cut-side density quantity can be calculated across the LD region, as it is independent of specific nucleotides being present in the ATAC-seq sequences.
To identify caQTLs, we developed two computational approaches. First, we scanned the local ATAC-seq intensities for Pearson correlation with the MM lead variant for the LD block. Second, as a complementary approach inspired by methods previously developed by our lab79,80,81,82,83, we developed asegmentation tool (“caQTLseg”) to partition a region of LD into subregions with either lead variant-dependent or allele-independent local ATAC-seq signal intensity (link to a software in Code Availability section). In short, caQTLseg, which was inspired by signal reconstruction tools previously developed by us79,80,81, takes as input the local ATAC-seq intensities dij for window i = 1, …, I and sample j = 1, …, J. In an outer loop, we use dynamic programming to find a partitioning of the LD region that minimizes an inner cost function. At each step in the loop, the dynamic programming algorithm suggests a candidate partitioning of the region consisting of a number of segments, whose breakpoints are shared across samples (as inherited variants can be assumed to have comparable effects across individuals). Given a candidate partitioning that consists of a number of segments s = 1, …, S, caQTLseg then finds the values fij(s) that minimize the sum of L2 residuals λ(s)×||dij(s)-fij(s)||2 across all i in the segment, averaged across all samples. Thus, caQTLseg seeks the partitioning and fij(s) values that minimize
where i0(s) and i1(s) are the indices of the first and last windows of each segment. In every second segment, caQTLseg alternatingly fits allele-independent and allele-independent fij(s) values. In allele-independent segments, the optimal fij(s) values are the average of the dij(s) across all j (i.e., the same optimal values are fit to all samples). In this case, caQTL also sets λ(s) = 1. In allele-dependent segments, caQTLseg fits a linear model fij(s) = ai(s) + gj × bi(s), where ai(s) and bi(s) are shared across all j, and gj is the variant genotype of sample j. In this case, λ(s) is set to a prespecified parameter λ1 > 1 that serves to calibrate the cost of an allele-dependent model against the cost of an allele-independent model. Since an allele-dependent model is more flexible than an allele-independent model, it will always produce a lower L2 residual, and λ1 must therefore be greater than 1 in order to prevent the dynamic programming algorithm from always choosing the allele-dependent model. Increasing λ1 makes it more difficult to call a segment allele-dependent, yielding a more conservative segmentation. Following the computation of the segment-specific cost, the total cost for the partitioning is calculated as the sum of segment-specific costs plus an additional regularization penalty calculated as the number of segments multiplied by a prespecified parameter λ2 > 0 that determines the degree of over-segmentation versus under-segmentation. Increasing λ2 produces a solution with fewer segments.
To estimate the noise level, we defined a statistic π0, defined as min(n0/n1, 1), where n0 is the average number of base pairs in the region-of-interest that are called allele-dependent under the null (i.e., when genotypes are randomly permuted between samples) and n1 is the number of base pairs in the region-of-interest that are called allele-dependent with correctly assigned, unpermuted genotypes. We calculated n0 using 500 random genotype permutations. The π0 statistic serves to estimate the proportion of signal that can be attributed to noise (similar to, say, the false discovery rate), and can be used to titrate λ1 and λ2. Clearly, π0 approaches 0 when λ1 and λ2 increase (more conservative segmentation) and 1 when λ1 and λ2 approach 1 and 0, respectively (less conservative segmentation).
To identify caQTLs, we used a two-stage approach, with a discovery set of 56 samples and a follow-up set of 105 samples. In the caQTLseg analysis, we used λ1 = 1.075 and λ2 = 10−1.5. With these parameters, we detected allele-dependent regions conservatively in the combined data set of 161 samples (π0 = 0.03 for SMARCD3 rs78740585; π0 = 0.0022 for CDCA7L rs4487645; π0 = 0.0022 for CEP120 rs6595443). To further assess the robustness of the results, we also repeated the analysis with a broad range of parameter choices (λ1 from 1.025 to 1.20; λ2 from 10−1 to 10−5). Throughout, we identified essentially the same regions as allele-dependent, though the estimated noise level (π0) and the degree of fragmentation varied as expected (Supplementary Table 11 and Supplementary Figs. 12, 13).
Statistics and reproducibility
The experiments in Fig. 5c were done three times, Fig. 5e once, Fig. 6b twice, Fig. 6d once, and Fig. 6f twice. The gels shown are representative. Agreeing results were seen in the replicates.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The raw sequencing data for the MPRA experiment have been deposited in the Sequence Read Archive, accession no. PRJNA679966 and are publicly available. The ATAC-sequencing data for primary CD138+ MM plasma cells have been deposited in the European Genome-phenome Archive (EGA), accession no. EGAS00001005394 and EGAD00001007814 and are available to other researchers with controlled access. The PCHi-C data for KMS11 is available through EGA; accession number EGAS00001002614 and EGAD00001003597 and are available to other researchers with controlled access, as are the meQTL data (accession number EGAS00001005788 and EGAD00010002259). The following previously published data sets were used: Gene expression data for MM samples from the CoMMPASS study, available in dbGaP, accession number phs000748.v7.p4 (available to senior investigators through authorized access after application in dbGaP)[https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000748.v7.p4]; publicly available blood eQTL data from the eQTLGen Consortium[http://www.eqtlgen.org]; and publicly available gene expression data from the NCBI Gene Expression Omnibus (GEO) repository, accession numbers GSE111199, GSE24759, GSE15695, GSE4581, GSE19784, GSE26760, and GSE5900. Source data are provided with this paper.
Code availability
The source code (C++) for caQTLseg is available at GitHub[https://github.com/abhisheknrl/caQTLseg]84.
Change history
13 December 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41467-022-35411-1
References
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
Pertesi, M. et al. Genetic predisposition for multiple myeloma. Leukemia https://doi.org/10.1038/s41375-019-0703-6 (2020).
Broderick, P. et al. Common variation at 3p22.1 and 7p15.3 influences multiple myeloma risk. Nat. Genet. 44, 58–61 (2012).
Chubb, D. et al. Common variation at 3q26.2, 6p21.33, 17p11.2 and 22q13.1 influences multiple myeloma risk. Nat. Genet. 45, 1221–1225 (2013).
Swaminathan, B. et al. Variants in ELL2 influencing immunoglobulin levels associate with multiple myeloma. Nat. Commun. 6, 7213 (2015).
Mitchell, J. S. et al. Genome-wide association study identifies multiple susceptibility loci for multiple myeloma. Nat. Commun. 7, 12050 (2016).
Halvarsson, B. M. et al. Direct evidence for a polygenic etiology in familial multiple myeloma. Blood Adv. 1, 619–623 (2017).
Went, M. et al. Identification of multiple risk loci and regulatory mechanisms influencing susceptibility to multiple myeloma. Nat. Commun. 9, 3707 (2018).
Ali, M. et al. The multiple myeloma risk allele at 5q15 lowers ELL2 expression and increases ribosomal gene expression. Nat. Commun. 9, 1649 (2018).
Li, N. et al. Multiple myeloma risk variant at 7p15.3 creates an IRF4-binding site and interferes with CDCA7L expression. Nat. Commun. 7, 13656 (2016).
Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
Chen, X. F. et al. Multiomics dissection of molecular regulatory mechanisms underlying autoimmune-associated noncoding SNPs. JCI Insight https://doi.org/10.1172/jci.insight.136477 (2020).
Choi, J. et al. Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma. Nat. Commun. 11, 2718 (2020).
Mulvey, B., Lagunas, T., Jr. & Dougherty, J. D. Massively parallel reporter assays: defining functional psychiatric genetic variants across biological contexts. Biol Psychiatry https://doi.org/10.1016/j.biopsych.2020.06.011 (2020).
Castaldi, P. J. et al. Identification of functional variants in the FAM13A chronic obstructive pulmonary disease genome-wide association study locus by massively parallel reporter assays. Am. J. Respir. Crit. Care Med 199, 52–61 (2019).
Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Jonsson, S. et al. Identification of sequence variants influencing immunoglobulin levels. Nat. Genet. 49, 1182–1191 (2017).
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
Boyd, K. D. et al. Mapping of chromosome 1p deletions in myeloma identifies FAM46C at 1p12 and CDKN2C at 1p32.3 as being genes in regions associated with adverse survival. Clin. Cancer Res. 17, 7776–7784 (2011).
Broyl, A. et al. Gene expression profiling for molecular classification of multiple myeloma in newly diagnosed patients. Blood 116, 2543–2553 (2010).
Chapman, M. A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).
Zhan, F. et al. Gene-expression signature of benign monoclonal gammopathy evident in multiple myeloma is linked to good prognosis. Blood 109, 1692–1700 (2007).
Weinhold, N. et al. The 7p15.3 (rs4487645) association for multiple myeloma shows strong allele-specific regulation of the MYC-interacting gene CDCA7L in malignant plasma cells. Haematologica 100, e110–e113 (2015).
Samur, M. K. et al. Long intergenic non-coding RNAs have an independent impact on survival in multiple myeloma. Leukemia 32, 2626–2635 (2018).
Manojlovic, Z. et al. Comprehensive molecular profiling of 718 Multiple Myelomas reveals significant differences in mutation frequencies between African and European descent cases. PLoS Genet. 13, e1007087 (2017).
Niroula, A., Ajore, R. & Nilsson, B. MPRAscore: robust and non-parametric analysis of massively parallel reporter assays. Bioinformatics https://doi.org/10.1093/bioinformatics/btz591 (2019).
Salzer, U. et al. Mutations in TNFRSF13B encoding TACI are associated with common variable immunodeficiency in humans. Nat. Genet. 37, 820–828 (2005).
Flajollet, S., Lefebvre, B., Cudejko, C., Staels, B. & Lefebvre, P. The core component of the mammalian SWI/SNF complex SMARCD3/BAF60c is a coactivator for the nuclear retinoic acid receptor. Mol. Cell Endocrinol. 270, 23–32 (2007).
Wang, W. et al. Diversity and specialization of mammalian SWI/SNF complexes. Genes Dev. 10, 2117–2130 (1996).
Mashtalir, N. et al. Modular organization and assembly of SWI/SNF family chromatin remodeling complexes. Cell 175, 1272–1288 e1220 (2018).
Puri, P. L. & Mercola, M. BAF60 A, B, and Cs of muscle determination and renewal. Genes Dev. 26, 2673–2683 (2012).
Shaffer, A. L. et al. IRF4 addiction in multiple myeloma. Nature 454, 226–231 (2008).
DeSanto, C. et al. WAC loss-of-function mutations cause a recognisable syndrome characterised by dysmorphic features, developmental delay and hypotonia and recapitulate 10p11.23 microdeletion syndrome. J. Med Genet. 52, 754–761 (2015).
Vanegas, S., Ramirez-Montano, D., Candelo, E., Shinawi, M. & Pachajoa, H. DeSanto-Shinawi syndrome: first case in South America. Mol. Syndromol. 9, 154–158 (2018).
Vazquez-Arreguin, K. & Tantin, D. The Oct1 transcription factor and epithelial malignancies: old protein learns new tricks. Biochim. Biophys. Acta 1859, 792–804 (2016).
Park, K. S. et al. Transcription elongation factor ELL2 drives Ig secretory-specific mRNA production and the unfolded protein response. J. Immunol. https://doi.org/10.4049/jimmunol.1401608 (2014).
Martincic, K., Alkan, S. A., Cheatle, A., Borghesi, L. & Milcarek, C. Transcription elongation factor ELL2 directs immunoglobulin secretion in plasma cells by stimulating altered RNA processing. Nat. Immunol. 10, 1102–1109 (2009).
Benson, M. J. et al. Heterogeneous nuclear ribonucleoprotein L-like (hnRNPLL) and elongation factor, RNA polymerase II, 2 (ELL2) are regulators ofmRNA processing in plasma cells. Proc. Natl Acad. Sci. USA 109, 16252–16257 (2012).
Milcarek, C., Albring, M., Langer, C. & Park, K. S. The eleven-nineteen lysine-rich leukemia gene (ELL2) influences the histone H3 protein modifications accompanying the shift to secretory immunoglobulin heavy chain mRNA production. J. Biol. Chem. 286, 33795–33803 (2011).
Ou, X. M., Chen, K. & Shih, J. C. Monoamine oxidase A and repressor R1 are involved in apoptotic signaling pathway. Proc. Natl Acad. Sci. USA 103, 10923–10928 (2006).
Comartin, D. et al. CEP120 and SPICE1 cooperate with CPAP in centriole elongation. Curr. Biol. 23, 1360–1366 (2013).
McCarthy, N. Signalling: REX rules. Nat. Rev. Cancer 11, 83 (2011).
Srijakotre, N. et al. P-Rex1 and P-Rex2 RacGEFs and cancer. Biochemical Soc. Trans. 45, 963–977 (2017).
Ciofani, M. et al. A validated regulatory network for Th17 cell specification. Cell 151, 289–303 (2012).
Kurachi, M. et al. The transcription factor BATF operates as an essential differentiation checkpoint in early effector CD8+ T cells. Nat. Immunol. 15, 373–383 (2014).
Karwacz, K. et al. Critical role of IRF1 and BATF in forming chromatin landscape during type 1 regulatory cell differentiation. Nat. Immunol. 18, 412–421 (2017).
Shaffer, A. L., Emre, N. C., Romesser, P. B. & Staudt, L. M. IRF4: immunity. malignancy! therapy? Clin. Cancer Res. 15, 2954–2961 (2009).
Christophersen, M. K. et al. SMIM1 variants rs1175550 and rs143702418 independently modulate Vel blood group antigen expression. Sci. Rep. 7, 40451 (2017).
Bao, E. L. et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature 586, 769–775 (2020).
Weinhold, N. et al. The CCND1 c.870 G > A polymorphism is a risk factor for t(11;14)(q13;q32) multiple myeloma. Nat. Genet. 45, 522–525 (2013).
Johnson, D. C. et al. Genome-wide association study identifies variation at 6q25.1 associated with survival in multiple myeloma. Nat. Commun. 7, 10290 (2016).
Ali, M. et al. Sequence variation at the MTHFD1L-AKAP12 and FOPNL loci does not influence multiple myeloma survival in Sweden. Blood Cancer J. 9, 57 (2019).
Morgan, G. J. et al. First-line treatment with zoledronic acid as compared with clodronic acid in multiple myeloma (MRC Myeloma IX): a randomised controlled trial. Lancet 376, 1989–1999 (2010).
Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate illumina paired-end reAd mergeR. Bioinformatics 30, 614–620 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007).
Jackson, G. H. et al. Lenalidomide maintenance versus observation for patients with newly diagnosed multiple myeloma (Myeloma XI): a multicentre, open-label, randomised, phase 3 trial. Lancet Oncol. 20, 57–73 (2019).
Morris, T. J. et al. ChAMP: 450k chip analysis methylation pipeline. Bioinformatics 30, 428–430 (2014).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
Li, N. et al. Genetic predisposition to multiple myeloma at 5q15 is mediated by an ELL2 enhancer polymorphism. Cell Rep. 20, 2556–2564 (2017).
Orlando, G., Kinnersley, B. & Houlston, R. S. Capture Hi-C library generation and analysis to detect chromatin interactions. Curr. Protoc. Hum. Genet. https://doi.org/10.1002/cphg.63 (2018).
Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in capture Hi-C data. Genome Biol. 17, 127 (2016).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009).
Andrews, N. C. & Faller, D. V. A rapid micropreparation technique for extraction of DNA-binding proteins from limiting numbers of mammalian cells. Nucleic Acids Res. 19, 2499 (1991).
Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 8, 2281–2308 (2013).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21 29 21–21 29 29 (2015).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Nilsson, B., Johansson, M., Al-Shahrour, F., Carpenter, A. E. & Ebert, B. L. Ultrasome: efficient aberration caller for copy number studies of ultra-high resolution. Bioinformatics 25, 1078–1079 (2009).
Jarvstrat, L., Johansson, M., Gullberg, U. & Nilsson, B. Ultranet: efficient solver for the sparse inverse covariance selection problem in gene network modeling. Bioinformatics 29, 511–512 (2013).
Nilsson, B., Johansson, M., Heyden, A., Nelander, S. & Fioretos, T. An improved method for detecting and delineating genomic regions with altered gene expression in cancer. Genome Biol. 9, R13 (2008).
Nilsson, B., Hakansson, P., Johansson, M., Nelander, S. & Fioretos, T. Threshold-free high-power methods for the ontological analysis of genome-wide gene-expression studies. Genome Biol. 8, R74 (2007).
Taslaman, L. & Nilsson, B. A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data. PLoS ONE 7, e46331 (2012).
Niroula, A. & Nilsson, B. Source code for caQTLseg. GitHub https://doi.org/10.5281/zenodo.5239301 (2021).
Acknowledgements
This work was supported by grants from the Knut and Alice Wallenberg Foundation (2012.0193 and 2017.0436), the Swedish Research Council (2017-02023 and 2018-00424), the Swedish Cancer Society (2017/265), the Nordic Cancer Union (R217-A13329-18-S65), Arne and Inga-Britt Lundberg’s Stiftelse (2017-0055), European Research Council (EU-MSCA-COFUND grant no. 754299 and 847583) Myeloma UK and Cancer Research UK (C1298/A8362), The National Institute of Health (R01 DK103794 and R01HL146500), the New York Stem Cell Foundation, a gift from the Lodish Family to Boston Children’s Hospital, and Mr. Ralph Stockwell. We thank Ellinor Johnsson for her assistance between 2011 and 2020. We are indebted to the patients who participated in the study.
Funding
Open access funding provided by Lund University.
Author information
Authors and Affiliations
Contributions
R.A., A.N., M.P., and B.N. designed the project. C.C., M.T., E.L.B., L.D.-L., A.L.d.L.P., I.J., T.R., U.T., V.S., and R.H. contributed to design. R.A., M.P., C.C., L.D.L., O.M., G.N., and K.G. performed experiments. S.K., A.S., M.K., K.A., N.M., N.W., K.H., H.G., A.F., I.J., T.R., F.v.R., A.W., U.T., V.G.S., K.S., and R.H. contributed data or samples. R.A., A.N., M.P., C.C., M.T., M.W., E.L.B., L.D.-L., A.L.d.L.P., T.O., N.U.-D., M.S., C.A.L., G.H.H., G.T., N.W., and B.N. carried out statistical analyses or analyzed the data. R.A., A.N., M.P., C.C., M.T., M.W., L.D.-L., A.L.d.L.P., R.H., and B.N. wrote the manuscript. All authors contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
Authors T.O., O.M., G.H.H., G.T., G.L.N., K.G., I.J., T.R., U.T., and K.S. are employed by deCODE Genetics/Amgen Inc. The remaining authors declare no competing interests.
Peer review information
Nature Communications thanks Yue Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ajore, R., Niroula, A., Pertesi, M. et al. Functional dissection of inherited non-coding variation influencing multiple myeloma risk. Nat Commun 13, 151 (2022). https://doi.org/10.1038/s41467-021-27666-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-021-27666-x
This article is cited by
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.