Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signalling was indicated by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.
Multiple myeloma is an incurable malignancy of mature B-lymphoid cells, and its pathogenesis is only partially understood. About 40% of cases harbour chromosome translocations resulting in overexpression of genes (including CCND1, CCND3, MAF, MAFB, WHSC1 (also called MMSET) and FGFR3) via their juxtaposition to the immunoglobulin heavy chain (IgH) locus1. Other cases exhibit hyperdiploidy. However, these abnormalities are probably insufficient for malignant transformation because they are also observed in the pre-malignant syndrome known as monoclonal gammopathy of uncertain significance. Malignant progression events include activation of MYC, FGFR3, KRAS and NRAS and activation of the NF-κB pathway1,2,3. More recently, loss-of-function mutations in the histone demethylase UTX (also called KDM6A) have also been reported4.
A powerful way to understand the molecular basis of cancer is to sequence either the entire genome or the protein-coding exome, comparing tumour to normal from the same patient to identify the acquired somatic mutations. Recent reports have described the sequencing of whole genomes from a single patient5,6,7,8,9. Although informative, we hypothesized that a larger number of cases would permit the identification of biologically relevant patterns that would not otherwise be evident.
Landscape of multiple myeloma mutations
We studied 38 multiple myeloma patients (Supplementary Table 1), performing whole-genome sequencing (WGS) for 23 patients and whole-exome sequencing (WES; assessing 164,687 exons) for 16 patients, with one patient analysed by both approaches (Supplementary Information). WES is a cost-effective strategy to identify protein-coding mutations, but cannot detect non-coding mutations and rearrangements. We identified tumour-specific mutations by comparing each tumour to its corresponding normal, using a series of algorithms designed to detect point mutations, small insertions/deletions (indels) and other rearrangements (Supplementary Fig. 1). On the basis of WGS, the frequency of tumour-specific point mutations was 2.9 per million bases, corresponding to approximately 7,450 point mutations per sample across the genome, including an average of 35 amino-acid-changing point mutations plus 21 chromosomal rearrangements disrupting protein-coding regions (Supplementary Tables 2 and 3). The mutation-calling algorithm was found to be highly accurate, with a true positive rate of 95% for point mutations (Supplementary Text, Supplementary Tables 4 and 5, and Supplementary Fig. 2).
The mutation rate across the genome varied greatly depending on base composition, with mutations at CpG dinucleotides occurring fourfold more commonly than mutations at A or T bases (Supplementary Fig. 3a). In addition, even after correction for base composition, the mutation frequency in coding regions was lower than that observed in intronic and intergenic regions (P < 1 × 10−16; Supplementary Fig. 3b), potentially owing to negative selective pressure against mutations disrupting coding sequences. There is also a lower mutation rate in intronic regions compared to intergenic regions (P < 1 × 10−16), which may reflect transcription-coupled repair, as previously suggested10,11. Consistent with this explanation, we observed a lower mutation rate in introns of genes expressed in multiple myeloma compared to those not expressed (Fig. 1a).
Frequently mutated genes
We next focused on the distribution of somatic, non-silent protein-coding mutations. We estimated statistical significance by comparison to the background distribution of mutations (Supplementary Information). Ten genes showed statistically significant rates of protein-altering mutations (‘significantly mutated genes’) at a false discovery rate (FDR) of ≤0.10 (Table 1). To investigate their functional importance, we compared their predicted consequence (on the basis of evolutionary conservation and nature of the amino acid change) to the distribution of all coding mutations. This analysis showed a dramatic skewing of functional importance (FI) scores12 for the ten significantly mutated genes (P = 7.6 × 10−14; Fig. 1b), supporting their biological relevance. Even after RAS and p53 mutations are excluded from the analysis, the skewing remained significant (P < 0.01).
We also examined the non-synonymous/synonymous (NS/S) mutation rate for the significantly mutated genes. The expected NS/S ratio was 2.82 ± 0.15, whereas the observed ratio was 39:0 for the significant genes (P < 0.0001), further strengthening the case that these genes are probably drivers of the pathogenesis of multiple myeloma, and are unlikely to simply be passenger mutations.
The significantly mutated genes include three previously reported to have point mutations in multiple myeloma: KRAS and NRAS (10 and 9 cases, respectively (50%), P < 1 × 10−11, q < 1 × 10−6), and TP53 (3 cases (8%), P = 5.1 × 10−6, q = 0.019). Interestingly, we identified two point mutations (5%, P = 0.000027, q = 0.086) in CCND1 (cyclin D1), which has long been recognized as a target of chromosomal translocation in multiple myeloma, but for which point mutations have not been observed previously in cancer.
The remaining six genes have not previously been known to be involved in cancer, and indicate new aspects of the pathogenesis of multiple myeloma.
RNA processing and protein homeostasis mutations
A striking finding of this study was the discovery of frequent mutations in genes involved in RNA processing, protein translation and the unfolded protein response. Such mutations were observed in nearly half of the patients.
The DIS3 (also called RRP44) gene harboured mutations in 4 out of 38 patients (11%, P = 2.4 × 10−6, q = 0.011). DIS3 encodes a highly conserved RNA exonuclease which serves as the catalytic component of the exosome complex involved in regulating the processing and abundance of all RNA species13,14. The four observed mutations occur at highly conserved regions (Fig. 2a) and cluster within the RNB domain facing the enzyme’s catalytic pocket (Fig. 2b). Two lines of evidence indicate that the DIS3 mutations result in loss of function. First, three of the four tumours with mutations exhibited loss of heterozygosity via deletion of the remaining DIS3 allele. Second, two of the mutations have been functionally characterized in yeast and bacteria, where they result in loss of enzymatic activity leading to the accumulation of their RNA targets15,16. Given that a key role of the exosome is the regulation of the available pool of mRNAs available for translation17, these results indicate that DIS3 mutations may dysregulate protein translation as an oncogenic mechanism in multiple myeloma.
Further support for a role of translational control in the pathogenesis of multiple myeloma comes from the observation of mutations in the FAM46C gene in 5 out of 38 (13%) patients (P = 1.8 × 10−10, q = 1 × 10−6). There is no published functional annotation of FAM46C, and its sequence lacks obvious homology to known proteins. To gain insight into its cellular role, we examined its pattern of gene expression across 414 multiple myeloma samples and compared it to the expression of 395 gene sets curated in the Molecular Signatures Database (MSigDB), using the GSEA algorithm18,19,20. The expression of FAM46C was highly correlated (q = 0.034 after multiple hypothesis correction; Fig. 2c) to the expression of the set of ribosomal proteins that are known to be tightly co-regulated21. Strong correlation with eukaryotic initiation and elongation factors involved in protein translation was similarly observed. Although the precise function of FAM46C remains unknown, this striking correlation provides strong evidence that FAM46C is functionally related in some way to the regulation of translation. Consistent with this observation, FAM46C was recently shown to function as an mRNA stability factor (M. Fleming, manuscript submitted).
Notably, although not statistically significant on their own, we found mutations in five other genes related to protein translation, stability and the unfolded protein responses (Supplementary Table 6), further supporting a role of translational control in multiple myeloma. Of particular interest, two patients had mutations in the unfolded protein response gene XBP1. Overexpression of a particular splice form of XBP1 has been shown to cause a multiple-myeloma-like syndrome in mice, although no role of XBP1 in the pathogenesis of human multiple myeloma has been described22.
Of related interest, mutations of the LRRK2 gene were observed in 3 out of 38 patients (8%; Supplementary Table 6). LRRK2 encodes a serine-threonine kinase that phosphorylates translation initiation factor 4E-binding protein (4EBP). LRRK2 is best known for its role in the predisposition to Parkinson’s disease23,24. Parkinson’s disease and other neurodegenerative diseases such as Huntington’s disease are characterized in part by aberrant unfolded protein responses25. Protein homeostasis may be particularly important in multiple myeloma because of the enormous rate of production of immunoglobulins by multiple myeloma cells26,27,28. The finding is also of clinical significance because of the success of the drug bortezomib (Velcade), which inhibits the proteasome and which shows remarkable activity in multiple myeloma compared to other tumour types29.
Together, these results indicate that mutations affecting protein translation and homeostasis are extremely common in multiple myeloma (at least 16 out of 38 patients; 42%), thereby indicting that additional therapeutic approaches that target these mechanisms may be worth exploring.
Identical mutations suggest gain-of-function oncogenes
Another way to recognize biologically significant mutations is to search for recurrence of identical mutations indicative of gain-of-function alterations in oncogenes. Two patients had an identical mutation (K123R) in the DNA-binding domain of the interferon regulatory factor IRF4. Interestingly, a recent RNA interference screen in multiple myeloma showed that IRF4 was required for multiple myeloma survival, consistent with its role as a putative oncogene30. Genotyping for this mutation in 161 additional multiple myeloma samples identified two more patients with this mutation. IRF4 is a transcriptional regulator of PRDM1 (also called BLIMP1), and two of 38 sequenced patients also exhibited PRDM1 mutations. PRDM1 is a transcription factor involved in plasma cell differentiation, loss-of-function mutations of which occur in diffuse large B-cell lymphoma31,32,33,34,35.
Clinically actionable mutations in BRAF
Some mutations deserve attention because of their clinical relevance. One of the thirty-eight patients harboured a BRAF kinase mutation (G469A). Although BRAF G469A has not previously been observed in multiple myeloma, this precise mutation is known to be activating and oncogenic36. We genotyped an additional 161 multiple myeloma patients for the 12 most common BRAF mutations and found mutations in 7 patients (4%). Three of these were K601N and four were V600E (the most common BRAF mutation in melanoma37). Our finding of common BRAF mutations in multiple myeloma has important clinical implications because such patients may benefit from treatment with BRAF inhibitors, some of which show marked clinical activity38. Our results also support the observation that inhibitors acting downstream of BRAF (for example, on MEK) may have activity in multiple myeloma39.
Gene set mutations: NF-κB pathway
Another approach to identify biologically relevant mutations in multiple myeloma is to look not at the frequency of mutation of individual genes, but rather of sets of genes.
We first considered gene sets based on existing insights into the biology of multiple myeloma. For example, activation of the NF-κB pathway is known in multiple myeloma, but the basis of such activation is only partially understood2,3. We observed 10 point mutations (P = 0.016) and 4 structural rearrangements, affecting 11 NF-κB pathway genes (Supplementary Table 7): BTRC, CARD11, CYLD, IKBIP, IKBKB, MAP3K1, MAP3K14, RIPK4, TLR4, TNFRSF1A and TRAF3. Taken together, our findings greatly expand the mechanisms by which NF-κB may be activated in multiple myeloma.
Gene set mutations: histone modifying enzymes
We next looked for enrichment in mutations in histone-modifying enzymes. This hypothesis arose because of our observation that the homeotic transcription factor HOXA9 was highly expressed in a subset of multiple myeloma patients, particularly those lacking known IgH translocations (Supplementary Fig. 4a). HOXA9 expression is regulated primarily by histone methyltransferases (HMT) including members of the MLL family. Sensitive polymerase chain reaction with reverse transcription (RT–PCR) analysis showed that HOXA9 was in fact ubiquitously expressed in multiple myeloma, with most cases exhibiting biallelic expression consistent with dysregulation via an upstream HMT event (Supplementary Fig. 4b, c). Accordingly, we looked for mutations in genes known to regulate HOXA9 directly. We found significant enrichment (P = 0.0024), with mutations in MLL, MLL2, MLL3, UTX, WHSC1 and WHSC1L1.
HOXA9 is normally silenced by histone 3 lysine 27 trimethylation (H3K27me3) chromatin marks when cells differentiate beyond the haematopoietic stem-cell stage40,41. This repressive mark was weak or absent at the HOXA9 locus in most multiple myeloma cell lines (Fig. 3a). Moreover, there was inverse correlation between H3K27me3 levels and HOXA9 expression (Fig. 3b), consistent with HMT dysfunction contributing to aberrant HOXA9 expression.
To establish the functional significance of HOXA9 expression in multiple myeloma cells, we knocked down its expression with seven shRNAs (Supplementary Fig. 5). In 11 out of 12 multiple myeloma cell lines, HOXA9-depleted cells exhibited a competitive disadvantage (Fig. 3c and Supplementary Fig. 6).
These experiments indicate that aberrant HOXA9 expression, caused at least in part by HMT-related genomic events, has a role in multiple myeloma and may represent a new therapeutic target. Further supporting a role of HOXA9 as a multiple myeloma oncogene, array-based comparative genomic hybridization identified focal amplifications of the HOXA locus in 5% of patients (Supplementary Fig. 7).
Discovering new gene set mutations
We next asked whether it would be possible to discover pathways enriched for mutations in the absence of previous knowledge. Accordingly, we examined 616 gene sets in the MSigDB Canonical Pathways database. One top-ranking gene set was of particular interest because it did not relate to genes known to be important in multiple myeloma. This gene set encodes proteins involved in the formation of the fibrin clot in the blood coagulation cascade. There were 6 mutations, in 5 of 38 patients (16%, q = 0.0054), encoding 5 proteins (Supplementary Table 8). RT–PCR analysis confirmed expression of 4 of the 5 coagulation factors in multiple myeloma cell lines (Supplementary Fig. 8). The coagulation cascade involves a number of extracellular proteases and their substrates and regulators, but their role in multiple myeloma has not been suspected. However, thrombin and fibrin have been shown to serve as mitogens in other cell types42, and have been implicated in metastasis43. These observations suggest that coagulation factor mutations should be explored more fully in human cancers.
Mutations in non-coding regions
Analyses of non-coding portions of the genome have not previously been reported in cancer. We focused on non-coding regions with highest regulatory potential. We defined 2.4 × 106 regulatory potential regions (Supplementary Fig. 9), averaging 280 base pairs (bp). We then treated these regions as if they were protein-coding genes, subjecting them to the same permutation analysis used for exonic regions.
We identified multiple non-coding regions with high frequencies of mutation which fell into two classes (Table 2 and Supplementary Table 9). The first corresponds to regions of known somatic hypermutation. These have a 1,000-fold higher than expected mutation frequency, as expected for post-germinal centre B cells (Supplementary Table 9). These regions comprise immunoglobulin-coding genes and the 5′ UTR of the lymphoid oncogene, BCL6, as reported44. Interestingly, we also found previously unrecognized mutations in the intergenic region flanking BCL6 in five patients, indicating that somatic hypermutation probably occurs in regions beyond the 5′ UTR and first intron of BCL6 (Table 2). Whether such non-coding BCL6 mutations contribute to multiple myeloma pathogenesis remains to be established.
The second class consisted of 18 non-coding regions with mutation frequencies beyond that expected by chance (q < 0.25) (Table 2 and Supplementary Table 10). Four of the 18 regions flanked genes that also harboured coding mutations. Interestingly, we observed 7 mutations in 5 of 23 patients (22%) within non-coding regions of BCL7A, a putative tumour suppressor gene discovered in the B-cell malignancy Burkitt lymphoma45, and which is also deleted or hypermethylated in cutaneous T-cell lymphomas46,47. The function of BCL7A is unknown, and the effect of its non-coding mutations in multiple myeloma remains to be established.
Our preliminary analysis of non-coding mutations indicates that non-exonic portions of the genome may represent a previously untapped source of insight into the pathogenesis of cancer.
The analysis of multiple myeloma genomes reveals that mechanisms previously suspected to have a role in the biology of multiple myeloma (for example, NF-κB activation and HMT dysfunction) may have broad roles by virtue of mutations in multiple members of these pathways. In addition, potentially new mechanisms of transformation are suggested, including mutations in the RNA exonuclease DIS3 and other genes involved in protein translation and homeostasis. Whether these mutations are unique to multiple myeloma or are common to other cancers remains to be determined. Furthermore, frequent mutations in the oncogenic kinase BRAF were observed—a finding that has immediate clinical translational implications.
Importantly, most of these discoveries could not have been made by sequencing only a single multiple myeloma genome—the complex patterns of pathway dysregulation required the analysis of multiple genomes. Whole-exome sequencing revealed the substantial majority of the significantly mutated genes. However, we note that half of total protein-coding mutations occurred via chromosomal aberrations such as translocations, most of which would not have been discovered by sequencing of the exome alone. Similarly, the recurrent point mutations in non-coding regions would have been missed with sequencing directed only at coding exons.
The analysis described here is preliminary. Additional multiple myeloma genomes will be required to establish the definitive genomic landscape of the disease and determine accurate estimates of mutation frequency in the disease. The sequence data described here will be available from the dbGaP repository (http://www.ncbi.nlm.nih.gov/gap) and we have created a multiple myeloma Genomics Portal (http://www.broadinstitute.org/mmgp) to support data analysis and visualization.
Informed consent from multiple myeloma patients was obtained in line with the Declaration of Helsinki. DNA was extracted from bone marrow aspirate (tumour) and blood (normal). WGS libraries (370–410-bp inserts) and WES libraries (200–350-bp inserts) were constructed and sequenced on an Illumina GA-II sequencer using 101- and 76-bp paired-end reads, respectively. Sequencing reads were processed with the Firehose pipeline, identifying somatic point mutations, indels and other structural chromosomal rearrangements. Structural rearrangements affecting protein-coding regions were then subjected to manual review to exclude alignment artefacts. True positive mutation rates were estimated by Sequenom mass spectrometry genotyping of randomly selected mutations. HOXA9 short hairpin (sh)RNAs were introduced into multiple myeloma cell lines using lentiviral infection using standard methods.
A complete description of the materials and methods is provided in the Supplementary Information.
Gene Expression Omnibus
Sequence data have been deposited in the dbGaP repository (http://www.ncbi.nlm.nih.gov/gap) under accession number phs000348.v1.p1. Additional data have been submitted to the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession numbers GSE26760 (GEP data) and GSE26849 (aCGH data); both data sets are also combined under accession code GSE26863. We have also created a multiple myeloma Genomics Portal (http://www.broadinstitute.org/mmgp) to support data analysis and visualization.
This project was funded by a grant from the Multiple Myeloma Research Foundation. M.A.C. was supported by a Clinician Scientist Fellowship from Leukaemia and Lymphoma Research (UK). We are grateful to all members of the Broad Institute’s Biological Samples Platform, Genetic Analysis Platform, and Genome Sequencing Platform, without whom this work would not have been possible.
The file contains Supplementary Figures 1-14 with legends, Supplementary Methods and Supplementary Tables 1-16.
About this article
Nature Communications (2018)