Characterization of rare germline variants in familial multiple myeloma

Introduction: The risk of developing Multiple Myeloma (MM) is 2-4 fold higher in first-degree relatives of patients with MM compared to the general population, suggesting genetic predisposition to this cancer. Indeed, recent genome-wide association studies have identified common risk alleles that predispose for MM. Yet, the impact of these variants on MM risk is too low to explain familial aggregation of MM. High-impact alleles have been identified for other cancers such as ovarian and breast cancer (BRCA1,-2) and melanoma (CDKN2A) but the search for such alleles in MM is still in its infancy. In order to identify high-impact alleles in MM we have performed whole genome/exon sequencing (WGS/WES) in members of MM high risk families.
 Methods: We included 21 families with multiple cases of MM/MGUS. Whole genome/exome sequencing was performed on a total of 46 affected and 20 unaffected family members. Filtering and prioritization of the variants were performed in accordance with the criteria of our in-house familial cancer variant prioritization pipeline version 2 (FCVPPv2). Loss-of-function variants were further screened using MutPred-LOF, Translate tool and IntOGen/c-BioPortal in order to discriminate pathogenic and neutral variants, to translate a nucleotide sequence to a protein sequence and to visualize the domain affected by the variant and the portion of the protein lost after the newly formed stop codon. Variants were analyzed for predicted effects on splicing by using Human Splicing Finder.
 Results: We found a total of 148 potentially pathogenic variants, 109 non-synonymous and 39 LOF, in 18 out of 21 MM families. Among our genes, many affect protein metabolism, immune system, and other have known links to carcinogenesis. Additionally, some of them are known to interact with key signaling pathways in MM, including PI3K/Akt/mTOR, Ras/Raf/MEK/MAPK, JAK/STAT, NF-κB, Wnt/β-catenin, and RANK/RANKL/OPG, showing congruency with previously reported literature. Interestingly, we also found different missense variants in the same two genes in two unrelated families.
 Conclusions: We have identified potentially pathogenic gene variants in 85% of MM/MGUS families. Our results can offer a useful reference to gene finding efforts by others in order to improve screening, early diagnosis and personalized therapy of individuals at risk of developing MM.
 
 
 Durie: Amgen, Celgene, Johnson & Johnson, and Takeda: Consultancy. Goldschmidt:Merck Sharp and Dohme (MSD): Research Funding; Molecular Partners: Research Funding; Incyte: Research Funding; Sanofi: Honoraria, Membership on an entity's Board of Directors or advisory committees, Other, Research Funding; Johns Hopkins University: Other: Grants and/or provision of Investigational Medicinal Product; Janssen: Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Grants and/or provision of Investigational Medicinal Product:, Research Funding; Dietmar-Hopp-Foundation: Other: Grants and/or provision of Investigational Medicinal Product:; Chugai: Honoraria, Other: Grants and/or provision of Investigational Medicinal Product:, Research Funding; Celgene: Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Grants and/or provision of Investigational Medicinal Product:, Research Funding; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Grants and/or provision of Investigational Medicinal Product:, Research Funding; University Hospital Heidelberg, Internal Medicine V and National Center for Tumor Diseases (NCT), Heidelberg, Germany: Current Employment; GlaxoSmithKline (GSK): Honoraria; Adaptive Biotechnology: Membership on an entity's Board of Directors or advisory committees; Amgen: Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Grants and/or provision of Investigational Medicinal Product, Research Funding; Novartis: Honoraria, Research Funding; Mundipharma GmbH: Research Funding; Takeda: Membership on an entity's Board of Directors or advisory committees, Research Funding.


variants were called using multi sample processing mode of the Unified Genotyper tool from GATK. To calculate background frequencies, we used pre-existing WES data from the NHLBI GO Exome Sequencing Project, which were subjected to variant calling in the same run (3). Genotypes with quality <10 or read depth <8 were marked as missing data. Variants with >10% missing data were excluded, as were samples with >5% missing data. A total of 3,597 controls of European ancestry were selected from the ESP data using the first two principal components calculated using -mds-plot option in PLINK (4).

Variant annotation and filtering
The processed list of WES variants and raw WGS variants were analyzed together in the following downstream steps. The variants were annotated with Gencode v19 gene definitions using ANNOVAR (5) and further with dbSNP (6), 1000 Genomes phase III (7), dbNSFP v2.9 (8), and ExAC (9) read depth >10. Minor allele frequency (MAF) of 0.1% was used with respect to 1000 Genomes phase III and non-TCGA exome aggregation (ExAC, version 0.3) data to remove common variants, and variant frequency of 2% from the local data sets was used to remove technical artefacts. In order to control for family relatedness and sample swaps, a pairwise comparison of variants among the cohort was carried out.

Variant prioritization: missense variants
Variant prioritization was performed using our in-house developed pedigree-based pipeline, Familial Cancer Variant Prioritization Pipeline (FCVPP) version 2 (10). Pedigree segregation was the first criterion the variants were screened for. Family members diagnosed with MM, MGUS or AL amyloidosis were considered as cases, also members with plasma cell dyscrasia, solitary plasmacytoma and aberrant plasma cell clone were considered as variant carriers and unaffected members as non-carriers, unless they were more than 10 years younger than the earliest age of diagnosis of cases in the family. After the pedigree segregation filtering, all the variants ranking within the top 1% of potentially deleterious variants in the human genome were selected using the Combined Annotation Dependent Depletion (CADD) tool v1.3; a scaled PHRED-like CADD score greater than 20 was applied (11).
As the following step, based on the assumption that variants within genes intolerant to variation are likely to be deleterious, only variants located in genes predicted to be intolerant by at least two of the Residual Variation Intolerance Scores (RVISs) based on NHLBI-ESP6500 (12) and ExAC (9) datasets and a local dataset, were selected. Additionally, they should be located in genes intolerant for missense variants according to Z-score, developed by the ExAC consortium (9).
As next, the variants should locate at an evolutionary conserved position, which was evaluated by  (22). Variants predicted to be deleterious by at least 60% of these tools were selected for further analyses.

Loss-of-function variant analysis
Frameshift, stop-gain/loss and splice-site variants affecting the canonical splice sites were considered if pedigree segregation and CADD score criteria were met. It is well known that also healthy people carry genetic variants predicted to cause loss-of function (LoF) (23). In order to discriminate pathogenic and neutral variants, we used MutPred-LOF (http://mutpredlof.cs.indiana.edu/index.html) (24). For each variant, it returns a score between zero and one; higher scores denote variants that are more likely to be pathogenic. In our analysis a threshold score of 0.50 at 5% false positive rate was used as suggested by Pagel et. al. (24). In addition, it shows up to five structural and functional mechanisms that are impacted in the affected region of the protein, accompanied by significant prior-corrected P-values. Variants that passed the filtering were further analyzed using the Translate tool (https://web.expasy.org/translate/) to translate a nucleotide (DNA/RNA) sequence to a protein sequence and IntOGen/c-BioPortal (https://www.intogen.org/search) in order to visualize the domain affected by the variant and the portion of the protein lost after the newly formed stop codon. Splice site variants were analyzed by using Human Splicing Finder (http://www.umd.be/HSF/HSF.shtml), a tool used to predict the effects of variants on splicing signals (25).

Additional variant quality control
Using the Integrative Genomics Viewer (IGV; version 2.4.10) (26), WGS data of all cases and controls were visually checked for correctness in order to increase the confidence of variant calls and reduce the risk of false positives.

Germline Copy Number Variant (gCNV) analysis
GATK gCNV module (version 4.1.7.0) was used to call CNVs from the WGS samples individually against a background of 200 WGS samples sequenced from the sample platform. The gCNVs were called based on the best practice recommended by the GATK (https://gatk.broadinstitute.org/hc/enus/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants). The major deviation was that the gCNVs were called only on the Gencode v19 exonic regions by considering them as the target regions. This decreased the turnaround time for the analysis of gCNVs from WGS data.
The resulting CNV segments with QS score above 30 were selected and annotated with the subset of gnomAD structural variant (SV) data (version 2.1, variants with 'PASS' filter tags and 'DUP' or 'DEL' SV types) using vcfanno (27). The segments with at least 80% overlap with a common gnomAD SV (popmax MAF > 0.1%) of same SV subtype were considered as common and removed. In addition, at least 50% of the targets (exons here) in the gCNV segments should have the denoised ploidies among the bottom (in the case on deletion) or top (in the case of duplication) 5% denoised cohort ploidies to be considered as a rare gCNVs. Subsequently, the candidate rare gCNVs were selected if they followed the disease inheritance pattern in the family.

Protein function
We used the UniProt Knowledgebase (UniProtKB, https://www.uniprot.org/) to evaluate the general function of the proteins, whose sequence was affected by the variants identified in our study (28).

Details of identified candidate genes, proteins and their function
In the main text we referred to some genes and gene variants and here we give functional details with references. The functions of the gene products were collected from the UniProtKB database and literature search.

Missense variants: genes, proteins and their function
After the FCVPPv2 application, a total of 109 potential pathogenic missense variants were identified; in most families several candidates were found and in four families none (Supplementary Table 2). All variants were private for each family, except for two genes, KIF1B (kinesin family member 1B) and DCHS1 (dachsous cadherin-related 1), in which two different missense variants were found in two unrelated families (Families 10 and 18 for KIF1B and 15 and 17 for DCHS1). KIF1B is involved in the transport of mitochondria and synaptic vesicles (29, 30).
In Family 18, the variant (ENST00000263934, p.Asn1594Lys) was located between a domain of unknown function (DUF 3694) and the pleckstrin homology (PH) domain, which plays a role in recruiting proteins to different membranes and targeting them to appropriate cellular compartments.
In   Our candidate list included two genes, KMT2A and USP28, functionally related to the recently reported MM predisposing genes, LSD1/KDM1A, encoding a lysine-specific demethylase, and USP45, an apoptosis-related gene regulating DNA repair (42,43). KMT2A (alias MLL1) is a histone H3 lysine 4 (H3K4) methyltransferase, which plays an essential role in early development and hematopoiesis and which mediates chromatin modifications associated with epigenetic transcriptional activation (44). Somatic mutations in KMT2 gene family are reported to be among the most frequent variants in many types of cancers, including MM (45). Carcinogenic mechanism for KMT2A mutations and the common fusion genes, which KMT2A is a part of, may be related to transcription of homeobox (HOX) target genes (45). USP28 is a deubiquitinase involved in the DNA damage-induced apoptosis. It regulates MYC protein stability in response to DNA damage (46). In MM, overexpression of MYC, mainly through complex chromosomal rearrangements, has been shown to promote myeloma cell survival and to lead to poor prognosis (47).
We checked our gene list also for the presence of the 82 somatically mutated driver genes in MM,

LoF variants: genes, proteins and their function
A total of 36 LoF variants were identified in the MM families (Supplementary Table 3). If we would apply a MutPred-LOF score higher than 0.50 at a 5% false positive rate, as suggested by Variants in the two genes related to immune function, IL3RA and IL17REL, had a low MutPred score. This score is not applicable to splice site variants. Of the eight splice site variants, five were predicted by Human Splicing Finder to alter the splicing motifs (indicate by 'yes' in Supplementary Table 3), however with no link to MM. Many of the genes with LoF mutations encode proteins with housekeeping functions. LONP2 is an ATP-dependent protease that plays a role in maintaining peroxisome homeostasis. CSGALNACT2 is a member of the chondroitin Nacetylgalactosaminyltransferase family. HMGCLL1 is a non-mitochondrial 3-hydroxymethyl-3methylglutaryl-CoA lyase involved in ketogenesis. FUK catalyzes the utilization of free L-fucose in glycoprotein and glycolipid synthesis.

Copy number variants: genes, proteins and their function
We identified seven CNVs that segregated with MM in the families (Supplementary Table 4 (52). FGFBP1 and FGFBP2 encode proteins that are involved in FGF ligand bioactivation by releasing them from extracellular matrix. Thus, duplication of these two genes may lead to activation of the FGF signaling, enhanced MM cell proliferation and survival and affect bone homeostasis (53,54). PROM1 is considered a marker of both hematopoietic progenitor and stem cells and cancer stem cells and it is overexpressed in acute lymphoblastic leukemia and many solid cancers contributing to the growth of the cancer cells (55,56).
In a review of cancer predisposing genes it was observed that over 40% of germline variants were in genes that functioned also as somatic drivers (57). In the above, we referred to some somatic drivers, and some of the observed genes are known to interact with key signaling pathways in MM, including PI3K/Akt/mTOR, Ras/Raf/MEK/MAPK, JAK/STAT, NF-κB, Wnt/β-catenin, and RANK/RANKL/OPG (58). Among the relevant genes in our list, DAB2IP, encoding a Ras-GTPase activating protein, modulates key oncogenic pathways such as PI3K/Akt, NF-κB, and Wnt/βcatenin (31); FOXO1 encodes for a downstream effector of Akt signaling (36); the LRP1B gene product negatively regulates the Wnt/β-catenin/TCF signaling, through its interaction with DVL2 (59   Pedigrees of the multiple myeloma families. For cancer patients age of diagnosis is shown, for healthy family members age at sampling. Families 18-21 consisted of two first-or second-degree relatives diagnosed with MM and are not shown.