A polygenic risk score for multiple myeloma risk prediction

There is overwhelming epidemiologic evidence that the risk of multiple myeloma (MM) has a solid genetic background. Genome-wide association studies (GWAS) have identified 23 risk loci that contribute to the genetic susceptibility of MM, but have low individual penetrance. Combining the SNPs in a polygenic risk score (PRS) is a possible approach to improve their usefulness. Using 2361 MM cases and 1415 controls from the International Multiple Myeloma rESEarch (IMMEnSE) consortium, we computed a weighted and an unweighted PRS. We observed associations with MM risk with OR = 3.44, 95% CI 2.53–4.69, p = 3.55 × 10−15 for the highest vs. lowest quintile of the weighted score, and OR = 3.18, 95% CI 2.1 = 34–4.33, p = 1.62 × 10−13 for the highest vs. lowest quintile of the unweighted score. We found a convincing association of a PRS generated with 23 SNPs and risk of MM. Our work provides additional validation of previously discovered MM risk variants and of their combination into a PRS, which is a first step towards the use of genetics for risk stratification in the general population.


INTRODUCTION
Multiple myeloma (MM) is the third most common hematological malignancy with a worldwide incidence rate of 2.1/100,000 new cases each year (https://gco.iarc.fr/today/home) [1]. MM is preceded by monoclonal gammopathy of undetermined significance (MGUS), an asymptomatic premalignant condition [2,3], and by smoldering myeloma (SM), a more advanced precursor of the disease [4].
Considering also the rarity of the disease, the identified variants have a poor clinical use in predicting the individual risk, especially if considering the general population. A possible approach to improve usefulness of genetic risk markers could be to combine the SNPs in a polygenic risk score (PRS) in order to have a better estimation of their cumulative effect on the risk of developing the disease. This method has been successfully applied to several diseases including breast, prostate, colorectal, and pancreatic cancer [22][23][24][25][26][27][28]. For myeloma, a PRS was briefly mentioned in the latest GWAS publication [17]. An earlier study compared a 16-SNP PRS in familial and sporadic MM cases [29]. A PRS including all the known risk SNPs has been also evaluated in African-Americans [30].
The aim of this work is to use the International Multiple Myeloma (IMMeNSE) consortium to establish a PRS for MM and provide an evaluation of the PRS performance in an independent set of MM cases and controls.

MATERIALS AND METHODS Study population
We used DNA samples from 2361 MM patients and 1415 controls from 7 countries (Denmark, France, Hungary, Israel, Italy, Poland, and Portugal) within the IMMEnSE consortium [6], for whom information on sex and age was available. Cases were defined by a confirmed diagnosis of MM according to the International Myeloma Working Group criteria [31]. Controls were selected from the general population, from hospitalized subjects with different diagnoses excluding cancer, or from blood donors. Characteristics of the study population are summarized in Table 1.

SNP selection
To build the PRS we used 23 SNPs shown to be associated with MM risk at genome-wide significance level (p < 5 × 10 −8 ) by previous GWAS [5,7,14,15,17]. We did not include variants reported to be associated with MM risk but not at genome-wide level of significance (e.g., those reported by Erickson et al. [9]). Characteristics of the SNPs included in the PRS are summarized in Supplementary Table 1.

Genotyping and PRS computation
Genotyping was performed using TaqMan technology (ThermoFisher Applied Biosystems, Waltham MA, USA) according to the manufacturer's recommendations. TaqMan assays were not available for some SNPs, therefore we replaced them with surrogates in high linkage disequilibrium (r 2 > 0.9), as detailed in Supplementary Table 1.
For each SNP, the number of alleles associated with higher MM risk were counted and added up for each study subject, resulting in an unweighted PRS, which had a theoretical range from 0 (no MM risk alleles) to 46 (all risk alleles are present at each SNP in homozygosity). In addition, we built a weighted PRS by using the ORs of the codominant model of the association of each variant with MM risk in the IMMEnSE population as coefficients to weight the relative effects of the risk SNPs. For each SNP in the weighted PRS, a value of 0 was assigned if 0 risk alleles were present, the ln(OR) of the heterozygous was assigned if one risk allele was present, and the ln(OR) of the homozygous was assigned if two risk alleles were present. Then all the values were summed among them for each subject. We built alternative weighted PRSs by using ORs from the literature, or values calculated in our dataset. Only a subset of the study subjects (1426 cases and 969 controls) had a 100% SNP call rate. Therefore, in order to be able to compute comparable score values for all study subjects, we also considered "scaled" scores, in which the PRS values for each subject were multiplied by the ratio between 23 (total number of SNPs) and the number of effectively genotyped SNPs for the subject in question. For both PRSs (weighted and unweighted), we calculated quintiles based on the distribution of values in the controls.
The formulas for the unweighted and weighted scores are respectively P m 1 aj and P m 1 aXj, where a = number of risk alleles (0, 1, 2), m = total number of SNPs (23), j = jth subject, X = ln(OR). Supplementary Table 2 shows an example of how the scores were generated.

Data filtering and statistical analysis
Samples with call rate less than 80% were not included in subsequent analysis. Pearson chi square was used to test departure from Hardy-Weinberg equilibrium (HWE) in the overall control group and in the individual countries.
To validate the associations between the individual SNPs and MM risk, we used logistic regression according to the log-additive and codominant models, using the more common allele in controls as the reference category.
We analyzed the association between the PRSs and MM risk by logistic regression. Age-stratified analyses were performed by comparing all controls with younger or older cases, with cutpoints at 55 (to distinguish between early onset and non-early onset cases), 61 (median age at onset of the cases in this study), or 69 years of age (median age at onset of MM, https://seer.cancer.gov/statfacts/html/mulmy.html) [32]. All analyses were adjusted for age, sex, and geographic region of origin.
We set up receiver operating characteristic (ROC) curves and calculated the areas under the curve (AUC), to determine the performance of the PRSs in discriminating MM cases from individuals without the disease.

RESULTS
We genotyped a total of 3376 subjects (2361 cases and 1415 controls). Controls from Portugal resulted out of HWE for SNPs rs877529 and rs4325816 in one 384-well plate (using a Bonferroni-corrected threshold of p < 0.002). Therefore, genotypes of Portuguese subjects for those two SNPs were dropped from the dataset. The remaining data were used for further statistical analyses. Duplicated samples (8% of the total) showed a concordance rate higher than 99%.
The associations between 12 of the SNPs and MM risk were replicated in IMMEnSE (p < 0.05) ( Table 2). Regardless of statistical significance, all SNPs showed ORs going in the same directions as originally reported in the literature.
We observed strong associations between the PRS and MM risk (Table 3). When we computed the association between the PRSs and MM risk considering only 1426 cases and 969 controls with a call rate of 100%, we observed an OR = 3.18, 95% CI 2.34-4.33, p = 1.62 × 10 −13 for the highest vs. lowest quintile of the unweighted score and OR = 3.44, 95% CI 2.53-4.69, p = 4.86 × 10 −15 for the highest vs. lowest quintile of the weighted score. Results were very similar when we considered the whole dataset including 2361 cases and 1415 controls and "scaled" PRSs (Table 3), as well as when we built weighted scores using ORs for each SNP from the original GWASs (Table 3).
A histogram showing the difference in number of risk alleles (unweighted PRS) between cases and controls is shown in Supplementary Fig. 1.
In order to focus on the extreme parts of the risk distribution, we also calculated the difference in risk of subjects in the 95th percentile compared to subjects in the 5th percentile, and we found a substantial difference in risk (OR = 5.77, 95% CI 2.37-14.06, p = 1.12 × 10 −4 ). Furthermore, we compared the subjects in the 95th percentile with subjects in the middle of the score distribution (third quintile) and we obtained an OR = 4.22, 95% CI 2.11-8.44, p = 4.52 × 10 −5 . All the tail distribution results are shown in Table 4.
In addition, we performed case-control analyses stratifying the cases by age at diagnosis. We used three age cutpoints: 55, 61, and 69. The PRS was associated with MM risk in all strata, without differences in risk due to age of onset (data not shown).
The AUCs for each score are shown in Table 5. The best performance was observed for the unweighted PRS when    considering only subjects with 100% call rate (AUC = 0.64, 95% CI = 0.62-0.67).

DISCUSSION
Twenty-three SNPs affecting risk of MM were identified through GWAS. Since individually they do not explain a large proportion of the disease risk, we combined them in a PRS, which showed association with MM risk with strong statistical significance. Our results are encouraging, since when comparing the tails of the PRS distribution we observed a fourfold or more increase in risk. The best area under the curve associated with the PRS was modest (AUC = 0.64, 95% CI = 0.62-0.67). However, this test could show a much better predictive ability in a selected population at already increased risk, such as individuals with MGUS or SM patients. We expect that the PRS performance will improve as more variants associated with MM are discovered, as shown by studies on other cancer types [23,26,27]. A further step to the clinical use of PRS is to combine them with environmental or lifestyle risk factors, as well as family history. We can envisage that in the middle/long term an enhanced MM risk PRS could become a powerful prediction tool for individualized risk stratification. Genotyping of risk loci will be done quickly and inexpensively in large groups of the population. Information on risk loci will be combined with questionnaire data on non-genetic risk factors, and specialized algorithms will estimate disease risk in a personalized manner. This will allow to adopt preventive measures, such as enhanced surveillance or intensified screening of people at high risk. A limitation of this work is that the individuals used are all of European origin, making it difficult to generalize the data for other ethnicities. The same PRS was recently studied in African-Americans, with results comparable to those of European descent people [30]. Another limitation is that we examined only genetic polymophisms. It would be worth exploring whether a multifactorial score including also non-genetic risk factors could have a better predictive power. Unfortunately, we do not have complete data about known MM risk factors in IMMEnSE, therefore we can not explore multifactorial risk scores with meaningful numbers of cases and controls.
In conclusion, we found a convincing association of a 23-SNP PRS and MM risk. Our work provides additional validation of previously discovered MM risk variants and of their combination into a PRS, which is a first step toward the use of genetic background in the prevention of the disease. Additional risk SNP discovery will allow to generate PRS with a better accuracy and a clearer usefulness.

DATA AVAILABILITY
The dataset underlying this manuscript has been submitted to the European Genome-phenome Archive (EGA) under accession number EGAS00001005654.