Polygenic risk score and risk of monoclonal B-cell lymphocytosis in caucasians and risk of chronic lymphocytic leukemia (CLL) in African Americans

Monoclonal B-cell lymphocytosis (MBL) is a precursor to CLL. Other than age, sex, and CLL family-history, little is known about factors associated with MBL risk. A polygenic-risk-score (PRS) of 41 CLL-susceptibility variants has been found to be associated with CLL risk among individuals of European-ancestry(EA). Here, we evaluate these variants, the PRS, and environmental factors for MBL risk. We also evaluate these variants and the CLL-PRS among African-American (AA) and EA-CLL cases and controls. Our study included 560 EA MBLs, 869 CLLs (696 EA/173 AA), and 2866 controls (2631 EA/235 AA). We used logistic regression, adjusting for age and sex, to estimate odds ratios (OR) and 95% confidence intervals within each race. We found significant associations with MBL risk among 21 of 41 variants and with the CLL-PRS (OR = 1.86, P = 1.9 × 10−29, c-statistic = 0.72). Little evidence of any association between MBL risk and environmental factors was observed. We observed significant associations of the CLL-PRS with EA-CLL risk (OR = 2.53, P = 4.0 × 10−63, c-statistic = 0.77) and AA-CLL risk (OR = 1.76, P = 5.1 × 10−5, c-statistic = 0.62). Inherited genetic factors and not environmental are associated with MBL risk. In particular, the CLL-PRS is a strong predictor for both risk of MBL and EA-CLL, but less so for AA-CLL supporting the need for further work in this population.


INTRODUCTION
Chronic lymphocytic leukemia (CLL) is a neoplasm of mature B-cells, with at least 5 × 10 9 B-cells/L in the peripheral blood [1]. These CLL cells typically co-express CD5, CD19, CD20 dim and CD23, and exhibit a decrease in expression of surface immunoglobulin, CD20, and CD79b as compared to normal B cells [2,3]. Leukemic B-cells also show restricted expression of either kappa or lambda immunoglobulin light chains featuring the clonal nature of such cells [2].
Monoclonal B-cell lymphocytosis (MBL) is a pre-malignant condition with a clonal absolute B-cell count of <5 × 10 9 /L in the peripheral blood, with the notable absence of lymphadenopathy, cytopenias, or organomegaly [1], and an immunophenotype that is similar to that of CLL. MBL is a precursor state to CLL [4,5]. MBL clones are present in~5-12% in the general population [6][7][8] with the prevalence rising to 15-22% in unaffected firstdegree relatives of CLL patients [4,9,10]. MBL is also sub-classified into low-count MBL (LC-MBL) or high-count MBL (HC-MBL) according to the B-cell clone size of below or above 0.5 × 10 9 /L threshold, respectively [4,6,11]. Other than age, sex, and family history of CLL, little is known about factors associated with risk of MBL.
To date, 41 single nucleotide polymorphisms (SNPs) have been found to be associated with risk of CLL among European ancestry (EA) individuals, and they explain~25% of the additive heritable risk [12][13][14][15][16][17][18][19]. We previously showed that a PRS of the weighted average of the number of risk alleles of these 41 SNPs is associated with CLL risk using cases and controls of EA from the International Lymphoma Epidemiology (InterLymph) Consortium [20]. However, these InterLymph cases and controls were used to identify over half of the SNPs, potentially inflating the association. Thus, we evaluated this CLL-PRS in an independent sample of CLL cases and controls from the Genetic Epidemiology of CLL (GEC) Consortium, a cohort of families each with ≥2 members with CLL. This analysis demonstrated that CLL-PRS along with age and sex has high discrimination (c-statistic = 0.78) for CLL risk. In these CLL families, we also reported an association of these 41 SNPs and the CLL-PRS with MBL risk in a small cohort of 95 familial MBLs; the vast majority (93%) of these were LC-MBL [20].
Here we evaluate these 41 SNPs, the CLL-PRS, and environmental factors in a large screening cohort of 560 EA MBLs (including 396 LC-MBLs and 164 HC-MBLs) and 2631 EA controls known not to have MBL, all of whom ascertained agnostic to family history status. Because the CLL-PRS has not been evaluated in non-EA individuals, particularly in African Americans (AA), we also evaluate the CLL-PRS in 173 AA CLL cases and 235 AA controls and compare these results to another independent cohort of 696 EA CLLs.

Study population
MBL and control individuals. To identify individuals with MBL, we had two EA cohorts: a screening cohort and a clinical cohort ( Supplementary Fig. 1). For the screening cohort, we used stored cryopreserved peripheral blood mononuclear cells (PBMC) from 3041 asymptomatic adults participating in the Mayo Clinic Biobank to screen for MBL using a highly sensitive flow cytometry. Each consented participant in the Mayo Clinic Biobank was asked to complete a self-reported health-history questionnaire, provide a blood sample, and allow access to their Mayo Clinic medical record [21]. The baseline health-history questionnaire was a self-reported questionnaire that included domains around medical history, lifestyle factors, family history of hematological malignancies (any non-Hodgkin lymphoma, Hodgkin lymphoma, multiple myeloma, or leukemia), reproductive history, and occupational exposures [21] (Supplementary Table 1). We screened for MBL using a highly sensitive, 8-color (CD38, CD45, Kappa, Lambda, CD19, CD23, CD5 and CD20) flow-cytometry assay with the capacity to detect clonal B-cell counts to the 0.005% level (1/20,000 events), and for each individual, 500,000 PBMC events were typically captured [22]. Based on our MBL screening, we identified 410 individuals with CLL phenotype MBL (i.e., CD5 + CD20 dim ), with the remaining 2631 individuals without MBL serving as controls. Because the Mayo Clinic biobank participants did not all have a complete blood count, we used the percent of clonal B-cells out of total Bcells to categorize participants as LC-and HC-MBL [4]. Based on prior evidence, those MBL individuals with a percent clonal B-cell <85% were defined as LC-MBL and those with percent clonal B-cells ≥85% as HC-MBL [4]. Our second MBL cohort is a clinical cohort of predomominantly (99%) HC-MBL from the Mayo Clinic CLL Resource. This resource is comprised of individuals with a clonal B-cell population of CLL immunophenotype who are seen on a routine basis for clinical evaluations in the Division of Hematology at Mayo Clinic (Rochester, MN). All diagnoses were confirmed by a Mayo hematopathologist based on the 1996 NCI working group criteria and then updated to the 2008 International Workshop CLL criteria. From this CLL Resource, we identified 150 MBLs who had available DNA collected within 2 years of the initial MBL diagnosis. MBL was classified by LC-MBL or HC-MBL according to the B-cell clone size of below or above 0.5 × 10 9 /L threshold, respectively [6,11] (Supplementary Fig. 1).
CLL patients and control individuals. CLL patients of EA or AA were ascertained from four studies (Supplementary Fig. 1 Fig. 1). CLL diagnoses were made based on the 1996 NCI working group criteria and updated to the 2008 International Workshop CLL criteria wherever possible. AA controls (N = 235) with no history of CLL were identified from the Mayo Clinic Biobank (Supplementary Fig. 1).
All individuals provided written informed consent approved by the respective institutional review board.

Genotyping
Genotyping of the study cohort was done using Illumina genotyping arrays and genotypes were called using Illumina GenomeStudio software. Extensive quality control metrics were utilized including removing monomorphic SNPs, SNPs with call rates <95%, or SNPs with extreme Hardy-Weinberg disequilibrium (P < 1.0 × 10 −5 ). We also dropped individuals with call rates <90%, gender discordance, or those who had a relative genotyped. Duplicates showed >99% concordance. From these data, we pulled the 41 SNPs previously found to be associated with CLL (Supplementary Tables 2 and 3). Using ADMIXTURE [26], we determined genetic ancestry for each individual using the HapMAP as the reference. Individuals with percent of African ancestry ≥50% were considered AA, and individuals with >80% Caucasian ancestry were considered EA. We correlated MAF between EA and AA across the 41 CLL SNPs using the 1000 genomes project data [27].

Statistical analyses
We evaluated differences in the distribution of demographic characteristics and self-reported environmental exposures (including medical, lifestyle, family history, and occupational exposures) between cases and controls, using two-sided χ 2 test or Student's t test, where appropriate. Logistic regression was used to estimate OR and 95% confidence intervals (CIs), adjusted for age and sex. We computed the CLL-PRS based on the 41 CLL SNPs (Supplementary Tables 2 and 3) as previously published [20]. Specifically, the PRS was computed as a weighted average of the number of risk alleles across the 41 CLL SNPs, with the weights being the log of the odds ratio (OR) previously reported for each SNP (Supplementary Tables 2  and 3) [20]. We evaluated the CLL-PRS as a continuous or categorical predictor. Among EA analyses, we categorized the PRS by quintiles based on cutoffs previously used with 7983 controls from the InterLymph Consortium [20]. We also calculated an unweighted CLL-PRS and evaluated this unweighted PRS with CLL risk. For the AA analyses, we categorized the PRS quintiles based on 235 AA controls obtained from this study and used the same weights as that in EA analyses; an unweighted CLL-PRS was also evaluated. We used logistic regression, adjusted for age and sex, to evaluate associations of the PRS with risk of CLL, MBL, or MBL subtypes, stratified by race. The middle quintile served as the reference category. Among EA individuals, we calculated a trend test among LC-MBL, HC-MBL, and CLL risk using the P value for heterogeneity from a polytomous logistic regression analysis. Moreover, we plotted a boxplot for the PRS among controls, LC-MBL, HC-MBL, and EA CLL, and evaluated the statistical difference using the Kruskal-Wallis test. To evaluate model discriminatory ability, we computed a c-statistic and 95% CIs [28] for the adjusted regression models. Two-sided P values < 0.05 indicated statistical significance. In addition to the PRS, we evaluated each of the 41 CLL SNPs with the risk of MBL overall, LC-MBL, HC-MBL, EA CLL, and AA CLL assuming a log additive model in logistic regression. Because these SNPs were selected a priori, we used the nominal level (P < 0.05) for statistical significance. The data were analyzed using Software Package for Statistics and Simulation (IBM SPSS version 25, IBM Corp, Armonk, NY, USA), and R 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria).

RESULTS
We evaluated associations of the individual SNPs and the CLL-PRS in 3887 individuals of EA and 408 AA individuals. Collectively, this included 560 EA MBLs (396 LC-MBLs and 164 HC-MBLs), 696 EA CLLs, 173 AA CLLs, 2631 EA controls, and 235 AA controls. The demographics of these individuals are shown in Table 1.

Individual CLL-susceptibility SNPs and risk of CLL and MBL
The results for each of the 41 individual CLL-susceptibility SNPs for risk of CLL, MBL and MBL subtypes (LC-MBL and HC-MBL) are shown in Supplementary Tables 2 and 3. Among CLL cases and controls of EA, the ORs of 40 (98%) SNPs out of the 41 were directionally consistent with those reported in the larger CLL GWAS studies [12][13][14][15][16][17][18][19], and 32 (78%) SNPs out of the 41 were statistically significant at P < 0.05 (Supplementary Table 3). We also evaluated the 41 individual CLL-susceptibility SNPs among AA CLL cases and controls (Supplementary Table 3). Among the 41 SNPs, ORs of 22 SNPs (54%) were directionally consistent with those reported in CLL GWAS of EA and only two SNPs, rs7690934 (OR = 1.41, CI: 1.03-1.95, P = 0.03) and rs1679013 (OR = 1.56, CI: 1.08-2.25, P = 0.02), were nominally significant (Supplementary Table 3). The lack of statistical significance for the other SNPs in the AA appears to be due in part to the variability of minor allele frequencies (MAF) across EA and AA. The median difference in the MAF between EA and AA across the 41 CLL SNPs was 7.2% (range: 0.2-26%) in the 1000 genomes, with the majority of the MAF in the AA being lower than that of EA (Supplementary Table 4, Supplementary Fig. 2). The lower MAF in the AA then translates to attenuated ORs in the AA compared to that in the EA ( Supplementary Fig. 3). Among MBL overall, the observed ORs for 39 (95%) of the 41 SNPs were directionally consistent with those reported in CLL, and 21 (51%) of the 41 SNPs were nominally statistically significant at P < 0.05 and 15 of the 41 SNPs showed little evidence of an association (OR < 1.1) (Supplementary Table 2).

Environmental exposures and risk of MBL in the Mayo Clinic Biobank
In the Mayo Clinic Biobank, we had 2512 controls and 365 MBL individuals who completed a self-reported questionnaire. Because the vast majority of MBLs from the Biobank were LC-MBL (only 9 individuals were HC-MBL), we evaluated the effect of these exposures on MBL risk overall (Supplementary Table 1). As expected, age per 10 years (OR = 1.83, CI: 1.64-2.04, P < 0.0001) and male sex (OR = 1.73, CI: 1.38-2.15, P < 0.0001) were associated with higher risk of MBL. Family history of leukemia/lymphoma was higher among MBL cases (N = 40, 13.1%) compared to controls (N = 205, 9.5%); however, it did not cross the threshold of significance (OR = 1.44, CI: 0.99-2.09, P = 0.06, adjusted for age and sex, Supplementary Table 1). Prior history of cancer other than leukemia or lymphoma was significantly higher (P < 0.0001) among MBL cases (N = 156, 43%) compared to controls (N = 769, 31%); however, the association was not statistically significant   Table 1). Within specific cancers, prior history of melanoma and non-melanoma skin cancers, prior history of sarcoma, and, among women, prior history of breast cancer were significantly higher in MBL cases compared to controls; however, none of these specific prior cancers were associated with MBL risk after adjusting for age and sex (Supplementary Table 1). No other exposures were found to be statistically associated with MBL risk, including prior history of type 2 diabetes, prior history of any autoimmune condition, or prior diagnosis of hepatitis A, B, or C.

DISCUSSION
Our study clearly demonstrated that an inherited genetic component exists for the development of MBL, both among a cohort of 410 asymptomatic individuals from the Mayo Clinic Biobank who were screened for MBL and among a cohort of 150 MBLs who were clinically identified in the Division of Hematology.
We observed that~50% of the known 41 SNPs from 37 CLLsusceptibility loci and the CLL-PRS comprised of these 41 SNPs were associated with MBL overall risk. Two prior studies evaluated risk of MBL with SNPs from 10 CLL-susceptibility loci among 419 MBLs [29] and from 8 CLL-susceptibility loci among 60 familial MBLs from CLL families [30]. All three studies found statistically significant associations with SNPs in the 2q37.1 locus, and two of the three studies (excluding the familial MBLs) found significant associations at the 6p25.3, 8q24.21, 11q24.1 and 16q24.1 loci. Our study also found associations at these loci. With the additional 32 CLL-susceptibility loci evaluated herein, we found significant SNP associations with MBL risk from 12 more loci. Of particular interest, we found no or limited evidence of association (OR < 1.10 and P > 0.05) for 12 known CLL risk loci. Because the SNPs in these loci have been repeatedly found to be associated with risk of CLL, this suggests that these loci may be associated with progression from MBL to CLL rather than associated with initiation of the B-cell clone. Further studies are needed to evaluate this hypothesis.  We previously reported that the CLL-PRS was associated with MBL risk among a cohort of 95 familial MBLs with a 2.3-fold increased risk [20]. Herein, among a cohort ascertained agnostic to family history of CLL, we also reported an association of the CLL-PRS with risk of MBL. In both studies, the CLL-PRS had good discrimination (after adjusting for age and sex) with an estimated c-statistic of 0.77 in the family study and 0.72 in this study. We next evaluated the CLL-PRS among the LC-MBL and HC-MBL subsets. We observed a significant association with a 1.75-fold and 2.14-fold increased risk for LC-MBL and HC-MBL, respectively. Moreover, the increase in the effect size from LC-MBL to HC-MBL to CLL was statistically significant. Because not all MBLs progress to CLL, the next needed study is to determine whether the CLL-PRS could discriminate progression to CLL among indivdiuals with MBL. Based on our data, there is strong evidence that those MBLs with high PRS will have a greater chance of progression to CLL compared to those MBLs with a low PRS.
For the first time, we evaluated the CLL-PRS in AA CLL cases and controls based on genetic ancestry and found a significant increased risk for CLL, though, with an attenuated effect (1.76-fold) and less discrimination (the c-statistic = 0.62) compared to our EA CLL cases and controls. These findings are not surprising given the known differences in the genetic landscape (i.e., allele frequencies and linkage disequilibrium) between populations of EA and AA. Moreover, the CLL-PRS is comprised of SNPs that were identified through GWAS of individuals with EA ancestry and includes the estimated ORs from these EA GWAS as the weights in the PRS calculation instead of weights obtained from AA GWAS of CLL, which has yet to be done. When we used an unweighted PRS, we also observed a significant, although attenuated, association. These results highlight that the EA PRS is a weak predictor for AA individuals compared to EA individuals. Thus there is a need for a GWAS of CLL among AA in order to identify CLL-susceptibility SNPs which may be unique to AA CLL or SNPs that are more informative within known CLL loci. A PRS can then be developed based on these more representative SNPs.
We previously reported that the CLL-PRS had a 2.49-fold increased risk among CLL cases and controls of EA from the InterLymph Consortium [20], but because these individuals were used to identify at least 50% of the CLL-susceptibility SNPs, the statistical significance and the effect size of the PRS would have been inflated (i.e., winner's curse [31]). Thus, we used an independent cohort of EA CLL cases and controls and reported consistent results of the CLL-PRS with a 2.53-fold increased risk. We also previously evaluated the CLL-PRS among CLL cases and control ascertained from CLL families that had at least 2 family members with CLL and also found consistent effect of the CLL-PRS with a 2.44-fold increased risk [20]. Importantly, across these three sets of CLL cases and controls, we also see strong and consistent discriminatory ability of the CLL-PRS, along with age and sex, with c-statistics of 0.79, 0.80, and 0.77, respectively. Collectively, these results affirm and again demonstrate that the CLL-PRS is a strong predictor of CLL risk.
Among the environmental exposures evaluated beyond age and sex, we observed suggestive although not-significant after adjusting for age and sex that a prior history of cancer or a family history of leukemia or lymphoma may be associated with MBL risk. A prior study by Casabonne et al. of 72 MBLs and 380 controls screened not to have MBL also found suggestive evidence albeit notsignificant that a prior history of cancers increased risk of MBL [32]. In addition, several family studies reported elevated prevalence rates of MBL among relatives of CLL families compared to that of the general population [9,10,33]. Casabonne et al. also found evidence that exposures to infectious agents (e.g., history of pneumonia) increased MBL risk and that prior history of vaccination (e.g., vaccinated against pneumococcal or influenza) decreased MBL risk [32]. No other medical, occupational, or lifestyle exposures evaluated herein were found to be associated with risk of MBL.
In conclusion, inherited genetic factors and not environmental are associated with risk of MBL. We reported that some, but not all of the CLL-susceptibility SNPs, and the CLL-PRS were associated with risk of initiation of the MBL clone among individuals of EA suggesting the possibility that the remaining SNPs are associated with progression to CLL. We also demonstrated that the CLL-PRS is a strong and significant predictor of risk for CLL among individuals of EA agnostic to family history and a somewhat weaker predictor of risk among AA individuals supporting the need for further work in this population. Most importantly the results of this study may help identify individuals at higher risk of developing MBL and CLL beyond the known risk associated with age, male sex, and family history of CLL in individuals of EA.