Introduction

Chronic lymphocytic leukemia (CLL) is a neoplasm of mature B-cells, with at least 5 × 109 B-cells/L in the peripheral blood [1]. These CLL cells typically co-express CD5, CD19, CD20dim and CD23, and exhibit a decrease in expression of surface immunoglobulin, CD20, and CD79b as compared to normal B cells [2, 3]. Leukemic B-cells also show restricted expression of either kappa or lambda immunoglobulin light chains featuring the clonal nature of such cells [2].

Monoclonal B-cell lymphocytosis (MBL) is a pre-malignant condition with a clonal absolute B-cell count of <5 × 109/L in the peripheral blood, with the notable absence of lymphadenopathy, cytopenias, or organomegaly [1], and an immunophenotype that is similar to that of CLL. MBL is a precursor state to CLL [4, 5]. MBL clones are present in ~5–12% in the general population [6,7,8] with the prevalence rising to 15–22% in unaffected first- degree relatives of CLL patients [4, 9, 10]. MBL is also sub-classified into low-count MBL (LC-MBL) or high-count MBL (HC-MBL) according to the B-cell clone size of below or above 0.5 × 109/L threshold, respectively [4, 6, 11]. Other than age, sex, and family history of CLL, little is known about factors associated with risk of MBL.

To date, 41 single nucleotide polymorphisms (SNPs) have been found to be associated with risk of CLL among European ancestry (EA) individuals, and they explain ~25% of the additive heritable risk [12,13,14,15,16,17,18,19]. We previously showed that a PRS of the weighted average of the number of risk alleles of these 41 SNPs is associated with CLL risk using cases and controls of EA from the International Lymphoma Epidemiology (InterLymph) Consortium [20]. However, these InterLymph cases and controls were used to identify over half of the SNPs, potentially inflating the association. Thus, we evaluated this CLL-PRS in an independent sample of CLL cases and controls from the Genetic Epidemiology of CLL (GEC) Consortium, a cohort of families each with ≥2 members with CLL. This analysis demonstrated that CLL-PRS along with age and sex has high discrimination (c-statistic = 0.78) for CLL risk. In these CLL families, we also reported an association of these 41 SNPs and the CLL-PRS with MBL risk in a small cohort of 95 familial MBLs; the vast majority (93%) of these were LC-MBL [20].

Here we evaluate these 41 SNPs, the CLL-PRS, and environmental factors in a large screening cohort of 560 EA MBLs (including 396 LC-MBLs and 164 HC-MBLs) and 2631 EA controls known not to have MBL, all of whom ascertained agnostic to family history status. Because the CLL-PRS has not been evaluated in non-EA individuals, particularly in African Americans (AA), we also evaluate the CLL-PRS in 173 AA CLL cases and 235 AA controls and compare these results to another independent cohort of 696 EA CLLs.

Methods

Study population

MBL and control individuals

To identify individuals with MBL, we had two EA cohorts: a screening cohort and a clinical cohort (Supplementary Fig. 1). For the screening cohort, we used stored cryopreserved peripheral blood mononuclear cells (PBMC) from 3041 asymptomatic adults participating in the Mayo Clinic Biobank to screen for MBL using a highly sensitive flow cytometry. Each consented participant in the Mayo Clinic Biobank was asked to complete a self-reported health-history questionnaire, provide a blood sample, and allow access to their Mayo Clinic medical record [21]. The baseline health-history questionnaire was a self-reported questionnaire that included domains around medical history, lifestyle factors, family history of hematological malignancies (any non-Hodgkin lymphoma, Hodgkin lymphoma, multiple myeloma, or leukemia), reproductive history, and occupational exposures [21] (Supplementary Table 1). We screened for MBL using a highly sensitive, 8-color (CD38, CD45, Kappa, Lambda, CD19, CD23, CD5 and CD20) flow-cytometry assay with the capacity to detect clonal B-cell counts to the 0.005% level (1/20,000 events), and for each individual, 500,000 PBMC events were typically captured [22]. Based on our MBL screening, we identified 410 individuals with CLL phenotype MBL (i.e., CD5+ CD20dim), with the remaining 2631 individuals without MBL serving as controls. Because the Mayo Clinic biobank participants did not all have a complete blood count, we used the percent of clonal B-cells out of total B-cells to categorize participants as LC- and HC-MBL [4]. Based on prior evidence, those MBL individuals with a percent clonal B-cell <85% were defined as LC-MBL and those with percent clonal B-cells ≥85% as HC-MBL [4]. Our second MBL cohort is a clinical cohort of predomominantly (99%) HC-MBL from the Mayo Clinic CLL Resource. This resource is comprised of individuals with a clonal B-cell population of CLL immunophenotype who are seen on a routine basis for clinical evaluations in the Division of Hematology at Mayo Clinic (Rochester, MN). All diagnoses were confirmed by a Mayo hematopathologist based on the 1996 NCI working group criteria and then updated to the 2008 International Workshop CLL criteria. From this CLL Resource, we identified 150 MBLs who had available DNA collected within 2 years of the initial MBL diagnosis. MBL was classified by LC-MBL or HC-MBL according to the B-cell clone size of below or above 0.5 × 109/L threshold, respectively [6, 11] (Supplementary Fig. 1).

CLL patients and control individuals

CLL patients of EA or AA were ascertained from four studies (Supplementary Fig. 1): Mayo Clinic, Duke University, Weill Cornell Medical College, and the CLL Research Consortium (CRC). We identified 433 CLL patients (417 EA, 16 AA) from the Mayo Clinic CLL Resource who were diagnosed between 2002 and 2019 and who had available DNA collected within 2 years of CLL diagnosis. From Duke University, a total of 338 CLL patients (258 EA, 80 AA) were accrued from the CLL Clinic from 1999 through 2019 [23, 24]. From the CLL Research Consortium (CRC), we included 71 CLL patients (67 AA and 4 EA) [25]. Finally, from Weill Cornell Medical College, we included 27 CLL patients (17 EA, 10 AA) (Supplementary Fig. 1). CLL diagnoses were made based on the 1996 NCI working group criteria and updated to the 2008 International Workshop CLL criteria wherever possible. AA controls (N = 235) with no history of CLL were identified from the Mayo Clinic Biobank (Supplementary Fig. 1).

All individuals provided written informed consent approved by the respective institutional review board.

Genotyping

Genotyping of the study cohort was done using Illumina genotyping arrays and genotypes were called using Illumina GenomeStudio software. Extensive quality control metrics were utilized including removing monomorphic SNPs, SNPs with call rates <95%, or SNPs with extreme Hardy–Weinberg disequilibrium (P < 1.0 × 10−5). We also dropped individuals with call rates <90%, gender discordance, or those who had a relative genotyped. Duplicates showed >99% concordance. From these data, we pulled the 41 SNPs previously found to be associated with CLL (Supplementary Tables 2 and 3). Using ADMIXTURE [26], we determined genetic ancestry for each individual using the HapMAP as the reference. Individuals with percent of African ancestry ≥50% were considered AA, and individuals with >80% Caucasian ancestry were considered EA. We correlated MAF between EA and AA across the 41 CLL SNPs using the 1000 genomes project data [27].

Statistical analyses

We evaluated differences in the distribution of demographic characteristics and self-reported environmental exposures (including medical, lifestyle, family history, and occupational exposures) between cases and controls, using two-sided χ2 test or Student’s t test, where appropriate. Logistic regression was used to estimate OR and 95% confidence intervals (CIs), adjusted for age and sex. We computed the CLL-PRS based on the 41 CLL SNPs (Supplementary Tables 2 and 3) as previously published [20]. Specifically, the PRS was computed as a weighted average of the number of risk alleles across the 41 CLL SNPs, with the weights being the log of the odds ratio (OR) previously reported for each SNP (Supplementary Tables 2 and 3) [20]. We evaluated the CLL-PRS as a continuous or categorical predictor. Among EA analyses, we categorized the PRS by quintiles based on cutoffs previously used with 7983 controls from the InterLymph Consortium [20]. We also calculated an unweighted CLL-PRS and evaluated this unweighted PRS with CLL risk. For the AA analyses, we categorized the PRS quintiles based on 235 AA controls obtained from this study and used the same weights as that in EA analyses; an unweighted CLL-PRS was also evaluated. We used logistic regression, adjusted for age and sex, to evaluate associations of the PRS with risk of CLL, MBL, or MBL subtypes, stratified by race. The middle quintile served as the reference category. Among EA individuals, we calculated a trend test among LC-MBL, HC-MBL, and CLL risk using the P value for heterogeneity from a polytomous logistic regression analysis. Moreover, we plotted a boxplot for the PRS among controls, LC-MBL, HC-MBL, and EA CLL, and evaluated the statistical difference using the Kruskal–Wallis test. To evaluate model discriminatory ability, we computed a c-statistic and 95% CIs [28] for the adjusted regression models. Two-sided P values < 0.05 indicated statistical significance. In addition to the PRS, we evaluated each of the 41 CLL SNPs with the risk of MBL overall, LC-MBL, HC-MBL, EA CLL, and AA CLL assuming a log additive model in logistic regression. Because these SNPs were selected a priori, we used the nominal level (P < 0.05) for statistical significance. The data were analyzed using Software Package for Statistics and Simulation (IBM SPSS version 25, IBM Corp, Armonk, NY, USA), and R 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria).

Results

We evaluated associations of the individual SNPs and the CLL-PRS in 3887 individuals of EA and 408 AA individuals. Collectively, this included 560 EA MBLs (396 LC-MBLs and 164 HC-MBLs), 696 EA CLLs, 173 AA CLLs, 2631 EA controls, and 235 AA controls. The demographics of these individuals are shown in Table 1.

Table 1 Demographic characteristics by phenotype.

Individual CLL-susceptibility SNPs and risk of CLL and MBL

The results for each of the 41 individual CLL-susceptibility SNPs for risk of CLL, MBL and MBL subtypes (LC-MBL and HC-MBL) are shown in Supplementary Tables 2 and 3. Among CLL cases and controls of EA, the ORs of 40 (98%) SNPs out of the 41 were directionally consistent with those reported in the larger CLL GWAS studies [12,13,14,15,16,17,18,19], and 32 (78%) SNPs out of the 41 were statistically significant at P < 0.05 (Supplementary Table 3). We also evaluated the 41 individual CLL-susceptibility SNPs among AA CLL cases and controls (Supplementary Table 3). Among the 41 SNPs, ORs of 22 SNPs (54%) were directionally consistent with those reported in CLL GWAS of EA and only two SNPs, rs7690934 (OR = 1.41, CI: 1.03–1.95, P = 0.03) and rs1679013 (OR = 1.56, CI: 1.08–2.25, P = 0.02), were nominally significant (Supplementary Table 3). The lack of statistical significance for the other SNPs in the AA appears to be due in part to the variability of minor allele frequencies (MAF) across EA and AA. The median difference in the MAF between EA and AA across the 41 CLL SNPs was 7.2% (range: 0.2–26%) in the 1000 genomes, with the majority of the MAF in the AA being lower than that of EA (Supplementary Table 4, Supplementary Fig. 2). The lower MAF in the AA then translates to attenuated ORs in the AA compared to that in the EA (Supplementary Fig. 3). Among MBL overall, the observed ORs for 39 (95%) of the 41 SNPs were directionally consistent with those reported in CLL, and 21 (51%) of the 41 SNPs were nominally statistically significant at P < 0.05 and 15 of the 41 SNPs showed little evidence of an association (OR < 1.1) (Supplementary Table 2).

CLL-PRS and risk of MBL overall

The median CLL-PRS was 7.90 and 7.46 among 560 MBLs and 2631 controls of EA, respectively (Table 2). The PRS distribution among controls was consistent and overlapped with the distribution of 7983 controls from the InterLymph Consortium [20] (Supplementary Fig. 4). The continuous PRS had a 1.86-fold increased risk for MBL (CI: 1.67–2.07, P = 1.9 × 10−29), with a c-statistic of 0.72 (CI: 0.69–0.73) (Table 2). Compared to the middle quintile, the highest quintile had 2.38-fold increased risk for MBL (CI: 1.81–3.13, P = 5.5 × 10−10), and the lowest quintile had a 54% reduced risk (OR = 0.46, CI: 0.32–0.66, P = 2.9 × 10−5) (Table 2). The 99th percentile (5.5% of MBL) compared to the middle quintile had a 4.83-fold increased risk for MBL (CI: 2.81–8.31, P = 1.3 × 10−8).

Table 2 PRS and association with MBL risk among individuals of European ancestry.

CLL-PRS and risk of LC-MBL

Among 396 LC-MBL, only 10% were in the lowest PRS quintile, while 34% were in the highest quintile. The median PRS was 7.84, and the continuous PRS had a 1.75-fold increased risk for LC-MBL (CI: 1.55–1.98, P = 7.5 × 10−19) compared to the Biobank controls, with a c-statistic of 0.72 (CI: 0.70–0.75) (Table 3). Compared to the middle quintile, the highest quintile had 2.10-fold increased risk for LC-MBL (CI: 1.53–2.88, P = 4.0 × 10−6), and the lowest quintile had a 49% reduced risk (OR = 0.51, CI: 0.34–0.76, P = 0.001) (Table 3). The 99th percentile (4.3% of LC-MBL) compared to the middle quintile had a 3.69-fold increased risk for LC-MBL (CI: 1.94–7.02, P = 6.8 × 10−5).

Table 3 PRS and association with MBL subtypes and CLL among individuals of European ancestry.

CLL-PRS and risk of HC-MBL

Among 164 HC-MBL individuals, only 6% were in the lowest PRS quintile, while 44% were in the highest quintile. The median PRS was 8.05 which was higher than the LC-MBL PRS (Fig. 1). When comparing the HC-MBL individuals to the 2631 Biobank controls, the continuous PRS had a 2.14-fold increased risk for HC-MBL (CI: 1.80–2.56, P = 3.9 × 10−17), with a c-statistic of 0.73 (CI: 0.69–0.77) (Table 3). Compared to the middle quintile, the highest quintile had 3.13-fold increased risk for HC-MBL (CI: 1.97–4.98, P = 1.0 × 10−6), and the lowest quintile had a 0.33-fold decreased risk (CI: 0.15–0.70, P = 0.004) (Table 3). The 99th percentile (8.5% of HC-MBL) compared to the middle quintile had an 8.18-fold increased risk for HC-MBL (CI: 3.85–17.4, P = 4.6 × 10−8).

Fig. 1: Polygenic risk score distribution among controls, LC-MBL, HC-MBL, and CLL European ancestry and African-American individuals.
figure 1

A Boxplots representing the CLL-PRS distribution among EA controls, LC-MBL, HC-MBL, and CLL. The white line in the box represents the median score of 7.46, 7.84, 8.05, and 8.24 for controls, LC-MBL, HC-MBL, and CLL, respectively. P value represents the statistical difference of the CLL-PRS between the four groups. B Boxplots representing the CLL-PRS distribution among AA controls and CLL. The white line in the box represents the median score of 7.25 and 7.53 for controls and CLL, respectively. P value represents the statistical difference between the CLL-PRS in AA controls and CLL. Y-axis (CLL-PRS) represents a weighted average across 41 CLL risk SNPs. AA African-American, CLL chronic lymphocytic leukemia, EA European Ancestry, HC high-count, LC low-count, MBL Monoclonal B-cell lymphocytosis, PRS polygenic risk score.

CLL-PRS and risk of CLL among individuals of EA

Among 696 CLL patients, only 5% were in the lowest PRS quintile, while 49% were in the highest quintile. The median PRS was 8.24 which was higher than both the LC-MBL and HC-MBL PRS (Fig. 1). When comparing the CLL cases to the Biobank controls, the continuous PRS had a 2.53-fold increased risk for CLL (CI: 2.27–2.81, P = 4.0 × 10−63), with a c-statistic of 0.77 (CI: 0.75–0.79) (Table 3). Compared to the middle quintile, the highest quintile had 3.49-fold increased risk for CLL (CI: 2.70–4.51, P = 1.2 × 10−21), and the lowest quintile had a 0.31-fold decreased risk (CI: 0.21–0.46, P = 1.0 × 10−8) (Table 3). The 99th percentile (6.6% of CLL) compared to the middle quintile had a 5.98-fold increased risk for CLL (CI: 3.61–9.93, P = 4.3 × 10−12).

When comparing the PRS between controls, LC-MBL, HC-MBL, and CLL, we found a significant difference (P = 4.3 × 10−85, Fig. 1A). There was also a significant positive trend between the PRS effect sizes and risk of LC-MBL, HC-MBL, and CLL, with the association increasing as the clonal size increases (Pheterogeneity = 1.5 × 10−5).

CLL-PRS and risk of CLL among African-American individuals

We calculated the EA CLL-PRS among 173 AA CLL and 235 AA controls. The median PRS was 7.53 and 7.25 among CLL and controls, respectively (Table 4, Fig. 1B). We observed a 1.76-fold increased risk of CLL (CI: 1.34–2.31, P = 5.1 × 10−5) with a c-statistic of 0.62 (CI: 0.57–0.68). Moreover, when eliminating the weights that were generated from the EA, the unweighted CLL-PRS effect size attenuated but still statistically significant (continuous OR = 1.07, CI: 1.01–1.13, P = 0.03) (Table 4).

Table 4 Association between the CLL-PRS and risk of CLL among African Americans.

Environmental exposures and risk of MBL in the Mayo Clinic Biobank

In the Mayo Clinic Biobank, we had 2512 controls and 365 MBL individuals who completed a self-reported questionnaire. Because the vast majority of MBLs from the Biobank were LC-MBL (only 9 individuals were HC-MBL), we evaluated the effect of these exposures on MBL risk overall (Supplementary Table 1). As expected, age per 10 years (OR = 1.83, CI: 1.64–2.04, P < 0.0001) and male sex (OR = 1.73, CI: 1.38–2.15, P < 0.0001) were associated with higher risk of MBL. Family history of leukemia/lymphoma was higher among MBL cases (N = 40, 13.1%) compared to controls (N = 205, 9.5%); however, it did not cross the threshold of significance (OR = 1.44, CI: 0.99–2.09, P = 0.06, adjusted for age and sex, Supplementary Table 1). Prior history of cancer other than leukemia or lymphoma was significantly higher (P < 0.0001) among MBL cases (N = 156, 43%) compared to controls (N = 769, 31%); however, the association was not statistically significant after adjusting for age and sex (OR = 1.22, CI: 0.96–1.55, P = 0.10, Supplementary Table 1). Within specific cancers, prior history of melanoma and non-melanoma skin cancers, prior history of sarcoma, and, among women, prior history of breast cancer were significantly higher in MBL cases compared to controls; however, none of these specific prior cancers were associated with MBL risk after adjusting for age and sex (Supplementary Table 1). No other exposures were found to be statistically associated with MBL risk, including prior history of type 2 diabetes, prior history of any autoimmune condition, or prior diagnosis of hepatitis A, B, or C.

Discussion

Our study clearly demonstrated that an inherited genetic component exists for the development of MBL, both among a cohort of 410 asymptomatic individuals from the Mayo Clinic Biobank who were screened for MBL and among a cohort of 150 MBLs who were clinically identified in the Division of Hematology. We observed that ~50% of the known 41 SNPs from 37 CLL-susceptibility loci and the CLL-PRS comprised of these 41 SNPs were associated with MBL overall risk. Two prior studies evaluated risk of MBL with SNPs from 10 CLL-susceptibility loci among 419 MBLs [29] and from 8 CLL-susceptibility loci among 60 familial MBLs from CLL families [30]. All three studies found statistically significant associations with SNPs in the 2q37.1 locus, and two of the three studies (excluding the familial MBLs) found significant associations at the 6p25.3, 8q24.21, 11q24.1 and 16q24.1 loci. Our study also found associations at these loci. With the additional 32 CLL-susceptibility loci evaluated herein, we found significant SNP associations with MBL risk from 12 more loci. Of particular interest, we found no or limited evidence of association (OR < 1.10 and P > 0.05) for 12 known CLL risk loci. Because the SNPs in these loci have been repeatedly found to be associated with risk of CLL, this suggests that these loci may be associated with progression from MBL to CLL rather than associated with initiation of the B-cell clone. Further studies are needed to evaluate this hypothesis.

We previously reported that the CLL-PRS was associated with MBL risk among a cohort of 95 familial MBLs with a 2.3-fold increased risk [20]. Herein, among a cohort ascertained agnostic to family history of CLL, we also reported an association of the CLL-PRS with risk of MBL. In both studies, the CLL-PRS had good discrimination (after adjusting for age and sex) with an estimated c-statistic of 0.77 in the family study and 0.72 in this study. We next evaluated the CLL-PRS among the LC-MBL and HC-MBL subsets. We observed a significant association with a 1.75-fold and 2.14-fold increased risk for LC-MBL and HC-MBL, respectively. Moreover, the increase in the effect size from LC-MBL to HC-MBL to CLL was statistically significant. Because not all MBLs progress to CLL, the next needed study is to determine whether the CLL-PRS could discriminate progression to CLL among indivdiuals with MBL. Based on our data, there is strong evidence that those MBLs with high PRS will have a greater chance of progression to CLL compared to those MBLs with a low PRS.

For the first time, we evaluated the CLL-PRS in AA CLL cases and controls based on genetic ancestry and found a significant increased risk for CLL, though, with an attenuated effect (1.76-fold) and less discrimination (the c-statistic = 0.62) compared to our EA CLL cases and controls. These findings are not surprising given the known differences in the genetic landscape (i.e., allele frequencies and linkage disequilibrium) between populations of EA and AA. Moreover, the CLL-PRS is comprised of SNPs that were identified through GWAS of individuals with EA ancestry and includes the estimated ORs from these EA GWAS as the weights in the PRS calculation instead of weights obtained from AA GWAS of CLL, which has yet to be done. When we used an unweighted PRS, we also observed a significant, although attenuated, association. These results highlight that the EA PRS is a weak predictor for AA individuals compared to EA individuals. Thus there is a need for a GWAS of CLL among AA in order to identify CLL-susceptibility SNPs which may be unique to AA CLL or SNPs that are more informative within known CLL loci. A PRS can then be developed based on these more representative SNPs.

We previously reported that the CLL-PRS had a 2.49-fold increased risk among CLL cases and controls of EA from the InterLymph Consortium [20], but because these individuals were used to identify at least 50% of the CLL-susceptibility SNPs, the statistical significance and the effect size of the PRS would have been inflated (i.e., winner’s curse [31]). Thus, we used an independent cohort of EA CLL cases and controls and reported consistent results of the CLL-PRS with a 2.53-fold increased risk. We also previously evaluated the CLL-PRS among CLL cases and control ascertained from CLL families that had at least 2 family members with CLL and also found consistent effect of the CLL-PRS with a 2.44-fold increased risk [20]. Importantly, across these three sets of CLL cases and controls, we also see strong and consistent discriminatory ability of the CLL-PRS, along with age and sex, with c-statistics of 0.79, 0.80, and 0.77, respectively. Collectively, these results affirm and again demonstrate that the CLL-PRS is a strong predictor of CLL risk.

Among the environmental exposures evaluated beyond age and sex, we observed suggestive although not-significant after adjusting for age and sex that a prior history of cancer or a family history of leukemia or lymphoma may be associated with MBL risk. A prior study by Casabonne et al. of 72 MBLs and 380 controls screened not to have MBL also found suggestive evidence albeit not-significant that a prior history of cancers increased risk of MBL [32]. In addition, several family studies reported elevated prevalence rates of MBL among relatives of CLL families compared to that of the general population [9, 10, 33]. Casabonne et al. also found evidence that exposures to infectious agents (e.g., history of pneumonia) increased MBL risk and that prior history of vaccination (e.g., vaccinated against pneumococcal or influenza) decreased MBL risk [32]. No other medical, occupational, or lifestyle exposures evaluated herein were found to be associated with risk of MBL.

In conclusion, inherited genetic factors and not environmental are associated with risk of MBL. We reported that some, but not all of the CLL-susceptibility SNPs, and the CLL-PRS were associated with risk of initiation of the MBL clone among individuals of EA suggesting the possibility that the remaining SNPs are associated with progression to CLL. We also demonstrated that the CLL-PRS is a strong and significant predictor of risk for CLL among individuals of EA agnostic to family history and a somewhat weaker predictor of risk among AA individuals supporting the need for further work in this population. Most importantly the results of this study may help identify individuals at higher risk of developing MBL and CLL beyond the known risk associated with age, male sex, and family history of CLL in individuals of EA.