B-cell lymphomas (BCL) are an etiologically, clinically, and histologically heterogeneous group of malignant diseases of B lymphocytes. Immunodeficiency and autoimmunity are strong B-cell lymphoma risk factors. Chronic B-cell activation is suspected to be an important mechanism contributing to the accumulation of genetic errors that can lead to lymphomagenesis1. Increased serum/plasma levels of molecules involved in B-cell activation, among which soluble (s)CD23, sCD27, sCD30, sCD44, and CXCL13, have been associated with the development of acquired immune deficiency syndrome (AIDS)-related BCL2,3,4,5,6,7. Recently, studies within general population cohorts incorporating serologic measurements of cytokines, chemokines, and other immune markers have provided important evidence supporting a role for subtle immunologic effects in lymphomagenesis even among non-immunocompromised individuals8,9,10,11,12. Elevated serum levels of sCD23, sCD27, sCD30, and CXCL13 have subsequently been shown to be associated with BCL development in immunocompetent individuals8,9,10,11,12.

The risk of BCL has been associated with lifestyle, viral, and environmental factors13,14. A large study from the International Lymphoma Epidemiology Consortium (InterLymph) showed several risk/protective factors for Non-Hodgkin lymphoma subtypes13. These included a family history of hematologic malignancy, history of autoimmune disease, hepatitis virus C (HCV) infection, body mass index (BMI), height, smoking, and occupation, which were all associated with increased risk of NHL and/or one of its subtypes (CLL, FL, or DLBCL), while alcohol intake (≥ 1 drink per month), better socio-economic position (SEP), history of atopic disease, and recreational sun exposure were linked with reduced risks. Given the central role of the immune system in lymphomagenesis, most risk factors may influence BCL risk through modulation of the immune function15. Indeed, diet and obesity have an important influence on the immune system as immune functions are sensitive to both under- and over-nutrition16. Obesity promotes increased production of cytokines and leptin16, and the latter, in turn, has been shown to enhance B-cell survival17.

Infection with hepatitis virus B (HBV) and HCV correlate with high sCD3018,19,20 and sCD2321,22 serum levels. In a study including HCV-infected patients, with and without BCL, a signature involving sCD27, sIL-2Rα, gamma globulins, and complement factor 4 was associated with the presence of overt BCL in HCV-infected patients23.

Moreover, exposure to environmental factors suspected to be acting as lymphomagens (i.e. trichloroethylene and 2,3,7,8-tetrachlorodibenzo-p-dioxin) altered sCD27 and sCD30 plasma levels24,25. Studies have shown that alcohol consumption may be associated with a decreased risk of BCL13. Alcohol is associated with dysregulation of cytokines and chemokines, which may mediate alcohol-induced tumor promotion. Alcohol may also affect transcription factors and signaling pathways that regulate the expression/function of cytokines and chemokines26. Alcohol abuse impairs both the number and function of B cells. Chronic alcohol consumption reduces B-cell numbers, decreases antigen-specific antibody responses, increases the production of auto-antibodies, and interferes with B-cell development and maturation. Moreover, alcohol’s impact on T and B cells increases the risk of infections (e.g., pneumonia, HIV infection, hepatitis C virus infection, and tuberculosis)27. However, to date, the direct mediating effect of immune markers on the association between these factors and BCL risk has not been investigated in a prospective setting.

In our previous study among participants of two prospective cohorts (32 BCL cases overlap with cases included in the present study), which also included a meta-analysis of published data on sCD27 and sCD30, we reported a highly consistent association between these markers and increased risk of BCL subtypes12. Pre-diagnostic sCD23 was a strong predictive marker (area under the curve (AUC) = 0.88) for diagnosis of CLL (179 CLL cases overlap with cases included in the present study)28. So far, previous studies have limitations in terms of sample size, in particular for BCL subtypes, limiting their power to detect associations of moderate magnitude.

In the present study, we aimed to extend our previous findings using a much larger population within the European Prospective Investigation into Cancer and Nutrition (EPIC) population to examine the relationship between pre-diagnostically level of the most promising previously reported B-cell activation markers (sCD23, sCD27, sCD30, and CXCL13) and subsequent development of BCL major subtypes, i.e. diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), and chronic lymphocytic leukemia (CLL). Moreover, we aimed at exploring the potential clinical utility of these markers for screening and their possible mediating effects on the association between anthropometric and lifestyle factors and BCL subtypes.


Table 1 shows the characteristics of the study population. Overall, case and control subjects did not differ with regard to risk factors and covariates. Median time between recruitment (i.e., blood collection) in the study and diagnosis of BCL was 9 years (range, 0.2–19). Blood levels of all B-cell activation markers were significantly higher in cases compared to controls (Table 1). Spearman correlations were calculated between the various immune markers among the total BCL cases, controls, and by lymphoma subtypes, ranging from 0.14 to 0.63 (Supplementary Table 1). We found the lowest correlations between these immune markers among controls, while sCD30 was highly correlated with sCD27 in CLL, and with all other markers in FL. Levels of the markers were positively correlated with age in both cases and controls except for CXCL13 in control subjects (Supplementary Fig. 1). Compared to male participants, female subjects had slightly higher levels of sCD30 and CXCL13 that met the threshold of statistical significance (Supplementary Table 2). There was no significant difference in levels of the markers in different countries except for CXCL13 (Supplementary Table 2).

Table 1 General characteristics of B-cell lymphoma cases and controls.

Association between immune markers and lymphoma development

Multivariable conditional logistic regression analyses for all BCL cases showed a significant association for all markers when analyzed as categorized variables (Table 2). The subtype-specific analyses rendered similar results except for a non-significantly increased risk of CLL with elevated levels of CXCL13 (Table 2). To account for the correlation between the markers, all markers were also modeled together. The combined multivariable models showed a significant association between sCD23 and all three BCL subtypes, and between CXCL13 and FL and DLBCL (Table 3). Moreover, we found a borderline significant association between sCD27 and all three BCL subtypes.

Table 2 Multivariable conditional analyses: odds ratio (OR) and 95% confidence interval (CI) for individual immune marker (categorical variable) and B-cell lymphoma and histological subtypes.
Table 3 Odds ratio (OR) and 95% confidence interval (CI) for combined immune marker (categorical variables) and B-cell lymphoma and histological subtypes.

Analyses using continuous measures of the markers showed similar results to the categorical analyses (Supplementary Table 3).

Analyses stratified by time-to-diagnosis (TTD)

To explore the possibility of reverse causation, associations of the markers with risk of BCL and subtypes were further stratified by median (9 years) duration of time between blood donation and diagnosis of BCL. Unconditional logistic regression among subjects diagnosed ≤ 9 years from the time of blood collection essentially showed similar associations as presented in Table 2. Subjects diagnosed more than 9 years from the time of blood collection with elevated levels of sCD23 and sCD27 showed an increased risk of CLL, while increased levels of all markers were associated with higher risk of DLBCL (Supplementary Table 4). In the combined model (including all markers together), associations of sCD23 with CLL and CXCL13, sCD23, and sCD27 with DLBCL remained significant.

The correlation between marker levels and time to diagnosis was evaluated. These analyses revealed higher levels of the markers in those cases with blood drawing closer to the diagnosis date (Fig. 1) except for sCD23 and sCD27 among DLBCL cases, which may suggest that serum levels of these markers are impacted by the disease itself.

Figure 1
figure 1

Correlation between serum level of immune markers and time to diagnosis (TTD) for different BCL subtype and controls; TTD for controls was calculated based on diagnosis date of the index cases.

Receiver operating characteristic (ROC) and test performance analysis

The AUC, as a measure of how well a marker (or a group of markers) predict the development of BCL subtype, was calculated by tenfold cross-validation. We found an AUC of 0.80 for a model including sCD23 as a predictor of CLL. Addition of other markers to the model did not significantly increase the prediction ability for CLL (Supplementary Table 5). sCD23 levels showed the highest prediction ability for CLL among male participants (AUC = 0.85, p = 0.0003) and more particularly those older than 60 years as indicated by an AUC 0.88 (p = 0.02) (Supplementary Table 6). sCD23 and CXCL13 showed a lower predictive ability for FL (AUC ~ 0.60) and DLBCL (AUC ~ 0.63) compared with CLL and addition of other markers did not significantly increase the AUCs for these subtypes (Supplementary Table 5).

While the AUCs gave an overall picture of the behavior of the markers across all cutoff values, there remains a practical need to determine the specific cutoff value that could be used for individuals requiring screening. Therefore, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for different cut-off values of sCD23 for the prediction of CLL, FL, and DLBCL and cut-off values of CXCL13 for the prediction of FL and DLBCL were determined (Supplementary Table 78). Generally, there is a trade-off between sensitivity and specificity. Using the cut-off value showing the highest specificity, subjects with sCD23 levels ≥ 3,608 pg/mL appeared to have a 70% probability to develop CLL after 9 years (Fig. 2a, Supplementary Table 7), the probability being slightly higher for male (0.76) than for female (0.62) participants (Fig. 2b, c, Supplementary Table 7). When analyzing sCD23 and CXCL13 test accuracy for DLBCL and FL, it was observed that despite the specificity of sCD23 being the same for FL and DLBCL, PPVs were low compared to CLL (Supplementary Table 8).

Figure 2
figure 2

ROC curve and test accuracy parameters of sCD23 for cutoff at 30th (≥ 1832 pg/mL), 60th (≥ 2,453 pg/mL), and 90th (≥ 3,608 pg/mL) deciles for CLL (A) and stratified by gender (B, C); ROC receiver operating characteristic, PPV positive predictive value, NPV negative predictive value.

Mediation analyses

Finally, to evaluate the hypothesis that immune markers act as a mediator on the causal pathway between known risk factors and B-cell lymphoma, a causal mediation analysis was conducted. Actual analyses were performed if (1) the risk factors were significantly associated with immune markers (Supplementary Table 9, model M) and (2) immune markers were found to be significantly associated with lymphoma subtypes in our combined models (Table 3). Selection of possible risk factors for mediation analyses was based on a large pooled study from 20 case–control studies13. Although, most of these associations were only suggestive and not significant in EPIC (Model X in Supplementary Table 9), the cohort can still help to understand potential mechanisms due to the prospective framework. Therefore, results of the mediation analysis should be seen as suggestive mediated associations. Positive association between BMI and DLBCL (average causal mediation effect (ACME) = 0.02) mediated through both sCD23 and CXCL13 (Table 4). We found a lower risk of DLBCL (ACME = − 0.02) with physical activity mediated through CXCL13. Finally, there was a trend toward significance for a protective effect of alcohol intake (ACME = − 0.05) mediated through sCD23 on CLL and for a protective effect of physical activity (ACME = − 0.01) mediated through CXCL13 on FL (Table 4). Sensitivity analysis were conducted to evaluate the robustness of the results from the causal mediation analysis. The analyses showed that as long as ρ was 0.4 or lower, the estimated mediated effects still had the same sign, indicating good robustness.

Table 4 Average direct effect and causal mediation effect (mediated through B-cell activation markers) of known risk factors on B-cell lymphoma subtypes.


In this prospective study, we determined the serum levels of previously reported lymphoma associated immune markers and investigated how these markers correlate to the future risk of BCL histological subtypes and if risk of lymphoma is mediated by these markers. From these results, we confirmed the association between pre-diagnostically measured B-cell activation markers (sCD23, sCD27, sCD30, and CXCL13) and subsequent development of major BCL subtypes8,9,10,11,12,28,29. After adjustment for other immune markers, sCD23 remained significantly associated with all subtypes, while CXCL13 was found to be associated with FL and DLBCL only. The associations between sCD23 and CLL and DLBCL, and CXCL13 and DLBCL persisted among cases sampled more than 9 years before diagnosis, although the associations were attenuated in comparison to the findings from the models that covered the entire follow-up. sCD23 showed the highest prognostic value for CLL. In addition, we assessed for the first time, whether these markers may be mediating the causal pathway between several risk/protective factors and later lymphoma risk. Our results suggest that sCD23 and CXCL13 partly mediated the association between BMI (positive) and DLBCL risk, while lower levels of CXCL13 associated with higher physical activity and this partly explains the inverse association of physical activity and DLBCL risk. It should be noted that the measured biomarkers are individual markers in a complicated signaling milieu. Therefore any reported mediating effect for the marker itself should be interpreted as a proxy of the underlying biologic milieu that is captured by circulating levels of those molecules that is partly mediating the alleged associations.

We observed strong associations between sCD23, sCD27, sCD30, and CXCL13 in blood samples collected up to 19 years before diagnosis and risk of BCL development, consistent with previous studies8,9,10,11,12,28,29. The associations between sCD23 levels and development of BCL subtype were the most stable associations in this study, particularly for CLL, which was significant even more than 9 years before BCL diagnosis. In a recent study using the EPIC cohort with maximum 12.5 years TTD, ROC curve for the prediction or diagnosis of CLL indicates that sCD23 is a strong predictive marker (AUC = 0.88)30 which is consistent with our current finding of AUC = 0.80. The earlier study included 179 CLL cases that were also present in our study. AUC based on new cases only (maximum 19 years follow-up) was 0.73. Moreover, our study showed that sCD23 is apparently more predictive in men than in women for future risk of CLL. Little is known how sex-specific factors influence CLL incidence. Although androgen receptors play a part in lymphopoiesis, it is unclear how this relates to the sex differences in CLL. As second explanation, sex-specific somatic alterations in the non-pseudoautosomal and pseudoautosomal regions on chromosomes X and Y have also been suggested to influence CLL incidence30.

CD23 is an integral membrane glycoprotein involved in IgE binding, and is found on mature B-cells, activated macrophages, eosinophils, follicular dendritic cells, and platelets. CD23 is expressed on the membrane in CLL, in some cases of FL, and primary mediastinal large B-cell lymphoma. CLL cells have a characteristic phenotype of sIglow/CD19+/CD5+/CD23+31. CD23 has been widely used as a marker in the differential diagnosis of CLL versus mantle cell lymphoma. Its soluble form, sCD23, is released from activated B-cells and can itself induce further B-cell stimulation as well as function as a potent mitogenic growth factor. Many reports suggest that elevated CD23, either on neoplastic cell surfaces or as a soluble form, is a useful marker in either diagnosis or prognosis of BCL31. Assuming that increased levels of sCD23 are caused by early stages of disease, our results further support that sCD23 would act as a marker for early detection of CLL28.

In a study within the Northern Sweden Health and Disease Study (NSHDS), B-cell activation markers were measured in two pre-diagnostic blood samples donated by 170 individuals before BCL diagnosis, along with 170 matched cancer-free controls29. The study showed that regardless of baseline B-cell activation marker concentration, BCL future risk was also associated with an increase of marker concentrations over time (slope). The predictive ability of these markers for response to treatment as well as their prognostic value for disease progression in particular in the trajectory from monoclonal B-cell lymphocytosis (MBL), an asymptomatic condition in which small numbers of clonal B cells are detectable in blood, to CLL must be further assessed.

CXCL13 is a CXC subtype member of the chemokine superfamily and acts via its receptor CXCR5 as one of the most potent B-cell chemo-attractants. CXCL13 expression is observed in BCL and diseases with B-cell activation. We reported that higher levels of CXCL13 were associated with increased risk of DLBCL, independently of sCD23, even among subjects diagnosed > 9 years after study initiation. Our finding is consistent with a previous report within the NSHDS (42 DLBCL cases)29, the Women’s Health Initiative Observational Study cohort (138 DLBCL cases)8, and Nurses’ Health Study and Health Professionals Follow-up Study (107 DLBCL cases)32. This would argue against the idea that increased marker concentrations would be attributable to undiagnosed disease, considering that the median survival of DLBCL is expected to be only a few months if left untreated29.

The fact that measured B-cell activation marker levels were higher in those cases with blood drawing closer to the diagnosis date may indicate the existence of undiagnosed lymphoproliferative disease, in particular for indolent BCL subtypes. On the other hand, this may also reflect biological processes involved in the onset of the disease, e.g. as a measure of the allostatic load on the B-cell compartment related to non-heritable factors such as lifestyle and environmental factors18,19,20,21,22,23,24,25. Previous studies showed that elevated concentrations of these markers were associated with disease states related to immune system activation, such as autoimmune diseases, hepatitis C, and HIV infection18,19,20,21,22,23. However, such information was not available on our subjects, except for data on HBV infection from only a minority of the subjects (n = 335).

Currently, no established screening programs for BCL development exist. Pre-diagnostic screening for risk factors of BCL in the general population has currently little clinical benefit. Moreover, cut-offs of marker levels to be used as a measure for disease progression have not been established yet. This would require the conduct of a large number of studies within various populations in order to establish normal levels and resulting in a more uniform reference for “abnormal” levels for use in clinical practice. In our view, the value of pre-diagnostic biomarkers is potentially much larger in groups of patients that are already at elevated risk of developing BCL, such as those with a family history of lymphoma and immunocompromised patients (i.e., primary immunodeficiency disorders, PID). PID patients carry an eightfold increased risk of lymphoma compared to the general population. Patients who have higher levels of the markers may thus be recommended to start intensive follow-up, while individuals with lower level of the markers may be advised for a less intensive follow-up. However, usefulness of these markers must be examined and validated among those patients before they could be rationally employed by physicians to improve human health.

To our knowledge, this is the first study to assess the hypothesis of B-cell activation markers as mediators on the potential pathway between risk factors and later lymphoma development. BMI was associated with a significant increased risk of DLBCL mediated through sCD23 and CXCL13. There is no clear consensus on how obesity impacts B-cell development and function, but several studies point to an involvement of B-cells in adipose tissue inflammation associated with obesity33,34,35. A further study in obese and non‐obese women revealed that body fat mass was positively correlated with total leukocyte, neutrophil, monocyte and lymphocyte counts36. While T‐cell function was comparable between obese and non‐obese women, B-cell function was about 50% higher in the obese group.

Physical activity was associated with decreased risk of DLBCL and FL mediated through CXCL13. There has been evidence that moderate exercise is beneficial for the immune system37. Exercise induced changes in the number and function of cell subsets involved in the innate (e.g., neutrophils, monocytes, and natural killer cells) and the adaptive immune system (e.g., T and B cells)37,38. It has been shown that exercise can reduce insulin, glucose, and insulin-like growth factors, which may influence the proliferation of tumor cells in general. Physical activity also plays role in the prevention of obesity and reduces the percentage of body fat38. However, it is not clear if modulation of the immune system contributes to the potential antitumor properties of exercise.

Adaptive immune responses, also called acquired immunity, is characterized by antigen‐specific T‐cell proliferation, immunological memory, B-cell activation, and production of antibodies. Evidence indicates that alcohol exposure can interfere with various aspects of the immune response and affect the different cellular components of the innate and/or adaptive immune system. Several studies reported that the number and function of B-cells are reduced by alcohol26,27. In our study, sCD23 mediated the association between alcohol intake and decreased risk of CLL. A meta-analysis provided evidence for a favorable role of alcohol drinking on NHL risk39. However, there is no clear biological explanation for this association.

A strength of our study is the relatively large number of incident cases of newly diagnosed major BCL subtypes in a cohort of cancer-free individuals with pre-diagnostic blood samples and prospective follow-up times of up to 19 years. This enabled us to carry out specific analyses according to BCL subtypes. This is particularly relevant since there is growing evidence that lymphoma subtypes have different pathological and epidemiological features13. However, limitations of our study should be considered when interpreting the results. We measured blood immune markers at a single time point, which may not reflect accurately the long-term B-cell activation status and may not capture the most important etiologic timing. We cannot exclude potential measurement errors derived from dietary questionnaires, which could lead to systematic and random errors when estimating alcohol intake. Likewise, anthropometric measures were ascertained at recruitment (with the exception of the Oxford, France and Norway cohorts—self-reported). Moreover, we cannot exclude the possible bias due to unmeasured confounders (e.g., immune diseases and infections), which are well known risk factors of BCL. If an unmeasured confounder is related to several mediators this may have resulted in a bias in the mediation results40. Due to the limited sample size, the results of this sensitivity analysis should be interpreted with caution. Notably, the measured immune markers are not only produced by those cell types considered to play pivotal roles in the immune system (lymphocytes), but also by fibroblasts, neutrophils, eosinophils, follicular dendritic cells, and platelets. So, blood levels of the markers may not necessarily reflect activity in the target tissue (lymph nodes)41.

In conclusion, increased B-cell activation marker levels present in blood years before BCL diagnosis, suggest a role of B-cell activation in BCL development at early stages. These may reflect a constitutional predisposition with shared underlying mechanisms for both indolent and aggressive lymphoma subtypes. Further studies investigating the biological and clinical impact of these markers are required. The mediating role of the immune function in the association between lifestyle factors and BCL also needs further examination.

Materials and methods

Study population

The EPIC study is a prospective cohort study involving 23 centers from ten European countries (Denmark, France, Germany, Greece, the Netherlands, Italy, Norway, United Kingdom, Spain and Sweden) that was designed to investigate the potential relationships between diet, nutritional status, lifestyle and environmental factors and the incidence of cancer and other chronic diseases42. Over 500,000 healthy subjects in the age range 35–70 were recruited in the study during 1992–2000. The rationale, complete methodology and study design of the EPIC study have been described previously42,43. The ethical review boards from the International Agency for Research on Cancer and all local participating centers approved the study and all participants gave their informed consent. The study was conducted in accordance with the approved guidelines.

Standardized lifestyle, medical and personal history, and diet questionnaires were collected from the participants, and a blood sample was taken at enrollment. Anthropometric measures were measured at recruitment (except for France, Oxford and Norway who collected self-reporting data). Within 2 h of blood collection, blood samples were processed for the isolation of buffy coats and other fractions. Samples were transported on dry ice to the laboratory and stored at − 80 °C before analyses were performed.

Incident lymphoma cancer cases were identified by population cancer registries for Denmark, Italy, The Netherlands, Norway, Spain, Sweden and the United Kingdom. A combination of methods was used in France, Germany and Greece, as detailed previously42. Originally, the diagnosis of lymphoma cases was based on the ICD-O-2. All lymphoma cases were subsequently reclassified according to the SEER ICD-O-3 morphology codes44. For each incident BCL case identified within the cohort by December 2012, one random control was selected among all cohort members alive and free of cancer at the time of diagnosis of the index case, matched by country, center, gender, age at recruitment (± 12 months), date of blood collection (± 3 months), time of blood collection (± 3 h), and fasting at blood collection.

The current analysis was based on 516 case–control pairs for which a blood sample was available consisting of 174 DLBCL, 132 FL, and 210 CLL (including small lymphocytic leukemia) cases. For one CLL case and 9 controls, paired samples were missing. These subjects were included only in unconditional logistic analyses.

Inclusion of etiological factors for FL, DLBCL, CLL subtypes

Previously known BCL risk and protective factors13 that were available in EPIC were included in the study. These included BMI, height, smoking, education as proxy for SEP, alcohol intake and physical activity. The physical activity assessment included occupation as well as average recreational and household activity.

Measurement of immune markers

Serum levels of sCD23, sCD27, sCD30, and CXCL13 were measured by ELISA for all samples (eBioscience, USA: BMS286INST, BMS240INST kits, and R&D Systems, USA: DCD230 and DCX130 kits). Assays were performed in duplicate and according to the manufacturers’ protocols. All laboratory personnel were blinded with regard to case–control status. Matched case–control sets were assayed next to each other on the same plate in the same batch and quality control samples were run in duplicate along with the case–control sets in each batch. Inter- and intra-assay coefficients of variation were 2.2% and 3.1% for sCD23, 8.9% and 8.8% for sCD27, 4.8% and 7.3% for sCD30, and 5.2% and 6.8% for CXCL13.

Statistical analysis

Marker levels measured out of range of the calibration curve (sCD23 = 6.3%, sCD30 = 0.8%, sCD27 = 0.5%, CXCL13 = 19.1%) and missing values of smoking status (2%), education (7%), alcohol intake (0.4%), physical activity (2.7%) covariates (Supplementary Table 10) were imputed based on a maximum likelihood estimation method which was informed by the observed correlation structure within the data45. Blood levels of soluble markers were log transformed to normalize their distributions. Differences between cases and controls in baseline continuous covariates were assessed using the paired t test, and by χ2-test, for categorical variables. Spearman rank correlation was used to measure the degree of correlation between markers.

Odds ratios (OR) and 95% confidence intervals (95% CI) for the subtypes of BCL in relation to immune markers (as continuous variables) were calculated by conditional logistic regression (CLR). The models were adjusted for BMI (kg/m2, continuous), alcohol intake (g/day; continuous), smoking status (never, former, current), physical activity levels based on the Cambridge Physical Activity Index (inactive, moderately inactive, moderately active, active)46 and educational level (none, primary, technical/professional, secondary, university/college). Quartiles of immune marker concentrations were calculated based on the distribution in control subjects, and CLR models were used to estimate the association between quartiles of marker levels and risk of BCL subtypes (first quartile as reference category). Tests for trend were calculated using the quartile number as a continuous variable. All markers were also modeled together (combined multivariable model) as it may be possible that one marker serves as a surrogate of another.

Associations of the markers with risk of BCL and subtypes were additionally stratified by median duration of time between blood donation and diagnosis of BCL (time-to-diagnosis: TTD) to explore the possibility of reverse causation. In these analyses, to preserve statistical power, subtype cases were compared to all controls and (unconditional) models additionally adjusted for matching variables (i.e., country, gender, and age at recruitment) and plate number.

Receiver operating characteristic (ROC) analysis and AUC comparisons were used to determine the discriminative ability of the markers separately or in combination with other markers. The AUCs was corrected for biases due to overfitting by tenfold cross-validation. For each fold, the AUC was calculated, and the mean of the fold AUCs was the cross-validated AUC estimate. Four objective measures of test performance were further calculated, namely, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for different cut-off values of the markers found to be informative for future risk of BCL47. Cut-off values were calculated based on deciles of ranked level of the markers in control subjects.

Causal mediation analysis was applied to study the average causal mediation effect (ACME) and the average direct (unmediated) effect (ADE) of immune markers linking risk factors to lymphoma48. The effect estimates represent the change in probability that the subject develops lymphoma when moving the exposure variable from the reference category to the exposure category via the mediated or direct paths. Further, the analysis provides an estimate of the proportion of the total effect of exposure on lymphoma development mediated through the measured marker. Included exposure variables were smoking: non-smoker at recruitment (0) vs. smoker (1), alcohol intake: non-drinker (0) vs. drinker (1), physical activity: first 3 categories (0) vs. active(1), education: categories primary school or lower/ technical/ vocational school (0) vs. secondary school/ university/college (1), BMI: < 30 (0) versus ≥ 30 (1), and height: < country median (0) versus ≥ country median (1). We fitted two statistical models, the mediator (M) linear model for the conditional distribution of the mediator M given the risk factor X and a set of the covariates C; f(M | X, C), and the outcome (Y) logistic model for the conditional distribution of the outcome Y given X, M , and C; f(Y | X, M, C). These models were fitted separately and then their fitted objects comprised the main inputs to the mediate function, which computes the estimated ACME and other quantities of interest under these models and the sequential ignorability assumption. Mediation analyses were applied only for the risk factors significantly associated with immune markers and for the immune markers found to be significantly associated with lymphoma subtypes in our combined models (sCD23 and CXCL13). Adjustments were made for country, sex, and age. Models for each risk factor were additionally adjusted for other risk factors. There was no significant interaction between the risk factors and sCD23 and CXCL13. Quasi-Bayesian confidence intervals were determined49. Sensitivity analyses were performed for deviations from the sequential ignorability assumption (that in particular implies no unmeasured pre-sample collection confounders), with deviations measured by the correlation ρ between the errors in the mediation and the outcome models. In the presence of confounders which affect both the mediator and the outcome, we expect that the sequential ignorability assumption is violated and ρ is no longer zero48. A large critical ρ value reversing the sign of ACME indicates the violation of ignorability assumption49.

Statistical analyses were performed using the R 3.4.1 language and environment (The R Foundation for Statistical Computing, Vienna, Austria) and SAS (version 9.4; SAS institute, USA). The R package mediation (4.1.2) was used for causal mediation analysis49. All p values are two-sided, with p < 0.05 considered as statistically significant.