Introduction

Blood group systems are collections of polymorphic proteins and oligosaccharides expressed on the surface of red blood cells and other tissues. Within each blood group are the antigenic variants that comprise the system, commonly referred to as “blood type,” which correspond to allo-antibody reactions among blood group phenotypic variants [1, 2]. Although some of these antigenic families are defined at the protein level, most are determined in a monogenic fashion, including some of the more commonly studied variants of ABO, H, Rhesus (Rh), Duffy, Lewis, MNS, and Lutheran blood groups [3].

While our understanding of the clinical significance of many less-studied blood group antigens continues to evolve, several blood group systems have well-established clinical associations. ABO’s roles in mediating acute hemolytic reactions and modulating hyperacute solid organ rejection have been extensively described and now comprise much of the scientific basis for blood product administration and allotransplantation, respectively [4]. Similarly, Rh antigenicity remains crucial in the management and prevention of fetal hemolytic disease. In addition, more recent data have elucidated associations between various blood group systems and susceptibility to infectious pathogens such as Plasmodium vivax malaria, Helicobacter pylori, norovirus, and the severe acute respiratory syndrome coronavirus (SARS-CoV; [5,6,7,8]). In findings, more germane to the present pandemic, individuals with non-O blood type, and particularly carriers of the A antigen, have been consistently shown to have higher risk of SARS-CoV2 infection [9,10,11], and severe complications as a result of SARS-CoV2 [12, 13], than individuals with O blood type, although the mechanisms underlying these associations are unclear.

Beyond the role of the blood group antigens in immunologic reactions and initial susceptibility to infections, the functions of these antigens in host immunity and inflammation are poorly understood. Histo-blood group antigens (i.e., ABO and Lewis) have been found to affect the stability or organization of other glycans on the cell surface [14], possibly modulating binding of other antigens or marking cells for degradation [15]. For instance, it is well known that individuals expressing the A or B ABO antigens have lower rates of clearance of von-Willebrand factor and consequently higher risk of venous thromboembolism than individuals expressing O antigen [16, 17]. Similar investigations measuring inflammatory markers directly, however, have not yielded such unequivocal results. Specifically, while genome-wide association studies (GWAS) have identified associations of the ABO locus and ABO methylation with levels of circulating inflammatory markers [16, 18,19,20,21], the relationships failed to replicate in validation samples [16, 20]. Furthermore, the Duffy antigen, a marker with less-defined clinical significance, has been shown to act as an atypical chemokine receptor and assists in regulating circulating CXC and CC-motif chemokine levels [22, 23].

These findings have raised interest for detecting and describing associations between circulating marker levels and blood group antigens. While the existing data justify further investigation, the evaluated markers were all selected a priori for investigation and represent only a small subset of key immune and inflammatory processes that may involve blood groups. Characterizing the additional associations between blood types and immune markers may clarify the broader role of blood group antigens in host immunity and disease susceptibility, and contribute to fundamental description of an individual’s inflammatory state. We have therefore cataloged associations between blood group genotypes and a broad spectrum of circulating immune and inflammation markers in an exploratory reanalysis of existing data. To our knowledge, this is the first study to examine associations between blood groups and inflammation using marker data measured on multiplex panels. Ultimately, our aim is to improve understanding of how blood group antigens relate to systemic inflammation.

Results

The final analytic sample size was 3537. No participants were missing information on age, sex, or smoking status. As shown in Table 1, the weighted percentage of cancer cases in the study sample was 3.7%, while 9.7% were current smokers, 41% were former smokers, and 56.5% were male, corresponding to the respective prevalence values in the underlying Prostate, Lung, Colorectal, and Ovarian (PLCO) cohort.

Table 1 Selected characteristics (%) of participants in component case–control studies nested in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial.

Of the 1152 associations tested (12 blood types × 96 markers), we found ten markers to be associated at the Bonferroni-adjusted significance level of p < 4.3 × 10−5 (Fig. 1). The Duffy antigen group was associated with seven inflammatory markers (CXCL6/GCP2, CXCL5/ENA78, CCL11/EOTAXIN, CXCL1/GRO, CCL2/MCP1, CCL13/MCP4, and CCL17/TARC), and ABO was associated with three markers (soluble vascular endothethial growth factor receptor 2 (sVEGFR2), sVEGFR3, and soluble glycoprotein 130 (sGP130). All associations remained statistically significant in additive models (Supplementary Table 1). No other blood types appeared to be strongly associated with other inflammatory markers (Supplementary Fig. 1). Complete lists of associations from categorical and additive models are listed in the Supplementary Tables 2 and 3. Findings were similar when associations were evaluated with Tobit models (Supplementary Tables 4 and 5).

Fig. 1: Observed versus expected distribution of p values for associations between circulating inflammatory markers and blood antigen groups.
figure 1

Each dot represents the joint Wald test of a linear or logistic model predicting marker concentration from blood type, adjusting for sex, age (5-year category), smoking status (never/former/current), study, and the first-five principal components of population substructure. Labeled points denote associations which were significant after Bonferroni correction. Shaded bands represent a bootstrapped 95% confidence interval generated from 10,000 replicates, and dotted lines represent predicted empirical distribution. Note different y-axis scales for top and bottom figures.

Among the ten blood type-marker associations identified among non-Hispanic whites, only the associations of Duffy with CXCL6/GCP2, CXCL1/GRO, and CXCL5/ENA78 were evident in other racial and ethnic groups (Supplementary Table 6 and Supplementary Fig. 2).

Post hoc evaluation of individual terms showed that individuals with Duffy type Fy(a−b+) and Fy(a+b+) had higher levels of all associated circulating chemokines than individuals with Duffy type Fy(a+b−), although the pairwise comparison between Fy(a+b+) and FY(a+b−) for CXCL5/ENA78 was not significant after Bonferroni correction (Fig. 2). Individuals with ABO A blood type had significantly lower levels of sVEGFR2, sVEGFR3, and sGP130 than individuals with O or B blood types; no significant differences were noted between type A and AB individuals (Fig. 3). Most correlations among the ten significantly associated markers were weak with |r| < 0.4, although moderate associations were observed between sVEGFR2 and sGP130 (r = 0.55), CCL2/MCP1 and CCL11/EOTAXIN (r = 0.55), and CXCL5/ENA78 and CXCL1/GRO (r = 0.48).

Fig. 2: Estimated marginal mean (EMM) concentrations of circulating biomarkers among individuals with different Duffy phenotypes, adjusting for age (5-year categories), sex, study, smoking status (never/former/current), and the first-five principal components of population substructure.
figure 2

Bars represent 95% confidence intervals of the EMM. Overlapping black lines between bars represent pairwise comparisons that are not significantly different after Bonferroni correction.

Fig. 3: Estimated marginal mean (EMM) concentrations of circulating biomarkers among individuals with different ABO phenotypes, adjusting for age (5-year categories), sex, study, smoking status (never/former/current), and the first-five principal components of population substructure.
figure 3

Bars represent 95% confidence intervals of the EMM. Overlapping black lines between bars represent pairwise comparisons that are not significantly different after Bonferroni correction.

Discussion

We evaluated a wide range of circulating inflammation-related proteins and found that ten inflammatory markers were associated with Duffy and ABO antigenic variants. The associations were robust against alternative model specifications, specifically when controlled for population stratification, and nonrepresentative sampling. These findings may reflect baseline differences in the inflammatory state of individuals with these blood types, and similarly could help to explain immune reactivity and infection susceptibility among corresponding subgroups of individuals.

All of the seven markers associated with Duffy antigen type were CC or CXC motif chemokines, five of which (CCL11/EOTAXIN, CXCL1/GRO, CCL2/MCP1, CCL13/MCP4, and CCL17/TARC) are known to have high affinity for the Duffy antigen receptor [24]. Mounting evidence suggests that the Duffy antigen receptor acts as a homeostatic reservoir for chemokines, buffering intracellular concentrations of these molecules, and sequestering them for release under conditions of immune challenge [22]. Prior GWAS have also noted biomarker associations between rs12075, which distinguishes the Fy(a) and Fy(b) antigens of the Duffy blood group and circulating CCL2/MCP1 [25]. From a clinical standpoint, inverse associations have been noted between Duffy antigen expression and metastatic potential in breast [26], pancreatic ductal [27], and non-small cell lung carcinomas [28], possibly due to lower concentrations of pro-angiogenic chemokines in the tumor microenvironment. Notably, however, these models of tumor aggressiveness are based on comparisons between Duffy-positive (carriers of Fy(a) and/or Fy(b)) and Duffy-null individuals (carriers of neither Fy(a) nor Fy(b)). Given the heterogeneity within the Duffy-positive subgroup observed in the current study, our results suggest that distinctions between the Fy(a) and Fy(b) alleles may also be etiologically relevant for metastatic potential.

Two of the markers associated with ABO were soluble receptors for molecules in the VEGF superfamily, a class of proteins, which regulate angiogenesis [29]. The soluble forms of VEGF receptors 2 and 3 compete with membrane-bound VEGF receptors for binding of pro-angiogenic VEGF-A and VEGF-C, thereby inhibiting outgrowth of blood and lymphatic vessels [30, 31]. The finding that individuals with type A blood had lower levels of circulating sVEGFR2, and sVEGFR3 is consistent with the higher risks of acute cardiovascular events, such as venous thromboembolism and myocardial infarction, observed in those with type A blood relative to those with type O [32]. Decreased expression of sVEGFR2 also correlates with metastatic potential in laboratory models of neuroblastoma [33] and breast cancer [34], which suggests that ABO-associated differences in angiogenic capacity may have implications for oncologic progression and prognosis. For instance, lower levels of prediagnostic sVEGFR2 and sVEGFR3 have also been associated with lung cancer risk in never smokers [35], while type A blood is associated with increased risks of pancreatic [36, 37], ovarian [38], and gastrointestinal cancers [32]. The extent to which these associations are related to inflammation and angiogenesis versus other probable mechanisms, such as H. pylori colonization [39], requires additional study.

Differences in circulating levels of sVEGFR2 and sVEGFR3 among individuals with type A blood may also have prognostic implications for COVID19. For instance, an autopsy study of lungs from COVID19 patients who died of acute respiratory distress found upregulated mRNA expression of both VEGF-A and VEGF-C [40]. Given that individuals with type A blood are also at increased risk of experiencing severe COVID19 (defined as requiring a ventilator, death, or development of respiratory distress; [9, 12, 13]), metabolic pathways involving sVEGFR2/sVEGFR3 may underlie higher COVID19 susceptibility of individuals with type A blood.

Like the other two markers associated with ABO, sGP130 has an anti-inflammatory role, inhibiting the IL6/soluble IL6 receptor (sIL6R) trans-signaling cascade [41, 42]. IL6 trans-signaling induces T-cell recruitment to local sites of inflammation and plays an important role in the development of atherosclerosis when activated in vascular tissue [43]. Interestingly, exogenous administration of sGP130 attenuates this effect in mice [44], yet in humans, elevated levels of sGP130 are correlated with higher blood pressure and increased risk of adverse cardiovascular events [45] in individuals with pre-existing cardiovascular conditions. We found individuals with type A blood to have lower levels of SGP130 than those with type O or B blood, consistent with an anti-atherosclerotic activity of the molecule. However, additional research is needed to understand the complex regulatory loops governing this association and its apparent pleiotropic effects.

Notably, we did not find associations between levels of inflammatory markers and blood groups other than Duffy or ABO. However, mechanisms other than inflammation may mediate some associations between blood types and diseases, such as the Kidd system with bladder cancer [46].

The strengths of our study include its large sample size, use of a well-characterized, population-based cohort, comprehensive assessment of inflammatory biomarkers using a reproducible method, the broad spectrum of blood group phenotypes measured using high-quality genetic data, and statistical control for potential nonrepresentativeness due to reliance on case–control data. Nonetheless, our results should be interpreted amidst several limitations. First, because the PLCO was predominantly comprised of non-Hispanic white participants, we had limited power to detect meaningful differences in inflammation levels by blood type for individuals of non-European ancestry. Thus, the findings may not generalize to racial and ethnic groups other than non-Hispanic whites. Second, our data are based on single measurements of circulating biomarker levels, which may require serial assessment for better characterization. In addition, because assays were conducted separately for each component study, laboratory variation may have limited our power to detect weaker associations. Third, several of the assays used to detect marker levels, including that of sCD30, may have been less sensitive than those used in previous studies since we identified fewer samples above the lower limits of detection. Finally, misclassification of blood group phenotypes may have occurred due to rare variants in genomic regions, which we did not assess, although the degree of misclassification would correspondingly be small and likely nondifferential.

Overall, our study suggests that Duffy and ABO blood types are associated with differences in levels of several circulating inflammatory markers. Further studies are warranted to determine whether differences in circulating levels of chemokines and angiogenic factors are etiologically or clinically relevant for Duffy- and ABO-associated conditions.

Materials and methods

Study population

Data for this study come from the PLCO Cancer Screening Trial. The design of the PLCO has been described in detail elsewhere [47, 48]. Briefly, ~155,000 men and women aged 55–74 years were recruited for the PLCO from the general population between 1992 and 2000 at ten sites across the United States. Participants randomized to the screening arm received periodic cancer screenings in the first 6 years of follow-up. In addition to basic demographic and behavioral information, nonfasting blood plasma samples were collected from screening arm participants at baseline and annually for five follow-up visits [48]. All participants provided informed consent.

We included participants from one laboratory pilot (n = 78) and seven cancer case–control studies nested within PLCO from prior investigations of non-Hodgkin lymphoma (NHL; 358 cases/295 controls), and lung (study 1: 347 cases/513 controls; study 2: 411 cases/533 controls), ovarian (124 cases/127 controls), endometrial (280 cases/352 controls), colorectal (432 cases/358 controls), and upper gastrointestinal (57 cases/62 controls) cancers [49,50,51,52,53,54,55]. For participants included in multiple studies (n = 219), we used marker data from the study with fewer missing marker values or, given equal numbers of missing values, the first study of inclusion. Due to small numbers with other racial and ethnic backgrounds among the PLCO cohort, we limited our primary analyses to self-identified non-Hispanic white participants, as confirmed by genetically determined ancestry. Additional details regarding individual study inclusion/exclusion and matching criteria are provided in the Supplementary Table 7. Sample size was determined by availability of extant data.

Laboratory analyses

Using prediagnostic serum blood samples from either recruitment (NHL, first lung, endometrial, colorectal, upper gastrointestinal studies) or follow-up (second lung and ovarian studies) visits, serum levels of 102 immune and inflammation-related markers were measured by a single laboratory on Luminex fluorescent bead-based assay panels (Millipore Inc., Billerica, MA). As described previously, assays were run at a centralized laboratory according to manufacturer protocol, with samples from cases and controls included in the same batches. Laboratory staff were blinded to the patient data and clinical outcome. Concentrations were determined with four- or five-parameter standard curves, and samples were run in duplicate to calculate average measurements. In addition, blinded replicates were included within and across batches to assess reproducibility. Ninety-one percent of log-transformed marker intraclass correlation coefficients (ICCs) were >0.80 in the lung and NHL studies, 98% in the lung replication study, 84% in the endometrial study, and 76% in the ovarian cancer study [50,51,52,53,54]. The upper gastrointestinal cancer study was itself comprised of multiple case–control studies and thus did not report overall ICCs. Due to the use of multiple assay lots for markers in the colorectal cancer study, we only included results for an analyte from the lot with the largest number of samples. Depending on the analyte, the number of included participants from the colorectal cancer study ranged from 668 (319 cases/349 controls) to 768 (365 cases/403 controls).

After pooling the data, we excluded six markers (interleukin 3, leukemia inhibitory factor, soluble CD30, soluble interleukin 1 receptor 1, soluble receptor for advanced glycation end products, and soluble vascular endothelial growth factor receptor 1) that were detectable in fewer than 10% of samples. A complete list of markers assessed and the studies in which they were measured is available in the Supplementary Table 8.

Genotyping

Participants were genotyped on either the Global Screening Array, Oncoarray, OmniX, or Omni 2.5 platforms. We evaluated twelve blood group systems based on the commonly encountered single-nucleotide polymorphisms (SNPs) that define their phenotypic variance. After excluding samples with call rates <95% and SNPs with call rates <95%, we pre-phased SNPs using Eagle, version 2.4 [56] and the TOPMed reference panel 5b [57]. We then imputed genotypes against the same reference panel on Minimac, version 4 [58], and removed variants with Rsq values <0.3. We partitioned remaining variants into bins based on minor allele frequencies ([0,0.0005], (0.0005,0.002], (0.002,0.005], (0.005,0.01], (0.01,0.03], (0.03,0.05], (0.05, 0.5]), then iteratively filtered out variants in each bin with the lowest Rsq values until the average bin Rsq was at least 0.9 [59]. Of the SNPs we used to evaluate blood types, only one (rs12075) was genotyped directly on any of the four platforms. The remaining SNPs were imputed against the TOPMed reference panel.

We also generated a set of principal components to accommodate potential population stratification. The Genetic Relationships and Fingerprinting method was used to split participants into genetically determined ancestry groups, with splits made separately by genotyping platform [60, 61]. For groups with at least ten individuals, we removed variants with minor allele frequencies <0.01, <98% call rates, or Hardy Weinberg Equilibrium p values < 0.001. We then pruned remaining SNPs in a stepwise manner and removed heterozygosity outliers (|F| > 0.2), using remaining individuals and SNPs to generate principal components using EIGENSTRAT smartpca [62].

Table 2 depicts the blood antigens assessed in this study, as well as the SNPs used to infer them [63,64,65,66]. Except for Lewis null (le1/le2), Secretor null (se), and the O allele of ABO, all blood group alleles were considered codominant.

Table 2 Single-nucleotide polymorphisms (SNPs) used to define blood antigen phenotypes of participants in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.

Statistical analyses

Our objective was to estimate associations between blood types and levels of circulating inflammatory markers. Because we analyzed individuals selected for case–control studies of cancer, as opposed to a random selection of individuals, we first developed a set of propensity score-adjusted sampling weights [67, 68] as previously used in similar studies [69] to estimate each participant’s probability of being included in each of the eight component studies. Weights were generated separately by gender and case–control status estimate weights, we fit logistic regressions on (a) all PLCO screening arm cases (for case models) or (b) all PLCO screening arm potential controls (for control models) where the outcomes were inclusion in a given study and the predictors were age (5-year categories), smoking status (never/former/current), and pack years for former and current smokers (<15, 15+). We imposed basic inclusion criteria (having a valid blood draw, providing consent, having questionnaire data, complete smoking history, no history of cancer, non-Hispanic white) on all subjects. In addition, we applied any study inclusion criteria to applicable models, such as limiting controls in the endometrial cancer study to women with an intact uterus. We estimated predicted probabilities from each regression, and constructed weights as the inverse of predicted probabilities. Thus, cancer cases, all of whom were included in our study given the case–control sampling scheme, were given low weights in comparison to controls, of whom only 2064 (3.8%) were selected. Because our analysis was weighted to the entire PLCO screening arm, individual weights summed to the total of number of participants meeting eligibility criteria (N = 56,709). In addition, because we performed weighting separately by sex and case–control status, summed weights in each sex and case/non-case stratum of our study equaled the total number of eligible PLCO screening arm participants with the same sex and case/non-case status (male case N = 1054, female case N = 1042; male non-case N = 28,244; female non-case N = 26,369).

Virtually all markers were measured in more than one study, so we then generated marker-specific weights by taking the average of study-specific weights for studies including that marker (sixteen different combinations). For instance, interferon gamma was measured in the lung, NHL, and ovarian cancer studies. Weights for this marker were computed as the average of lung, NHL, and ovarian cancer weights for women and the average of lung and NHL weights for men, as men were ineligible for the ovarian cancer subsample.

We next fit-weighted linear or logistic regressions to assess associations between log-transformed marker concentrations (outcome) and blood types (exposure), accounting for the multiple sampling schemes with the survey package in R [70]. The type of regression depended on the percentage of marker samples that fell below a given assay’s lower limit of detection (LLOD): for analytes with fewer than 50% of samples below the LLOD (64 markers), we used linear regression, imputing concentrations below the LLOD as one-half the LLOD, and for analytes with 50–90% of samples below the LLOD (31 markers), we used logistic regression, dichotomizing concentrations as detectable or undetectable. One marker (soluble CD40 ligand; CD40L) with 71% of samples above the upper limit of quantitation was also analyzed with a logistic model, although in this instance, quantitated samples were set as the reference group. For all other markers, the percentage of samples with concentrations above the upper limit of detection was small and ignorable (<5%; Supplementary Table 8); we set the values of these samples as the upper limit of quantitation.

In accordance with phenotypic inheritance patterns, we evaluated associations for all blood types using categorical genetic models. Wald tests were used to determine joint statistical significance for all levels of a given blood group combined. Because few participants were homozygous recessive for Colton (n = 3), Knops (n = 5), and Kell (n = 4) phenotypes, we combined heterozygous and homozygous recessive individuals for these blood groups, thus comparing marker levels for carriers of any recessive allele versus carriers of two dominant alleles. We excluded participants with weak (Fyx; n = 1) or null (n = 4) Duffy antigen types and participants with null Lewis types (n = 30) from our models due to small sample size. To examine the robustness of findings to alternate model specifications, we also estimated associations using additive genetic models, where the primary predictors were number of minor alleles for a given blood group. All models were adjusted for age (5-year categories), sex, study, the first-five principal components of population substructure, and smoking status (never/former/current). In addition, to determine whether imputation of marker concentrations below an assay’s LLOD affected our findings, we also performed sensitivity analyses by fitting Tobit models for all markers evaluated with linear regression [71]. Each Tobit model was left-censored at an assay’s LLOD plus a small constant.

To further explore which blood group antigens contributed to significant results, we conducted post hoc pairwise t-tests to compare the estimated marginal mean marker levels of different groups. Marginal means were calculated using the beta values from the respective regressions. To explore the relationship between significantly associated markers, we report Pearson’s partial correlations, controlling for study, for all markers associated with the same blood group. Finally, to determine whether results were generalizable to other races and ethnicities, we modeled all significant blood-type-marker associations identified among non-Hispanic whites in Black (n = 322), Asian (n = 107), and Hispanic (n = 66) participants using the same covariate adjustments as in main analyses. All statistical tests were two-sided with a Bonferroni threshold of p < 4.3 × 10−4 considered statistically significant (based on 12 blood types × 96 markers = 1152 tested associations).