Introduction

The novel Coronavirus disease (COVID-19, caused by the SARS-CoV-2 virus) has spread rapidly across the globe and has caused over 21.1 million confirmed infections and over 761,000 deaths worldwide as of August 17, 20201. Within the United States, New York suffered among the worst outbreaks during the early phases of the pandemic. As of August 22, New York City has recorded 228,144 confirmed infections and 19,014 deaths2. A number of risk factors for COVID-19 morbidity and mortality are known, including age, sex, smoking, hypertension, diabetes, and chronic cardiovascular and respiratory diseases3,4.

Recent work has demonstrated an association between ABO blood types and COVID-19 risk. Using data from Wuhan and Shenzhen, Zhao et al. found a greater proportion of A and a lower proportion of O blood types among COVID-19 patients, relative to the general populations of Wuhan and Shenzhen5. Similarly, using a meta-analysis of data from Italy and Spain, Ellinghaus et al. found6 a higher risk of COVID-19 among A and a lower risk among O blood types. Conversely, however, they estimated lower odds of mechanical ventilation for all non-O types, though the estimated odds ratios were not statistically significant at the 5% level for this outcome.

The ABO blood type trait reflects polymorphisms within the ABO gene. This gene is associated with a number of other traits, including risk factors for COVID-19 morbidity and mortality. For example, genome-wide association studies have associated variants within ABO to activity of the angiotensin converting enzyme7, red blood cell count, hemoglobin concentration, hematocrit8,9,10,11, von Willebrand factor12,13,14,15, myocardial infarction16,17, coronary artery disease17,18,19,20,21, ischemic stroke13,19,22, type 2 diabetes23,24,25, and venous thromboembolism26,27,28,29,30,31,32,33. A 2012 meta-analysis found that, in addition to individual variants, a non-O blood type is among the most important genetic risk factors for venous thromboembolism34. These conditions are also relevant for COVID-19. For example, coagulopathy is a common issue for COVID-19 patients35,36,37,38,39,40,41, and risk of venous thromboembolism must be carefully managed42.

The numerous associations between conditions and both blood type and COVID-19 provide reason to believe that true associations may exist between blood type and morbidity and mortality due to COVID-19. In addition, previous work has identified associations between ABO blood groups and a number of different infections or disease severity following infections, including SARS-CoV-143, P. falciparum44, H. pylori45, Norwalk virus46, hepatitis B virus47, and N. gonorrhoeae48.

Rh(D) phenotypes (positive and negative Rh blood types) are associated with very few diseases compared to ABO49. Like ABO, Rh type is important for type compatibility and immune response. For example, hemolytic disease of the newborn is a concern when Rh(D) is mismatched between mother and offspring50. Other studies have found evidence that Rh-positive individuals are protected against the effects of latent toxoplasmosis51, though Toxoplasma gondii is a eukaryotic parasite52, not a virus like SARS-CoV-2.

In this study, we sought to understand the association between SARS-CoV-2 infection/COVID-19 and blood type using electronic health record (EHR) data from NewYork-Presbyterian/Columbia University Irving Medical Center (NYP/CUIMC) hospital in New York City, USA. We compared both ABO and Rh(D) blood types, and we investigated initial infection status and two severe COVID-19 outcomes: intubation and death. We evaluated potential confounding due to population stratification using a multivariate analysis, and we report clinically meaningful measures of effect.

Results

Data collection and cohort selection

We determined blood types using laboratory measurements recorded in the NYP/CUIMC EHR system. After removing likely errors, such as individuals with contradictory blood-type results, we identified 14,112 adult individuals with known blood types who received at least one SARS-CoV-2 swab test (Table 1 and Supplementary Fig. 1). We performed chi-squared tests of independence and found insufficient evidence to conclude that the blood-group frequencies differ between SARS-CoV-2-tested and non-tested groups (Supplementary Table 1). Individuals were considered initially SARS-CoV-2-positive (COV+) if they tested positive on their first recorded test or within the following 96 h. We evaluated associations between blood types and outcomes using three comparisons: infection prevalence among initial tests and survival analysis for intubation and death among individuals with infections confirmed by the swab test. We report on clinical data as of August 1, 2020.

Table 1 Summary demographics for SARS-CoV-2-tested individuals at NYP/CUIMC, stratified by blood type.

Infection prevalence

The unadjusted prevalence of initial infection was higher among A and B blood types and lower among AB types, compared with type O (Table 2 and Fig. 1). To avoid bias with respect to healthcare utilization, length of hospital stay, and potential in-patient infection, we evaluated the prevalence of infection among individuals only during the first encounters in which they were tested. In addition, to account for the considerable risk of false-negative tests53,54 and the fact that providers would repeat the test in patients with high clinical suspicion for COVID-1955, any positive test during the first 96 h of an encounter was considered evidence of initial infection.

Table 2 Effect size estimates for blood types with and without correction for race and ethnicity.
Fig. 1: Estimated risk differences for blood types during the period from March 10 to August 1, 2020.
figure 1

Values represent risk differences for each blood type relative to the reference groups: O for ABO and positive for Rh(D). Prevalence differences were computed using linear regression, while intubation and death were computed using the Fine-Gray model. Estimated differences are represented as points. 95% confidence intervals (CI, represented as bars) were computed using the Austin’s method with n = 1000 bootstrap iterations. Adjusted models include race and ethnicity as covariates.

Blood-type frequencies vary across ancestry groups56, so we evaluated the confounding effect of ancestry by adjusting for race/ethnicity (proxies for ancestry). We compared infection prevalence with linear regression, using reference groups O for ABO and Rh-positive for Rh(D) and using bootstrap to compute 95% confidence intervals for each estimate57. With adjustment for patient race and ethnicity, prevalences among types A, AB, and B were higher than type O. Rh(D)-negative individuals had a 2.7% lower risk of initial infection after adjustment for ancestry, consistent with a lower risk before adjustment.

Survival analysis for intubation and death

Next, we examined intubation and death using a survival framework to understand how blood type affects progression to disease outcomes over time. Specifically, we used the Fine-Gray model58 to estimate cumulative incidence functions by blood type while accounting for competing risks and adjusting covariates. Death and recovery were competing events for intubation, and recovery was a competing event for death. Cohort entry was defined as the time of a patient’s first positive test result or the start of a hospital encounter if the first positive test occurred during the first 96 h of the hospitalization. In accordance with CDC guidelines for returning to work59, we defined a patient as having recovered only when 10 days had passed since the patient entered the cohort and only once the patient had been discharged. Patients appearing before July 30 were considered, and outcomes beyond August 1, 2020 were censored.

Blood type A was at decreased risk of both intubation and death relative to type O, while type AB was at increased risk of both outcomes (Fig. 1 and Table 2). Conversely, we found that type-B individuals were at higher risk of intubation but at lower risk of death, compared with type O. Individuals negative for Rh(D) were at decreased risk for both intubation and death, consistent with a lower risk of initial infection. Overall, we estimate between 0.1 and 8.2% absolute risk differences between blood groups, after adjusting for race and ethnicity.

Discussion

Better understanding COVID-19 is imperative given the current pandemic’s toll. We investigated whether blood type is relevant for risk of infection, intubation, and death. Overall, we found modest but consistent risk differences between blood types. After adjusting for ancestry (the relevant confounder for this analysis), estimated risk differences were larger for intubation and death outcomes than for initial infection. We estimate larger risk differences between Rh blood types than between ABO types, with Rh-negative individuals being at lower risk of all three outcomes. Type A had lower risk of intubation and death compared with types AB and O. Only type B had inconsistent effects between intubation and death—type B increased risk of intubation and decreased risk of death compared to type O. We also found consistent evidence for protective associations between Rh-negative blood groups and SARS-CoV-2 infection, intubation, death. Overall, blood type appears to have a consistent effect, though the magnitudes of these effects on risk of intubation or death are modest, and our estimates have large uncertainties relative to their magnitudes. The relatively large estimated errors in our analysis also suggest modest effect sizes and that greater sample sizes or meta-analyses are needed to estimate these effects more precisely.

After adjusting for ancestry by proxies of race and ethnicity, we found that types A and B conferred greater risk of an initial positive test compared to type O, while type AB (the rarest), conferred a very-small risk decrease (0.2%). These results are consistent with an association discovered for SARS-CoV-1, in which O blood groups were less common among SARS patients43. Our results are also mostly consistent with the results reported by Zhao et al.5, where non-O types appear to be at greater risk of infection, and with Ellinghaus et al.6, where non-O appears to be at greater risk of infection but at lesser risk of mechanical ventilation, though the authors note that this decreased risk is not statistically significant at the 5% level. Unlike Ellinghaus et al., though, we estimate slightly higher risk for types B and AB relative to O for intubation.

Our results are based on data collected as part of hospital care during the early course of the pandemic, where outpatient testing was severely limited due to testing capacity and supply limitations. As such, our data are highly enriched for severely ill patients, and the absolute risk values we report are not generalizable to all SARS-CoV-2-infected individuals. A considerable fraction of infections is mild or asymptomatic60,61,62,63, while our data represent predominantly the most severe cases. Selection bias is a fundamental limitation of our study, so all our effect estimates are conditional on presentation to the hospital. Nonetheless, we minimized additional selection bias by making cohort criteria for cases and controls differ only with respect to the outcome of interest. Moreover, we found concordance between SARS-CoV-2-tested individuals and the general population at NYP/CUIMC in terms of blood type (Supplementary Table 1). Consequently, our results are not affected by selection bias with respect to blood type, unlike some other blood-type case/control study designs—particularly those using blood donors as controls, where enrichment of type O can be expected6.

False negatives and time delay between test administrations and the return of their results both introduce noise to this analysis. We attempted to account for these biases by setting cohort entry at the time of first contact with the hospital when the patient tested positive <96 h thereafter. This definition is imperfect, as 96 h is sufficient for an individual infected shortly after admission to test positive (albeit with probability roughly 0.33)64, but it is necessary to allow sufficient adjustment for the considerable time delay and retesting following false negatives. Another source of noise is the fact that not all intubations and deaths following a confirmed infection are related to COVID-19 (e.g., intubation during unrelated surgery). We defined recovery in an attempt to minimize this issue, though we recognize that our definition is imperfect. Patients may be discharged prematurely and later return following onset of severe symptoms. Moreover, our 10-day cutoff for recovery is based on CDC guidelines for returning to work59, which may be refined as additional evidence becomes available. Further work is needed to refine the definition of recovery and to determine which outcomes may be causally linked to COVID-19.

The ABO gene is highly polymorphic65, and blood types have considerably different distributions across ancestry groups56. Like ABO, Rh groups are not distributed equally across race/ethnicity groups, with enrichment of Rh-negative among white and non-Hispanic individuals (Table 1). In addition, negative Rh blood groups are less common, representing only 9% of individuals in our data. While genetic data were not available for the patients included in our study, we used self-reported race and ethnicity as imperfect proxies for genetic ancestry. Adjusting for these covariates had a noticeable effect on our comparison of infection prevalence, but did not have an equally relevant effect on intubation or death (Fig. 1 and Table 2). This suggests that blood type may have a lesser, more confounded effect on infection prevalence than on intubation or death following confirmed infection. Nonetheless, race and ethnicity cannot fully capture ancestry, so the associations between blood types and COVID-19 that we report may still be confounded by ancestry, even after adjustment. Further work is needed to better understand any potential residual confounding due to ancestry, not captured by race and ethnicity.

In this study we found evidence for associations between ABO and Rh blood groups and COVID-19. Using data from NYP/CUIMC, we found moderately increased infection prevalence among non-O blood types and among Rh-positive individuals. Intubation risk was increased among AB and B types, and decreased among A and Rh-negative types. Risk of death was slightly increased among type AB individuals and was decreased among types A, B, and Rh-negative types. All estimates were adjusted for patient ancestry using self-reported race and ethnicity. Our results add further evidence to the previously discovered associations between blood types and COVID-19.

Methods

Data collection and cohort selection

We identified the cohort for this study by filtering the NYP/CUIMC data warehouse for patients with a recorded SARS-CoV-2 test and those having a recorded blood type. Next, we removed any individual with multiple, contradictory blood-type measurements, reflecting likely errors in the data. Finally, we excluded individuals below age 18 from our analysis.

Blood group was determined using laboratory measurements coded using descendant concepts of LOINC LP36683-8 (ABO and Rh group). Intubation was assessed using completed procedures having the procedure description, “intubation”. We grouped race into five categories and ethnicity into two. Specifically, we considered only Asian, Black/African-American, and White, categorizing other listed races (all of which were small minorities) as “other”, and missing or declined race as “missing”. Ethnicity was grouped as either Hispanic or non-Hispanic. This study is approved by the Columbia University IRB (#AAAL0601).

Covariate adjustment

We sought to estimate total effects of blood type on COVID-19 outcomes. Using a graphical model (Supplementary Fig. 2), we identified ancestry as the only confounding variable for an estimate of total effect, since blood type is genetic and varies across ancestry groups. As genetic data were not available, we used self-reported race and ethnicity as proxies for ancestry. We were unable to identify a method to alleviate selection bias in our data, so the effects we report are conditional on presence at NYP/CUIMC.

Infection prevalence

We considered three outcomes: initial infection, intubation, and death. Our evaluation of initial infection sought to assess the infection prevalence differences among individuals presenting to the hospital, not those potentially infected at the hospital or long after their first test. Due to the high risk for false negatives53,54, we considered any positive test <96 h after the start of an encounter as evidence of initial infection. Initial infection risk differences between blood types were assessed using linear regression, and race/ethnicity were adjusted by including them as covariates. We used Austin’s bootstrap method to compute 95% confidence intervals for all risk estimates57, using 1000 bootstrap iterations.

Survival analysis for severe outcomes

We assessed intubation and death as severe outcomes of COVID-19, and evaluated blood-type effects using survival analysis. Individuals entered the at-risk cohort either at the time of their first positive test result, or at the time of first contact with the hospital when the first positive test occurred within 96 h of the start of a hospital encounter. Patients with Do-Not-Intubate orders were excluded from consideration for the intubation outcome. We defined a patient as recovered only after being discharged from the hospital and only once 10 days have passed since cohort entry. Death and recovery are competing risks for intubation, and recovery is a competing risk for death. Finally, outcomes beyond August 1 were censored, as this was the last date for which we have data available. Intubation and death were assessed using Fine-Gray models, which can estimate cumulative incidences. As before, race and ethnicity were adjusted by including them as covariates, and confidence intervals were computed with 1000 bootstrap iterations.

Software

We conducted our analyses using the R language (version 4.0.1), the cmprsk (version 2.2-10) package66 implementation of the Fine-Gray model, MySQL version 5.6, and tidyverse meta-package version 1.1.2. The manuscript was written openly on GitHub using Manubot67.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.