A population-based cohort study of obesity, ethnicity and COVID-19 mortality in 12.6 million adults in England

Obesity and ethnicity are known risk factors for COVID-19 outcomes, but their combination has not been extensively examined. We investigate the association between body mass index (BMI) and COVID-19 mortality across different ethnic groups using linked national Census, electronic health records and mortality data for adults in England from the start of pandemic (January 2020) to December 2020. There were 30,067 (0.27%), 1,208 (0.29%), 1,831 (0.29%), 845 (0.18%) COVID-19 deaths in white, Black, South Asian and other ethnic minority groups, respectively. Here we show that BMI was more strongly associated with COVID-19 mortality in ethnic minority groups, resulting in an ethnic risk of COVID-19 mortality that was dependant on BMI. The estimated risk of COVID-19 mortality at a BMI of 40 kg/m2 in white ethnicities was equivalent to the risk observed at a BMI of 30.1 kg/m2, 27.0 kg/m2, and 32.2 kg/m2 in Black, South Asian and other ethnic minority groups, respectively.

O besity has emerged as one of the most characterised risk factors internationally for coronavirus disease 2019 (COVID- 19) severity and mortality in both community and in-patient settings [1][2][3][4][5][6][7] . The strong association between obesity and COVID-19 outcomes has been suggested to result from a deleterious change in the role of circulating adipocytokines leading to a pro-inflammatory state with subsequent predisposition to thrombosis, incoordination of innate and adaptive immune responses, inadequate antibody responses, and the cytokine storm 1 .
There is growing evidence that the strength of association between BMI and COVID-19 outcomes may be modified by key sociodemographic factors, most notably ethnicity 6,7 , which is also an important risk factor of COVID-19 severity and mortality, with risk up to four times greater in Black and South Asian ethnicities [8][9][10] . In a study of 65,932 in-patients admitted with COVID-19 7 , a coding of obesity was associated with a higher risk of intensive care, mechanical ventilation or in-hospital mortality in all ethnic groups, but with the greatest risk observed in Black ethnicities with obesity 7 . A community study of 6.9 million adults from general practices in England also found the association between BMI and COVID-19 mortality at the start of pandemic was strongest in Black ethnicities 6 . However, whilst ethnicity has been shown to modify associations between BMI and COVID-19 outcomes, previous research has not quantified how this interaction affects both within-ethnicity and between-ethnicity risk across the spectrum of BMI. An early analysis of 5,623 community and in-hospital test results suggested the potential importance of this by showing the risk of SARS-CoV-2 positivity was not different between ethnic groups at low BMI, but was over twofold higher in ethnic minority groups compared to white ethnicities at high BMI 11 . This has not been explored in larger representative community cohorts or with COVID-19 outcomes.
Previous analyses with cardiometabolic outcomes have used the differential associations between ethnicity and BMI to calculate thresholds for obesity in ethnic minority groups where risk is equivalent to white ethnicities at established thresholds for obesity (e.g. 30 kg/m 2 ) [12][13][14] , with current guidelines suggesting that thresholds for ethnic minority groups should be reduced by 2.5 kg/m 215, 16 . It is unclear whether these guidelines are applicable to COVID-19 outcomes. Therefore elucidating the within and between ethnicity risk with COVID-19 mortality has important implications for public health policy and guidelines in relation to infectious disease.
The aim of this study was to use linked national Census, electronic health care records and mortality datasets to investigate the interaction between BMI and ethnicity in the risk of COVID-19 mortality, quantify how the difference in risk between ethnic groups varies by BMI, and generate risk equivalency at established BMI thresholds for class I, II, and III obesity.

Results
Cohort characteristics. This analysis included 11 Associations of BMI, ethnicity and COVID-19 mortality. BMI was associated with COVID-19 mortality in all ethnic groups. However, compared to white ethnicities, the J-shaped association were steeper in Black, South Asian and other ethnic minority groups (P < 0.001 for interaction) (Fig. 1A, with specific values highlighted in Table 2  There are 32,844 Lower Super Output Area (LSOA) areas in England, with a mean population of 1500 and a minimum of 1000. We calculated density as LSOA population divided by LSOA area. Household deprivation is defined across four dimensions: employment (at least one household member is unemployed or with long-term sickness, not including full-time students); education (no household member has at least Level 2 education, and no one aged 16-18 years is a full-time student); health and disability (at least one household member reported their health status as being 'bad'/ 'very bad' or has a long-term health problem); and housing (the household's accommodation is overcrowded, with an occupancy rating −1 or less, or is in a shared dwelling, or has no central heating). Approximate Social Grade is a socio-economic classification based on the occupation, employment, qualification, and tenure of the household reference person. Key worker type is defined based on the occupation and industry code. 'Exposure to disease' and 'proximity to others' are derived from the O*NET database, which collects a range of information about individual working conditions linked to specific occupational codes. To calculate the proximity and exposure measures, the questions asked were: (i) How physically close to other people are you when you perform your current job? (ii) How often does your current job require that you be exposed to diseases or infection? Scores ranging from 0 (no exposure) to 100 (maximum exposure) were calculated based on these questions. Health data were extracted from primary care records, apart from solid organ transplant and stroke which were extracted from hospital records. When analysis was repeated using a broader range of ethnic classification, the pattern of results mirrored the main finding with South Asian ethnicities (Bangladeshi and Pakistani) having the greatest risk at higher BMI ( Supplementary Fig. 7). White ethnicities had the lowest risk. However, Chinese ethnicities may not reflect the wider trend for other ethnic minority groups, with associations similar to white ethnicities.

Discussion
In 12.6 million adults with linked Census, electronic health care records and mortality data, BMI was associated with COVID-19 mortality amongst all ethnic groups, but with a stronger association in ethnic minority groups. The interaction between BMI and ethnicity with COVID-19 mortality revealed an ethnic risk that was also dependant on BMI. There was no difference in risk between Black and other ethnic minority groups compared to white ethnicities at a low BMI of 20 kg/m 2 , and only a modestly elevated risk in South Asians (HR = 1. 21 This is the first large-scale population-based study to show the continuous association between BMI and COVID-19 mortality across different ethnic groups on a population level, and to provide BMI values that show equivalent risk at commonly used thresholds for obesity classifications. Our findings are consistent with previous observations in a community setting at the start of the pandemic and a later in-hospital study 6,7 , which also observed an interaction between BMI and ethnicity with COVID-19 outcomes. We extend these previous studies by quantifying the shape of the interaction across a continuous measure of BMI using linked Census and health care records up to the end of 2020, which allowed for the adjustment of detailed sociodemographic characteristics and comorbidities in a population-level dataset. Our findings suggest that, unlike other health outcomes such as type 2 diabetes 12-14 , it may not be possible to achieve BMI threshold equivalency in the risk of COVID-19 mortality for class I obesity. Current guidelines suggest that BMI thresholds for obesity classifications should be reduced by 2.5 kg/m 2 in ethnic minority groups 15, 16 . This study suggests that applying these criteria to COVID-19 mortality will only have a marginal impact and still produce thresholds where risk is substantially elevated in ethnic minority groups compared to white ethnicities.
The shape of association between BMI and COVID-19 mortality or hospital admissions was J shaped, particularly in white and other ethnic minority groups, suggesting that the positive association between BMI and COVID-19 outcomes do not extend to lower levels of BMI where low BMI may also be associated with an elevated risk. This is consistent with meta-analyses for allcause mortality which have reported the nadir in risk occurs between a BMI of 25 to 30 kg/m 217,18 . The shape of association in the present study could be explained by the fact that low levels of BMI are associated with malnutrition and higher levels of frailty and sarcopenia 19 , which are in themselves associated with a greater risk of COVID-19 20,21 . The finding that the association between BMI and COVID-19 mortality was stronger in those under 70 years of age is consistent with previous observations from the United States and Europe and provides further evidence for the importance of BMI as a risk factor in younger populations [5][6][7] . It is plausible that weaker associations between   Reports the risk of an event relative to white ethnicities at specific BMI values. Therefore at each BMI value the modelled risk is provided for Black, South Asian and other ethnic minority groups relative to white ethnicities.
All data adjusted for region, population density, urban/rural classification, deprivation (area and household), social grade, qualification, household size and tenure, household composition (multigenerational, with children), key worker status and type, occupational exposure to disease, occupational exposure to others. BMI and COVID-19 in older individuals may reflect the greater absolute risk of COVID-19 with age and the risk profile in older normal or underweight weight individuals with frailty or other factors 5,22 . The reasons underpinning the observed ethnicity by obesity interaction are unclear. It has previously been suggested that ethnic minority groups may have a stronger innate inflammatory response to viral infection or chronic disease [23][24][25] , thus potentially increasing the risk of severe COVID-19 23 . It is possible that the presence of greater levels of adiposity interacts with and accelerates this inflammatory response in ethnic minority groups 7 . However, unlike previous findings from the start of the pandemic or from hospital settings where the risk with obesity was found to be greatest in Black ethnicities 6,7 , this research using national level data from primary care during the first year of the pandemic suggests that the risk with obesity is greater in all minority ethnic groups compared to white ethnicities, with South Asian ethnicities at greatest risk. This mirrors the interaction between ethnicity and BMI with cardiometabolic disease where risk has also consistently been shown to be greatest in South Asian ethnicities [12][13][14] . Further research in this area, including the potential of genetic and epigenetic factors, is warranted.
A major strength is the large population-level dataset linking national Census and health care record data making it the largest analysis of its kind to date. Linkage between clinical records and Census data allowed for the extraction of BMI from primary care records and ethnicity from Census data, which is a major strength as ethnicity is not universally coded within primary care 26 , with a previous study investigating COVID-19 risk factors in England finding ethnicity coding was missing in over 25% of clinical records 10 . We were also able to extract detailed descriptive and covariate data from a wide range of sociodemographic and clinical factors, allowing for the adjustment of potentially confounding variables including household and area indicators of deprivation and established clinical risk factors for COVID-19 mortality. However, there were limitations. Most notably, this analysis is generalisable to the 52.4% of the English population with coded BMI data within their health care records in the 10 years preceding the pandemic. In England, height and weight are collected as part of routine care. Nevertheless, family practice incentivisation schemes and differential take-up rates to population-level vascular screening programmes means that data are not missing at random 27 . Previous analysis has shown that women, those who attend their family doctor more often, who come from more deprived areas, who have a high or low BMI and have a greater number of comorbidities are more likely to have a coded BMI value 27 . Nevertheless, the pattern of overweight and obesity in this study (66.9% for white ethnicities, 77.4% for Black ethnicities, 65.4% for South Asian ethnicities and 59.4% for other ethnic minority groups) were similar to national survey data were the highest rates (consistently above 70%) are reported in Black ethnicities 28 . It has also been demonstrated that complete case analysis excluding missing data within clinical records can provide unbiased estimates of adjusted exposure-outcome associations under a wide range of missing data assumptions 29 , particularly when missingness is independent of the outcome, as was demonstrated for this study. In addition, primary care data in England provide some of the most detailed electronic health care records internationally and are routinely used to identify individuals at risk of chronic and infectious diseases, including COVID-19 mortality 30,31 , giving this study real-world utility. However, as this is real word administrative data, it is possible not all COVID-19 deaths or hospital admissions were captured, or conversely, some deaths or hospital admissions may have been coded as attributable to COVID-19 in error. It is notable though that a high proportion of COVID-19 deaths (94.0%) were coded as U07.1, therefore relatively few deaths were subject to symptom-based or epidemiological diagnosed cases. Although we report a secondary outcome of hospital admissions as a marker of disease severity, data was not available for in-hospital treatment. This study utilised data from the 2011 Census, therefore any sociodemographic changes within the last decade will not be reflected in the analysis. Although we adjusted for factors related to the risk of SARS-CoV-2 exposure, including household composition, key worker status and exposure to others, it is not possible to verify whether the associations observed with BMI and ethnicity were due to greater disease severity, greater SARS-CoV-2 exposure and infection rates, or a combination of both. Therefore results should be interpreted simply as the populationlevel risk of dying from COVID-19 during the first year of pandemic.
In conclusion, this study of linked Census, electronic health records and mortality data demonstrated a notable interaction between ethnicity and obesity in the risk of COVID-19 mortality and hospitalisation, with obesity having a stronger association in all ethnic minority groups compared to white ethnicities. These results further emphasise the importance of public health messages to reduce levels of obesity within the population, particularly within ethnic minority groups. Future work is needed to investigate how these risk factors interact with post COVID-19 vaccination infection and mortality risk. was then used to match records that were not linked deterministically, using 13 different combinations of personal identifiers 33 .
Our analysis was restricted to those over 40 years of age on December 31, 2019 due to poor coverage of BMI values in GDPPR in younger populations.
Of the 32,755,633 people enumerated at the 2011 Census in England and Wales aged ≥40 years on December 31st 2019, 31,498,128 people were linked deterministically or probabilistically to the NHS Patient register, and of these, 27,477,607 individuals were alive on 24 th January 2020. As linked family practice data was only available for England, the English population with linked GDPPR data included 24,026,950 people (see sample flow diagram in Supplementary  Fig. 8). Of these, 12,591,137 (52.4%) had valid BMI data and were included within the primary analysis.
Exposure. In England, height and weight are collected during routine primary care consultations by trained staff using medical grade equipment with BMI calculated (weight(kg)/height(m) 2 ) and coded as a continuous value within electronic health care records. For this study, BMI was available within the GDPPR extract of primary care records, reflecting the BMI value coded within primary care that was closest to December 31, 2019, with data available from January 2010. Participants without a recorded BMI in primary care within this 10-year window were coded as missing. In order to remove outliers and potentially spurious values, a data-driven approach was used, restricting the analysis from the 2.5th (17.4 kg/m 2 ) to 97.5th (41.0 kg/m 2 ) percentile of the distribution.
Outcome. COVID-19 related death (either in hospital or out of hospital) was the primary outcome for the analysis and defined as confirmed or suspected COVID-19 death, which was identified by ICD-10 codes U07.1 (lab-confirmed COVID-19) or U07.2 (clinically/epidemiologically-diagnosed COVID-19 when a lab-confirmed test is inconclusive or not available) anywhere on the death certificate from 24 January 2020 until December 28, 2020.
Hospitals admissions for COVID-19, using a primary admission for COVID-19 (U07.1 or U07.2) were also extracted from HES from 24 January 2020 until December 28, 2020 and were included as an indicator of disease severity as a secondary outcome, as has been reported for other studies 34 .
Effect modifier. Self-reported ethnicity was coded from the 2011 Census, which asked respondents to select their ethnicity from 18 categories. For the purposes of this analysis we derived four categories: white (defined as British, Irish, other White), South Asian (Asian/Asian British defined as Indian, Pakistani, Bangladeshi), Black (defined as Black African, Black Caribbean, Black British, other Black) or other (all other classifications). Ethnicity was imputed in 3.0% of 2011 Census returns due to item non-response using nearest-neighbour donor imputation, the methodology employed by the Office for National Statistics across all 2011 Census variables 33 .
Covariates. Our analysis included key Census extracted sociodemographic data, including measures of household deprivation, household tenure and composition, occupation status (including key worker status) and social grade, educational attainment and exposure to disease or others, defined within Table 1. We also used the linkage to GDPPR and HES to extract up-to-date geographical information (including population density and area deprivation) and data on chronic diseases that have been shown to be associated with COVID-19 outcomes in the QCovid® prediction models 30 (Table 1).
Statistical analysis. Cox proportional hazard models were fitted with time to event measured in days from 24 January 2020 to the date of COVID-19 deaths or deaths from other causes or December 28, 2020, whichever came first. Non-COVID-19 mortality was analysed as a censoring event. A priori covariates were adjusted for in two models. Model 1 was adjusted for age, sex, ethnicity, geographic region, and other key sociodemographic factors (detailed in Table 1). Model 2 was adjusted for the same factors as Model 1, plus included clinical factors (Table 1). A BMI by ethnicity interaction term was included in both models. Given the potential for included clinical factors to act as mediators between BMI and COVID-19 mortality, the primary interpretation from the analysis was taken from Model 1. The proportional hazards assumption was assessed visually using log-log survival plots across quartiles of BMI. The strength of interaction was tested using a likelihood-ratio test. Restricted cubic splines were fitted with 3 knots at the 25th (23.2 kg/m 2 ), 50th (26.3 kg/m 2 ) and 75th (29.8 kg/m 2 ) BMI percentiles. A BMI of 22.5 kg/m 2 (representing a value within the normal range) in white ethnicities as the largest group was specified as the reference to which all other ethnic minority groups and BMI values were compared. Model fit was determined using the concordance statistic (c-index), with values over 0.8 interpreted as a strong model fit. Models were repeated for the outcome of hospital admissions to investigate whether associations of ethnicity and BMI with hospital admissions were consistent with the pattern of associations observed for mortality.
For descriptive purposes, values generated by the restricted cubic spline models were used to quantify the within and between ethnic risk in COVID-19 outcomes at specific BMI values (20, 22.5, 25, 30, 35, 40 kg/m 2 ). For COVID-19 mortality this data was also used to generate BMI values in minority ethnicity groups that would produce an equivalent risk to white ethnicities at the thresholds for class I (30 kg/m 2 ), II (35 kg/m 2 ), and III (40 kg/m 2 ) obesity.
When fitting the Cox models, we included all individuals who died (or were admitted to hospital when using hospital admission as an outcome) during the analysis period and a weighted random sample of those who did not, with a sampling rate of 1% for those of white British ethnicity and 10% for adults from ethnic minority groups. We applied case weights (defined as the inverse of the sampling rate) to all analyses.
In order to assess the pattern of results across sex and age, analyses for COVID-19 mortality were repeated stratified by sex and age (<70 years, ≥70 years). In order to assess whether the pattern of results for the broad ethnic categories of white, Black, South Asian and other mirrored the pattern of results in more detailed subcategories, the analysis was repeated using ten categories of ethnicity.
As BMI is likely to be missing not at random and influenced by many factors 27 , not all of which were captured in this analysis, multiple imputation of missing data was not attempted. Nevertheless, to assess whether the pattern of missingness varied by ethnic groups, we examined the proportion of missing data by ethnicity across regions. There was no clear systematic pattern of missingness by ethnicity ( Supplementary Fig. 9). We also undertook logistic regression to quantify whether ethnicity, covariates or outcome predicted missing data using the pseudo R 2 or area under the curve statistic (Supplementary Table 3). Missing data was found to be conditionally independent of the outcome indicating a lower risk of bias with the complete case analysis 29 .
Data are reported as mean (± SD) or hazard ratio (95% CI) unless detailed otherwise.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Analysed data are controlled by the Office of National Statistics, UK. Technical details of the Public Health Research Database (PHRD) incorporating the 2011 Census data for England and Wales, linked to Mortality Data, Hospital Episode Statistics (HES) data, and GP Extraction Service (GPES) data for Pandemic Planning and Research Data can be found through the Health Data Research UK Innovation Gateway https://web.www. healthdatagateway.org/dataset/a325f33e-bac8-49af-896f-1e025941dae8 Given the sensitive nature of the data, organisations and individuals will need to demonstrate they meet strict data security and information governance standards. The application form can be accessed and completed through Health Data Research UK Innovation Gateway https://web.www.healthdatagateway.org/dataset/a325f33e-bac8-49af-896f-1e025941dae8.

Code availability
The statistical code developed for this study has been archived and published separately 35 .