The interaction of ethnicity and deprivation on COVID-19 mortality risk: a retrospective ecological study

Black, Asian and Minority Ethnic (BAME) populations are at an increased risk of developing COVID-19 and consequentially more severe outcomes compared to White populations. The aim of this study was to quantify how much of the disproportionate disease burden can be attributed to ethnicity and deprivation as well as its interaction. An ecological study was conducted using data derived from the Office for National Statistics data at a Local Authority District (LAD) level in England between 1st March and 17th April 2020. The primary analysis examined how age adjusted COVID-19 mortality depends on ethnicity, deprivation, and the interaction between the two using linear regression. The secondary analysis using spatial regression methods allowed for the quantification of the extent of LAD spillover effect of COVID-19 mortality. We find that in LADs with the highest deprivation quartile, where there is a 1 percentage point increase in “Black-African (regression coefficient 2.86; 95% CI 1.08–4.64)”, “Black-Caribbean (9.66: 95% CI 5.25–14.06)” and “Bangladeshi (1.95: 95% CI 1.14–2.76)” communities, there is a significantly higher age-adjusted COVID-19 mortality compared to respective control populations. In addition, the spatial regression results indicated positive significant correlation between the age-adjusted mortality in one LAD and the age-adjusted mortality in a neighbouring LAD, suggesting a spillover effect. Our results suggest targeted public health measures to support those who are deprived and belong to BAME communities as well as to encourage restricted movement between different localities to limit disease propagation.

www.nature.com/scientificreports/ association with socio-economic characteristics in more detail and also for the urgent need to identify cultural risk factors which may be susceptible for intervention 10,11 . When considering the effect between ethnicity and deprivation, it must also be important to analyse "neighbourhood effects, " which as seen even in the Global Burdens of Disease work have a substantial effect on population health 12,13 . Neighbourhood effect is defined as the impact from living in a particular locality 14 , in this case we can think of it as a LAD. A large literature exists that suggest that one's neighbourhood can affect individual outcomes, including health 15 . Spillovers refer to effects percolating from one LAD to a contiguous one. Therefore, for an effective analysis examining the interaction between ethnicity, deprivation and COVID-19 risk we must consider the impact of the neighbourhood effect as well as spillover effects between neighbouring regions. Not considering these spillover effects would potentially underestimate the full effect of interaction between ethnicity, deprivation and COVID mortality. Although some work has started to examine spatial distribution of COVID-19 even within the UK, to our knowledge no studies have examined spatial distribution of infection rates between Local Authorities Districts (LAD) in the UK which would give an indication of how COVID-19 is being transmitted due to social interaction and local travel [16][17][18] .
Therefore, the first aim of this study was to examine the interaction between ethnicity and deprivation when associating ethnicity to COVID-19 mortality. In addition, we aimed to examine spillover effects between neighbouring LADs in England.

Methods
Study design and participants. In this study we used open source LAD (a subnational division of England used for the purposes of local government) level data from England to undertake a cross sectional analysis of our variables of interest. There are 317 LADs in England, and we used data from 315 of them excluding the City of London and the Isle of Sicily which were dropped due to missing observations. The primary focus of this study was to analyse the disparity in COVID-19 deaths for individuals belonging to BAME groups vis-àvis the White population (while neither of these categories are homogenous, we use sub-groups as used by the Census) 19 . Secondly, we assessed the importance of the index of multiple deprivation (IMD), in explaining the age adjusted mortality across the different local areas in England after we control for ethnicity. Finally, we quantified the extent of a spillover effect of COVID-19 disease burden that helped us to assess the impact of socialdistancing measures introduced by the government.
Data sources and definitions. The data are taken from different sources. Our dependent variable is the age standardised death rate data for all persons at LADs in England is taken from the ONS across 315 LADs from 1 March to 17 April 2020 20 . This is defined as the age adjusted mortality per 100,000 of the population of the LAD. We took into account LAD boundary changes which occurred in April 2019, for example Bournemouth, Poole and Christchurch combined as one LAD. The average score for the index of multiple deprivation, ethnicity data and educational attainment used were taken from the Office for National Statistics 19,21 . Data on density of population expressed as total population per hectare was taken from Census 2011, Table PHP01. A-level educational qualification (proportion with GCE A level or equivalent-aged 16-64) were taken from https:// www. nomis web. co. uk/ query/ const ruct/ compo nents/ stdSe arch. asp? menuo pt= 7& subco mp= 131.
We classified ethnicity using the broad headings used in the 2011 Census 22 . Our dependent variable and independent variables are described in Table 1. www.nature.com/scientificreports/ The dependent variable is the age adjusted COVID-19 death rate per 100,000 in the population. The 'Quartile Dummy i (j = 1, 2, 3,4)' presents the quartile dummies for the index of multiple socio-economic deprivation. We use 'Quartile 1 Dummy' representing the bottom 25% (least deprived) as the base. In addition, two other explanatory variables were incorporated to capture population per square hectare kilometre within each LAD (i.e. pop − density ) and education status within a LAD (edu-status) measured in terms of proportion of individuals within LAD who have completed A-level. Each ethnicity variable represented the % of people with that ethnicity in that LAD. Our categorisation of the ethnicity variable departs from other previous studies by disaggregating within broad ethnic groups 10 . Instead of using Black as a whole, we divide 'Black' into 'Black-African' , 'Black-Caribbean' and 'Black-Others' . For the Asian ethnicity, we use the following categories: 'Bangladeshi' , 'Indian' , 'Pakistani' , ' Arab' , 'Chinese' and ' Asian-Other' . The 'Mixed' category is a combination of 'Mixed White & Asian' , 'Mixed White & Black African' , 'Mixed White and Black Caribbean' and 'Mixed Other' . Along with the above, we also have 'Other Ethnic group' and the excluded group is the 'White' community which hence serves as the reference category.
Statistical analysis. Two types of statistical analysis were performed.
A: An OLS (ordinary least squares) regression model was estimated to see the impact of ethnicity, particularly the interaction of ethnicity with economic deprivation in explaining the mortality differential across the LADs while controlling for population density and education status that may also affect mortality.
We note that the sum of the ethnic population percentages in each LAD equals 100. Therefore, we have excluded the 'White' community, which acts as the reference category, to eliminate the collinearity problem. The estimated equation takes the following form: where the dependent variable is the age adjusted COVID-19 death rate per 100,000 in the population in LAD i. The independent variables are as described above. The term u represents the error term.
To justify the linear specification and rule out a non-linear relationship, we performed a specification test where the predicted value and its square from the estimated OLS regression were used as explanatory variables in a separate regression with the same dependent variable. It turned out that the squared of the predicted value had a p-value of 0.416, implying that our OLS model was correctly specified. Therefore, we excluded the possibility of a non-linear relationship. To test for presence of multicollinearity, we ran a model without the interaction terms. The average variance inflation factor was 5.60 which is lower than the tolerance value of 10.
The model performance in terms of goodness of fit statistics ( R 2 ) is 0.782. We also performed an information matrix test for the regression model and an orthogonal decomposition into tests for heteroskedasticity, skewness, and kurtosis. The p value for test of heteroskedasticity, skewness, and kurtosis was 0.474, 0.991 and 0.150 respectively justifying our specified model.
As a robustness check, given the dependent variables is censored between lower-limit (minimum) and upperlimit (maximum), we ran a censored regression model. The results obtained from the censored regression model were very similar to the one reported in the paper from the linear regression.
B: Spatial analysis. Figure 1 displays the age-adjusted mortality rates along with index of multiple deprivations across 315 LADs in England, comprising an area with 54,706,877 inhabitants and 173,673 inhabitants per LAD on average. The colour scheme depicted in a colour gradient showed lowest to highest values of age-adjusted mortality rates (in shades of blue) and index of multiple deprivation (in shades of green).
The COVID-19 age standardised mortality rate is clustered in the London metropolitan area and its surroundings, the Midlands, the Liverpool-Manchester area, the south of the Cumbria region, and in the North East (especially Durham and Newcastle), and lower down in the South, the South West, the Yorkshire and Humber region, the East, and in the East Midlands. This picture contrasts with the existing disparities in IMD across (1) To investigate spillover effect from neighbouring LADs (keeping in mind the greater likelihood of social interaction, even during the lockdown, across neighbouring LADs), we undertook a spatial regression analysis using IMD and percentages of ethnic population as the explanatory variables. We allowed for spatial dependence in the age standardised mortality rate (especially given the North-South divide in England). In particular, we estimated the following equation: where W denotes a spatial contiguity matrix, and e the i.i.d disturbance term (Drukker et al. 23 ).
We also estimated a spatial regression model without the interaction terms. The estimated coefficient on the spatial lag of age-adjusted mortality is 0.407 (CI 0.292-0.522, p < 0.001), indicating positive correlation between the age-adjusted mortality in one LAD and the age-adjusted mortality in a neighbouring LAD. The total effect remained qualitatively the same as the model with interaction terms as reported in the paper.
(2)     Table 2 reflects the aggregate effect of being in the particular deprivation quartile and belonging to the corresponding ethnicity. It is clear that even after controlling for ethnicity, deprivation matters: most deprived LADs also exhibited significantly larger mortality compared to the affluent quartile (see 'Quartile Dummy 4' in Table 2). Interaction of economic deprivation with dis-aggregated ethnicity confirmed considerable heterogeneity within the BAME community. For example, a 1 percentage point increase in 'Black-African' population in the poorest LADs (Quartile 4) increased mortality rate by 2.861 per 100,000 population (CI 1.080-4.642, p = 0.002) and the corresponding increase for the 'Black-Caribbean' population in the fourth Quartile is 9.655 (CI 5.248-14.061, p < 0.001). The results were not as clear for the 'Bangladeshi' community, whereby the second and third Quartile effect was negative. Whereas a 1% point increase in 'Bangladeshi' community in the poorest LADs (Quartile 4) significantly increased COVID-19 mortality by 1.952 per 100,000 population (CI 1.144-2.760, p < 0.001) compared to the 'White' community. Similar positive significant impact was also established for the 'Indian' in the second and in the third Quartiles. 'Other ethnic group' across all the three deprivation quartiles record a significantly higher age-adjusted mortality rate compared to the 'White' group. A unit increase in population density (see pop-density in Table 2) increased mortality by 0.275 per 100,000 population (CI 0.111-0.439, p = 0.001).
Spatial regression results. The spatial regression results based on dis-aggregated ethnicity are reported in Table 3. The estimated coefficient on the spatial lag of age-adjusted mortality ( φ) was 0.462 (CI 0.355-0.569, p < 0.001), indicating positive correlation between the age-adjusted mortality in one LAD and the age-adjusted mortality in a neighbouring LAD.
Instead of regression coefficients, we focused on the obtained marginal effects for select ethnicity profile. The results are reported in Table 3 for five ethnicity groups: 'Black-African' , 'Black-Caribbean' , 'Bangladeshi' , 'Indian' and 'Pakistani' . Table 3 looks at the obtained results in a more disaggregated manner. Here we conducted comparison of three scenarios for each of the deprivation quartile: keeping the ethnicity at the 25th percentile, at the median and at the 75th percentile within each deprivation category. The results show that if the 'Black-Caribbean' population stood at the 75th percentile in comparison to the 25th percentile, the mortality rate would increase by 8.111 (CI − 0.464-16.687, p = 0.064), 11.966 (CI 4.274-19.658, p = 0.002), and 13.398 (CI 6.102-20.695, p < 0.001) for the 'Quartile dummy' 2, 3 and 4 respectively. The 'Black-Caribbean' showed an increase of 2.096 (CI − 0.120-4.312, p = 0.064), 3.092 (CI 1.105-5.080, p = 0.002) and 3.462 (CI 1.577-5.348, p < 0.001) when we compared the median in comparison to the 25th percentile. The other statistically significant increase was observed for the 'Black-African' , and 'Indian' for the second and third Quartile of deprivation and for 'Pakistani' only for the second Quartile of deprivation. Contrary, to previous literature, 'Bangladeshi' had a significantly lower age-adjusted mortality with deterioration in deprivation status. For example, if the 'Bangladeshi' population stood at the 75th percentile in comparison to the 25th percentile, the mortality rate would decrease by 8.184 (CI − 11.955 to − 4.413, p < 0.001), 5.004 (CI − 8.049 to − 1.959, p < 0.001) and 2.287 (CI − 4.505 to − 0.069, p = 0.043) for the 'Quartile dummy' 2, 3 and 4 respectively.

Discussion
We found that COVID-19 mortality disproportionately affects the local areas with an over-representation of individuals who are relatively socio-economically deprived and also belong to ethnic minority particularly the 'Black-Caribbean' and 'Black-African' . Linear regression estimates showed that considering 'South-Asian' community as one homogeneous entity can be misleading because there was considerable discrepancy among the ethnic subgroups ('Bangladeshi' , 'Indian' and 'Pakistani') across the different deprivation quartiles. Impact of living in a particular LAD (neighbourhood effect) is also confirmed by positive significant impact of population density within each LAD on the age adjusted COVID-19 mortality. The spatial regression results suggest a strong spillover effect in the disease burden. The spatial regression estimates also corroborates the higher risks of particularly the 'Black-Caribbean'-for each deprivation quartile, age adjusted mortality rate became significantly higher if the percentage of 'Black Caribbean' increased in that LAD. A similar pattern was also seen for the 'Black-African' , 'Indian' in second and third Quartile, and for 'Pakistani' in the second Quartile. Amongst the 'South-Asian' population, the 'Bangladeshi' behaves differently.
Pareek et al. 7 have rightly called for an urgent public health approach to understand the differential effect of ethnicity and understand the interplay of ethnicity with several factors. Apart from the inherent genetic disposition, our analysis sheds light on deprivation e.g. in inequality in access to resources for the higher prevalence of mortality in certain segments of the population. Our analysis findings largely correlate with existing evidence suggesting that economic deprivation with dis-aggregated ethnicity exhibits considerable heterogeneity within the BAME community. Consistent with ONS 4 and Apea et al. 24 individuals from 'Black-African' and 'Black-Caribbean' community and who are also economically more deprived record a significantly higher age-adjusted COVID-19 mortality compared to the 'White' .
However, our results are particularly interesting when we disaggregate by ethnic group. Our results are in contrast to previous literature 10 whereby no consistent pattern emerges when we compare the combined effect of ethnicity and economic deprivation across the different 'South Asian' community profile. Impact for the 'Bangladeshi' is negative and significant contrary to previous findings 10 . While we cannot fully explain this, many of these areas may have other compensating factors such as higher social capital and other LAD specific factors that we cannot observe in the data. An empirical analysis of how such differences in behavioural norms and choices across ethnicities have impacted their risk of contracting the deadly virus is left for future investigation. Despite some similarities with pre-existing literature, our results suggest a differential public health approach would be warranted to capture the nuances within the different ethnic minority groups.
The most important limitation with this type of study is the possibility of ecological fallacy, whereby the effect seen at a LAD level may not be generalisable to individuals in the region. Another important limitation is that in this study we were unable to adjust for important co-variates such as occupation and comorbidity status all of which may be risk factors for COVID-19 6 although we were able to account for population density and educational attainment. It is important to highlight that ethnic groups do have differing occupational profiles which may have an impact on their COVID-19 risk. Of the total employed Black population predominant share belonged to 'Caring Personal Service Occupations' (15.4%), 'Elementary Administration and Service Occupations' (14.3%) and in 'Health Professionals' (9%). In comparison for the employed White, there was a more uniform distribution across different occupations (for example while 6.9% were working in 'Caring Personal Service Occupations' a very large proportion (8.0%) were also 'Corporate Managers and Directors') 25 . The largest share of employed Asian is in 'Elementary Administration and Service Occupations' (50.5%). If we look at NHS per se, the latest data reveals that "Asian people made up a higher percentage of medical staff (at 29.7%) than non-medical staff (at 8.0%)" 26 . These occupational roles may put BAME communities at a greater risk of COVID-19 and may make it more difficult to practice social distancing 27 . Additionally, it is clear that BAME communities have a higher level of general comorbidity which may pose as an independent risk for COVID-19 28 .
In conclusion, our results clearly show that deprivation and its interaction with ethnicity play an important role in explaining COVID-19 mortality. The presence of spatial effects and spillover suggest family structures and social networks play an important role too. Social interactions between people across neighbouring regions can also spread the disease.

Data availability
The data used in this study was open source provided by the Local Authority. Any requests for the extracts used during this study period can be made to the corresponding author.