The use of Enhanced Vegetation Index for assessing access to different types of green space in epidemiological studies

,

It is a widely implemented assumption in epidiological studies that an increase in EVI is equivalent to an increase in greenness and/or green space.

•
We used linear regression models to test associations between EVI and potential sources of green reflectance at a neighbourhood level using satellite imagery from 2018.

•
We compared EVI measures with a 'gold standard' vector-based dataset that defines publicly accessible and private green spaces.

INTRODUCTION
Exposure to green space in the home neighbourhood has been associated with positive impacts on physical and mental health outcomes; including mortality, cardiovascular disease and well-being [1][2][3].Evidence suggests that various behavioural mechanisms such as viewing and spending time in the green space supports good health through being physically active, reducing stress, allowing social connectedness and time to relax [4,5].Factors such as socioeconomic deprivation may modify this relationship and studies have highlighted inequalities in the quality and accessibility to green space [6,7].However, longitudinal evidence is lacking [8] and there are no accepted frameworks for quantifying exposure to green space.
Therefore, definitions of exposure to green space are often nuanced, context-specific, and application-dependent [9].Past epidemiological and public health studies have assessed neighbourhood exposure to green space using many different measures.Vegetation indices based on remotely sensed data from satellite imagery [10][11][12] represent green vegetation and are widely used as a measure of greenness or green space [13].
Vector-based measures such as land cover maps [14][15][16], mapping agency data, crowdsourced data (e.g.openstreetmap) [17] and local government audits tend to be used to represent area based measures of exposure to green space such as size of nearest green space from the home location or proporation of an area-level boundary that contains green space.Survey data that record selfreported visits tend to represent exposure to green space as time spent in a green space or distance travelled to a green space [11,18,19].Furthermore, different approaches to defining and managing the green space among local and national government bodies can present challenges to understanding the impact of green space on health and well-being outcomes, and for translating evidence into policy and action [20,21].
Remote sensing is widely used for extracting information about the environment [22] through satellite sensors recording reflected and emitted radiance from the earth's surface.This radiance is classified into different wavelength ranges and the reflective range (0.4-2.5 µm) is used to identify the presence of vegetation within a pixel.The nature of remotely sensed data means that there are numerous advantages compared to other approaches [23].Data are: easily obtainable for large spatio-temporal ranges; open-source (e.g. 30 m from Landsat [24]); uniformly collected and therefore subject to less variability in the way data are defined; collected for municipal and administrative areas; and able to measure exposure in an objective and uniform way [25].Conversely, the challenges of working with satellite data include spatial and temporal resolution, cloud cover, shadows cast by buildings in dense urban areas [26], and missing smaller urban green spaces such as trees and pocket parks found in urban areas [27,28].Furthermore, finding appropriately cloud-free satellite data at the correct time of year can be particularly challenging for northern-hemisphere climates.Satellite data processing includes adjustments and masks to mitigate some issues of cloud cover, but it remains challenging to work with at a national level and may lead to exposure misclassification [27].
Satellite-derived vegetation indices (VIs) such as Normalised Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Soil-adjusted Vegetation Index (SAVI) and Leaf Area Index (LAI) have been found to be positively associated with mental and physical health outcomes and health-promoting behaviours [29][30][31][32][33].However, when defining exposure to green space, VIs have been implemented as a measure of both exposure to greenness [34][35][36] and green spaces [37][38][39][40]; with studies interpreting a greater index score to represent greater access to green space [41,42].Greater VI values have been interpreted as representing greater access to green space by area because VIs are indicators of green vegetation which is inherently what makes a green space.However, VIs are a dimensionless measure of green reflectance and therefore may not be linearly associated with 2-dimensional measures of access to green space [43].Recent studies have begun to acknowledge that there is a difference between greenness and green spaces when defining exposure to green space [11,44,45].Despite this, there have been few attempts to examine the association between VIs and objective area-based measures of green space [27,46] to contribute to understanding what changes in mean VI values mean to policy makers and planners.As studies begin to investigate mutidimensional aspects of green space attributes e.g.type and quality to understand more specifically how green spaces support good health and wellbeing, it is important to understand how to interpret changes in VIs in relation to changes in vegetaion amount and type [47].

Study overview
We wanted to evaluate the assumption that an increase in EVI is equivalent to an increase in accessible green space.To do this we used linear regression models to test associations between EVI and potential sources of green reflectance at a neighbourhood level using satellite imagery from 2018.We compared EVI measures with a 'gold standard' vector-based dataset that defines publicly accessible and private green spaces for 2018.We used EVI because it was developed to optimise the vegetation signal compared to NDVI by improving its sensitivity to high biomass regions and vegetation monitoring by reducing atmospheric noise [48,49].This was pertinent when considering Wales' climate and topography.
We hypothesised that EVI would be positively associated with amount of publicly accessible and private green space by area (m 2 ).For publicly accessible green spaces we predicted that the association would vary by green space type.Specifically, we defined our research questions as: 1. Is mean EVI positively associated with green space found within 300 m of households in Wales? 2. Are any associations modified by green space type? 3. Is mean EVI positively associated with private green space (i.e.average garden size) within 300 m of households in Wales?

Study background
The cross-sectional work reported in this paper was conducted as part of a wider longitudinal study called the Green-Blue Spaces project [45,50,51] where we developed a national level annual exposure variable for 1.49 million households over 11 years (2008-2019) using satellite derived EVI.Interpreting EVI scores in space and over time in relation to different types of green space was challenging and provoked us to conduct this cross-sectional evaluation to support the interpretation of the impacts of greenspace on mental health and wellbeing.We used 300 m buffers to coincide with World Health Organisation guidelines of green space access [52] and implemented a cross-sectional design for this investgation because we wanted to understand base-line associations before including more complex methodological considerations in the study design such as temporally aligning the EVI and aceess to green space data and seasonal changes in EVI values.

Study area and study subjects
This study was based in Wales (Fig. 1), a small nation (population: 3.17 million people; total area: 20,735 km 2 ) with a moderate sea climate [53].
Wales has relatively mild winters and precipitation all year, with regions of elevated terrain and coastal exposure to prevailing westerly winds contributing to its high rainfall and cloud cover.Two thirds of the population live in cities and urban settlements.

Generating exposure data
Satellite image data processing.We used Landsat 8 (30 m resolution) to create a measure of EVI for every residential address (n = 1.49 million) in Wales in 2018.We acquired images captured between May and July to temporally align EVI measures with peak greenness and minimise data gaps through cloud cover [54].We pre-processed the images using the Semi-Automatic Classification Plugin tool in QGIS [55] and applied DOS1 atmospheric correction to each image [22].We created cloud masks using the Cloud Masking for Landsat Products plugin [56] to set pixels covered by cloud in the satellite imagery to NULL.This prevented these values from influencing the final greenness metrics.We produced an annual composite image of Wales in QGIS by mosaicking different coverages together for the same year.Calculating exposure metrics: EVI was calculated using the red, blue and NIR reflectance bands found within the Landsat satellite-imagery and processed using the vegetation index GRASS tool in QGIS [57].This resulted in a raster dataset which contained EVI values for the whole of Wales with a range of −1 (water) to +1 (vegetation) [58], with healthy vegetation values typically found in the 0.2 to 0.8 range [58].To assign a neighbourhood EVI exposure value to each residential address location in Wales (n = 1.49 million), we created a Euclidean buffer of 300 m and created averages of EVI values that were found in assigned area (area of buffer with 300 m radius = 282,743 m 2 ).For coastal households, the buffer was clipped to the coastline to avoid underestimates of greenness.Using this buffer layer, we performed an intersection analysis with the EVI layer to estimate the density of green vegetation (Fig. 2).
Topographic vector data processing.We produced a UK-relevant typology to classify green spaces for urban and rural areas (supplementary material Table A, [45]).This addressed the need for typologies that facilitate crossdisciplinary and inter-sectoral work by developing a peer-reviewed typology of green space.We consulted with more than 30 stakeholders from across Wales working in Policy, Planning and the Third Sector to iteratively develop our final typology.Farmland was not included in the typology.Although farmland constitutes large areas in rural regions, it is privately owned land and therefore not publicly accessible.We only included publicly accessible green spaces and private gardens because our overarching aim was to contribute evidence about modifiable aspects of the built environment for planning and policy guidelines.Although private gardens are not publicly accessible, we included this private green space as private garden space will make up the majority of green space within 300 m of an individual's home location.
We used topographic vector data from multiple sources to create a map of green spaces in Wales for 2018.In summary, vector data from the UK's national mapping agency (Ordnance Survey's Master Map [59]) and local government audits were collated to create a dataset of all publicly accessible green spaces in Wales [60].We categorised land parcels according to the typology (see 2.2) and extracted private garden size from OS Master Map Wales [61].Finally, we calculated access to green space in terms of publicly accessible and private spaces within 300 m network distance of each household in Wales to coincide with World Health Organisation guidelines [52].We defined green space access as: (1) the total area of green spaces (subset by type) and ( 2) average garden size, within a 300 m linear buffer of the household point location.

Statistical analysis
We employed linear regression to investigate associations between EVI and total area of green space and average garden size (m 2 ).We stratified our analyses by urban and rural settings as classified by the Office for National Statistics (ONS) urban-rural classification [62].We also used Pearson correlation to explore the association between EVI green space by type using our typology.

RESULTS
Table 1 shows the three exposure assessment measures for 1.4 million households in Wales in 2018.We successfully linked 1.4 million households (95%) with an EVI and comparator vector-based green space measures.We lost comparators where garden size metrics weren't available (n = 5753) or there was no access point to publicly accessible green space (n = 60,930), as defined by the typology, within a 300 m network distance of a household.A consort diagram of data linkage is included in the supplementary material (Fig. A).

National temporal and spatial variation
The average EVI for 300 m buffers around residential in Wales was 0.28 (Table 1).This falls within the healthy-vegetation range for EVI (0.2-0.8).The spatial distribution of household greenness across Wales is consistent with the theoretical principles of an EVI estimate, with rural areas having higher EVI values than urban regions.Figure 3 shows the distribution of greenness for households.Areas where there are no households to are shown in white.Figure 3 shows rural areas have higher average EVI scores than those found in coastal and more populated areas.The northwest, southwest and mid-Wales have the highest greenness scores.These are the most rural regions of the country and there are several managed green spaces, including national parks, forests found in these areas.Residential locations in south Wales cities and coastal towns consistently had the lowest EVI values across the study period.

Associations between EVI values and green space type
Table 2 shows that for a nation-scale model, when total green space area is 0, EVI is predicted to be 0.24.This is comparable with the mean EVI of 0.28 reported in Table 1.The Pearson correlation coefficient is 0.33 and the adjusted r 2 highlights that total green space area accounts for 11% of the variation in EVI for Wales.The difference between urban and rural mean EVI are described by the intercept values of 0.25 for urban areas and 0.34 for rural areas.In urban areas, the Pearson correlation coefficient is 0.57 and the adjusted r 2 highlights that total green space area accounts for 32% of variation in EVI in urban areas.In rural areas, the Pearson correlation coefficient is 0.26 and the adjusted r 2 highlights that average garden size can account for 7% of the variation in EVI in rural areas.
Table 4 shows that when average garden size is 0, the model predicts that EVI will be 0.25 (mean EVI = 0.28, Fig. 1).The Waleswide model predicts that when average garden size increases by 1 m 2 , EVI increases by 0.0001.Conversely, to see a 0.1 unit increase in EVI index, garden size would need to increase by 100 m 2 (average garden size = 275 m 2 , Table 1).The Pearson correlation coefficient is 0.45 and the adjusted r 2 highlights that average  garden size can account for 20% of the variation in EVI.This means that 80% of the variation in EVI cannot be by average garden size alone.The difference between urban and rural mean EVI are described by the intercept values of 0.21 for urban areas and 0.33 for rural areas.In urban areas, as average garden size increases by 1 m 2 , EVI increases by 0.0002.Therefore, in urban areas, to see a 0.1 unit increase in EVI index score, garden size would need to increase by 50 m 2 .The Pearson correlation coefficient is 0.40 and the adjusted r 2 highlights that averge garden size accounts for 16% of variation in EVI in urban areas.In rural areas, average garden size does not contribute to a measurable increase in EVI (β = 0.0000, 95% CI: 0.0000, 0.0000).
The Pearson correlation coefficient is 0.34 and the adjusted r 2 highlights that average garden size can account for 11% of the variation in EVI in rural areas.The very small β values in Tables 3  and 4 suggest no measurable, real world relationship.When stratifying total publicly accessible green space by type, we observed no moderate or strong positive associations with EVI using Pearson correlation (Table 4).

DISCUSSION
In this paper, we calculated three exposure assessment measures for a national population located across rural and urban regions.We compared a satellite-derived greenness exposure measure (EVI) with two vector-based measures of access to public and private green space.Our results indicated that satellite-derived measures such as EVI offer the opportunity to measure exposure to greenness for populations across large spatial and temporal scales in an objective and uniform way.EVI quantifies vegetation greenness and is an indicator of biomass [63] therefore, greater EVI values may indicate more vegetation by area and/or by volume (i.e. a greater EVI value does not necessarily equal a larger green space by area).Therefore, care should be taken when interpreting defined incremental changes of EVI (e.g.0.1 or interquartile range) within a 300 m buffer zone as it is not possible to translate what incremental changes in EVI represent beyond changes in overall greenness.Our results are generalisable for temperate climates in the Northern Hemisphere.
Our work supports recent findings where a measure of access to green space was weakly correlated with NDVI [64].Our results also support previous research findings that satellite-derived measures may be the most efficient way to measure population-wide, longitudinal exposures.Increases in both greenness and access to green space are positively associated with health outcomes around the globe [65][66][67][68].Previous findings have reported that a defined increment of EVI (e.g.0.1 or interquartile range) within a 300 m buffer zone is associated with improvements in health outcomes [69].However, these studies do not indicate how these incremental changes can be translated for policy and practice in how to specifically modify the built environment to provide health-promoting environments [47].Beyond promoting general greening policy, current evidence is not able to articulate which modifiable aspects of the built environment should be promoted or invested in by planners and policy makers.Therefore, the results of this study are an important contribution in interpreting epidemiological evidence on the relationship between EVI and health outcomes.
Our results highlight that EVI values do not readily map on to planning and policy defined green space types because these green space types are generally not characterised by a single vegetation type.This highlights the challenge of translating vegetation indices into actionable recommendations for planners and policy makers.Given current data availability for longitudinal research, satellite data derived EVI measures have limited ability to identify hyperlocal variations ( < 300 m) in public green spaces where multiple facilities or features may be present within the vicinity (e.g., a park, roadside trees, or allotments).Our findings suggest that greenness and total greenspace are not linearly Total green space area (m2) 0.0000 (0.000, 0.000) < 0.001 related and both measures should be acknowledged distinct exposures.It is challenging to produce policy that improves complex public health issues when there is heterogeneity in the methods used defining exposure to greenness and access to green space [7].Distinguishing more clearly between greenness and access to greenspace will help researchers, policy makers and practitioners to better understand exposures and mechanisms that drive health outcomes.A further implication of this finding is that multi-exposure models should be implemented to better understand the cumulative impact of different aspects of nature on health outcomes (i.e.household greenness and local neighbourhood access to green spaces).A final implication for future research is that our study highlights the need to investigate more detailed features of green spaces including manmade features (e.g.footpaths, kiosks, toilets).Estimating green space exposure using satellite imagery was challenging because Wales does not experience many cloud-free days; even fewer when considering the cycle of a satellite recording data.As such we adopted a flexible approach to estimating EVI (i.e., we used different Landsat sensors to enable EVI measures to be calculated throughout the study period).We acknowledge that finer-resolution satellite data may yield different results, but it was not possible to obtain cloud free images for the wider study period [45,50] (2008-2019) with any other open source satellite data.We found that either the data were not recorded for the entire study period, or it was not possible to create an annual image of Wales with the data available.However, satellite schemes such as Sentinel [70] offer the potential for higher resolution data and are recorded more frequently from 2016.We also excluded land that was privately owned such as farmland.Although this rural land type potentially provides valuable opportunities for exposure (e.g. via views), we chose to reflect the potential to access a parcel of land.We acknowledge that 'accessibility' is in fact a much more complex construct dependent on multiple characteristics of spaces, individuals, communities, and transport/pedestrian networks.Our classification is necessarily pragmatic and restricted to the data available at a national scale.However, it allows a nuanced understanding of green spaces which can inform the protection, improvement, management, planning and funding of green and blues spaces.A final limitation to note was that the EVI buffers were Euclidean distances, and the access buffers were calculated from 300 m network distance.Although the buffers are not a like for like comparison in shape, at this scale, we are confident that this did not significantly impact our results.The buffers were appropriate representations of how individuals would engage with greenness (e.g., viewing green space in a straight line) and publicly and privately accessible green spaces (e.g., walking along a footpath to a park).
Our study explores associations within the hyperlocal environment (300 m).Further work should be undertaken to explore whether the relationships reported remain for other vegetation indices and commonly defined activity spaces e.g., 500 m, 800 m and 1600 m around the home environment.Future studies should also focus on qualities of green spaces and facilities within the green spaces to shed light on which modifiable aspects of green spaces should be focussed on by planners, to enable local planning authorities to consider design quality.More could be drawn from EVI as an indicator of biomass in future studies because areas of greater biomass tend to be associated with areas of greater biodiversity [71,72].This may prove particularly useful in providing evidence to support health policy as evidence suggests that biodiversity may support pathways linked with positive health outcomes [73].
Satellite-derived measures such as EVI offer the opportunity to calculate objective and uniformly measured exposures of exposure to green space.There are many advantages of satellitederived green space exposures, and currently these are the only feasible option for studies investigating large spatial and temporal scales.However, differences between EVI values do not necessarily reflect greater or lesser access, or different types of publicly accessible green space, nor capture greenspace signatures in three dimensions.Our results suggest that when characterising the hyperlocal green space environment, exposure to greenness and access to green spaces are distinct features of the environments that we live, work and play in.When investigating the impact of exposure to green spaces on health outcomes, particularly understanding mechanisms that rely on using a green space, satellite-derived measures should be supplemented with alternative data sources such as administrative and crowd-sources data to characterise green space boundaries and the facilities within them.

Fig. 2
Fig. 2 Methodological steps to calculate household level EVI. a represents the raw satellite data which contains 30m × 30m grid squares.b represents the processed satellite data.Each grid square has been assigned an EVI value.c shows the EVI values overlain with a 300m circular buffer around a household location.

Fig. 1
Fig. 1 Map of Wales including urban and rural regions.The purple regions with white lines indicate urban areas in Wales.The dark green regions represent rural areas.Wales shares a boarder with England, which is represented in a light green colour.

Fig. 3
Fig. 3 Mean EVI scores per household for Wales in 2018.Low EVI values are represented with purples and blues.Higher EVI values are represented with greens and yellows.The total range of EVI is 0-1.

Table 1 .
Descriptive statistics for Enhanced Vegetation Index and total green space area (m 2 ) within 300 m of household locations in Wales and average garden area (m 2 ).

Table 3 .
Regression coefficients for Enhanced Vegetation Index and average garden size (m2) for Wales and stratified by urban/rural status.

Table 2 .
Regression coefficients for Enhanced Vegetation Index and total green space area (m2) for Wales and stratified by urban/rural status.

Table 4 .
Pearson correlation coefficient for Enhanced Vegetation Index and green space types with 95% confidence intervals.