Air pollution exposure disparities in US public housing developments

Fine particulate matter 2.5 microns or less in diameter (PM2.5) is widely recognized to be a major public health concern. While ethnic/racial minority and lower socioeconomic status individuals in the US experience higher PM2.5 exposure, previous research on social disparities in PM2.5 exposure has not examined residents of federally-assisted public housing developments (PHDs). Here we present the first national-scale analysis of the relationship between outdoor PM2.5 exposure and PHD residency in the US, as well as exposure disparities within the population of households residing in PHDs. We integrated data on average annual PM2.5 concentrations (2011–2015) with US Department of Housing and Urban Development data on PHDs (2015), and socio-demographic information from the 2011–2015 American Community Survey. Results from multivariable generalized estimating equations indicated that PHD locations, units, and residents are significantly overrepresented in neighborhoods with greater PM2.5 exposure, after accounting for clustering, urbanization, and other socio-demographic factors. Additionally, significantly higher percentages of Black, Hispanic, disabled, and extremely low-income households reside in PHDs with greater PM2.5 exposure. Findings represent an important starting point for future research and emphasize the urgent need to identify gaps in environmental, public health, and housing policies that contribute to disproportionate air pollution exposures among PHD residents.

www.nature.com/scientificreports/ living in PHDs 24 , more systematic research is urgently needed to determine if PHDs and their residents are disproportionately exposed to higher air pollution levels and specifically, higher concentrations of PM 2.5 .
Our study responds to this need by presenting the first national-scale assessment of the relationship between outdoor exposure to fine particulate air pollution and PHD residency, as well as exposure disparities within the population of households residing in PHDs, in the US. This research has two related objectives. First, we seek to determine whether PHD locations, units, and residents, respectively, are significantly overrepresented in neighborhoods burdened by higher outdoor PM 2.5 concentrations, after controlling for spatial clustering and relevant socio-demographic factors. Following this neighborhood level analysis, we investigate ethnic/racial and socioeconomic inequalities within PHDs. This second objective thus focuses on determining whether outdoor PM 2.5 concentrations are significantly greater around locations of PHDs where higher percentages of socially disadvantaged households reside. We conduct analyses at the census tract-level in counties containing public housing to address our first research objective and at the PHD-level to address our second objective, with the continental US (lower 48 states and Washington DC) representing our study area.
For this study, we combined high-resolution data on the average annual concentrations of ambient PM 2.5 developed by the Center for Air, Climate and Energy Solutions 25 with US Department of Housing and Urban Development (HUD) data 26 for PHDs, and census tract-level socio-demographic information from the American Community Survey (ACS). Our statistical analyses are based on multivariable generalized estimating equations (GEEs) that account for spatial clustering of census tracts and locations of PHDs, respectively.

Results
The spatial distribution of PM 2.5 exposure and PHD locations in the continental US are depicted in Fig. 1. This map groups census tracts in counties with public housing into four quartiles based on the 2011-2015 annual average PM 2.5 concentrations. A large percentage of tracts in the highest quartile (top 25%) of PM 2.5 exposure are concentrated within metropolitan areas of Illinois, Pennsylvania, Texas, and California. Tracts in the lowest quartile (bottom 25%), in contrast, are dispersed across states of the US Midwest and within Florida.
Before conducting multivariable analysis, we examined the distribution of PHDs across the four quartiles of PM 2.5 exposure shown in Fig. 1. For this purpose, we first identified the PHDs located within census tracts in each of the four quartiles of PM 2.5 concentration. The number of housing units and residents in each of these four groups of PHDs were then summed to derive the PHD unit and population totals for each PM 2.5 concentration quartile. Figure 2 depicts the number of units and people residing in PHDs within each quartile (as a percentage of each total) and compares them to the corresponding percentages of all housing units and total population in each quartile of PM 2.5 exposure. About 25% of all housing units and 24% of the total population in the continental US reside within tracts in the lowest quartile (Q1) of PM 2.5 exposure. However, only about 13% of PHD   Table 1. Results from GEEs summarizing the association between tract-level PM 2.5 concentrations and PHD indicators (presence, units, and population) are presented in Table 2. After controlling for spatial clustering of tracts and socio-demographic factors known to be associated with PM 2.5 exposure in the US (see Methods for further details), the annual average PM 2.5 concentration indicates a statistically significant and positive relationship (p < 0.05) with the presence of PHDs (Model A), percentage of housing units that are in PHDs (Model B), and the percentage of population residing in PHDs (Model C). With regards to the socio-demographic, urbanization, and housing variables used as controls, all three GEE models indicate similar statistical associations. Specifically, PM 2.5 concentrations are significantly greater (p < 0.05) in tracts characterized by higher percentages of non-Hispanic Black residents, people with disabilities, female-headed households, and individuals below poverty, but lower in tracts with higher percentages of people aged 62 or more years and single-family homes. Percent below poverty level also indicates a positive association with PM 2.5 concentration  www.nature.com/scientificreports/ (p < 0.10), but this relationship was not significant (p > 0.05). Tracts in both metropolitan and micropolitan areas are exposed to significantly higher average PM 2.5 concentrations (p < 0.001) than those in rural areas. The names, definitions, and summary statistics for all variables used in our PHD-level multivariable GEE are listed in Table 3. Results from the GEE summarizing the association between annual average PM 2.5 concentrations and socio-demographic characteristics of households residing in public housing are presented in Table 4. After controlling for spatial clustering of PHDs based on their city of location, we found positive and significant relationships (p < 0.05) with the annual average PM 2.5 concentration for the percentages of PHD households headed by Hispanic and non-Hispanic Black residents, as well as the percentages of people with disabilities and extremely low-income households. The average PM 2.5 concentration is positively related to the percentages of PHD households headed by females and those aged 62 or more years (p < 0.10) , but not significantly (p > 0.05). The average PM 2.5 concentration is also significantly higher around PHDs located in metropolitan and micropolitan tracts, and in tracts with a higher percentage of single-family homes (p < 0.01).

Discussion
This study sought to extend research on social disparities in the distribution of air pollution by focusing on exposure to fine particulate air pollution for federally subsidized PHDs in the US. Our first research objective focused on determining whether PHD locations, units, and residents are significantly overrepresented in neighborhoods burdened by significantly greater exposure to outdoor PM 2.5 . Our analysis indicated that the presence of PHDs,  www.nature.com/scientificreports/ as well as the overall percentages of both PHD units and residents are significantly greater in neighborhoods with greater annual average PM 2.5 concentrations, after accounting for spatial clustering, urbanization level, and the presence of socially disadvantaged groups. This spatial concentration of PHD units and residents in more polluted census tracts cannot be explained solely by their disproportionate location in segregated neighborhoods containing significantly higher percentages of ethnic/racial minority and lower socioeconomic status residents, since these variables were used as controls in our statistical models. Our results for the non-Hispanic Black percentage align with previous studies that reported significant racial inequities in the distribution of fine particulate pollution in the US [17][18][19][20][21][27][28][29] . For disability status, our results are consistent with studies that found a significantly higher percentage of people with disabilities to live near pollution sources 30,31 and children with disabilities to reside in US school districts with greater air toxics exposure 32 . Although individuals aged 62 or more years were significantly underrepresented in neighborhoods with greater PM 2.5 exposure, these results are similar to previous studies that found a higher proportion of elderly residents in areas with lower air pollution exposure 33,34 .
The second research objective focused on investigating whether PM 2.5 concentration levels around PHDs were greater for those characterized by higher percentages of socially disadvantaged households. Multivariate GEEs revealed that PHDs with greater PM 2.5 exposure contained significantly higher percentages of non-Hispanic Black, Hispanic, disabled, and extremely low-income households, after controlling for spatial clustering, urbanization level, and relevant characteristics of their host neighborhoods. The disproportionately large concentration of minority, disabled, and economically disadvantaged residents in PHDs most exposed to higher levels of outdoor PM 2.5 pollution point to environmental and social injustices that go beyond our previous research finding regarding the significantly higher presence of PHD units and residents in more polluted neighborhoods.
Public housing in the US has been associated with poorer health outcomes, and the negative health experiences of PHD residents compared to non-PHD residents are well-documented [35][36][37] . Residents of federally assisted rental housing experience higher rates of asthma and other chronic diseases, and have lower access to health-promoting resources such as safe exercise spaces and healthy food outlets [38][39][40][41] . Evidence also indicates that health risks, such as pests, mold, combustion-related toxins, and inadequate ventilation, are considerably greater in subsidized rental housing developments compared to other rental housing 37,42 . These findings suggest that PHD residency in areas exposed to higher levels of outdoor air pollution could plausibly contribute to or worsen health outcomes for this marginalized group. In this context, the results of our study suggest that many PHD residents in the US are at a 'multiple jeopardy' 43,44 based on the convergence of their minority, disability, and poverty status, that is potentially amplified by the adverse effects of higher PM 2.5 exposure at both their neighborhood and housing development locations.
While this study contributes important findings on exposure to air pollution for PHD residents, it is important to consider three limitations. First, our analyses are cross-sectional and cannot be used to explain the processes that led to the observed social disparities. Since longitudinal data was not used, the results cannot be used to infer the sequence of events that caused increased PM 2.5 exposure for neighborhoods containing a significantly higher proportion of PHDs or PHDs with a higher proportion of socially disadvantaged households. More research is necessary to analyze policy decisions, as well as changing patterns of residential segregation, industrial and economic development, racial/ethnic migration, and other socioeconomic processes that are potentially responsible for the concentration of PHDs in areas exposed to greater PM 2.5 . However, our findings represent an important starting point for more detailed longitudinal analysis of the relationship between air pollution exposure and PHD residency.
Second, our study focuses specifically on federally subsidized rental housing developments associated with HUD's public housing program, which included more residents (2.24 million) than any other type of HUDassisted multifamily housing in 2015. Future research should consider extending this work to analyze PM 2.5 exposure for properties affiliated with other HUD programs such as Section 202 (supportive housing for the www.nature.com/scientificreports/ elderly), Section 811 (supportive housing for persons with disabilities), and Section 8 project-based rental assistance, as well as other housing subsidy programs such as Community Development Block Grants, HOPE, and Indian Housing. We also did not examine subsidized rental housing affiliated with Section 8 Housing Choice Vouchers (HCVs), a program used by tenants to subsidize their rents in the private market housing of their choice. Although locations of households using Section 8 HCVs are not available in the HUD database utilized for this study, it is important to consider that more than 50% of residents receiving rental assistance from HUD used vouchers in the private housing market in 2015. Future research should explore additional or local data sources that can be used to evaluate air pollution exposure disparities for residents of tenant-based programs such as Section 8 HCVs. Third, our study analyzed disparities associated with outdoor exposure to air pollution, but did not examine indoor air quality problems faced by public housing residents. A major impediment here is data availability since indoor air pollution estimates for PHDs are not currently available in the US. However, both outdoor and indoor pollution sources have been found to worsen indoor air quality in multifamily rental housing 45,46 . Residents of PHDs could be additionally burdened because households of lower socioeconomic status tend to spend more time at home due to lower employment levels 46 . Additionally, higher housing density in PHDs could lead to increased pollution exposure from neighboring housing units 46,47 . Recent research indicates while both housing characteristics (e.g., building quality, volume, and ventilation) and occupant behavior (e.g., smoking and cooking practices) contribute to indoor air pollution exposure, these risks vary across socioeconomic groups, with lower income households (e.g., those in PHDs) exposed to higher levels of indoor air pollution 35,46,47 .
In conclusion, our study reveals that public housing developments, units, and residents in the continental US are disproportionately located in neighborhoods with significantly higher outdoor PM 2.5 concentrations. Within the population residing in PHDs, we also found that Black, Hispanic, disabled, and extremely low-income households are more likely to reside in developments around which PM 2.5 exposure is significantly higher. Although this study did not examine PM 2.5 pollution sources, our results could be potentially linked to previous assertions that a confluence of historical policies and practices have encouraged the construction of public housing in areas of environmental contamination and also allowed pollution sources such as hazardous industries to locate near these housing developments 23,24 . The findings of our study emphasize the growing need to analyze and reverse these patterns, as well as identify and address gaps in environmental, public health, and housing policies that place a significantly higher risk burden on public housing inhabitants.

Materials and methods
Fine particulate air pollution data. For assessing exposure to fine particulate air pollution, we use estimates of outdoor PM 2.5 concentrations developed by the Center for Air, Climate and Energy Solutions (CACES), using national-scale empirical models as described in Kim et al. 25 . These models are based on publicly available concentration measurements from EPA regulatory monitors, and combine land use information and satellitederived estimates of air pollution to predict concentrations for criteria air pollutants at multiple spatial scales, including census tracts and block groups. Census tracts are relatively permanent statistical subdivisions of a US county that generally encompass between 1500 and 8000 people, with an optimum size of 4000 people 48 . Tracts are designed to be fairly homogeneous with respect to the demographic and socioeconomic characteristics of the local population when they are first established. A census block group is a subdivision of a census tract that generally contains between 600 and 3000 people, with an optimum size of 1500 people 48 .
For this study, both census tract and block group level values of predicted PM 2.5 concentrations were downloaded from the CACES Land Use Regression database for 2011 to 2015 (latest years available). While this database also provides ambient concentrations of other air pollutants (carbon monoxide, ozone, nitrogen dioxide, sulfur dioxide, and PM 10 ), we focused specifically on PM 2.5 because of its well-documented adverse health outcomes 1-10 and higher exposure levels for socially disadvantaged groups in the US [13][14][15][16][17][18][19][20][21] .
For each census tract and block group in the continental US, the five-year average annual PM 2.5 concentration was calculated as the arithmetic mean of five annual concentration values, from 2011 to 2015. This information was used to develop separate dependent variables to address each of our two research objectives. Tract-level mean values of PM 2.5 concentration (2011-2015) were used for the tract-level analysis. For the PHD-level analysis, the average annual PM 2.5 concentration (2011-2015) around each housing development was estimated using values from the block group in which their geographic coordinates were located. Specifically, we utilized a spatial join function in ArcGIS Desktop 10.8.1 software that allocates selected attributes from a target polygon feature (i.e., PM 2.5 concentration from a block group) to all point features located within the boundary of the target polygon (i.e., PHDs located inside or on the block group boundary).
Public housing data. Since the US Housing Act of 1937, the US federal government has provided housing assistance to low-income renters under programs administered by the US Department of Housing and Urban Development (HUD). These HUD programs provide subsidies that reduce rents for lower income tenants who meet specific program eligibility requirements. Generally, households pay rent equaling 30% of their incomes, while the federal government pays the remainder of the rental costs. Public housing, the focus of this study, includes housing developments owned and operated by local Public Housing Authorities for which HUD provides capital and operating assistance 49 . In 2015, the total number of people residing in PHDs was about 2.24 million-higher than those living in any other type of HUD-assisted multifamily project-based rental housing.
Information on federally subsidized public housing for this study was derived from the online HUD Picture of Subsidized Households 26  It should be noted that PHDs in the HUD database used for this study are depicted as a distinct address that is chosen to represent the general location of an entire PHD. When a PHD comprises multiple buildings, the building with the largest number of units is selected to represent the location of the development.
In the Picture of Subsidized Households database, HUD suppresses data for count variables with values smaller than 11 (coded as − 4) to preserve confidentiality and data for specific variables are not reported (coded as − 5) when reporting rates for the PHD are smaller than 50%. To address suppressed data on the number of units and resident population in 311 of the 7,000 PHDs in the continental US, we substituted a value of 1 to replace values of − 4 and − 5. This is likely to undercount the actual number of units and residents in these 311 PHDs, but provides a conservative non-zero estimate for our census tract-level analysis 50 . Variables. The first phase of our study sought to examine whether PHD locations, units, and residents, respectively, are overrepresented in census tracts with greater PM 2.5 exposure, after controlling for spatial clustering and relevant socio-demographic factors. This necessitated the construction of three different independent variables to assess the public housing status of each tract. The first variable indicated the presence or absence of PHDs in the tract, a dichotomous measure coded as either 1 or 0. The second and third variables are continuous and estimate the number of PHD units and residents, respectively, expressed as a percentage of the total number of all housing units and population in each tract from the 2015 American Community Survey (ACS) five-year estimates. We excluded tracts with low population (< 500) and housing (< 200) counts to ensure stable proportions for our variables, as well as tracts located in counties without PHDs. This led to the removal of 11,146 (15.4%) of 72,263 tracts in the continental US. The first phase of our analysis thus encompasses a total of 61,117 tracts with at least 500 people and 200 housing units, from the 1,892 counties that contained PHDs in 2015.
To control for tract-level socio-demographic characteristics, we used variables from the 2015 ACS fiveyear estimates that have been found to be significantly related to PM 2.5 exposure in previous national-scale US studies 14,15,[17][18][19]21,27 and are comparable to variables available in the HUD dataset on public housing utilized for PHD-level analysis. To examine the effects of ethnicity and race, we included the percentages of individuals who identified themselves as Hispanic or Latino (of any race) and non-Hispanic Black alone-the two largest ethnic/ racial minority groups in the US. We combined the percentages of non-Hispanic American Indian/Alaska Native alone, non-Hispanic Asian alone, non-Hispanic Pacific Islander/Native Hawaiian alone, non-Hispanic another race, and multiple races into a single category that represents non-Hispanic other non-White (minority) individuals. We excluded the percentage of non-Hispanic White residents from our multivariable models to ensure that results for the minority ethnic/racial groups are interpretable relative to this category. The percentage of people with an annual income below the poverty level was used to represent socioeconomic status. Additional measures of social disadvantage comprised the percentage of people with a disability, those aged 62 or more years, and female-headed households. The disability variable in the ACS encompasses civilian non-institutionalized individuals reporting at least one of these difficulties: hearing, vision, cognitive, ambulatory, self-care, and independent living. In addition to these socio-demographic characteristics, three control variables were included in our multivariable models. To account for urban-rural differences, we used Rural-Urban Commuting Area (RUCA) codes developed by the US Department of Agriculture that classify census tracts using measures of population density, urbanization, and daily commuting 51 . We created two dummy variables to identify tracts located in metropolitan (RUCA codes 1-3) and micropolitan areas (RUCA codes 4-6), respectively. We excluded RUCA codes 7-10 (small towns and rural areas) from our models, making them the interpretive reference for variables representing the metropolitan and micropolitan categories. Since PHDs are classified as multifamily housing, we included the percentage of housing units in the tract that are single-family detached homes to adjust for the prevalence of non-PHD multifamily housing.
The second phase of our study focused on determining if the average PM 2.5 concentration (2011-2015) in census block groups hosting PHDs was greater for PHDs containing higher percentages of socially disadvantaged households. This phase encompasses a total of 6,685 PHDs with at least 11 residents for which detailed sociodemographic data on resident household characteristics were available (no missing data for included variables) in the HUD Picture of Subsidized Households database for December 31, 2015 13 . For ethnic/racial minority status, we included the percentages of households where the head of the household was Hispanic and non-Hispanic Black. Information on the percentage of households belonging to other non-Hispanic minority categories (e.g., non-Hispanic Asian, Native American, and Other Race) were combined into a single variable. Our analysis included three additional indicators of social disadvantage that align closely with the tract-level independent variables used in Phase 1: the percentage of female-headed households, those in which the eldest of the household head or spouse was aged 62 or older, and residents with a disability. To measure socioeconomic status, we used the percentage of extremely low-income households (HUD definition), as described in Table 3. We also used the same set of control variables utilized for our tract-level analysis in the previous phase: metropolitan or micropolitan status of the tract hosting the PHD based on RUCA codes, and percentage of single-family housing units to adjust for the prevalence of non-PHD multifamily housing.
Statistical models. For the first phase of analysis, three multivariable models were constructed to independently examine the association of the presence of PHDs, percentage of PHD units, and percentage of population residing in PHD units, respectively, with average tract-level PM 2.5 concentrations, after controlling for relevant socio-demographic variables and spatial clustering. In the second phase, a single multivariable model was constructed to examine the relationship between average PM 2.5 concentrations in the census block group hosting the PHD and selected socio-demographic characteristics of PHDs, after controlling for a set of variables similar www.nature.com/scientificreports/ to those used in Phase 1. Our multivariable models for both phases are based on generalized estimating equations (GEEs) with robust (i.e., Huber/White) covariance estimates, which extend the generalized linear model to accommodate clustered data 52 . GEEs relax several assumptions of traditional regression models and impose no strict distributional assumptions for the variables analyzed (e.g., normality), while accounting for clustering of analytic units. Recent studies indicate that GEEs based on theoretically informed clusters are more advantageous for analyzing social disparities in the distribution of environmental hazards compared to other techniques such as spatial autoregressive models, mixed models with random effects, and multilevel modeling approaches 17,50,53 .
To estimate a GEE model, clusters of observations are defined based on the assumption that observations from within a cluster are correlated, whereas observations from different clusters are independent 54 . The cluster definition for the tract-level analysis was based on the US county within which the tract is located by median decade of housing construction, which ranged from "2010 or later" to "before 1939". This definition corresponds to historical-geographical formation of environmental inequalities 55 and is consistent with previous national-scale US studies on social disparities in air pollution 17,56 . This combination of county (1,892) and median decade of housing stock (10) for each tract yielded a total of 7,492 clusters for the tract-level GEEs. Clusters were defined for the PHD-level analysis using the city in which the building was located, since federally subsidized rental housing within a particular city can be expected have to have certain similarities compared to those located elsewhere 46 . This resulted in a total of 3337 clusters of PHDs in the PHD-level GEE model. An intra-cluster dependency correlation matrix also needs to be specified for estimating a GEE 54 . After experimentation with several different specifications, the 'independent' correlation matrix was chosen for the three GEEs in Phase 1 and the 'exchangeable' correlation matrix was selected for the GEE in Phase 2. We selected these specifications because they yielded the best statistical fit based on the QIC (quasi-likelihood under the independence model criterion). To further improve model fit, several different distributions (e.g., normal, gamma, and inverse Gaussian) appropriate for non-zero continuous dependent variables were tested with identity and logarithmic link functions. An identity link function models relationships between the dependent and independent variables linearly, while a log link function represents natural logarithmic relationships. The gamma distribution with identity link function was chosen for all GEEs because this function indicated the best fit based on the QIC.
All independent variables were standardized, and standardized coefficients are provided in tables summarizing the GEE results. Two-tailed p values from the Wald Chi-Square test were used to test the statistical significance of each variable coefficient. Potential multicollinearity among these variables were also examined using the multicollinearity condition index for the combination of all independent variables included in each GEE. This multicollinearity condition index was smaller than 10.0 in all multivariable models, confirming that the GEEs are not affected by multicollinearity. We also tested the residuals from our four GEE models for spatial autocorrelation using a contiguity-based (queen) spatial weights matrix. Non-significant (p > 0.05) values of the Moran's I statistic indicated that spatial autocorrelation did not affect our statistical model results.

Data availability
Publicly available data from the CACES, HUD, and ACS were used for this study, as described in the Materials and methods section of this paper. Additional data related to this paper may be requested from the authors.