Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Exposure indices for the National Children’s Study: application to inhalation exposures in Queens County, NY


Characterization of environmental exposures to population subgroups within the National Children’s Study (NCS), or other large-scale human environmental health studies is essential for developing a high-quality data platform for subsequent investigations. A computational formulation utilizing the tiered exposure ranking framework is presented for calculating inhalation exposure indices (EIs) for population subgroups. This formulation employs a probabilistic approach and combines information from diverse, publicly available exposure-relevant databases and information on biological mechanisms, for ranking study locations or population subgroups with respect to potential for specific end point-related environmental exposures. These EIs capture and summarize, within a set of numerical values/ranges, complex distributions of potential exposures to multiple airborne contaminants. These estimates capture spatial and demographic variability within each study segment, and allow for the relative comparison of study locations based on different statistical metrics of exposures. The EI formulation was applied to characterize and rank segments within Queens County, NY, which is one of the Vanguard centers for the NCS. Inhalation EI estimates relevant to respiratory outcomes, and potentially to pregnancy outcomes (low birth weight and preterm birth rates) were calculated at the study segment level. Results indicate that there is substantial variability across the study segments in Queens County, NY, and within segments, and showed an exposure gradient across the study segments that can help guide and target environmental and personal exposure sampling efforts in this county. The results also serve as an example application of the EI for use in other exposure and outcome studies.


The National Children’s Study (NCS) is a large-scale, longitudinal birth cohort study focusing on the relationships between environmental exposures and genetics on growth, development and health. One of the goals of the NCS is the development of a high-quality data platform encompassing a large number of environmental, biological, and health outcome-related factors for use in assessing exposure–outcome relationships relevant to children’s health. The wide range of extant data on environmental, behavioral, socio-economic, and biological factors, and so on, collected by federal, state, and local agencies may provide information that can support the development of such a high-quality data platform. In some cases, extant data may also offer feasible, cost-effective alternatives to study-specific data-collection efforts within the NCS study or a similarly designed study.

Exposure characterization within the NCS can be enhanced by incorporating diverse types of extant data, but is complicated because of various interacting exposure-modifying factors that are highly variable in both space and time. These factors include physiological, biochemical, environmental, and behavioral factors, such as genetic background, gender, developmental period of exposure, diet, stress, and so on, that may enhance or mitigate the effects of exposures to contaminants, over time. Additionally, characterization of exposures requires the systematic retrieval, analysis, and integration of diverse types of information from a wide variety of sources. The Tiered Exposure Ranking framework (TiER)1 provides a flexible, “multi-Tier”, framework designed to address these issues both within the context of the NCS efforts and overall exposure rankings in general.

The TiER framework allows for either “discovery-driven” or “hypothesis-driven” analysis of extant data in a tiered manner.1 The “Tier 1”, discovery-driven approaches focus on multivariate exploratory analyses, whose outcomes can be combined with available mechanistic knowledge relevant to the particular phenomena that are studied, to target data-collection efforts and to formulate hypotheses regarding these phenomena.1 “Tier 2” characterization involves using extant data in conjunction with exposure modeling using systems such as PRoTEGE (Prioritization/Ranking of Toxic Exposures with GIS Extension)2 and MENTOR (Modeling Environment for Total Risk).3, 4, 5, 6, 7 “Tier 2” characterization allows for simple numerical or distribution-based characterizations and rankings of ambient exposures associated with particular locations (at the county, segment, or even finer level), and for summarizing them via exposure indices (EIs).

The EIs provide simplified metrics of potential cumulative and aggregate exposures (and intakes) and facilitate ranking of different locations in the context of exposures of concern related to specific end point(s) of interest. It must be noted that the potential exposures are different from “actual exposures”, which are driven by individual- or household-specific behavioral, activity, biological, and other time-dependent exposure factors. Therefore, actual exposures can be quantified only after individual specific information is obtained by personal or microenvironmental measurements. In the absence of subject-specific information, analysis of extant data provides a starting point for comparatively evaluating different locations with respect to potential exposures, in the form of location-based EIs, which are represented in terms of probability distributions that reflect uncertainties associated with lack of subject-specific information. The main utility of these location-based EIs is in: (a) identifying areas that show substantial differences in potential environmental exposures, which can provide information useful in targeting environmental sampling studies, and (b) aggregating groups of similar segments or locations (based on EI values) across the NCS for cross county evaluation of potential exposure–response relationships.

This study presents the formulation of inhalation EIs relevant to respiratory outcomes, and potentially to pregnancy outcomes (low birth weight and preterm birth rates). As an example, these are developed at the NCS study segment level for Queens County, NY. The inhalation-specific EI formulation incorporates information on ambient concentration fields, reference concentrations for respiratory effects, and demographics. The following sections describe the development of the EI formulation and the application to Queens County, NY. The application demonstrates how extant exposure-relevant data can be incorporated into the NCS to understand differences and similarities in potential exposures across multiple locations; this information can be used to focus field microenvironmental and personal monitoring, which will reduce exposure misclassification.


General Formulation of the EIs

The general formulation of EIs for a specific end point of interest j, can be represented in terms of the exposure index vector () for a set of specific study areas. An element Eij of this vector represents the estimates for a specific study area, i, which can be tailored to a specific exposure characterization study. For example, the study area can be a county (which is applicable for ranking exposure relevant factors at different counties), zip code(s), census tract(s), census block(s), and so on. In some cases, one study area can encompass a group of disconnected subareas that share similar attributes (e.g., rural counties versus urban counties). Within the current analysis, the EI calculations focus on a typical NCS county, NCS study segments, or a subarea within a segment.

The EI vector is dependent on multiple environmental, demographic, and behavioral attributes. An inhalation-specific EI formulation can be represented in a simplified manner in terms of these factors via Eq. (1).

where, represents ambient concentration fields, represents microenvironmental fields, represents exposure-activity pattern fields, represents biological factors, variable s represents the spatial coordinates, and t denotes time. The generalized “weighting function” gjk is typically contaminant specific and can also be population specific.

Ambient concentration fields are typically calculated via a combination of the following methods: (a) spatiotemporal interpolation of ambient air quality monitoring data, (b) prognostic air quality modeling, and (c) Bayesian fusion of model outputs and monitor data. Subsequently, these terms can be corrected for local neighborhood scale effects.6 Although the above approach is appropriate for air pollutants, cases of multiroute exposures to multimedia pollutants (e.g., pesticides and metals) requires additional characterization: the concentration fields in these cases must be developed for multiple media and pathways (food, drinking water, surface dust, and ambient air) using predictive models, as well as estimates from statistical/probabilistic modeling and analyses of available data, including those from local-scale field measurements and surveys.

Microenvironmental fields are estimated via integration of study-specific data and supporting databases on population demographics, housing types and ages, commuting patterns, and so on, and by utilizing ambient concentration fields, as required. Typically these fields are estimated using microenvironmental modifiers applied to ambient air concentrations, and contributions of indoor sources and sinks.

Exposure-activity pattern fields are typically developed from study-specific data and/or aggregation of representative time-activity diary information, databases of travel and commuting profiles, and data on usage of different consumer products.

Biological factors are estimated from study-specific or representative population distributions of age, gender, ethnicity, and databases of physiological and biochemical properties.8, 9

Estimation of EIs for Inhalation Exposures at an NCS Study Area Level

The overall approach for developing inhalation EIs for characterizing/ranking exposures in the NCS Queens County involved: (a) retrieval and aggregation of inhalation exposure relevant data from extant national, regional, and local sources; (b) application of the EI formulations, employing screening level modules from the PRoTEGE system (Figure 1; Georgopoulos et al.2), using as inputs distributions of environmental and demographic parameters. A review of estimated EI values helps in assessing whether significantly sharp or unrealistic variations in estimated potential exposures across segments occur due to real differences in environmental quality or due to confounding factors such as sharp differences in population densities across segments. These calculations are expected to subsequently utilize site-specific measurement data for participants from various study centers along with detailed exposure modules.

Figure 1

The exposure index formulation takes advantage of the PRoTEGE framework2; PRoTEGE utilizes simplified components of the screening level PRoTEGE system and the more comprehensive MENTOR (Modeling Environment for Total Risk) system. It provides a screening modeling platform employing extant data and screening modules to examine human exposures associated with environmental toxics.

Estimation of inhalation EIs relevant to birth outcomes

The Queens case study presented here focuses on the development of EIs relevant to pregnancy outcomes, such as low birth weight and preterm birth.10, 11 The NCS study segments within Queens County, NY, are shown in Figure 2, denoted by arbitrary segment designation numbers and masked boundaries, as per Lioy et al.12 Examples of corresponding national and regional datasets that were utilized in the development of EIs for characterizing exposures at segment levels are shown in Figure 3. The selection of specific environmental contaminants was guided based on the published literature showing relationships between exposures to and adverse birth outcomes (Figure 3).10, 11, 13, 14, 15 One of the major biological pathways associated with preterm birth that can be influenced by environmental exposures is inflammation as shown schematically in Figure 4, adapted from Behrman and Butler.16 Inhalation exposures relevant to this pathway are selected for study in the Queens NCS Center case study presented here, because adequate data are available for characterizing these exposures, and the exposure characterization process remains tractable. If we had included other pathways, such as endothelial cell dysfunction, infection, and so on, this would have increased the complexity of the problem, and the problem would have been underspecified as these are driven primarily by subject-specific characteristics. Thus, general conclusions would have been very difficult because of the need for multiple variables associated with birth outcomes. Therefore, the analysis presented here is exploratory as it focused only on inhalation exposures that could cause inflammation via respiratory effects. Rankings based on these EI estimates should be interpreted in relation to environmental chemical exposures causing inflammation, with an understanding that this pathway is only one of many known pathways for causing adverse birth outcomes.

Figure 2

Map showing the NCS (National Children’s Study) Queens Study area with approximate locations of the Queens Segments; arbitrary numbers replace actual segment numbers, and the size of the segment is also arbitrary as per Lioy et al.19

Figure 3

Examples of representative datasets at different levels used in the EXIS for the estimation of exposure indices. Top left panel shows the population distributions of women of child bearing age (Census data), top right panel shows the annual average ambient concentration of toluene in Queens, bottom left panel shows traffic density, and bottom right panel shows the pesticide usage data at the state level.

Figure 4

Common biological pathways associated with preterm birth (chart adapted from Behrman and Butler).21 This study focuses on inflammation due to ambient exposures to environmental chemicals.

The formulation for the inhalation EIs builds on the information from a large set of existing environmental indices that have typically focused on a limited set of contaminants. Although the concept of an “index” for representing ambient levels of complex mixtures of environmental contaminants has been widely employed, the indices published in the literature have been primarily limited to general characterization of environmental quality in relation to regulatory limits and fixed reference concentrations in relation to health effects. For example, the selection of contaminants in typical air quality or general environmental index formulations is not based on specific end points of interest; they often include only the major contaminants that are monitored, and they do not incorporate variability in demographic factors, inter-individual variability, and so on. Supplementary Table S.1 presents a summary of major environmental indices used predominantly in environmental quality analysis. A key element for multi-chemical EIs is the incorporation of reference concentrations for developing the relative weights of the pollutants. These are developed by considering available reference concentrations or doses associated with the specific health outcome (or the general biological mode of action) considered. These weights can be updated based on new scientific findings and can also be varied as part of sensitivity or uncertainty analysis of the indices with respect to the reference concentrations. The approach pursued in this EI application is unique because it brings together for the first time, a large number of contaminants that are considered important for inflammation processes, and presents a formulation that can be generalized to other exposure routes and health end points. Other aspects of the EI formulation include consideration of specific subpopulations (women of childbearing age using census block level demographic data), variation in physiological characteristics (via probabilistic modeling based on exposure factor distributions from representative age- and gender-specific distributions), and description of fine-scale variations for locations within a county (using data from monitors and model estimates of concentrations).

The inhalation EIs were estimated using information on concentrations of multiple air pollutants (criteria air pollutants and air toxics) that are relevant to respiratory effects. Equation 2 presents a simplified representation of the exposure index calculation:

where Einhalation,i,p, inhalation exposure index of segment i in relation to end point p; i, geographic region of concern (e.g., segment or county); p, end point of concern (e.g., respiratory effects); , average concentration of the air toxic j in area i; Ci,k(t), average concentration of criteria pollutant k in area i; τ1 and τ2, start and end of averaging period for concentrations of criteria pollutant k in area i; ωtoxics,p, relative weight of air toxics in relation to end point p; ωcrit,p, relative weight of criteria pollutants considered in relation to end point p; ωpop,i,p, target population weight of segment i in relation to end point p (e.g., fraction women of child bearing age); Ntoxics,p, number of air toxics considered in relation to end point p; Ncrit,p, number of criteria pollutants considered in relation to end point p; RfCj,p, reference concentration for air toxic j in relation to end point p; RpCk,p (τ1, τ2), a reference concentration for criteria pollutant k in relation to end point p for averaging period between τ1 and τ2.

Four criteria pollutants (O3, PM2.5, NO2, and SO2) and 42 air toxics were considered in this formulation. The reference concentrations for respiratory effects for each of the pollutants were based on the information used in EPA’s NATA assessments and ambient air quality standards; these are listed in Supplementary Table S.2. The group of criteria pollutants and air toxics were equally weighted in this study. The formulation of equation (2) is a simplification of equation (1) and omits individual-specific activity information. The rationale is that when study- and subject-specific information on activity patterns, residence characteristics, and other behavioral/biological attributes is not available, locations can still be ranked based on potential exposures and exposure-related metrics. With respect to time, the averaging time used here is an annual average; this can be specified based on the outcome of interest and on the nature of dominant exposure of interest (e.g., acute versus chronic).

In general, each study area or segment of interest will have different subareas with varying pollutant levels. For example, substantial variability can exist in air pollutant levels within a county. Further, in the case of locations within a county, such variability occurs when a segment includes census blocks from multiple census tracts, which potentially have variations in environmental sources and demographics. In such situations, the term Einhalation,i,p will be different for each individual population subunit. Additionally, when a characterization of inter-individual variability is desired, the corresponding EIs are then estimated by generating a large number of statistical samples representing the populations within in each subunit. The statistical samples can then be aggregated to provide different statistical measures for the EIs. The formulation shown in equation (2) is general in nature and the approach is applicable to a wider range of end points such as asthma, neurodevelopmental effects, and so on.

For exposure-based ranking of different segments, the EI application here used annual average concentration estimates for the chemicals considered in this study, except for ozone, for which a summer season average was used. Although this approach can be used for ranking different segments, refined treatment of time periods will be required when developing estimates for specific study subjects. For example, when information on the pregnancy time period is available for different subjects, the calculations of the rankings of exposures for different subjects will include concentration terms corresponding to the time windows relevant to each individual and will use available corresponding time-specific concentration information.

The computational implementation of the exposure index formulation employed here utilizes Monte Carlo methods to adequately incorporate variabilities by the following steps:

  1. 1

    Generate a set of samples (“virtual individuals”) for each population group within each subarea of a study area (e.g., 100 samples for each person in the target population subgroup). By aggregating these samples, spatial variations in population distributions are taken into consideration.

  2. 2

    For each sampled individual, assign an inhalation rate based on representative distributions of inhalation rates for the specific age–gender combination,9 and estimate the potential inhalation and aggregate intakes. For the specific application here, the population group involved only women of childbearing age.

  3. 3

    Normalize these estimates through a reference inhalation rate (e.g., average inhalation rate for the population subgroup) and reference concentration.

These provide distributions of exposure index estimates at the resolution of the study area (segments or counties) and allow for relative comparison of segments with each other. For the Queens segments, the EI specifically focused on the inhalation exposure route and air pollutants. The underlying concentration data were derived from national monitoring data,8 local field campaigns,17 and/or modeled estimates from USEPA.18 All information was retrieved through the EXIS. Specifically, for the calculations presented here, the data include latest available NATA data (year 2005), along with monitoring data for year 2009. These were fused with demographic data from Census 2000 and field data from NYC DOHMH.17

Computational modules for estimating the EIs were coded using MATLAB and directly utilized the geographically coded datasets from the EXIS in the ESRI shapefile format. The distributions of the EIs were developed by aggregating about 200,000 Monte Carlo samples for each of the 18 NCS segments in Queens, NY. However, as the calculation procedures primarily involved algebraic computations, the large number of Monte Carlo samples was not a limitation in the estimation process.


For the Queens segments case study, the EI focused on the inhalation exposure route and related criteria air pollutants and air toxics. The EIs formulations employed were designed in a generalizable manner that can be tailored for addressing specific health end points of interest. In the present study, EIs related to pregnancy outcomes (specifically, pre-term birth and/or low birth weight) for Queens County focused on inhalation exposures to criteria air pollutants and air toxics relevant to inflammation and respiratory effects. The list of these contaminants is presented in the Supplementary Table S.2. The pollutants and exposure pathways considered in defining the inhalation exposure index formulation were selected based upon (a) literature reviews of proposed hypotheses regarding the particular health end point, (b) literature reviews of related (epidemiological and clinical) studies, and (c) supplementary exploratory multivariate analyses (of national scale data at county resolution) involving available data on the health outcome and on exposure-relevant factors and contaminant levels.1, 2, 19

The results of the EI calculations are based on the aggregation of a large set of probabilistic samples where each sample corresponds to a single estimate of an EI. When aggregated, these estimates capture, via probability distributions, the spatial variability in EIs due to variability in environmental concentrations and in physiological attributes of subpopulations of interest within each study segment.8, 9 For the application here, the factors considered were the fraction of women of childbearing age in each segment, stratified by age groups, and corresponding distributions of body weights and inhalation rates. These probability distributions provide the flexibility for ranking locations based on different statistical metrics of exposure concentrations (e.g., peak, mean, median, variance, and so on) based on the purpose of a specific hypothesis-driven study or analysis. For example, to identify segments where there is a need to perform more extensive sampling, the variance may be an appropriate metric, whereas for assessing whether two segments are similar, the entire distribution may be of interest.

The statistical summary of EI estimates for all the NCS study segments in Queens, NY are shown in the form of box plots in Figure 5. The median ranges of the normalized EI estimates ranged from 0.8 to 2.3. The EI estimates showed a significant variation across different segments. In fact the factor of three variations between the lowest and the highest EIs highlights the potential differences in environmental quality and exposures associated with the health outcomes highlighted in Figure 4 for the target population under consideration (women of child bearing age) across the segments.

Figure 5

Ambient inhalation exposure index estimates for birth outcomes for segments in Queens County; arbitrary numbers replace actual segment numbers, as per Lioy et al.19

As shown in Figure 5, there is substantial variation within each segment, reflecting the heterogeneity within each segment. The general trend of the distributions appears to be normal to lognormal with a significant range in the estimates, especially above 97.5 percentile. In this case, the estimates were above 4, indicating a longer tail for distribution of EI values. This variability can be attributed to the underlying factors, such as variability in demographics and environmental quality within each segment. For example, Segment 9856 includes census blocks covering highly industrialized areas with low residential population, and it also includes areas with large populations with fewer environmental sources. This variability is captured by the unusually large tails for the distribution.

Figure 6 shows the detailed distribution of the EI estimates for each of the 18 segments. The variation in the EI estimates captures the effects of spatial variation in concentrations as well as variation in the demographics within each segment. To easily identify the distributions, these results are presented in panels containing three segments each. An inspection of the distributions for individual segments indicated that for segments with lower median estimates, the distributions tended toward normal with relatively low variance. However, for segments that had wider distributions, the tails of the distributions were substantially long, indicating that particular attention must be paid to these long tails associated with areas with a relatively high median exposure index estimate. These distributions allow for relative comparison of locations based on different statistical metrics (e.g., percentiles) of exposure index estimates most appropriate for specific hypotheses.

Figure 6

Distributions of ambient inhalation exposure index estimates for the segments in Queens County, with each panel showing distributions for three segments. Arbitrary numbers replace actual segment numbers, as per Lioy et al.,19 and the variation across the segments is captured by the relative differences in the distributions.

It must be noted that the EI estimates are particularly sensitive to the quality of input data employed in the calculations. Although numerous statistical techniques are available for automatic fusion of data at multiple scales, particular attention must be paid to the underlying focus of the field or monitoring study associated with a specific dataset. The example here highlights the potential inconsistencies in exposure rankings based on merging of data from different sources, each having a different focus. Ambient air quality measurements from EPA AirData8 focus on estimating regional air quality, whereas local focused air quality field campaigns focus on local variability and on areas with potentially higher than usual levels of contaminants. Figure 7 shows a map for one such local field effort, the New York City Community Air Survey,17 which measured PM2.5 concentrations for 2008–2009 (Figures obtained from NYC DOHMH17). When these data are fused with AQS data through a standard application of the Bayesian maximum entropy technique,20 the resulting concentration maps of average ambient PM2.5 concentrations in Queens show substantially higher levels across the entire county, as shown in Figure 8. Inconsistencies in concentration, and consequently exposure estimates or rankings are likely because the concentrations of one criteria pollutant, among a group of four criteria pollutants considered, show an artificially high level, because of the artifact of the statistical approach used. This occurs because data from targeted, local experiments from NYCCAS, which aimed to capture high concentrations near roadways, dominated the data from monitor networks that aim to capture representative levels across a larger region.

Figure 7

Specific aspects of each local and regional dataset need to be considered when merging them with national data to “build” an EI (example of local extant data: NYCCAS 2009). Left: New York City Community Air Survey (NYCCAS) monitoring locations; Right: PM2.5 concentrations, NYC, winter 2008–2009 (figures from NYC DOHMH17).

Figure 8

Inconsistencies in the estimates of concentration fields by direct incorporation of data from multiple levels. Shown are the maps of annual average ambient PM2.5 concentrations in Queens County, NY, using the Bayesian maximum entropy approach for spatio-temporal interpolation. Left panel shows the estimates derived by using only monitoring data from ambient monitoring stations. Right panel shows the estimates derived by merging the national-level data with city-level data from targeted field studies. The discrepancies between the two highlight the need to employ relative weights for different scales and types of datasets even when they provide the same environmental metric.


There continues to be a rapid growth in publicly available exposure-relevant information, from various federal, state, and local agencies, encompassing data from direct measurements, indirect estimates, future projections, and outputs from predictive simulation models. These data can potentially be used to supplement or augment data obtained from field studies, or to provide screening-level estimates that will guide new, focused exposure data-collection efforts. Therefore, there was a need to investigate the feasibility and utility of extant data and for optimizing resources required for retrieval and analysis of exposure-relevant data, which was done as part of this study. Additionally, the diverse nature of the exposure-relevant data spanning a wide range of spatial and temporal scales necessitates the development of “indices” that can summarize a large number of attributes into a set of simple metrics that can be used in classifying or ranking exposures. The effort and costs associated with the development of the exposure information systems is primarily for data retrieval, processing, and management, along with analysis of information from the literature on biological mechanisms related to end points of concern. This effort is expected to be substantially lower than the effort required for collection and analysis of new data for large-scale studies, and also provides the initial data for the calculation of EIs. With recent advances in geodatabase technologies, it is expected that data collected by various federal, state, and local agencies will become more easily available in the future for use by the scientific community, and the costs associated with assembly and maintenance will reduce substantially.

The focus of the EI-based analysis presented here has been primarily on environmental exposures that can be associated with inflammation via respiratory irritation, although major pathways such as infection may be more important in relation to inflammation. Likewise, the underlying calculations and data presented in this study focused only at a screening level “Tier 2” analysis. This simplified application of the EI was necessarily due to the limited availability of subject-specific data on exposures and health outcomes. Despite the limited scope of the current EI application, generalizations can be made with respect to (a) scenarios where this framework is most appropriate, (b) major limitations and potential data quality issues to be aware of, and (c) directions for improving the framework.

The main advantage of the EI framework is that it is defined in a flexible manner that can potentially take advantage of data from multiple scales and from study-specific data and can address issues of variability within population subgroups. Large-scale epidemiological hypotheses can take advantage of the relatively simplified classification provided by the TiER framework (e.g., ranking of high, mean, and low exposures). For hypotheses focused on local-scale phenomena, the distributions of EIs, along with corresponding geospatial maps and analysis results of environmental attributes, can provide insight into areas with a potential for unusually high exposures, or where co-exposures may occur. This will lead to more clearly defined field studies on individuals and fill important data gaps for health outcomes of concern.

The TiER framework is most appropriate when some degree of knowledge exists on the underlying biological mechanisms (or reference concentrations), linking a set of exposures and biological attributes to health end points of concern. This is especially important when contributions from a large number of relatively minor factors together will be significant, and can affect conclusions based on statistical inference of epidemiological data; the use of EIs in such situations can help avoid exposure misclassification. EIs can also be used for generating exposure-related hypotheses, based on analysis of health outcome data and overall exposure attributes, taking into account variability across and within study segments. Additionally, when only small amounts of representative data are available on health outcomes, the EIs can be first evaluated using both the exposure and health outcome data, and can be extrapolated through simulations, with a higher degree of confidence because of the underlying mechanistic basis for the EIs. In cases where data exist for only a subset of exposure-relevant attributes, the EI approach can still be useful because field studies can be designed with primary focus on the remaining attributes.

One of the current limitations of the EIs is that they are applicable only within the constraints imposed by the selection of health end points, and assumptions related to key exposure-relevant attributes and biological mechanisms considered. Additionally, as shown in the example in Figure 8, the EIs can be affected by the representativeness of extant data. This also points to opportunities for improvements in characterizing local-scale exposures when local datasets are combined with regional data via spatial interpolation techniques that can allocate appropriate weights based on the types of data. Although the current EI formulations have used a “default” annual exposure period to rank the Queens segments, ongoing efforts focus on incorporating exposure windows (e.g., days, months) that are consistent with hypotheses relevant to other health end points associated with children’s health. Additionally, exposure characterization and ranking for longer periods of time require separate EI estimates for each time period of concern, and verified with locally collected participant samples and data. Likewise, for ranking individual subjects, narrower, subject-specific time windows and corresponding exposure and concentration metrics need to be considered.

The general formulation of the EIs is geared toward characterization of multimedia, multiroute exposures. Inputs that would be required to calculate corresponding EIs include environmental and microenvironmental concentration distributions, pathway-specific intake rates, general demographic distribution characteristics, and so on (a summary table listing a selected subset of environmental and demographic datasets currently incorporated in the EXIS is provided in Supplementary Table S.3 and a list of acronyms used in Table S.3 is expanded in Supplementary Table S.4). Thus, for multimedia or multi-exposure route contaminants, the required parameters will be concentrations or concentration distributions in drinking water, food, and so on; inhalation rate distributions (age- and gender-specific); water and food intake rate distributions (age- and gender-specific); recipe files (e.g., for estimating what food items the study population consumes, and what at levels), and so on. Depending on the resolution of the study, there are often spatial and temporal resolution limits that are imposed by the quality of extant data: as an example, extant national-level databases from food surveys do not allow characterization of sub-county variability (segment-level estimation), while ambient monitoring data and modeling estimates from national-level air quality databases allow segment-level characterization. Ongoing efforts on compilation and analysis of extant data beyond air quality data aim to identify feasible methods for incorporating additional data on environmental conditions, socio-economic characteristics, and so on, into the EI framework. Such improvements to the EI approach and the underlying data platform can potentially provide a cost-effective means for simultaneously analyzing these additional datasets and in supporting ongoing and future NCS and other field studies, as well as improving the design of future exposure measurement studies.

Improvements to the TiER framework and EI estimates are expected to be through iterative refinements, when data relevant to environmental and biological factors are collected in population studies and linked with corresponding data on health outcomes. Within the NCS, this linkage can be established when segment- and subject-specific data (e.g., data collected in and around an individual’s home and emissions microinventories) become available along with corresponding information on health and adverse birth outcomes, and can provide insight into the relative importance of different exposure attributes (e.g., air pollution-related inflammation versus infection induced inflammation).

Lioy et al.12 have summarized the overall scope and range of extant, exposure-relevant data in relation to the NCS. Incorporation and use of timely and current data are the important components in the development of EIs. Any lags in development and release of environmental databases will affect corresponding applications of the EIs. This implies that information from major environmental and demographic databases should be regularly updated with new data from primary sources (e.g., air quality concentration data from monitoring networks). Additionally, high-resolution data relevant to the locations and individuals, to be part of a new or long-term study, should be aggregated and maintained. In some cases, the primary data sources are updated less frequently than annually (e.g., NEI emission estimates); modeling or projection may be required to estimate values for years in between.

Data gaps in extant measurement data can be addressed through the use of mechanistic environmental and microenvironmental models, provided that sufficient supporting information (e.g., source, release and media characterization in lieu of concentrations) is available to develop reliable modeled estimates of the missing data. In some cases, high-resolution concentration metrics (such as peak concentrations) may be required, which can be obtained from outputs of environmental quality simulation models that are evaluated using available, coarser resolution data. In addition, model/data fusion techniques such as the Bayesian maximum entropy20 allow the combination of sparse data with outcomes from mechanistic models. With more evaluation and refinement, the EIs can provide metrics for exposure characterization in many types of studies and can be employed either as a simple screening tool or as a robust indicator of exposures experienced by individuals or populations.


This study demonstrates the utility of the integrated TiER toward supporting large-scale population exposure studies such as the National Children’s Study. The application of the EIs to rank NCS segments in Queens County, NY, illustrates the potential informative value of extant data to the NCS, and provides a significant first step toward effective usage of information available from extant databases for exposure estimation across a number of diverse types of locations in NCS and other studies. Although the EI estimates are highly sensitive to the quality of environmental data and to the adequacy of the mechanistic knowledge of exposure–response relationships, and there are limitations imposed on the EIs by the assumptions related to biological mechanisms and reference concentrations, the EI estimates nevertheless provide insights into designing field studies and can reduce exposure misclassification in epidemiological studies. The primary utility of the underlying TiER framework is in providing a high-quality data platform (EXIS) that can support both exploratory and hypothesis-driven analyses. The flexible formulation of the EIs implies that they can be augmented to consider multiple routes of exposure, and behavioral and socioeconomic factors that may be important in health outcomes of interest.


  1. 1

    Georgopoulos PG, Brinkerhoff CJ, Isukapalli S, Dellarco M, Landrigan PJ, Lioy PJ . A framework to support characterization and ranking of exposures for the National Children’s Study (NCS). Risk Anal 2012 (under review).

  2. 2

    Georgopoulos PG, Isukapalli SS, CCL co-workers. Exposure-based prioritization of a “test set” of environmental chemicals using PRoTEGE (Prioritization/Ranking of Toxic Exposures with GIS Extension). USEPA, the Computational Chemodynamics Laboratory, EOHSI Piscataway, NJ. 2011.

  3. 3

    Georgopoulos PG, Wang SW, Vyas VM, Sun Q, Burke J, Vedantham R et al A source-to-dose assessment of population exposures to fine PM and ozone in Philadelphia, PA, during a summer 1999 episode. J Expo Anal Env Epid 2005; 15: 439–457.

    CAS  Article  Google Scholar 

  4. 4

    Georgopoulos PG, Lioy PJ . From a theoretical framework of human exposure and dose assessment to computational system implementation: the Modeling ENvironment for TOtal Risk Studies (MENTOR). J Toxicol Envirn Health Pt B, Crit Rev 2006; 9: 457–483.

    CAS  Article  Google Scholar 

  5. 5

    Georgopoulos P . A multiscale approach for assessing the interactions of environmental and biological systems in a holistic health risk assessment framework. Water Air Soil Pollut: Focus 2008; 8: 3–21.

    Article  Google Scholar 

  6. 6

    Georgopoulos PG, Isukapalli SS, Burke J, Napelenok S, Palma T, Langstaff J et al Air quality modeling needs for exposure assessment from the source-to-outcome perspective. Environ Manager 2009, 26–35.

  7. 7

    Sasso A, Isukapalli S, Georgopoulos P . A generalized physiologically-based toxicokinetic modeling system for chemical mixtures containing metals. Theor Biol Med Model 2010; 7: 17.

    Article  Google Scholar 

  8. 8

    USEPA. AirData: Access to Air Pollution Data. US Environmental Protection Agency Washington, DC, (cited 23 August 2011).

  9. 9

    USEPA. Exposure Factors Handbook 2011 Edition (Final). US Environmental Protection Agency Washington, DC. 2008 (2011 EPA/600/R-09/052F).

  10. 10

    Zhang Y, Lin L, Cao Y, Chen B, Zheng L, Ge RS . Phthalate levels and low birth weight: a nested case-control study of Chinese newborns. J Pediatr 2009; 155: 500–504.

    CAS  Article  Google Scholar 

  11. 11

    Baibergenova A, Kudyakov R, Zdeb M, Carpenter DO . Low birth weight and residential proximity to PCB-contaminated waste sites. Environ Health Perspect 2003; 111: 1352–1357.

    Article  Google Scholar 

  12. 12

    Priyanka C, Dibyendu B . Biomonitoring of air quality in the industrial town of asansol using the air pollution tolerance index approach. Res J Chem Environ 2009; 13: 46–51.

    CAS  Google Scholar 

  13. 13

    Miranda ML, Maxson P, Edwards S . Environmental contributions to disparities in pregnancy outcomes. Epidemiol Rev 2009; 31: 67–83.

    Article  Google Scholar 

  14. 14

    Sram RJ, Binkova B, Dejmek J, Bobak M . Ambient air pollution and pregnancy outcomes: a review of the literature. Environ Health Perspect 2005; 113: 375–382.

    CAS  Article  Google Scholar 

  15. 15

    Windham G, Fenster L . Environmental contaminants and pregnancy outcomes. Fertil Steril 2008; 89 (2 Suppl): e111–e116 discussion e7.

    Article  Google Scholar 

  16. 16

    Behrman RE, Butler AS (eds). Preterm Birth: Causes, Consequences, and Prevention. The National Academies Press Washington, DC. 2007.

    Google Scholar 

  17. 17

    NYC DOHMH. New York City Community Air Survey. Results From Year One Monitoring 2008–2009. New York City Department of Health and Mental Hygiene New York, NY. 2009.

  18. 18

    USEPA. National Air Toxics Assessments. US Environmental Protection Agency. 2011 (cited 18 October 2011); Available from

  19. 19

    Lioy PJ, Isukapalli S, Trasande L, Thorpe L, Dellarco M, Weisel C et al Using national and local extant data to characterize environmental exposures in the National Children’s Study (NCS): Queens County, New York. Environ Health Perspect 2009; 117: 1494–1504.

    Article  Google Scholar 

  20. 20

    Christakos G . Modern Spatiotemporal Geostatistics. Oxford University Press New York, NY. 2000 p 288.

    Google Scholar 

  21. 21

    Cheng WL, Chen YS, Zhang JF, Lyons TJ, Pai JL, Chang SH . Comparison of the revised air quality index with the PSI and AQI indices. Sci Total Environ 2007; 382: 191–198.

    CAS  Article  Google Scholar 

Download references


Support for the work presented here is provided by the National Children’s Study Queens Vanguard Center, which is funded in whole or in part by the National Institute of Child Health and Human Development, National Institutes of Health, under Contract Number 0258-325-4609. Further support is provided by the NIEHS sponsored UMDNJ Center for Environmental Exposures and Disease, under Grant Number NIEHS P30ES005022. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. government.

Author information



Corresponding author

Correspondence to Paul J Lioy.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies the paper on the Journal of Exposure Science and Environmental Epidemiology website

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Isukapalli, S., Brinkerhoff, C., Xu, S. et al. Exposure indices for the National Children’s Study: application to inhalation exposures in Queens County, NY. J Expo Sci Environ Epidemiol 23, 22–31 (2013).

Download citation


  • National Children’s Study
  • exposure index
  • birth outcomes
  • inhalation exposures

Further reading


Quick links