Developing meaningful cohorts for human exposure models

Abstract

This paper summarizes numerous statistical analyses focused on the US Environmental Protection Agency's Consolidated Human Activity Database (CHAD), used by many exposure modelers as the basis for data on what people do and where they spend their time. In doing so, modelers tend to divide the total population being analyzed into “cohorts”, to reduce extraneous interindividual variability by focusing on people with common characteristics. Age and gender are typically used as the primary cohort-defining attributes, but more complex exposure models also use weather, day-of-the-week, and employment attributes for this purpose. We analyzed all of these attributes and others to determine if statistically significant differences exist among them to warrant their being used to define distinct cohort groups. We focused our attention mostly on the relationship between cohort attributes and the time spent outdoors, indoors, and in motor vehicles. Our results indicate that besides age and gender, other important attributes for defining cohorts are the physical activity level of individuals, weather factors such as daily maximum temperature in combination with months of the year, and combined weekday/weekend with employment status. Less important are precipitation and ethnic data. While statistically significant, the collective set of attributes does not explain a large amount of variance in outdoor, indoor, or in-vehicle locational decisions. Based on other research, parameters such as lifestyle and life stages that are absent from CHAD might have reduced the amount of unexplained variance. At this time, we recommend that exposure modelers use age and gender as “first-order” attributes to define cohorts followed by physical activity level, daily maximum temperature or other suitable weather parameters, and day type possibly beyond a simple weekday/weekend classification.

Introduction

This paper elaborates on the concepts enunciated and analyzed in McCurdy and Graham (2003), which focused on developing and testing an activity factors typology — a condensation of numerous factors thought to affect human activity choices — useful for understanding those factors affecting individual choices regarding the type and duration of activities undertaken. That paper presented the typology and tested it against a single-individual longitudinal activity database and a cross-sectional activity database of people similar to that person. While not all factors in the typology were testable, it was found that a number of them could explain a modest amount of variability (30–40%) in the time spent in three selected locations: time spent indoors, outside, and in motor vehicles. This paper has much the same orientation, but uses all of the information contained in the US Environmental Protection Agency's (EPA's) human activity database developed by the Agency's National Exposure Research Laboratory (NERL). NERL's activity database is known as the Consolidated Human Activity Database, or CHAD for short (McCurdy et al., 2000). CHAD is available on the internet at: www.epa.gov/chadnet1/.

CHAD is used as an input data file in a number of exposure models that utilize a time-series of activity/location patterns for their calculations.Footnote 1 These models essentially separate the CHAD diaries into user-defined categories based on demographic characteristics (age, gender, etc.), and randomly sample from the diaries to develop multiday (longitudinal) activity patterns for simulated individuals. These categories are used to group cross-sectional activity information from similar individuals into an activity “pool” or “bin” to increase the sample size available for longitudinal simulation purposes (Xue et al., 2003). This paper critically considers and statistically analyzes key attributes that should be considered when developing groups of individuals for exposure modeling purposes, whether utilized in constructing cohorts or when simulating individuals longitudinally.Footnote 2

The terminology used in this paper needs further elaboration. Activity pattern data are a compilation of complete day (midnight-to-midnight) sequential activity/location data and other information, such as the presence of combustion sources (e.g., wood fireplace) and the relative exertion rate (e.g., jogging is high exertion; typing is low) undertaken by an individual for every activity. Exertion rate is typically designated as the activity level. An event is any change in a respondent's location, activity, activity level, combustion status (cigarette smoking, gas stove usage, etc.), or clock hour. In human activity pattern studies, subjects are instructed to provide data for every event during the day(s) surveyed, and are supposed to note when each event started (sometimes to the minute). Even if a person were in the same location doing the same thing for an entire day, there would be 24 events for that day due to clock-hour changes. Finally, it should be noted that when an activity factor in the typology is analyzed as an isolated, noncausal variable in this paper, we designate it as an attribute.

Most of the analyses conducted in this paper focus on the locational aspects of exposure events. Understanding why individuals spend time in specific locations assists in bridging the gap between using cross-sectional data to address a longitudinal problem, that is, how best to assemble 1-day diaries of seemingly unrelated persons to simulate a longer-term diary for an individual. Thus, the purpose of this paper is to extricate statistically significant factors that determine where individuals spend time, to ultimately be used as a technique that provides a more appropriate representation of both intra- and interperson variability when coupling seemingly unrelated individual human activity diaries in longitudinal exposure analyses. Subsequent papers planned by the authors will focus on specific activity and activity-level choices made by individuals, each of which can greatly affect dose-rate estimation.

Materials and methods

Data Set Description

The version of CHAD used for this paper contains 22,968 person-daysFootnote 3 of activity pattern data.Footnote 4 Of the 11 factors developed in the typology mentioned above (McCurdy and Graham, 2003), eight are testable, at least in part. Unfortunately, lifestyle (a person's use of leisure time) and lifestage factors (a person's role in the family as it changes over time) that are known to greatly affect people's activities and locational decisions (Henderson et al., 1996; Zemel et al., 1996) are not available for analyses. Specific attributes that are testable follow; many of them are already used to frame exposure model cohort definitions. These include age, gender, race/ethnicity, educational level, employment status/occupation, health status (cardio-respiratory only)/potential exposure conditions, home-type and housing characteristics, day-of-the-week, and season of the year. The first task is to determine exactly how many days of data exist in CHAD given these attributes, followed by an analysis of their relative distribution across the various studies, and finally an evaluation of their potential explanatory power and use in developing exposure cohorts.

Statistical Analysis

Several statistical methods were used in the evaluation of the CHAD data set and are described briefly here. All statistical analyses undertaken in this paper were conducted using SAS© Version 8.02 for the PC (SAS Institute, 2002).

The Kolmogorov–Smirnov (K–S) test is a nonparametric test used to judge whether two ordinal-scale samples have the same continuous distribution, as would be expected if they were drawn from the same population (SAS, 2002). While it requires that the samples be independent and random, it does not assume any particular sampling distribution form. The test statistic is the maximum “distance” between cumulative distributions of the two samples (DN) and the null hypothesis (H0) of no difference in the distributions is rejected when DN becomes too large as evaluated by a χ2-test at α=0.050.

Due to the unbalanced nature of the data set, a general linear model (GLM) was required to perform an analysis of variance (ANOVA). An “F-test”, based on the Type III mean squares that provides an assessment of each independent variable adjusted for every other term, was utilized to assess parameter significance (Cody and Smith, 1997). Multiple pairwise comparisons of record count were evaluated using least-squared means (adjusted for main effects) and Tukey's method (SAS, 2002).

The Pearson χ2-test statistic involves the differences between observed and expected frequencies and indicates homogeneity or independence for each stratum (SAS, 2002). To avoid assessing significant differences noted simply because there were zero observed frequencies in a given cell, studies or parameters were either dropped or combined before the analyses were performed, where possible, to allow for sound interpretation.

Time spent outdoors was evaluated as a dependent variable in a multiple regression analysis containing several categorical and continuous independent variables. Only statistically significant explanatory parameters (p<0.050) were retained in models.

Results and discussion

Person-days of Data Available in CHAD for Selected Attributes

Table 1 presents selected information about the various studies that comprise CHAD; study characteristics are more fully described in McCurdy et al. (2000). The studies in CHAD vary from a random probability study of most US continentalFootnote 5 residents as a whole (NHAPS-Air and -Water, and the UMC study) or a large state (the three California studies), to panel studies of narrowly defined age cohorts (the Baltimore elderly and the two Los Angeles school studies). Falling somewhere between these are the random probability studies of specific metropolitan areas (DEN, CIN, WAS, and VAL). The method of data collection also varies from a structured telephone interview that elicited “yesterday's” activities to a single- or multi-page paper “activity diary” that subjects carried with them throughout the day. Thus, the studies in CHAD differ in both design and method of data collection.

Table 1 Study structure comprising CHAD, the number and frequency of individual subjects, and person days of data with under 30 diary records.

While combining data from dissimilar studies may seem inappropriate, end-users such as probabilistic exposure modelers need to maximize the sample size of activity pattern data available for modeling, particularly when attempting to maintain cohort homogeneity in a model. In fact, one of the important aspects of this paper is to indicate exactly where the studies differ so that a modeler knows in advance what potential problems may arise if disparate data from different studies are combined, without conducting the necessary statistical analyses themselves.

One of the columns in Table 1 requires further elaboration here; the number of days with <30 events recorded. Based on the standardization of all study data before input to CHAD, having 30 event records is only 25% more than the barest minimum possible (24 clock-hour events). We believe that having <30 records can indicate that a subject may not have been truly engaged in providing precise time increments about what he or she did or where they were on the day in question, the basis for much of the statistical analyses performed here. It is the same criterion used in our earlier comparative analyses of activity typology factors and is used here in screening the CHAD data (McCurdy and Graham, 2003).

There are no standard criteria determining whether an individual's diary is valid for an exposure modeling exercise (other than that it contains the necessary cohort information). As a whole, in constructing relevant cohorts for use in exposure models, we are essentially data-poor, and until appropriate studies are conducted to augment CHAD, exposure modelers will likely have to use all of the data in CHAD. From a probabilistic exposure modeling perspective, cohort sample size is of greater importance compared to the “microquality” of the data contained in CHAD.

Approximately 10% of the person-days of data in CHAD have <30 records per day (Table 1). The studies with the greatest proportion of diaries with <30 records are primarily those using the recall method, for example, the NHAPS study (comprising 69% of all diaries with <30 records), likely the direct result of individuals approximating time spent performing activities in various locations. Only BAL of this group is a diary study, and the low number of records per individual may indeed be “real” here since its subjects were all elderly (72–93 years old) and resided in a single facility having individual apartments. Residents could share meals in a common dining facility if desired, and rarely left the building anytime during the study. In fact, the vast majority of their time was spent in their own apartments, performing only a few discrete activities. Given the parameters of the diary used (a 15-min block of time was the smallest unit of analysis) and the definition of an event, having <30 records per day may be an accurate accounting of their time, and does not necessarily mean that the activity pattern data in Baltimore are perfunctory. Thus, this criterion does not attempt to indicate that all study data with <30 records are imprecise, inaccurate, or contrived, but it serves as a screen for where this type of data may be more prevalent.

An example of a decision-tree overview of CHAD that an exposure modeler may conceptualize in developing cohorts is given in Figure 1. It depicts the cumulative effect on the total available sample size when selecting various personal and residential attributes. The figure is first divided by gender, with males branching to the top and females towards the bottom. As shown, CHAD has 10,666 person-days of data for males (♂), 12,276 for females (♀), and activity pattern data for 26 people of unknown gender. With respect to age, 149 ♀ and 61 ♂ have missing age data, both <1.3% of their gender class. The sample begins to contract when ethnicity information is included (known for 82.8% ♂ and 80.7% ♀). Cumulatively, there are data on education, ethnicity, age, and gender in CHAD for 8394 ♂ and 9204 ♀, 78.7% and 75% of the two gender groups, respectively. Occupational information — determined by two-digit SIC categoriesFootnote 6 — causes an extreme drop in available cohort data when the preceding factors are known: they are available for only 564 ♀ and 537 ♂ when considered in this hypothetical cohort development sequence. Even if only adults ≥18 years old are considered (12,800 entries in CHAD), occupational information is known for only 8.6% of the database. Thus, developing a cohort based on all of the attributes shown in Figure 1 will result in fairly small group sizes, particularly if age itself is subdivided, as it often is in an exposure model (Johnson, 1989, 1995; Lurmann et al., 1990; Burke et al., 2001; Zartarian et al., 2002). The example posed may be somewhat dramatic to what will typically occur when selecting for particular attributes without a priori knowledge of the limitations of CHAD; however, sample size will be optimized following appropriate attribute selection order and possibly similar attribute substitution. For example, if being employed outside the home is of interest (as a Yes/No variable) instead of a specific occupational category, then more data exist in CHAD on that attribute; employment status is known for 12,702 adults ≥18 in CHAD (>99%).

Figure 1
figure1

Hierarchal analysis of cohort development effect (cumulative) on sample pool. (a) Decision tree begins with M=male and F=female; N=no data on cumulative attribute, Y=yes data are available to carry through to next attribute. (b) Consider there are 9480 individuals under 16 years of age ineligible for this attribute.

A complete summary of the number of person-days of data available in CHAD for all relevant attributes, characteristics, and factors central to the activity typology in addition to other personal indices is provided in Tables A1, A2, A3, Tables A4. These tables also contain the sample size reduction associated with cohort development when considering multiple attributes. Table A1 presents the number of person-days of data available in CHAD for age, gender, ethnicity, educational level, and employment, considering each attribute alone and in certain combinations with other attributes. This table covers most of the same items as Figure 1, but provides detailed attribute/characteristic information rather than the yes/no dichotomous data of the figure. Tables A2, A3 and A4 contain additional personal attributes such as smoking and health status, information about housing attributes such as type (single-family detached, apartments), and day-of-week, month-of-year, and weather conditions associated with the activity pattern and considers attributes individually as well as cumulatively.

Table a1 Person-days of data available in CHAD for selected personal attributes.
Table a2 Person-days of data available in CHAD for additional personal attributes.a
Table a3 Person-days of data available in CHAD for housing attributes and characteristics.a
Table a4 Person-days of data available in CHAD for temporal amd climatological factors.a

These tables present a systematic examination of selected data available in the CHAD database and the major sources of missing information for specific attributes and characteristics, some of which are known to affect people's choices regarding where they spend time and what they are doing. Much more detail on some of these attributes are reported in individual study publications; see McCurdy et al. (2000) for a full citation. The NHAPS study, in particular, has been extensively analyzed (Klepeis et al., 1996, 2001; Tsang and Klepeis, 1996, 1997). We recommend that interested readers review these citations for a broader perspective on available activity pattern data. The next two sections evaluate whether or not there are systematic differences among the studies with respect to selected attributes. If there are — and some were found in our previous analysis of a specific cohort (McCurdy and Graham, 2003) — then modelers should be cautious when pooling activity diaries from different studies in CHAD to develop specific cohort groups.

Analysis of Recorded Events by Study and Selected Activity Typology Factors

Study effect

The number of person-days of data that remain by study after removing days having <30 events are shown in Table 2. Note that the two California studies of children and adolescents (CAC & CAY) are combined, as are the two small Los Angeles studies of the same groups (LAE & LAH). This was done based on their study design structure, similar study subjects, and relatively low individual sample sizes. LAE/LAH stands out as it has by far the largest median number of events per day recorded: 65.5, about 60% more than the next most-detailed paper diary studies: CIN, DEN, and WAS. The remaining studies have between 32 and 39 median number of events per day, with all of the studies containing relatively modest COVs. We generally analyzed the events per day metric by study and other attributes/characteristics in two ways, with both outcomes consistent with one another.

Table 2 Summary statistics for the number of records >29 by study.

A two-sample K–S test of the distribution of record frequencies was undertaken for all possible pairs of studies shown in Table 2 (e.g., CAA versus CAC/CAY combined; NHA versus CIN). The null hypothesis (H0) that the two samples are from the same continuous distribution can be rejected at p<0.050 for all of them except for the UMC:CAA, NHA:VAL, NHW:VAL, and DEN:WAS pairs. The first three pairs listed contain studies where both used a recall format but were on widely different geographic scales: national versus state or local level, so the K–S test outcome is puzzling. Intuitively, H0 should be rejected for those three pairs. Perhaps it indicates that geographic extent of a study is not important. The DEN and WAS studies both used the same paper diary form and collected data on the same general population during the same seasonal time period of the year: winter. Thus, there is a logical reason why those two studies cannot be distinguished. Again, it may indicate that geography is not important. What is not easily explained is the rejection of H0 for the NHA and NHW pair (DN=0.05; χ2=2.02; P<0.001), since they were conducted by the same sampling group using the same recall format during the same time period; the significance outcome may be due to the very large sample size available for both pairs, which magnifies small differences.

A second analysis approach was to construct a GLM using study and selected personal attributes and weather conditions to assess whether these parameters could “explain” differences in the number of events per day recorded. Two separate models were considered as an attempt to maximize sample size for interpretive purposes: the first model considered the variables “study”, “age group”, “gender”, “day-type”, and “ethnicity” (n=17,119 for individuals with ≥30 records) and within-study (interaction) analyses for these variables. A second model added the “weather” variable and “precipitation” (n=11,408) and also included interaction terms with “study” (these results are only discussed in the weather and precipitation subsections since the sample size is reduced; the outcome was the same for the other variables).

As noted above, when assessing main effects, the parameter “study” contained significant explanatory power (F=120, p<0.001) in determining the number or records using the ANOVA model described and containing the other selected attributes (results for each discussed separately below). All pairwise study comparisons were significant at p<0.001, except for NHA:NHW (p=0.082).

Table 3 presents the distributions of record number for several of the personal attributes and other characteristics including the method of data collection (see Table 1 also for relationship with the studies). The diary format has more event records than the recall method, as expected. Employing the K–S test, the difference is noted statistically significant (DN=0.24; χ2=17.2; p<0.001). It should be mentioned that this parameter (i.e., “method”) was not included in the aforementioned GLM/ANOVA, since it is already accounted for by the variable “study”. A separate ANOVA model was constructed containing the variables “random” and “method” exclusively and showed that the largest difference in record number considering these structural components of “study” was in fact due to whether the method of data collection was real-time diary or recall.

Table 3 Summary statistics for the number of records by selected attributes and factors (daily records >29).a

Gender effect

The first step before testing the differences in records among gender was to evaluate the gender structure of CHAD considering the different studies. The difference in gender distribution between studies was tested using a contingency-table approach and a χ2-test of the study/gender cell count. The gender breakdown by study is significant (χ2=210; p<0.001). While the CHAD gender breakdown is 54.1% ♀/45.9% ♂ for persons containing ≥30 records (53.5% ♀/46.5% ♂ considering all data), the split for particular studies varies between 79.3% ♀/20.7% ♂ for the elderly persons in the Baltimore study and 47.7% ♀/52.3% ♂ for the two California recall studies of children and adolescents. The Baltimore result is understandable, but the California split is not. For comparison purposes, the gender split for the entire US from the 2000 Census is 50.9% ♀/49.1% ♂ (US Census, 2002).

In testing of record differences due to gender, females record slightly more events per day than males (Table 3), and this difference is significant according to the ANOVA (F=7.4; p=0.007) and confirmed by an independent K–S test (DN=0.04; χ2=3.06; p<0.001). The interaction of the study parameter and gender was also significant (F=4.2; p<0.001), however the only gender difference (symbolically: ♀:♂) noted by pairwise comparisons within studies was for NHAPS (NHA:NHW; p=0.004 and p<0.001, respectively). Thus, based on this analysis, gender differences noted for record keeping are mostly due to the differences within this study alone.

Age effects

Age is subdivided into 14 groups. Of them, 10 are used for <18 years old children and adolescents, and their bounds are taken from a draft EPA guidance document for use in assessing exposures to environmental exposures (Risk Forum, 2000). The Forum makes the distinctions because of important developmental, physiological, and/or activity differences among the age groups.

As far as the distribution of persons within the recommended age categories is concerned, CHAD data are quite sparse (see Table 3), particularly the <1 month (121 days), 1 to <3 months (14 days), and 3 to <6 months (115 days) age categories (each for persons with ≥30 records). There is an obvious problem with sample size in CHAD for cohorts <6 months when it is used to propagate activity pattern data for exposure modeling. In particular, activity data for 1 to <3-month-old children would have to be obtained outside of CHAD to adequately model their patterns. Note that data collected for children less than 1 year old in the CIN and NHAPs study were classified as age equals “0”, thus, the age category newborn (“0”) indicated here is likely mixed individuals of age other than “0” up to and including 11 months.

There are obvious differences among the studies in CHAD with respect to their age distributions since some of them were explicitly focused on specific age categories by design (see Table 1). For comparison purposes to the 2000 US Census age breakdowns, we aggregated CHAD age groups in Table 3 to fit those age categories available in the 2000 US Census and show them as proportions here: (See Table 12)

Table 8 Table 12

While there are differences in the age distributions — in particular, CHAD has proportionally more children/adolescents <15 years old than its share of the US population would “warrant” — this is not a problem from an exposure modeling perspective. This is because the proportion of ages included in a model is usually — and always should be — controlled by the Census distribution for the area under analysis. That is the explicit modus operandi used by all time-series exposure models that the authors know of (Lurmann et al., 1990; Johnson et al., 1997; Glen and Shadwick, 1998; Burke et al., 2001; Richmond et al., 2002; Zartarian et al., 2002). The proportional differences in age structure would be a problem if CHAD statistics were used to make inferences to the entire US population, but that is not how CHAD should be used.Footnote 7

On average, CHAD contains between 520 and 700 person-days of activity data for each year of age until 12, including <1-year-old children, and 200 or so per year for the 12 to <18-year-old group. There are between 150 and 250 days per year, on average, for 18–64 year olds, but fewer for the elderly: 100–140 days per year of age for people aged 65–76, but only 50–80 per year of age above 76. It should be noted that CHAD contains monthly age breakdowns only for children ≤2. Age for all other persons is coded as a yearly integer, using whatever convention the original studies used to define age.

As one may anticipate, record numbers are confirmed to be influenced by age of the individual as indicated by the GLM/ANOVA procedure (F=21.8; p<0.001), along with a significant interaction between study and age group (F=8.4; p<0.001). However, within-study age differences are limited to CIN (mostly due to differences in the 21–44 age group vis-a-vis several other groups; all p<0.002), LAE/LAH (both the 6–10 age group and 11–15-year-old youths compared to each other and to the 16-18-year-old group; all p<0.001), and NHW (3–5-year olds to both the 11–15 year and 21–45-year-old age categories; all p<0.035).

Ethnicity effects

Based on the contingency-table test, distributions of ethnic data were different among the studies (χ2=3060; p<0.001). The overall breakdown of ethnicity where known in CHAD is 1.9% Asian; 17.3% Black; 5.3% Hispanic; 73.2% White; and 2.2% Other (virtually the same for the ≥30 records subset). The corresponding single-race breakdownFootnote 8 in the 2000 US Census is 3.6% for Asians, 12.3% for Blacks, and 75.1% for Whites. Thus, Asians are under-represented in CHAD while Blacks are overrepresented. Again, this is not a problem since Census Bureau figures are used to allocate populations to the areas being modeled for exposures. With respect to studies in CHAD, the White/Black proportions — as an example — vary from 50.2%/37.5% in the UMC study, which oversampled minorities, to 93.6%/5.4% in the CIN study.

Table 3 next indicates the number of ≥30 event-days available for people with an identified ethnic/“Other” group. Ethnicity was also a significant attribute in explaining differences in record counts (F=9.7; p<0.001). In general, Whites have significantly more records per person, on average, than the other groups (all comparisons; p<0.002), followed by Asians, with little difference noted between the other groups. However, within-study ethnicity differences were limited to significance noted in the CIN (for Black:White; p<0.001) and UMC (White:Other,:Black, and:Hispanic; all p<0.001) studies only.

Day-type effects

Day type stands for the “type” of day for which the activity information was collected. Two kinds of day-type distribution analyses were conducted: one using a simple weekday (WD)/weekend (WE) split, and the other using a “composite” factor. The composite factor combines information on day-of-the-week with data on paid work for that day (working ≥2 h on that day and for individuals ≥16 years of age). A working day is coded “W” after the day-of-the-week symbol, and “NW” for nonworking. Thus, WDW means that the person's activities are for a (paid) working weekday; WENW indicates a nonworking weekend. While weekday/weekend is coded directly in CHAD, the combined day type can be obtained only by reviewing activity information to determine if a person was in a paid work location on that day. We determined in McCurdy and Graham (2003) that the combined definition of day type had more discriminatory power to explain time spent in critical locations of concern from an exposure perspective; so we use it here for the statistical analyses. If day-of-week sampling were truly random, you would expect a 71.4%/28.6% WD/WE proportion. The proportions by study in CHAD vary between 51.1%/48.9% for UMC and 79.7%/20.3% in BAL. Weekday/weekend proportion study differences are statistically significant using the contingency-table test (χ2=663; p<0.001).

Statistical analyses were performed on the day-type variable discussed above, rather than using simple weekend/weekday classifications. The main effect of day type on number of records per day was significant (F=14.4; p<0.001) along with the effect of “study” and day-type interaction (F=5.7; p<0.001). Least-squared means were significantly different for all possible day-type combinations except for the comparison of weekday nonworking and weekend nonworking (this pair, p=0.607; all others, p<0.160). However, within-study day-type differences were limited to CAC/CAY (WDNW:WENW; p=0.002), NHA (WDNW:WEW; p<0.001), and NHW (WDW:WDNW; p<0.001).

Neither of the results described above affects how exposure models use either day-type datum when models sample from discrete activity pattern “bins” that are keyed to the day-of-the-week (or combined characteristic) being modeled; so skewed proportions do not affect an exposure outcome. The correct activity set will be chosen for each day-type metric used model, but the available sample size for randomly choosing an activity day will not be proportional to individual study sample sizes. It is when the distinction is not made among workdays that potential nonsensical simulated individuals could result; such as when considering only WD/WE in developing a longitudinal diary, an individual could end up as full-time paid working on both WD and WE.

Weather effects

Weather in Table 3 is another combined variable deemed important by McCurdy and Graham (2003) in explaining differences made in locational decisions by a particulate cohort. The variable combines monthly identifiers with daily maximum temperature (DMT) information, both available for 14,609 person-days of data with ≥30 events (63.6% of all days). Cold (C) days have a DMT <55°F in October–June; not cold (NC) days have a DMT ≥55°F during the same months; not hot (NH) days have a DMT <84°F in July–September; and hot (H) days have a DMT ≥84°F for the same months. This temperature-by-season classification is used by OAQPS in its exposure modeling work, and is based upon analyses of weather effects on human locational decisions undertaken by Johnson et al. (1992, 1995). As can be seen, there are relatively fewer hot days than any of the others, and NC days dominate. This distribution is direct function of where and when the original studies were undertaken.

The distribution of weather classes was analyzed to determine if there are differences among the studies, as is expected, since many were restricted to specific seasons of the year in specific areas (BAL, CIN, DEN, LAE, LAH, VAL, and WAS). The contingency-table test indicates this to be true (χ2=1100; p<0.001). As before, this problem only affects a study's relative sample size availability for activity pattern pooling purposes, and not how an exposure model uses the information.

As far as record numbers, the main effect of weather was significant to the number of records (F=2.7; p=0.044), with C significantly different than NH and H days (both p<0.001) and NC significantly different from NH and H (both p<0.048). Although the weather-within-study interaction was significant (F=2.7; p<0.001), there were only two significant within-study differences noted considering least-squared means, LAE/LAH (NC:NH; p=0.023) and NHA (C:NH; p=0.025).

Precipitation effects

The last characteristic listed in Table 3 is precipitation, divided into three classes: none, trace (>0 to <0.5 in), and ≥0.5 in. There is a statistically significant study difference according to the contingency-table test for the distribution of the three precipitation classes (χ2=560; p<0.001). The California studies (CAA, CAC/CAY, LAE/LAH) and DEN contained considerably more days without rain (about 90% of all days) when compared with the other studies containing precipitation information (about 75% of days where no rain was recorded).

Precipitation considered as a main effect does not significantly contribute to the variation observed in the number of recorded events per day for each class in the GLM/ANOVA (F=2.3; p=0.102) and none of the pairwise comparisons were significantly different. However, within-study differences for precipitation were significant (F=2.4; p=0.012), but least-squared means were significantly different only for the CIN study (Trace: ≥0.5 in; p=0.001).

It should be noted that another precipitation metric is available in CHAD: hours of precipitation experienced for the activity pattern day. This metric is closely related to the precipitation amount that was analyzed (as classes) and was linearly related to the precipitation duration (R2=0.50; p<0.001); so we did no further analysis of this other precipitation indicator as it would give essentially the same result.

Other attribute effects

Although not listed in Table 3, we also evaluated whether the distribution of educational levels (see Table A1) and housing types (see Table A2) varied significantly by study. Each of these attributes was significant based upon the contingency-table results (χ2's=47–190; p<0.001). Education level was significant only in explaining the number of records as a main effect, while housing type was not when considered for both main and interaction effects. Neither of these variables added enough explanatory power to warrant inclusion in any further record of number of evaluations.

In summarizing this subsection of the paper, every attribute and characteristic tested varied significantly in their relative distribution in CHAD studies according to a contingency-table approach — some of which was not entirely unexpected due to the study design — and some attributes contained significant study-related differences in explaining record numbers according to the GLM/ANOVA procedure. That approach included as independent variables all of the attributes given to determine if they contribute significantly in “explaining” total and model variance in the dependent variable (number of events recorded per day) vis-a-vis the entire set of attributes/characteristics. While the resulting predictive models were not strong — the two separate models explained only around 35–40% of the variability in events recorded — each were statistically significant (F=72–95; both p<0.001). In addition, all of the attributes/characteristics listed in Table 3 contribute significantly to the model (F-tests vary between 2.4 and 105; all p<0.050) except precipitation. The GLM/ANOVA results remain essentially the same if precipitation is removed from the model and reanalyzed (not shown).

The largest contributing factor determined by the ANOVA is by far the main effects of study itself (F=122), followed by age group (F=22) and day type (F=14). The remaining attributes/characteristics and their interaction within study follow closely behind. The main conclusion to be drawn from the analyses presented so far is that there are distinct differences regarding the number of events recorded per day among the studies themselves for gender, age, ethnicity, day type, and weather classifications; but, in general, the differences within studies remain limited, based on the least-squared mean comparisons. This means that following study aggregation, and when considering only main effects, significance can be assigned where significance may not have existed previously for a given category either based on the increased sample size or perhaps from the effect of an “influential” study (that may contain either “true” effects or “biased” effects). It should be noted that these issues also extend beyond the record number parameter and may in fact play a role in other CHAD parameters when considered collectively.

In every case, differences in the relative distribution of attributes across studies indicate how much data are available from any one study in a “pooled” sampling bin to be used as input into a time-series exposure model. The exposure modeler must be aware that when a particular cohort is established, the data for it may come preponderantly from only one or two studies. If those studies do not reasonably match the population distribution for an area being analyzed in the important attributes/characteristics of interest, then the cohorts developed may not be appropriate for the task. Exposure modelers using CHAD — or any other activity database for that matter — should investigate this “input” issue before undertaking a major modeling effort.

Analysis of the time spent in aggregate microenvironments by attribute/characteristic

While CHAD contains information on 144 activity codes and 115 location codes, no one study used them all. Thus, to compare studies on the choices people make over time, a “common denominator” of major activities and locations has to be first established. As mentioned, we focus on locational decisions rather than on activity decisions in this paper. Typically five or six aggregate locations are analyzed or modeled: the home, workplace, bar/restaurants, all other indoor places, motor vehicles, and outdoors (e.g., Klepeis et al., 2001). On occasion, time spent in gas stations is included as a separate location, but most often it is put into the outdoor category. Based on our previous work (McCurdy and Graham, 2003), we focus on only three locations: outdoors, indoors, and motor vehicles. Most of our analyses are done on the outdoor location because of its relatively high level of inter- and intraindividual variability. Descriptive statistics for indoor and motor vehicle time for selected attributes are presented in the next section, but these data are not as thoroughly analyzed due to the zero-sum nature of the time accounting (total time=indoor time+motor vehicle time+outdoor time). Doing so would be statistically redundant (i.e., there are significant negative correlations, p<0.001, among all time pairings).

Time spent outdoors

Table 4 presents descriptive data on outdoor time organized by the same attributes and characteristics used before. It only includes data for people who spend >0 time outdoors; they are known as habitués (and also only those individuals with ≥30 records). The last column of the table shows the proportion of the attribute class who is an outdoor habitué — or participant — on the activity pattern day(s) coded. In general, habitués appear to be about 60–70% of each attribute class. The other entries in Table 4 are the same as described earlier.

Table 4 Daily time spent outdoors for selected attributes and factors (habitués only, daily records >29).a

A GLM/ANOVA model similar to that above for determining influential attributes on number of records was constructed for the time spent outdoors using age group, gender, ethnicity, day-type, weather, and precipitation as independent variables. Time spent outdoors did not fit a normal or lognormal distribution well. Since time spent outdoors fit a lognormal distribution better (DN=0.05 for lognormal as opposed to DN=0.18 for normal), the natural logarithm of outdoor time was used in the statistical analysis.

“Study” was not used as an independent variable in the model since, considering the distribution of several parameters discussed above, differences among studies would automatically be statistically significant due to the unequal distribution of attributes that may affect time spent outdoors. For example, based on the age distribution, the BAL study would certainly stand out from the others due to the fact that all are within one age category (>64 years) that contains some of the lowest time spent outdoors. Only the main effects of the attributes were evaluated since “study” was not included in the model. Each of the personal attributes and weather factors were determined to be significant in explaining variation in time spent outdoors (all p<0.001). In general, the most important variables in the model in explaining time spent outdoors are gender, weather, day type, and precipitation. Further, more detailed analyses were conducted on each of the attributes individually using a K–S test as follows.

Overall, there is a gender difference in the time spent outdoors using the K–S test (DN=0.13; χ2=7.79; p<0.001). There is also a significant difference when comparing most age groups to one another, but not all.Footnote 9 The time-spent-outdoors distributions were indistinguishable for the following contiguous age cohorts using the K–S test with a p<0.050:

  1. 1

    All possible pairs of babies: <1 month, 1–2 months, 3–5 months, 6–11 months, and 1 year (DN all >0.08; χ2<1.10; p>0.177).

  2. 2

    16–17 years versus 18–20 years (DN=0.06; χ2=0.84; p>0.482).Footnote 10

  3. 3

    18–20 years with all older age cohorts (DN all >0.02; χ2<0.73; p>0.655) and children >1 year (DN all >0.13; χ2<1.25; p>0.087).

  4. 4

    21–44 years with 45–64 years (DN =0.03; χ2=1.17; p=0.129).Footnote 11

  5. 5

    45–64 years with 65+(DN=0.04; χ2=1.07; p=0.202).

In addition, babies <6 months were generally indistinguishable from cohorts >2 years, among a few others. Given these results, we analyzed whether there were gender differences by age cohort. Descriptive statistics and K–S test results are shown in Table 5 for this analysis. As seen, there were consistently no differences in the distributions of time spent outdoors by gender until the 6–10-year-old age category is reached, and following this age, significant differences are the rule. This finding for children is consistent with the only exercise physiology article that we could find that presented data on the location and intensity of physical activity, instead of focusing exclusively on activity levelsFootnote 12 (Baranowski et al., 1993).

Table 5 Comparison of daily time spent outdoors considering gender and age cohort (habitués only, daily records >29).a

We then analyzed whether there were differences in the time spent outdoors by age cohort when gender is controlled (agegender). As expected, given the preceding two results, there are statistically significant differences for contiguous agegender groups (and for most of the possible pairs of these groups). The four baby age groups listed above in #1 are indistinguishable for both females and males independently (DN all <0.06; χ2 all <1.14; p>0.150). Time spent outdoors is also indistinguishable for all possible cohort pairings of female children <12 months with all ages ≥2 years, except for both 0–1 and 6–11 months versus 3–5 years and 6–10 years (DN all >0.25; χ2 all >1.53; p<0.019). For females, outdoor time can be distinguished for all possible cohort pairs between the ages of 1 and 15 years except for 2 versus 3–5, 6–10, and 11–15 years (DN all <0.11; χ2 all <1.34; p>0.056) and 3–5 versus 6–11 years (DN=0.03; χ2=0.54; p=0.929). None of the other older cohort pairs from age 16 and beyond are distinguishable except for one: 21–45 versus 65+ years (DN=0.07; χ2=1.41; p=0.038).

For males, time spent outdoors is indistinguishable for all possible cohort pairings of children <6 months with all ages ≥2 years, except for both 0–1 and 6–11 months versus 3–5 and 6–10 years (DN all >0.28; χ2 all >1.40; p<0.041) and for the newborns versus 2–3 and 3–6 years (DN all >0.25; χ2 all >1.38; p<0.045). Significant age-related differences in time spent outdoors occurs earlier than that of females; all possible cohort pairs of males age between the ages of 6 months and 15 years, except for 2 versus 3–5 and 16–17 years (DN all <0.10; χ2 all <1.17; p>0.130), 3–5 versus 11–15 and 16–17 years (DN all <0.09; χ2 all <1.22; p>0.104), and 11–15 versus 16–17 years (DN=0.11; χ2=1.30; p=0.070). Just as with the females, none of the other older cohort pairs from age 16 and beyond are distinguishable except for one: 21–45 versus 65+ years (DN=0.10; χ2=1.81; p=0.002).

Continuing with K–S tests of the attribute class, the distributions of time spent outdoors were significantly different for all possible ethnic pairs except Black-Hispanic (DN=0.04; χ2=0.86; p=0.450), Black-Other (DN=0.08; χ2=1.19; p=0.118), and Hispanic-Other (DN=0.10; χ2=1.27; p=0.081). For day type, the K–S test indicates that all possible pairs have significantly different distributions, as is hinted at by the descriptive statistics presented in the table. On average, people spend more time outdoors on weekends and less on weekdays, whether they work or not, and this has been generally noted since the early time budget studies on human activities (Chapin, 1974). Of the six possible weather pairs, the K–S test indicates that all have significant impacts on outdoor time (DN all >0.05; χ2 all >1.62; all p<0.011). While Chapin (1974) and others have long noted a geographical and seasonal effect on the allocation of “discretionary” time, the precise metrics used here are unique and impossible to put into direct perspective. Tsang and Klepeis (1996) certainly showed a seasonal effect in the time spent outdoors, and their pattern of mean time outdoors follows ours if Table 4 can be read as winter, spring, fall, and summer (although our weather classifications were not derived on a calendar-year basis). Finally, there was a significant difference in the outdoor time distribution for the none:trace and none:≥0.5 in precipitation pairs using the K–S test, but there was not for the trace:≥0.5 in pair (DN=0.06; χ2=1.18; p=0.123), indicating that time spent outdoors is affected by both trace and significant rain-/snow-fall. To our knowledge, this phenomenon has only been tested in our previous paper (McCurdy and Graham, 2003), as precipitation metrics generally are not available or not reported, with only the none:≥0.5 in pairing significantly different.

CHAD contains a distribution of METSFootnote 13 values for each activity contained in the database. See McCurdy (2000) for a full explanation of how METS distributions are obtained and used in an exposure model. Normally, the modeler would sample from this distribution to assign an event-specific value for each activity as the simulated person goes through their day. We developed an “average” daily index of physical activity level (PAI) for each individual activity-day in CHAD by summing the median time-weighted METS value from each activity distribution and dividing by 1440, the number of minutes in a day. The resulting values generally vary from 1.5 to 2.25 per individual-day, with active people having a higher PAI. PAI was found in McCurdy and Graham (2003) to be a statistically significant predictor of time spent outdoors in the cohort analyzed.

Using multiple linear regression analyses, the most robust equation for time spent outdoors by habitués follows (Ra2=0.19; SEEst=1.22), with intercept and each individual regression coefficient significant (p<0.001):

In (OUT) = + 0.91 + 0.02 times T − 0.18 × PR + 0.19 × D + 0.93 times PA + 0.11 × A − 0.42 × G + 0.08 × E

where ln(OUT) is the natural logarithm of the time spent outdoors (min/day), T the daily maximum temperature (°F), PR the precipitation (0=none; 1=trace to <0.5 in; 2=≥0.5 in), D the day type (1=WDW; 2=WEW; 3=WDNW; 4=WENW), PA the physical activity index (unitless, median), A the age group (1=≥72 years; 2=16–54 years; 3=55–71 years; 4=≤2 years; 5=11–15 years; 6=3–10 years), G gender (1=♂ 2=♀), and E ethnicity (1=Asian; 2=Hispanic; 3=Black; 4=White).

Based upon the partial sums of squares, the following order is determined for most important to least important in explaining time spent outdoors: PAI, temperature, gender, day type, age, precipitation, ethnicity. The regression results presented here are similar to coefficients determined for a narrow cross-section of the CHAD database (McCurdy and Graham, 2003), although results cannot be directly compared due to the different variables used (previously age, gender, and ethnicity were controlled for) and certain levels not used (day type included only WDW and “other” in the earlier work). As we found earlier (see McCurdy and Graham, 2003), dropping PAI from the initial equation inflates the constant term (from 0.91 to 2.53) and lowers the Ra2 (0.14), which is not desired. Thus, PAI is a very important determinant of outdoor time. A similar model was constructed using the weather variable (i.e., 1=C, 2=NC, 3=NH, 4=H), instead of using temperature as a continuous variable yielding nearly identical model results (Ra2=0.18) and only slightly different coefficients (1.54, 0.12, −0.43, 0.27, 0.92, 0.19, −0.20, 0.07 for the intercept, age group, gender, weather, PAI, day type, precipitation, and ethnicity, respectively). Furthermore, the weather variable was investigated in this model for optimization (yielding the highest Ra2), whereas the following was determined: altering the C/NC cutoff to 59°F (rather than 54 °F) and including June in the months July–September did increase the Ra2 to 0.19.

Time spent indoors

Turning to Table 6, the most time per day is spent indoors for all ages/both genders/all ethnic groups/etc. (Johnson, 1987; Schwab et al., 1990, 1992; Wiley et al., 1991; Klepeis et al., 1996, 2001; Tsang and Klepeis, 1996, 1997). On average, total indoor time approaches 21–22 h/day. As people generally sleep and undertake other personal care activities (e.g., eat, bathe, groom) indoors mostly at home, there is not much variability among population subgroups in this regard. The COV for indoor time therefore is relatively low for both individuals and cohorts: around 10–15% on a cohort basis.

Table 6 Daily time spent indoors for selected attributes and factors (habitués only, daily records >29).a

However, we still found that all of the attributes described above were statistically significant explanatory parameters (all p<0.001), regarding total daily time spent indoors using the GLM/ANOVA procedure, although the overall model explained only half the variance of that constructed for the outdoor time. Again, only the main effects of the attributes (i.e., age, gender, ethnicity, day type, weather, and precipitation) were assessed, with gender, weather and day type as the most significant attributes in the model.

Time spent in motor vehicles

With respect to time spent in motor vehicles (Table 7), the median values are all <1.3 h/day, but the COVs for all attributes are quite high: about 90–100%. The means generally are 30–40% higher than the same-class medians, indicating a lognormal, or other long-tailed, distribution. We found statistically significant effects using the GLM/ANOVA procedure for each of the parameters used except for precipitation (p=0.347), as expected and consistent with the analysis by McCurdy and Graham (2003), although the weather factor was also found not to be significant in their analysis of a longitudinal subject using the K–S test. Employing this test on all possible pairwise comparisons here did not yield any significant differences among the weather groupings (all p>0.100).

Table 7 Daily time spent in motor vehicles for selected attributes and factors (habitués only, daily records >29).a

Summary and conclusions

The main focus of this paper is to analyze key attributes of an activity factors typology to determine which ones are important when developing exposure cohorts or when simulating individuals (using “cohort-type” characteristics) to be used in an event-oriented exposure model. Attributes are needed to relate cross-sectional data to an inherently longitudinal problem, and to appropriately represent both intra- and interindividual variability within each developed cohort or simulated individual. Most of EPA's recent exposure models are of this type, as are numerous others used in regulatory decision-making.

As discussed, all analyses were conducted on human activity data containing ≥30 events recorded per day. We consider this to be a minimally acceptable record length, as it is only 25% more than the smallest possible. It should be noted that including person-days of activity having <30 records does not markedly affect any of the results presented herein (analyses not reported). Given this, it is probable that using all of the activity data in CHAD will not alter exposure estimates significantly; however, we recommend that these criteria be utilized to screen for potentially “complacent” diaries, where averaging of time spent in locations/performing activities is likely more prevalent.

On the other hand, the fact that studies comprising the CHAD database are quite different in many important attributes — age, gender, season of the year, method of gathering data, and so on — indicates that studies will be differentially represented in any model whatever cohort structure is decided upon (including the one that selects for simply ≥30 records). This limitation of CHAD will always be present and will not be reduced until additional suitable data are generated and incorporated into the database. It can be an additional modeling flaw if the CHAD database is used as it is to make inferences concerning the entire population; this will not be an issue if US Census data are used to control the sample size and makeup of the attributes chosen. However, if the attributes are chosen on a highly detailed basis, say using ethnicity, education, and work status along with age and gender, the modeler will quickly run into a “small sample size” problem associated with the number of diaries available for a cohort or simulated individual. The information presented in Tables A1, A2, A3 and A4 indicates precisely how many records will result for individual and cumulative possible attributes; as noted, only a handful of records exist for some detailed cohorts. This is not to say that using all of CHAD data weighted by the US Census for a given population will be truly representative of that population's activities, but that selecting for several sample size-limiting attributes will only lead to the modeling of these particular individuals within CHAD.

Based mostly on physiology and metabolism considerations (McCurdy, 2000), we recommend that exposure modelers use age and gender as “first-order” attributes to define cohorts or when simulating individuals. After that, our analyses of the time spent outdoors — an important location for receiving high airborne exposures — indicates that PAI level, daily maximum temperature (or weather classification), and day type (either weekend/weekday dichotomy or, better, a combination of day-of-the-week and working status) are important attributes in the order presented. Precipitation and ethnicity, while statistically significant for outdoor time, are less important attributes (and precipitation does not affect time spent in motor vehicles). The same attributes are important for explaining variability in time spent indoors.

These results must be tempered by the fact that none of the regression or GLM equations developed here explain large amounts of the variance associated with any of the dependent locational variables tested. There are a number of reasons for that. One is that both techniques assume a linear relationship between the set of independent variables and the dependent variable. Time spent outdoors (or indoors) is not linear with age, nor is the amount of relative variability consistent across all age levels. The aged spend the least amount of time outdoors, while children and adolescents spend the most (but not babies). Another reason is that none of the variables available to us capture the most important attributes of why people spend time the way they do: lifestyle and life stages considerations. Lifestyle usually refers to a person's use of leisure time (Henderson et al., 1996), while life stages are mainly associated with one's role within the family as it changes over time (Altergott and McCreedy, 1993). Life stages are often called life cycle changes (Zemel et al., 1996). Active people allocate their time differently than sedentary persons; people with children spend time differently than those without, even of the same age and gender. None of the studies in CHAD provide data on these attributes; thus they could not be tested. Age, gender, and PAI indirectly capture aspects of lifestyle and life stages considerations, but obviously not enough to explain differences in how people allocated time to different (and general) locations. Furthermore, the day-type parameter developed here could be considered somewhat incomplete. There are several other conditions that would dictate human activities that were not considered in the day-type analyses due to lack of information, such as illness (not exclusive to one's own health status, whether temporary or chronic condition), holidays and vacations, business travel, etc.

These shortcomings in human activity databases useful for exposure modeling can only be overcome by developing new studies that ask better questions about lifestyle and life stage. We hope that planners of new studies that gather human activity pattern information will do a better job of understanding the lifestyle and life cycle context of the people chosen to participate in their surveys and studies. At this point, however, CHAD remains as the most comprehensive tool for cohort development and simulating individuals within exposure modeling (and other uses) with essentially no other alternative, and considering the results presented here will strengthen its current and future utilization.

Disclaimer

This work has been funded wholly by the United States Environmental Protection Agency. It has been subjected to Agency review and approved for publication. Mention of trade names or commercial products does not constitute an endorsement of recommendations for use.

Notes

  1. 1.

    These models include, among others, pNEM (Johnson, 1995; McCurdy, 1995); HAPEM (Glen and Shadwick, 1998); SHEDS (Burke et al., 2001; Zartarian et al., 2002); and TRIM.Expo (APEX 3.0) (Richmond et al., 2002).

  2. 2.

    The act of simulating individuals by assembling diaries based on selected attributes is a similar methodology to creating/extracting cohorts since one uses typical “cohort-type” characteristics (such as age, gender, etc.). “Cohort” in this paper refers to the result of the grouping process.

  3. 3.

    Six studies have more than 1 day of activity pattern data per person. Rather than to continually use the phrase “person-days” of data when discussing specific attributes, we use days, people, or subjects. The basis for all of the data is a person-day, and about 23% of the data available for an attribute may include more than 1 day of data from a single individual.

  4. 4.

    The most recent internet version of CHAD has 1.1% fewer days (n=22,716) due to the removal of diaries that either had missing age/gender information or had “excessive” missing location and activity data. However, the downloadable Microsoft® Access version of CHAD, which many exposure models use as an input file, still contains the original CHAD data set. The small reduction in person-days of information that will be present in CHAD/Access after 2004 will not materially affect any of the results discussed here.

  5. 5.

    None of the national studies included non-English-speaking citizens or residents of Alaska and Hawaii, or Americans living abroad.

  6. 6.

    Standard Industrial Classification (SIC) codes are used to categorize all businesses and industries into classes. The two-digit codes are general in nature, such as “mining/forestry/fishing” and “construction”. The system was developed by the US Bureau of the Budget and is used by all branches of the US government for job and economic classification purposes.

  7. 7.

    Only the NHAPS studies in CHAD comes close to being able to provide inferential activity pattern information for the US population as a whole; however, they do not apply to Alaska and Hawaii residents or people who cannot speak English; see Klepeis et al. (2001).

  8. 8.

    The Census Bureau now allows people to say they have more than one race, an option not allowed by studies comprising CHAD. Hispanics and Latinos may or may not be coded in the US Census as one race; so its data are not comparable with CHAD’s, which makes CHAD’s “Other” classification problematic from a comparative viewpoint also.

  9. 9.

    Using the actual age boundaries used in Table 4 made for awkward-looking ranges when typed in text. Therefore, all upper age ranges were rounded off to the nearest age included in the range. For example, for purposes of this subsection, 16–17 means 16 to <18-year olds, and 1-year olds are 1 to <2 years.

  10. 10.

    And all older age cohorts except for 21–44 years, which probably is a statistical artifact.

  11. 11.

    But not the 65–100 years cohort!; another probable artifact.

  12. 12.

    It is interesting that human activity pattern analysts emphasize location and not levels, while the opposite is true for exercise epidemiologists and physiologists. The latter disciplines do state that most active play occurs outdoors in both genders (e.g., Sallis et al., 1990; Baranowski et al., 1993) and that ♂ children 3 years or older (and adolescents) are more active than ♀ children and adolescents of the same age (e.g., Armstrong et al., 1990, 1991; Sallis et al., 1996; Beunen and Thomis, 1999; Bradley et al., 2000). In general, they also find ethnic and socioeconomic “effects” in active physical activity (e.g., Sallis et al., 1996), similar to our attributes classes.

  13. 13.

    Metabolic equivalents of work sometimes called metabolic equivalents of tasks. It is a unitless ratio of the energy expended (EE) performing an individual task to the person's basal metabolic rate (BMR). This metric better accounts for differences seen in EE by age and gender, since both parameters go into the BMR denominator.

References

  1. Altergott K., and McCreedy C . Gender and family status across the life course: constraints on five types of leisure. Soc Leisure 1993: 16: 151–180.

    Google Scholar 

  2. Armstrong N., Balding J., Gentle P., Williams J., and Kirby B . Peak oxygen uptake and physical activity in 11-to-16-year olds. Pediatr Exer Sci 1990: 2: 349–358.

    Article  Google Scholar 

  3. Armstrong N., Williams J., Balding J., Gentle P., and Kirby B . Cardiopulmonary fitness, physical activity patterns, and selected coronary risk factor variables in 11- to 16-year olds. Pediatr Exer Sci 1991: 3: 219–228.

    Article  Google Scholar 

  4. Baranowski T., Thompson W.O., DuRant R.H., Baranowski J., and Puhl J . Observations on physical activity in physical locations: age, gender, ethnicity, and month effects. Res Q Exer Sports 1993: 64: 127–133.

    CAS  Article  Google Scholar 

  5. Beunen G., and Thomis M . Genetic determinants of sports participation and daily physical activity. Int J Obesity 1999: 23(S3): 55–63.

    Article  Google Scholar 

  6. Bradley C.B., McMurray R.G., Harrell J.S., and Deng S . Changes in common activities of 3rd through 10th graders: the CHIC Study. Med Sci Sports Exer 2000: 32: 2071–2078.

    CAS  Article  Google Scholar 

  7. Burke J.M., Zufall M.J., and Özkaynak H . A population exposure model for particulate matter: case study results for PM2.5 in Philadelphia, PA. J Expos Anal Environ Epidemiol 2001: 11: 470–489.

    CAS  Article  Google Scholar 

  8. Chapin Jr, F.S. Human Activity Patterns in the City. John Wiley & Sons, New York, 1974.

    Google Scholar 

  9. Cody R.P., and Smith J.K. Applied Statistics and the SAS Programming Language. Prentice Hall, Inc., Upper Saddle River, NJ, 1997.

    Google Scholar 

  10. Glen G., and Shadwick D. Final Technical Report on the Analysis of Carbon Monoxide Exposures for Fourteen Cities Using HAPEM-MS3. ManTech Environmental Technology, Research Triangle Park, NC, 1998.

    Google Scholar 

  11. Henderson K.A., Bialeschki M.D., Shaw S.M., and Freysinger V.J . Both Gains and Gaps: Feminist Perspectives on Women's Leisure. Venture Publishing, State College, PA, 1996.

    Google Scholar 

  12. Johnson T . A methodology for estimating carbon monoxide and resulting carboxyhemoglobin levels in Denver, Colorado. In: Starks T.H. (Ed.). Proceedings of the Research Planning Conference on Human Activity Patterns. US Environmental Protection Agency, Las Vegas, 1989, pp. 17-1–11-22 (EPA-600/4-89-004).

    Google Scholar 

  13. Johnson T . Recent advances in the estimation of population exposure to mobile source pollutants. J Expos Anal Environ Epidemiol 1995: 5: 551–571.

    CAS  Google Scholar 

  14. Johnson T, Capel J., and McCoy M . An Analysis of Ten Time/Activity Databases: The Effects of Selected Factors on the Apportionment of Time Among Microenvironments and Breathing Rate Categories. IT Corporation, Durham, NC, 1995.

    Google Scholar 

  15. Johnson T., McCoy M., Capel J., Wijnberg L., and Ollison W . A comparison of ten time/activity databases: effects of geographic location, temperature, demographics group, and diary recall method. Paper presented at AWMA International Conference on Tropospheric Ozone Nonattainment and Design Value Issues, October 1992.

  16. Johnson T., Weaver M., and Capel J . Development and Demonstration of the Probabilistic NAAOS Exposure Model for Application to Hazardous Air Pollutants (pNEM/HAP). IT Corporation, Cary, NC, 1997.

    Google Scholar 

  17. Klepeis N.E., Nelson W.C., Ott W.R., Robinson J.P., Tsang A.M., Switzer P., Behar J.V., Hern S.C., and Engelmann W.H . The National Human Activity Pattern Survey (NHAPS): a resource for assessing exposure to environmental pollutants. J Expos Anal Environ Epidemiol 2001: 11: 231–252.

    CAS  Article  Google Scholar 

  18. Klepeis N., Tsang A., and Behar J.V. Analysis of the National Human Activity Pattern Survey (NHAPS) Respondents from a Standpoint of Exposure Assessment. National Exposure Research Laboratory, US Environmental Protection Agency Las Vegas, NV, 1996 (EPA-600-R-96/074).

  19. Lurmann F.W., Winer A.M., and Colomé S.D . Development and application of a new regional human exposure (REHEX) model. In: Total Exposure Assessment Methodology. Air & Waste Management Association, Pittsburgh, 1990, pp. 478–498.

    Google Scholar 

  20. McCurdy T . Estimating human exposure to selected motor vehicle pollutants using the NEM series of models: lessons to be learned. J Expos Anal Environ Epidemiol 1995: 5: 533–550.

    CAS  Google Scholar 

  21. McCurdy T . Conceptual basis for multi-route intake dose modeling using an energy expenditure approach. J Expos Anal Environ Epidem 2000: 10: 86–97.

    CAS  Article  Google Scholar 

  22. McCurdy T., Glen G., Smith L., and Lakkadi Y . The National Exposure Research Laboratory's Consolidated Human Activity Database. J Expos Anal Environ Epidemiol 2000: 10: 566–578.

    CAS  Article  Google Scholar 

  23. McCurdy T., and Graham S.E . Using human activity data in exposure models: analysis of discriminating factors. J Expos Anal Environ Epidemiol 2003 (accepted).

  24. Richmond H.M., Palma T., Langstaff J., McCurdy T., Glenn G., and Smith L . Further refinements and testing of APEX (3.0): EPA's population exposure model for criteria and air toxic inhalation exposures. Joint meeting of the Society of Exposure Analysis and International Society of Environmental Epidemiology, Vancouver, Canada, 11–15 August, 2000.

  25. Risk Forum. Draft Guidance on Selecting Appropriate Age Groups for Assessing Childhood Exposures to Environmental Contaminants. US Environmental Protection Agency, Washington, DC, 2000.

  26. Sallis J.F., Hovell M.F., Hofstetter C.R., Elder J.P., Hackley M., Caspersen C.J., and Powell K.E . Distance between homes and exercise facilities related to frequency of exercise among San Diego residents. Public Health Rep 1990: 105: 179–185.

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Sallis J.F., Zakarian J.M., Hovell M.F., and Hofstetter C.R . Ethnic, socioeconomic and sex differences in physical activity among adolescents. J Clin Epidemiol 1996: 49: 125–134.

    CAS  Article  Google Scholar 

  28. SAS Institute. SAS/STAT Software (Version 8.02). SAS Institute, Cary, NC, 2002.

  29. Schwab M., Colome´ S.D., Spengler J.D., Ryan P.B., and Billick I.H . Activity patterns applied to pollutant exposure assessment: data from a personal monitoring study in Los Angeles. Tox Ind Health 1990: 6: 517–532.

    CAS  Google Scholar 

  30. Schwab M., McDermott A., and Spengler J.D . Using longitudinal data to understand children's activity patterns in an exposure context: Data from the Kanawha County health study. Environ Inter 1992: 18: 173–189.

    CAS  Article  Google Scholar 

  31. Tsang A.M., and Klepeis N.E . Descriptive Statistics Tables from a Detailed Analysis of the National Human Activity Pattern Survey (NHAPS) Data. US Environmental Protection Agency, Las Vegas, 1996, (EPA/600/R-96/148).

    Google Scholar 

  32. Tsang A.M., and Klepeis N.E . Three Telephone Surveys of Human Activities in California: The 1992–94 National Human Activity Pattern Survey; The 1987–88 California Activity Pattern Survey of Adults and Teenagers; and the 1989–90 California Activity Pattern Survey of Children. US Environmental Protection Agency, Las Vegas, NV, 1997.

    Google Scholar 

  33. US Census. “Quick Tables” from http://factfinder.census.gov., 2002 (obtained in December 2002).

  34. Wiley J.A., Robinson J.P., Cheng Y.-T., Plazza T., Stork L., and Pladsen K . Study of Children's Activity Patterns. Berkeley CA: Survey Research Center, University of California. 1991.

  35. Xue J., McCurdy T., Spengler J., and Özkaynak H . Intra- and inter-individual activity considerations in human exposure modeling. J Expos Anal Environ Epidemiol 2003 (submitted).

  36. Zartarian V.G., Xue J., Özkaynak H., Glen G., Stallings C., Smith L., Dang W., Cook N., Aviado D., Mostaghimi S., and Chen J . Technical manual: using SHEDS-Wood for the assessment of children's exposure and dose from treated wood preservatives on playsets and residential decks. Prepared for August 30, 2002 EPA/OPP FIFRA SAP meeting.

  37. Zemel B.S., Ulijaszek S.J., and Leonard W. Energetics, lifestyles, and nutritional adaptation: an introduction. Am J Hum Biol 1996: 8: 141–142.

    Article  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers and two of our EPA colleagues, Drs. Valerie Zartarian and Harvey Richmond, for helpful suggestions regarding the paper's content and format. Responding to all of the issues and comments that arose hopefully improved the clarity, presentation, and applicability of this article.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stephen E Graham.

Appendix

Appendix

A complete summary of the number of person-days of data available in CHAD for all relevant attributes, characteristics, and factors central to the activity typology in addition to other personal indices is provided in Table 1, A2, A3 and A4.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Graham, S., McCurdy, T. Developing meaningful cohorts for human exposure models. J Expo Sci Environ Epidemiol 14, 23–43 (2004). https://doi.org/10.1038/sj.jea.7500293

Download citation

Keywords

  • activity data
  • cohorts
  • exposure modeling
  • cross-sectional
  • outdoor time
  • indoor time
  • motor vehicle time.

Further reading

Search