Main

Recent increases in the prevalence of childhood obesity in the United States and in many other countries (1,2) are a major public health concern. Childhood obesity tracks into adulthood (3,4) and is associated with increased risk of psychological and respiratory conditions and with cardiovascular disease, cancer, diabetes, and other chronic conditions later in life (5,6). There is variation in child obesity rates by race/ethnicity, age, income, and parental education, and the relationship of sociodemographic factors to child obesity is complex. For example, although obesity risk tends to be inversely associated with income and parental education among non-Hispanic white children, these associations are not observed consistently among Mexican American and non-Hispanic black children, aged 2–19 y, and may vary by gender (7). At preschool ages, obesity rates in the United States are highest among American Indians/Alaskan natives and Hispanics (8). Given the limited success of treatments (9) and the health consequences of childhood obesity, prevention in early life is key. Thus, there is a need for timely and reliable data sources for tracking growth in early childhood and identifying modifiable risk factors for childhood obesity.

In the United States, the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) serves children up to 5 y of age and pregnant, breastfeeding, or postpartum women who are classified as having low income (<185% of the federal poverty level) and at nutritional risk (10). WIC serves nearly 50% of all infants and ~25% of all preschool-aged children in the United States (10). Regulations require that children who participate in WIC have their height and weight measured at least every 6 mo. These measurements are used by WIC to assess program eligibility and to screen and intervene with families regarding potential health and nutritional risks such as excess weight.

The longitudinal anthropometric measurements of young children in the WIC program represent a large data resource for monitoring trends in childhood obesity and identifying risk factors. They have been used by the Centers for Disease Control and Prevention (CDC)’s Pediatric Nutrition Surveillance System (http://www.cdc.gov/pednss/what_is/pednss/index.htm) to monitor trends in child overweight, risk of overweight, and other health indicators. WIC data have also been targeted for wider release as part of the Community Health Data Initiative of the U.S. Department of Health and Human Services and Institute of Medicine (http://www.hhs.gov/open/datasets/about.html), whose objective is to provide greater public access to health data. Given their nationwide coverage of an at-risk population and high concentrations of individuals in diverse geographic areas, anthropometric data in WIC administrative data systems also represent a largely untapped inexpensive resource for epidemiological research on childhood obesity, such as research on the impact of community food and physical activity environments (11,12).

Although the accuracy of height and weight measurements collected by WIC is critical to the success of current and potential future uses of WIC data for monitoring and research purposes, to our knowledge, the validity of these measurements has not been investigated. The objective of this study was to examine the accuracy of height, weight, and body mass index (BMI) measurements derived from WIC administrative data systems for children aged 2–5 y. We sought to determine whether such measurements are sufficiently accurate to enable WIC administrative data systems to join other health data sources as valid and useful sources of information for tracking and investigating childhood obesity.

Results

As shown in Figure 1 , caregivers of 382 children who appeared to meet the age criteria were approached, and consent was obtained for 367 (96%) children to participate in the study. Of these, 350 children met the age criteria (1.9–5 y). WIC measurements recorded within 30 d of measurement by research staff were obtained from 329 of these children, and 287 of these children cooperated with research staff to provide valid height and weight measurements (allowed their shoes and outerwear to be removed and provided at least two height and two weight measurements).

Figure 1
figure 1

Flowchart of participant inclusion. *Refers to children who allowed their shoes and outerwear to be removed and provided at least two height and two weight measurements to research staff. WIC, Special Supplemental Nutrition Program for Women, Infants, and Children.

Table 1 summarizes characteristics of the children participating in the study. Boys and girls were about equally represented. About half of the children had parents who spoke Spanish as a preferred language, and 80% were Latino. These proportions are comparable to those for WIC families in California (13). BMI percentiles covered nearly the entire possible range, from 0.01 to 99.99.

Table 1 Characteristics of children participating in the study (N = 287)

Table 2 provides intraclass correlation coefficients (ICCs) estimating the inter-rater reliability of WIC measurements in the sample overall and for strata defined by sex, race/ethnicity, preferred language (of parent), WIC site, and source of WIC measurements (WIC clinic or health-care provider record), as well as by child height, weight, BMI, and BMI percentile. Height, weight, BMI measurements, and percentiles were all highly reliable in the overall sample, with ICCs ranging from 0.90 to 0.99. In most strata, height measurements were somewhat less reliable than weight measurements, BMI measurements were less reliable than height or weight, and height, weight, and BMI percentiles were less reliable than their corresponding raw values; African Americans and non-Hispanic whites were notable exceptions to these patterns. For almost all strata and measurements, ICCs were greater than 0.80, and most exceeded 0.90. Height measurements for African Americans and ­non-Hispanic whites and weight measurements for African Americans had lower reliability than for other race/ethnicities; this may have been a reflection of the clustering of these racial groups within certain clinics. In general, measurements from taller (≥100 cm) or heavier (≥16 kg) children were more reliable than measurements from shorter or lighter children; however, all ICCs in height and weight strata exceeded 0.80. The lowest ICCs were for BMI percentiles for children with BMIs ≥16.5 kg/m2 or ≥75th percentile, which were in the 0.50–0.59 range. These ICCs may have been reduced because of attenuation of between-person variation within subgroups. The reliability of WIC measurements taken from provider records was lower than that of in-clinic measurements, especially for height-for-age percentile (0.91 vs. 0.63), BMI (0.95 vs. 0.64), and BMI percentile (0.91 vs. 0.74).

Table 2 Intraclass correlation coefficients assessing inter-rater reliability of WIC usual care measurements, overall and by child factors and source

Limits of agreement (LOAs) for the overall sample are presented in Table 3 . About 95% of WIC height measurements were within ±2–3 cm of research protocol–measured height, 95% of WIC weights were within ±0.7–0.8 kg, and 95% of WIC BMIs were within about ±1 kg/m2. The 95% LOA for height percentile was wider than that for weight percentile; 95% of WIC BMI percentiles were within about ±20 percentile points of research measurement–based BMI percentile.

Table 3 Limits of agreement of research protocol and WIC usual care measurements and estimated mean bias of WIC usual care measurements, overall and by child factors and source

Estimates of the mean bias of WIC measurements are also provided in Table 3 . On average, WIC height measurements were 6 mm higher than the research protocol measurements, and WIC weight measurements were 54 g higher. As a percentage of total height or total weight, the mean error in WIC measurements was 0.6% and 0.4%, respectively. WIC BMI measurements were 0.15 kg/m2 lower than the research protocol BMIs on average; this translated into a mean downward bias of 2.3 percentage points in BMI percentile. Bias did not significantly vary by age, sex, race/ethnicity, or preferred language, nor by child height or weight. Differences in bias were observed based on child BMI and BMI percentile. Children with higher BMI or BMI percentile tended to have their heights more overestimated than children with lower BMI or BMI percentile, but their weight percentiles were less overestimated; as a result, their BMI percentiles tended to have more downward bias. Some differences in bias were also detected by source of WIC measurements (weight and weight percentile).

Table 4 provides weight status classifications of the children (normal/underweight vs. overweight/obese) based on research protocol measurements and WIC measurements. Only five children, i.e., 1.8% of the sample, met the criteria for underweight. Based on research protocol measurements, 39% (113/287) of the children were overweight or obese. Based on WIC measurements, 37% (105/287) would be classified as overweight or obese. Based on estimates from a mixed-effects logistic regression model accounting for between-clinic variance, 86% (95% confidence interval (CI): 72%–99%) of overweight or obese children (by research protocol classification) were correctly classified as such by WIC measurements (sensitivity). About 92% (95% CI: 88%–97%) of normal or underweight children were correctly classified as such by WIC measurements (specificity). The positive predictive value of a WIC classification as overweight/obese was 87% (95% CI: 80%–93%), and the negative predictive value was 91% (95% CI: 83%–99%). About 89% (95% CI: 82%–95%) of the children were similarly classified by research protocol and WIC measurements (concordance).

Table 4 Comparison of weight status classifications defined using research protocol measurements and WIC usual care measurements of 287 children

Discussion

Overall our findings suggest that height and weight measurements of children aged 2–5 y recorded by PHFE WIC staff and the BMIs and percentiles based on these measurements agree sufficiently well with “gold standard” research protocol measurements to support most research and monitoring purposes. ICCs for height, weight, BMI, and corresponding percentiles were all 0.89 or higher in the overall sample, which is in the “almost perfect” range according to the criteria of Landis and Koch (14) and the “substantial” reliability range according to the criteria of Shrout (15). On average, WIC height measurements were only 6 mm higher than those of research protocol measurements and WIC weights were only 50 g higher. BMI percentile was underestimated by about 2.3 percentage points on average, and WIC BMI percentiles were less reliable for children with high BMIs (BMI ≥16 kg/m2 or BMI percentile ≥75) than for children with low BMIs, with reliabilities of 0.50–0.59, which are in the “moderate” range according to Landis and Koch (14) and the “fair” range according to Shrout (15). The 95% LOAs for BMI percentile were fairly wide; however, the sensitivity and specificity of WIC measurements for classifying the children as overweight/obese or normal/underweight were high, at 86% and 92%, respectively, indicating that children were correctly classified as above or below the 85th BMI percentile with very good accuracy.

Of the measures we examined, BMI and BMI percentile are the quantities of greatest interest with regard to childhood obesity. BMI is the ratio of weight to height squared, and its reliability is a function of the reliability of the height and weight measurements. The work of Cronbach (16) and others (17) has shown that the reliability of a ratio, as measured by the ICC, is typically lower than the reliability of the numerator and denominator, and this was evident in our overall sample ICCs. We found that WIC height errors were slightly greater than weight errors when expressed as a percentage of research protocol values. A multiplicative error in height measurement is more influential than an equivalent multiplicative error in weight in the calculation of BMI, which uses an inverse squared term for height. As a result, WIC BMIs, on average, were slightly underestimated. Overall, height measurements were somewhat less reliable than weight measurements, which could be because of the greater difficulty of obtaining compliance from young children during height measurements, which generally require longer periods of standing still than do weight measurements.

Insomuch as other WIC programs utilize similar height and weight protocols, our findings may be generalizable to other WIC programs. Informal assessment of protocols in New York, Illinois, and Oklahoma suggest very similar protocols across regions, although in some cases measurements are taken to an eighth of an inch and an eighth of a pound, which may increase measurement precision compared to the tolerances of a quarter inch and a quarter pound used in California.

Some of the upward systematic error in WIC height and weight measurements may be due to deviation from the WIC protocol of removing shoes and outerwear prior to measurement. We observed that implementation of the WIC standardized measurement protocol varied somewhat by site, which may have reduced the reliability of the measurements. As expected, accuracy and reliability tended to be lower for measurements derived from health-care provider records as compared with in-clinic WIC measurements. Provider measurements were typically taken days or weeks earlier than research protocol measurements; hence some discrepancy may be due to child growth, which would not reflect a lack of accuracy per se. However, some discrepancies may have reflected inaccuracy due to lack of use of a standardized protocol or calibrated equipment or inadequate training of health workers, which have been reported as concerns for child height measurement in health provider settings (18). Hence, source of measurements should also be considered when using anthropometric measurements from WIC.

It is informative to compare our findings with the accuracy of child height and weight measurements reported by other authors in other settings. A study in the United Kingdom that evaluated the accuracy of height and weight measurements of children at ages 4–43 mo, taken from child health records, observed little systematic bias in these measurements but noted slight variations in accuracy between younger infants’ and older children’s measurements (19). We did not find differences by age in this group of children aged 2–5 y but did find somewhat lower ICCs for shorter (<100 cm) and lighter (<16 kg) children. A study of the accuracy of height measurements of children taken in primary care clinics in the United States reported that 30% of measurements were within 0.5 cm of correct height (18). We found that about 36% (103/287) of WIC height measurements were within this tolerance.

Data for this study were collected from seven PHFE WIC sites in southern California and thus may not fully generalize to other WIC sites. This study did not examine the validity of height and weight measurements of infants aged 0–2 y nor of pregnant women; thus our findings should not be extended to infants and women participating in WIC. We conclude that height and weight measurements of children aged 2–5 y taken at WIC sites using standardized protocols that include training and annual evaluations of staff adherence to measurement protocols are suitable for monitoring and research uses.

Methods

Study Design

This study was the validation component of a larger project whose objective was to study the effects of neighborhood environment on the development of early childhood obesity using data from children aged 2–5 y extracted from PHFE WIC databases. PHFE WIC (http://www.phfewic.org) is the largest local WIC agency in the country and administers WIC clinics in Los Angeles, Orange, and San Bernardino counties in Southern California. It serves over 300,000 clients every month, comprising 23% of California’s and 4% of the nation’s WIC participants. Over 80% of the families served by PHFE WIC are Hispanic.

The validation study sought to assess the accuracy of measurements obtained by WIC staff in accordance with their usual standard of care by comparing them to “gold standard” measurements obtained by research staff using standard measurement protocol. Accordingly, the study design was to have WIC staff measure and record the height and weight of children who were being recertified, have trained research staff obtain height and weight measurements of the same children using a standard research protocol, calculate BMI by both sets of measurements, and compare the two sets of measurements. To ensure that the study was a validation of WIC’s usual standard of care, WIC clinic staff were blinded to the true purpose of the study and not reminded of height/weight protocols nor retrained prior to the study. WIC clinic staff were told that research staff were conducting a student-initiated survey that required taking height and weight measurements. The study protocol was approved by the institutional review board of the University of California Los Angeles. Parents or guardians of participating children gave verbal consent before inclusion in the study.

Recruitment

Children were recruited from seven PHFE WIC sites in Los Angeles and Orange counties in California in the spring of 2010. WIC child participants must recertify for WIC eligibility every 6 mo, at which time it is required that height and weight measurements be obtained by WIC staff. Children aged 1.9–5 y old who were attending the WIC clinic for the purpose of recertification were invited to participate in the study. The seven WIC sites were selected to achieve representation of the racial/ethnic composition of WIC clientele and a diversity of site characteristics, such as staff turnover rates, that might be associated with measurement accuracy.

Data Collection

PHFE WIC paraprofessional clinic staff are trained upon hire to follow a written standardized height/weight measurement protocol and take required height and weight measurements of all children. Hands-on training with children is conducted in the clinic setting. To confirm adherence to height/weight protocols, staff are observed annually when taking height and weight measurements and asked to review the protocol if any deviations are observed. Seven to 12 staff members per site are involved in taking height and weight measurements.

The WIC protocol for obtaining height and weight measurements requires WIC staff to either measure the child at the clinic during the recertification visit or obtain measurements from health-care provider records (typically pediatrician visit records) brought to the recertification visit by the caregiver. The health-care provider record can only be used if the visit was within 60 d of the WIC recertification appointment. WIC protocol calls for removal of shoes and outerwear before measuring children; height is measured to the nearest quarter inch and weight to the nearest quarter pound. At each of the WIC clinics in the study, WIC height measurements were taken using a wall-mounted stadiometer (Model PE-WM-60-76; Prospective Enterprises, Portage, MI), and WIC weight measurements were taken using a Health-O-Meter 402LB scale (Prospective Enterprises). Scales are calibrated every 6 mo by WIC administrative staff. Height and weight data obtained during recertification are entered into the California Integrated Statewide Information System. For measurements obtained from health-care provider records, the date of the measurement recorded in the Integrated Statewide Information System is the date of the provider visit. At PHFE WIC, about 80% of height and weight measurements are taken by WIC staff and about 20% are taken from pediatric provider records.

Three research assistants, two of whom were bilingual, were trained to obtain height and weight measurements; another three were trained to obtain only weight measurements. The training involved use of a stadiometer (Shorr Board; Shorr Production, Olney, MD) for measuring height and a digital scale (Model HD-314, Tanita, Arlington Heights, IL) for measuring weight, using standardized protocols. Interobserver error was assessed at the start of and midway through the 6-mo data collection period and was determined to be negligible (<0.1 cm for height and 0.1 kg for weight).

During the child’s recertification visit, research staff measured the child in a separate area of the WIC site, using the research study’s equipment as previously described (Shorr Board and Tanita scales). The research protocol called for two measurements each of height and weight, taken by the same assistant; a third measurement was taken if the first two measurements differed by more than 0.5 cm (height) or 0.3 kg (weight). Standing height was measured to the nearest 0.1 cm, and weight was measured to the nearest 0.1 kg. The digital scale was calibrated every 8 wk using standard weights. The ICCs for measurements within child were 0.9989 for heights and 0.9997 for weights, which suggests that intrarater reliability was well above acceptable levels (20). Final height and weight values were obtained as means of the two or three measurements.

Statistical Analysis

Analysis was limited to children who had had WIC and research protocol measurements taken within 30 d of each other to minimize lack of concordance attributable to child growth. This resulted in the exclusion of some observations for which WIC measurements were based on pediatric provider records.

WIC height and weight measurements of children participating in this study were extracted from the Integrated Statewide Information System for purposes of comparison with measurements obtained by research staff. Child age, sex, race/ethnicity, and preferred language of the family were also extracted from the Integrated Statewide Information System. For both WIC and research protocol measurements, height and weight measurements were used to compute BMI, defined as weight (kg)/height (m)2, and height, weight, and BMI percentiles were obtained using Centers for Disease Control and Prevention sex- and age-specific reference values (19).

Statistical analyses assessed the reliability and accuracy of WIC measurements of height, height percentile, weight, weight percentile, BMI, and BMI percentile as continuous variables, and the accuracy of WIC BMI percentiles to classify children as underweight or normal weight vs. overweight or obese.

Several methods are available for assessing measurement reliability, with major methods including the ICC approach (21) and LOAs (22). The first approach assesses reliability as the ICC, i.e., the percentage of the overall data variance that is due to true variability as opposed to variability due to measurement error or other sources. ICCs are unitless coefficients taking values between 0 and 1, which facilitates comparison across measurements on different scales. The LOA approach provides an interval within which 95% of differences between measurements by two raters are expected to lie. The limits are on the scale of the data, which facilitates interpretation of the clinical importance of differences but makes comparisons across different scales difficult. The two approaches provide complementary information, and we provide both.

We calculated the 95% LOA for height, weight, BMI, and their percentiles as mean difference ± 1.96 s.d. of the differences (22), with differences calculated as WIC measurement minus research measurement. We estimated ICCs using variance components from the mixed model

where μ is a global mean, Si is a child random effect with mean 0 and variance σ2S, Cj is a clinic random effect with mean 0 and variance σ2C, Bk is a rater effect (WIC vs. research protocol), and Eijk is random error with mean 0 and variance σ2E. This approach is similar to Case 3 of Shrout and Fleiss (21) and Model C of Muller and Buttner (23), in which each target (i.e., child) is measured by a fixed number of raters (i.e., two raters, WIC and research staff) but includes the addition of a clinic effect to account for sampling of children within a sample of clinics. Although research protocol measurements were taken by different research assistants, since interobserver error among research staff was negligible as previously described, we did not include an additional term for research staff variance. Although Bk is a so-called fixed effect, it can be modeled as a random variable, with mean 0 and variance σ2B, and a consistent estimate of the ICC can be obtained as σ2S /(σ2S + σ2C + σ2B + σ2E) (24,25). We estimated ICCs for height, height percentile, weight, weight percentile, BMI, and BMI percentile in the overall sample and within strata defined by age, sex, race/ethnicity, preferred language, and source of WIC values (in-clinic vs. provider record). We also estimated ICCs within strata defined by child height, weight, BMI, and BMI percentile to assess whether reliability varied with child size. These strata were defined by dichotomizing the research protocol measurements approximately at their medians.

We assessed the accuracy of the WIC measurements as continuous variables by estimating the mean bias of the WIC measurements using the mixed model in equation (1) with the rater effect specified as fixed. We estimated mean bias for height, height percentile, weight, weight percentile, BMI, and BMI percentile in the overall sample and within strata defined by age, sex, race/ethnicity, preferred language, source of WIC values, and child height, weight, BMI, and BMI percentile. We tested for differences in mean bias among strata using likelihood ratio tests comparing models with and without an interaction between rater and factor.

We assessed the accuracy of WIC measurements as binary classifiers of overweight/obese vs. underweight/normal by estimating sensitivity, specificity, positive and negative predictive values, and concordance between WIC and research protocol classifications. Underweight was defined as BMI below the 5th percentile, overweight as BMI ≥85th percentile and <95th percentile, and obesity as BMI ≥95th percentile (19). These estimates were obtained using mixed-effects logistic regression models, with random intercepts for clinic. Models were formulated such that sensitivity, specificity, positive and negative predictive values, and concordance could be estimated as marginal predictions. The delta method was used to obtain CIs.

Statement of Financial Support

Funding for this study was provided by the American Heart Association Grant-in-Aid Program. C.M.C. was also supported by National Institutes of Health grant U54 RR 031268.