Predicting future weight status from measurements made in early childhood: a novel longitudinal approach applied to Millennium Cohort Study data

Background/Objective: There are reports that childhood obesity tracks into later life. Nevertheless, some tracking statistics such as correlations do not quantify individual agreement, whereas others such as diagnostic test statistics can be difficult to translate into practice. We aimed to employ a novel analytic approach, based on ordinal logistic regression, to predict weight status of 11-year-old children from measurements at age 5 years. Subjects/Methods: The UK 1990 growth references were used to generate clinical weight status categories of 12 076 children enrolled in the Millennium Cohort Study. Using ordinal regression, we derived the predicted probability (percent chances) of 11-year-old children becoming underweight, normal weight, overweight, obese and severely obese from their weight status category at age 5 years. Results: The chances of becoming obese (including severely obese) at age 11 years were 5.7% (95% confidence interval: 5.2 to 6.2%) for a normal-weight 5-year-old child and 32.3% (29.8 to 34.8%) for an overweight 5-year-old child. An obese 5-year-old child had a 68.1% (63.8 to 72.5%) chance of remaining obese at 11 years. Severely obese 5-year-old children had a 50.3% (43.1 to 57.4%) chance of remaining severely obese. There were no substantial differences between sexes. Nondeprived obese 5-year-old boys had a lower probability of remaining obese than deprived obese boys: −21.8% (−40.4 to −3.2%). This association was not observed in obese 5-year-old girls, in whom the nondeprived group had a probability of remaining obese 7% higher (−15.2 to 29.2%). The sex difference in this interaction of deprivation and baseline weight status was therefore −28.8% (−59.3 to 1.6%). Conclusions: We have demonstrated that ordinal logistic regression can be an informative approach to predict the chances of a child changing to, or from, an unhealthy weight status. This approach is easy to interpret and could be applied to any longitudinal data set with an ordinal outcome.


INTRODUCTION
The increasing prevalence of childhood obesity has become a major public health issue worldwide in both developing and developed countries. 1 The consequences of childhood obesity can be severe, with an increased risk of developing conditions such as diabetes, cardiovascular disease and psychosocial disorders. 2,3 Furthermore, there is some evidence that children who are overweight or obese are more likely to be overweight or obese adults; hence, they are more likely to suffer from comorbidities when they reach adulthood. 4 Nevertheless, most adults who are overweight or obese now were of normal weight as children. 5 In England, ∼ 1 in 5 children aged 4-5 years and 1 in 3 children aged 10-11 years are either overweight or obese (defined using the UK90 population monitoring cut points for overweight (⩾85th centile) and obesity (⩾95th centile)). These figures are from the National Child Measurement Programme (NCMP) that was introduced into England in 2006 to measure the height and weight of children in Reception (4-5 years old) and Year 6 (10-11 years old). The rationale for introducing the NCMP included the gathering of population-level data on growth trends, informing service planning and delivery and increasing awareness of weight issues in children. 6 The results from the programme are routinely fed back to parents via letters. 7 There is a standard template that may be used by each local authority in England; however, some areas make changes to the letter or do not use the letter at all. This variation in practice leads to a lack of consistency in how local authorities present the results and whether they offer further support to the parents/children. In some local authorities the letter suggests that children who are overweight/obese during primary school are more likely to be overweight/obese in adulthood; some letters have previously stated that overweight or obese children are more likely to develop disorders such as cancer, diabetes and cardiovascular disease. 8 Such information can be distressing and also confusing for parents; therefore, it is important to provide parents with information that is acceptably accurate, informative and easy to understand.
The NCMP allows the annual prevalence of childhood obesity to be reported. The NCMP also has the potential to provide prognostic information, that is, to ascertain whether an individual child is likely or not to have an unhealthy weight status when measured again later in life. Nevertheless, this issue of 'tracking' is currently difficult to explore using NCMP data, which up until 2013 was anonymised before the annual upload to the national data collection system, thus prohibiting any data linkage on an individual level. 9 A statistic that is used commonly in body mass index (BMI) tracking research is the correlation coefficient. In a recent metaanalysis, 10 tracking correlations were synthesised from 48 studies that varied in their duration between initial and follow-up measurements. The authors of this review concluded that a high degree of tracking existed for follow-up durations of 1, 10 and 20 years, with respective correlation coefficients of 0.78-0.86, 0.67-0.78 and 0.27-0.47. However, a correlation coefficient does not quantify the prediction error for individual children. 11 Odds ratios, derived from binary logistic regression models, are also commonly reported in BMI tracking research. For example, in a recent secondary analysis of the NCMP data for South Gloucestershire, England, 12 multiple binary logistic models were used to derive over 20 separate odds ratios for boys, girls and the pooled sample across various weight categories. In this latter study, one odds ratio was cited to infer, incorrectly, that children who were overweight in Reception (85th-94th percentile, UK 1990 growth reference charts) were '13 times more likely' to be overweight or obese in year 6, compared with children who were between the 2nd and 49th percentile in Reception. It is not uncommon for odds ratios and relative risks to be misrepresented in research, rendering them difficult to translate to practitioners and patients. 13 Furthermore, the analysis by Pearce et al. 12 only used the population monitoring cutoffs for overweight and obesity; in the NCMP feedback letters the clinical cutoffs are used. Pearce et al. 12 also did not predict the odds of a child becoming severely obese, which has shown to be an increasing concern in England. 14 Lastly, BMI weight categories are clearly ordinal-level data, rendering the use of many binary logistic regression models across multiple pairs of weight categories non-parsimonious.
Finally, diagnostic test statistics such as sensitivity and specificity can help ascertain the individual agreement between two different measurements of status. 15 Nevertheless, several additional statistics (for example, positive predictive value, negative predictive value and positive and negative likelihood ratios) are required for a full interpretation, rendering results that are sometimes difficult to explain to a layperson, such as a child's parent. Steurer et al. 16 reported that even general practitioners can struggle to apply the statistics from the appraisal of a diagnostic test. 16 The aim of this secondary analysis of longitudinal data was to develop a robust analytic approach to predict the individual weight status of 11-year-old-children from weight status data collected at age 5 years, and to explore the influences of sex and deprivation.

SUBJECTS AND METHODS
Subjects in this secondary data analysis are from the Millennium Cohort Study (MCS) that recruited over 19 000 children born in the United Kingdom between 1 September 2000 and 11 January 2002. Children were identified from the Child Benefit register and were recruited, along with their families, when they were ∼ 9 months old. 17 The study used disproportionately stratified sampling to overrepresent disadvantaged populations and areas with a high prevalence of BME (Black and Minority Ethnic) communities. 18 Data were downloaded from the UK data archive, from sweep 1 and sweep 5 of the data collection, to select children who were of similar ages to those taking part in the NCMP (it is also possible that the children resident in England were also measured in the NCMP). The following variables were obtained: MCS research serial number, cohort member number, sex, age, BMI and index of multiple deprivation (IMD) decile (by country). 19,20 Height and weight were measured by study investigators at each time point, and were not self-reported. Because of the sample stratification and clustering, the data needed to be set for analysis using an attrition/nonresponse weight (whole of UK-level analysis), a finite population correction factor, a stratum variable and a ward variable to account for clustering. These variables were also obtained from the data set. 20 As variables were required from multiple data sets, files were merged together based on the MCS research serial number and cohort member number (used to represent twins/triplets). Raw BMI values were converted into BMI z-scores/centiles using the LMS growth Microsoft Excel add-in 21 where UK 1990 growth references were selected. These centiles were then converted into weight status categories using the UK 1990 clinical cutoff points: underweight ( o2nd centile), normal weight (⩾2nd but o91st centile), overweight (⩾91st centile but o 98th centile) and obese (⩾98th centile). 22 These categories are also used in the NCMP feedback letters to parents. 6 An additional category for severely obese children was also generated using the ⩾ 99.6th centile cutoff. 14 IMD scores were used to assess the level of deprivation and were presented in quintiles. Ordinal logistic regression was applied to generate the predicted probability (% chances) of a child becoming underweight, normal weight, overweight, obese and severely obese at age 11 years, with weight status at age 5 years, sex, deprivation and their three-way interaction as predictors. Interaction analyses presented are exploratory. All analyses were performed using Stata software (StataCorp, 2013. Stata Statistical Software: Release 13, College Station, TX, USA: StataCorp LP). Point estimates are presented together with 95% confidence intervals. These intervals are not adjusted for multiple comparisons. 23 Three sensitivity analyses were conducted. The first simply removed the second-and third-born twins/triplets to explore whether these had a substantial effect on the estimates. The second relaxed the constraint of the proportional odds assumption underpinning ordinal logistic regression and repeated all analyses using generalised ordinal logistic regression. 24 This model allows the effects of the predictor variables to vary with the point at which the categories of the age 11 weight status variable are dichotomised, rather than enforcing parallel lines. Finally, we explored the effect of missing data, given that 3116 BMI values were missing at follow-up. Under a missing at random assumption, a complete case analysis-our primary analysis-is unbiased in this context and methods such as multiple imputation can only exacerbate problems by introducing additional random variation. However, multiple imputation can be used for a sensitivity analysis to examine the effects of substantial departures from the missing at random assumption. In the current study, it is plausible that those children lost to follow-up had substantially higher BMI values-that is, data missing not at random. We imputed the 3116 missing follow-up BMI values predicted from baseline BMI using the Stata 'MI' module with predictive mean matching (random selection from 10 nearest neighbours). Twenty imputations were made by sex and deprivation strata to preserve relationships for the higher-order interactions in the analysis model. Using a pattern mixture modelling approach, 25 each imputed follow-up BMI value was then inflated by 25% to simulate data missing not at random, with higher follow-up BMI in those not presenting for measurement at age 11 years. We then converted these inflated BMI values into weight status categories using the same method previously described. The identical ordinal logistic regression model was then applied to the 20 imputed data sets, with results combined using Rubin's rules. 26

RESULTS
A total of 12 076 children were included in the analyses who had a BMI measurement along with complete data for sex and IMD score. The NCMP cleaning protocol 27 was used to explore whether there were any BMI outliers: only two BMI measurements were slightly outside the acceptable ranges given in the protocol; hence, these were retained in the analysis. Half (50.3%) of the sample were boys, and 25.8% and 19.2% of children were in the most deprived (0 to o 20%) and least deprived (80 to 100%) IMD categories, respectively. The mean BMI at baseline was 16.3 ± 1.9 kg m 2 and the mean age was 5.2 ± 0.3 years. The mean BMI at follow-up was 19.2 ± 3.7 kg m 2 and the mean age was 11.2 ± 0.3 years. At baseline (age 5 years), the percentage of children who were underweight, normal weight, overweight and obese (including severely obese) were as follows: 1.1% (n = 127), 82.4% (n = 9954), 10.3% (n = 1249) and 6.2% (n = 746). At follow-up (age 11 years), the percentages were as follows: 1.6% (n = 188), 71.0% (n = 8577), 15.1% (n = 1819) and 12.4% (n = 1492). The percentage of children who were severely obese at age 5 and 11 years were 2.9% (n = 347) and 4.1% (n = 494), respectively. The tracking of raw BMI between age 5 and age 11 years produced a correlation coefficient of 0.61.
Results from the full factorial ordinal logistic regression model are shown in Table 1, split by sex (the Stata code required to run this model is provided in Supplementary Table 1). Sex was shown to have little influence on these associations. Interestingly, overweight children had around a one-third chance of remaining overweight, one-third chance of returning to the normal-weight category and one-third chance of becoming obese. Obese (including severely obese) children at age 5 years had nearly a 70% chance of remaining obese at 11 years.
When the analysis was performed with an additional category for severe obesity, severely obese 5 year olds had a 52.8% (45.3 to 60.3%) chance of remaining severely obese at 11 years, and a 31.3% (27.4 to 35.1%) chance of decreasing their weight status and returning to the obese category (⩾98th but o99.6th centile). There were no substantial differences between sexes: severely obese boys had 49.5% (39.4 to 59.5%) chance of remaining severely obese compared with a 56.6% (46.0 to 67.2%) chance for severely obese girls. Severely obese boys and girls had a 32.3% (28.1 to 36.5%) and 30.0% (24.1 to 35.8%) chance of decreasing their weight status and becoming obese, respectively. Boys who were obese (not severe) at age 5 years had a 23.0% (17.2 to 28.8%) chance of becoming severely obese, whereas obese girls had a 27.2% (19.7 to 34.7%) chance.
Results stratified by sex and deprivation are shown in Table 2. Nondeprived obese boys had a lower chance of remaining obese at age 11 years compared with deprived obese boys; a difference of − 21.8% (−40.4 to − 3.2%). The opposite association was found in obese girls, where nondeprived girls were more likely to remain obese than deprived obese girls; however, this difference was not substantial. The sex difference in this specific interaction of deprivation and baseline weight status was − 28.8% (−59.3 to 1.6%). No other substantial differences were found between deprived and nondeprived boys/girls or when comparing boys and girls; this was also the case when normal-weight and overweight status were predicted at follow-up (data not shown). We were unable to include underweight children in the analysis split by sex and deprivation as there were too few underweight children in the sample. Table 3 shows the predicted percent chances of becoming severely obese by sex and deprivation. We also performed the analysis using the population monitoring cut points instead of the clinical cut points and found a slightly greater increase in the percent chances of becoming overweight or obese (results not shown). This was expected because the cut points are lower; hence, more children will have been categorised as overweight or obese.
When second-and third-born twins/triplets were removed from the analysis, there were no substantial differences in any of the predicted percent chances (data not shown). Similarly, relaxation of the constraint of the proportional odds assumption had no material effect on the findings. Results from the sensitivity analysis with missing data are shown in Table 4 for predicting obesity by sex and deprivation. When comparing the original analysis (data missing at random assumption) against the multiple imputation analysis (missing not at random assumption), no material differences were found.

DISCUSSION
This secondary analysis of data from the MCS has shown how a robust statistical approach can be used to predict a child's future weight status in an informative way using baseline weight status, sex and deprivation as predictor variables. This technique could be applied to NCMP data and predictions could be incorporated into the parental feedback letters, to better inform parents of the chances of their child becoming or remaining at an unhealthy   Abbreviations: CI, confidence interval; IMD, index of multiple deprivation; inc., including. Numbers are rounded to 1 decimal place.
Tracking weight status during childhood E Mead et al weight status. In fact, this statistical technique could be applied to any longitudinal data set, and additional predictor variables could be included in the model. Furthermore, as we had a considerable proportion of missing outcome data, we have demonstrated an approach to sensitivity analysis for substantial departures from the missing at random assumption. The main findings from the MCS analysis included showing that sex does not strongly influence the tracking of weight status from age 5 and 11 years. However, our exploratory interaction analyses suggest that deprivation might influence whether obese boys at age 5 years will remain obese at age 11 years, with nondeprived boys substantially less likely to remain obese. This association was not evident in girls. This finding is subject to replication and confirmation, but it suggests that nondeprived obese boys have a protective effect against remaining obese in later childhood, perhaps mediated by environmental and psychological factors.
Some of the children included in the MCS would have been measured in the English National Child Obesity Dataset (NCOD) in 2005-2006, which was then renamed the NCMP the following year after improvements were made. 28 Children in the MCS would have also taken part in the NCMP in 2011-2012 when they were in year 6 of primary school. Analyses of NCMP cohort trends have shown that obesity prevalence in the most deprived children is nearly double the prevalence in the least deprived children. This inequality gap has shown to significantly increase by ∼ 0.5% every year, showing inequalities are continuing to widen. 29 Analysis of cohort trends is limited because it does not explore how the weight status of individuals changes over time, and is unable to explore the influence of sex and deprivation in depth. The analysis of individual children in the MCS identified a protective effect against obesity in more affluent obese boys that would not have been seen in an analysis of cohort trends. Hence, this finding highlights the importance of obtaining linked NCMP data.
Following a change in NCMP legislation in 2013, 30 it is now possible to upload identifiable data through an NHS number that, if submitted, will facilitate data linkage and future tracking analyses. As there are 7 years between the two measurements, the earliest any national tracking analyses could be undertaken is 2019. That said, NCMP data can be obtained locally in those areas where data have been stored on the Child Health System (CHIS), although there are lengthy and time-consuming governance procedures to overcome in order to access these data. Examples of local authorities that have obtained data via CHIS include Hull 31 and Southampton; 32 however, not all data were collected through the NCMP as some measurements were collected before the start of the NCMP.
The main limitation to this analysis was the large amount of missing data between baseline (age 5 years) and follow-up (age 11 years) where it was possible that these data might be missing not at random. However, we were able to conduct a sensitivity analysis that showed only small differences in predicted probabilities when data were imputed under missing not at random assumption. This finding is noteworthy, as we allowed for a large departure from the missing at random assumption, with imputed follow-up BMI values inflated by 25%. A second limitation was that some children were 45 years old at baseline and 11 years old at follow-up; however, the majority of children were close to these ages. In addition, only 1.1% of the cohort were underweight at age 5 years and only 1.6% were underweight at age 11 years. Furthermore, only 2.9% and 4.1% of children were categorised as severely obese at age 5 and age 11 years, respectively. Hence, even though we analysed over 12 000 cases, a much larger sample would be required to be able to make robust predictions using these two categories. In addition, BMI may not be the most accurate measure of a child's weight status as it has shown to not always strongly correlate with body fat distribution. 33 However, BMI is the preferred method to use in a large sample as it is relatively quick to measure, less invasive than many other body fat assessments and has shown to be a relatively robust measurement at a population level. 34 A final limitation of the analysis is that the majority of the sample was of white ethnicity; hence, we were unable to explore the influence of ethnicity that has shown to strongly affect the likelihood of developing obesity. 35,36 Furthermore, a majority of children were sampled from England; hence, we were unable to conduct a country-by-country analysis.
At present, MCS data are only freely available up age 11 years; it will be interesting to explore what effect a longer follow-up period has on predicting whether children will become  Abbreviations: CI, confidence interval; IMD, index of multiple deprivation; inc., including. Numbers are rounded to 1 decimal place.
Tracking weight status during childhood E Mead et al overweight or obese in later life, especially as adolescence is anticipated to be an important predictor of adult weight status. 37 In addition, it would be worthwhile to perform further analyses looking at the effect of physical activity and nutrition on changes in BMI, and also explore what factors contribute to the protective effect against obesity in nondeprived obese boys. To conclude, this secondary data analysis has demonstrated how weight status can be tracked robustly and informatively over time. Such methods could be applied to other longitudinal data sets such as the NCMP.

CONFLICT OF INTEREST
Dr Louisa Ells is seconded to Public Health England 2 days per week as a specialist academic advisor.