Worldwide, an estimated 38 million children under the age of 5 years are above the body mass index (BMI) cut-off for what is considered a healthy weight1. In New Zealand, 14.9% of 4–5-year-olds have obesity, and 2.9% have extreme obesity2, with Māori and Pacific children experiencing disproportionately higher levels of obesity than children from other ethnic groups2. Childhood obesity tracks into later life3, even in the very young; having a body mass index (BMI) or weight for length at or above the World Health Organization (WHO) 85th percentile before the age of 18 months predicts obesity at 6 years of age4. Such findings highlight the importance of early intervention.

There are a number of potentially modifiable risk factors associated with early childhood obesity, including maternal smoking during pregnancy5,6,7, high maternal pre-pregnancy BMI5,6,7, excessive gestational weight gain5,6, high birth weight5,7, rapid infant weight gain5,7, high-protein infant formula7, and poor infant sleep5,7. Despite the identification of potentially modifiable risk factors, early childhood obesity interventions have reported inconsistent results8, although a recent meta-analysis of four trials, including one based in New Zealand, reported that early intervention reduced infants’ BMI z-scores by 18–24 months9. However, health professionals report they lack the knowledge required to confidently identify obesity risk in young children10,11,12,13. Providing health professionals with a tool that enables accurate prediction of an infant’s risk of obesity could serve to increase the effectiveness of early childhood obesity interventions, through enabling timely intervention. Importantly, any such model would need to be accurate enough to warrant telling families what could be worrying information for them14. In particular, a prediction model with a low positive predictive value (PPV, i.e. the probability that those considered to be at risk by the model will actually go on to develop obesity) would likely create considerable unwarranted anxiety.

Internationally, several prediction models have been developed to determine risk of early childhood obesity15, but to date no such model has been developed for the ethnically diverse New Zealand population. Prediction models should be developed and validated using participant data that are representative of the population in which the model will be used16. Thus, we aimed to develop, internally validate, and externally validate a prediction model for obesity at 4–5-years-of-age for New Zealand children. Additionally, because the prevalence of severe childhood obesity is of increasing concern, and children with severe obesity tend to have poorer clinical outcomes compared to those with less severe forms17, we also aimed to derive and validate models for severe childhood obesity in the same population.


Primary study population

The derivation and internal validation samples were obtained using data from the Growing Up in New Zealand (GUiNZ) study, a longitudinal, prospective birth cohort that recruited 6853 children via their pregnant mothers with an estimated due date between 25 April 2009 and 25 March 2010, who were resident in the greater Auckland, Counties-Manukau, and Waikato District Health Boards regions of New Zealand18. The cohort characteristics and recruitment strategy have been described elsewhere18. The generalizability of the recruited cohort to all current New Zealand births has also been demonstrated, especially with regard to ethnicity and socio-demographics19. Multiple data collection waves have occurred from pregnancy and throughout the pre-school period. Sociodemographic information including parental education, ethnicity, and socioeconomic deprivation were collected during pregnancy, as was maternal pre-pregnancy BMI and paternal BMI. Consent for linkage to routine birth records was also sought from parents to complement parental reported size at birth measures18. Socioeconomic status was determined using the NZDep200618,20, an area level measure of deprivation, which provides scores ranging from 1 (least deprived areas) to 10 (most deprived areas)20. At ages 2 and 4 years, trained GUiNZ researchers obtained height and weight measurements from the children using standardised protocols during face-to-face home interviews21.

Populations for external validation

Two populations were used for external validation of the prediction models. External validation is vital to determine the generalisability of a prediction model to populations that are reflective of, but not identical to, the population used to derive the model22. Prevention of Overweight in Infancy (POI) was a randomised controlled trial of interventions to prevent overweight during infancy23. In brief, 802 mothers and infants living in Dunedin (New Zealand) were recruited between 2009 and 2012. Pregnant women were eligible to participate if they were over 16 years of age, could communicate in English or the indigenous Māori language, and were planning to live in the Dunedin area for the following 2 years23. Anthropometric measurements were taken by trained researchers when the children were 5 years old24. The majority of children in POI (79%) were of New Zealand European ethnicity23.

The Pacific Islands Families (PIF) study recruited newborn infants of Pacific Island ethnicity born at Middlemore Hospital (Auckland, New Zealand) between March and December 200025. Infants were considered to be of Pacific Island ethnicity if at least one parent identified themselves as such. Participants joined the study at their 6-week follow-up, when 1376 eligible mothers (1398 infants) consented to and completed an interview. Children’s heights and weights were measured by trained researchers using standardized equipment at approximately 4 years of age26.


Children's BMI measures were transformed into z-scores as per WHO standards28,29. The primary outcome for the childhood obesity model was a binary outcome of obesity at age 4–5 years, which was defined as a BMI z-score ≥ 1.645 (or ≥ 95th percentile for age and sex)30.

A further two models were derived to predict severe childhood obesity. The outcome for the first was the likelihood of severe childhood obesity at age 4–5 years, defined as a BMI z-score ≥ 1.974 (i.e. ≥ 120% of the 95th percentile for age and sex); for the second model, it was defined as a BMI z-score ≥ 2.326 (or ≥ 99th percentile for age and sex)17.

Model development

Participants from GUiNZ with complete maternal and paternal data were randomly split into derivation (70%) and validation (30%) cohorts. A large number of categorical and continuous variables were selected from GUiNZ (Supplementary Table S1). The associations between categorical variables and obesity in childhood were examined using Chi-square tests. For continuous variables, univariable logistic regressions were used. All parameters displaying an association with p < 0.10 were selected for the next step.

Pairwise associations between all continuous variables were examined using Pearson's correlation coefficients, with high collinearity defined as |r|≥ 0.531. Similarly, associations between categorical variables were also examined to identify cases of high collinearity based on the phi (φ) coefficient: |φ|≥ 0.5. Whenever high collinearity between two variables was identified, one of these variables was eliminated, primarily based on their practicality for use in routine clinical practice. The remaining parameters were selected for further investigation. In addition, two-way interactions between all remaining variables were examined, and those that were statistically significant at p < 0.05 were also selected.

Ethnicity in GUiNZ was self-reported, and participants were able to select multiple ethnic groups with which they identified, and to self-prioritise their main ethnicity. Children’s expected ethnicity was also reported by their mothers19, but for a number of reasons, maternal prioritised ethnicity was used to represent children’s ethnicity in this study. Firstly, paternal demographic information is not always reliably available during pregnancy. Secondly, previous work on GUiNZ data has demonstrated that children’s reported ethnicity can vary by parent32. Lastly, maternal contributions to overweight or obesity in offspring, or risk factors such as high birth-weight, have been shown to be equal to33,34 or greater than35,36 that of paternal contributions. Ethnicity was defined using a hierarchical system of classification, such that if multiple ethnicities were selected, the participant was assigned to a single category37. Given the small numbers of participants in the categories ‘Other’ and ‘MELAA’ (Middle Eastern, Latin American and African), these were combined as ‘Other ethnicities’.

Model discrimination was defined as its ability to adequately differentiate those with high risk of an event from those with a low risk38, and it was estimated using the area under the receiver operating characteristic curve (AUROC). The following thresholds to assess model discrimination based on AUROC were adopted: poor (< 0.60), possibly useful (≥ 0.60 and < 0.70), acceptable (≥ 0.70, < 0.80), excellent (≥ 0.80 and < 0.90), and outstanding (≥ 0.90)38,39.

Model calibration was defined as its goodness of fit or extent to which the model correctly estimated the absolute risk38, and was determined using the Hosmer–Lemeshow test39. A p value < 0.05 was considered evidence of poor model calibration.

In addition, the following parameters were obtained to assess model accuracy and predictive capacity:

$$Sensitivity=\frac{\mathrm{n} \, \mathrm{true} \, \mathrm{positives }}{(\mathrm{n} \, \mathrm{true} \, \mathrm{positives}+\mathrm{ n} \, \mathrm{ false} \, \mathrm{negatives})}$$
$$Specificity=\frac{\mathrm{n} \, \mathrm{true} \, \mathrm{negatives}}{(\mathrm{n} \, \mathrm{true} \, \mathrm{negatives}+\mathrm{ n} \, \mathrm{false} \, \mathrm{positives})}$$
$$Positive \, predictive \, value \, (PPV)=\frac{\mathrm{n} \, \mathrm{true} \, \mathrm{positives }}{\left(\mathrm{n} \, \mathrm{true} \, \mathrm{positives }+\mathrm{ n} \, \mathrm{false} \, \mathrm{positives}\right)}$$
$$Negative \, predictive \, value \, (NPV)=\frac{\mathrm{n} \, \mathrm{true} \, \mathrm{negatives }}{(\mathrm{n} \, \mathrm{true} \, \mathrm{negatives }+\mathrm{ n} \, \mathrm{false} \, \mathrm{negatives})}$$

Using the derivation cohort, selected variables were included in a logistic regression model. Models were progressively modified with the addition and/or removal of parameters. This step was repeated multiple times, with model discrimination and calibration assessed at each step, until the most parsimonious model with the best discrimination possible was obtained, while accounting for a predictor's usability in routine practice. Using the method described by Moons et al.22, the model was then internally and externally validated, with the same equation applied to the GUiNZ internal validation cohort, and POI and PIF external validation cohorts.

Analyses were performed in SPSS v25 (IBM Corp, Armonk, NY, USA) and SAS v9.4 (SAS Institute, Cary, USA). All tests were two-tailed, with statistical significance set at p < 0.05 and without adjustment for multiple comparisons. There was no imputation of missing data.

Ethics approval

This study solely involved the use of anonymized data. Ethics approval for GUiNZ was granted by the Ministry of Health Northern Y Regional Ethics Committee, New Zealand18; for POI by the New Zealand Lower South Regional Ethics Committee, New Zealand27; and for PIF by the Auckland branch of the National Ethics Committee, New Zealand26. This study was performed in accordance with all appropriate institutional and international guidelines and regulations for research. Written informed consent was obtained from a parent or legal guardian of all participants from the individual studies.


The authors have no financial or non-financial conflicts of interest to disclose that may be relevant to this work. The funders had no role in study design, data analysis or interpretation, decision to publish, or preparation of this manuscript.



The selected derivation cohort consisted of 1731 participants, while the selected internal validation cohort consisted of 713 participants (Supplementary Figure S1). The number of participants that we ultimately used was reduced mostly due to the lack of paternal and/or infancy weight gain data (Supplementary Figure S1), which were identified as key parameters during variable selection. Of note, Supplementary Table S2 shows that there were some differences between included and excluded GUiNZ participants. In particular, the maternal ethnicity and sociodemographic status of participants differed, with more participants with mothers of Māori or Pacific ethnicity and/or from areas of greater socioeconomic deprivation in the excluded group. The exclusion of Māori and Pacific mother-infant pairs, primarily due to missing paternal data, is likely to have introduced some bias. However, ethnic-specific models (New Zealand European, Māori, and Pacific) were also derived, and yielded no clear advantage in comparison to the overall model (data not shown). For those included participants, the prevalence of obesity was similar in the GUiNZ derivation and validation cohorts (15.8% and 16.1%, respectively), while other demographic factors were also similar (Table 1).

Table 1 Demographic information on the populations used for model derivation and validation.

Model predictors

The final prediction model included maternal pre-pregnancy BMI, paternal BMI, birth weight, maternal smoking during pregnancy, and infant weight gain as significant independent predictors of childhood obesity (Table 2).

Table 2 Parameters for the final childhood obesity prediction model derived from the Growing Up in New Zealand subset.


Discrimination accuracy was acceptable for the derivation model [AUROC = 0.74 (0.71, 0.77), p < 0.0001], and there was no evidence of poor model calibration (p value for Hosmer–Lemeshow test = 0.27). Table 3 shows the sensitivity, specificity, PPV, and NPV of the derivation model at various risk thresholds ranging from 0.20 to 0.95. PPVs ranged from 18.6 to 54.0% (Table 3).

Table 3 Accuracy and predictive capacity of a prediction model for childhood obesity for New Zealand among the derivation and validation cohorts.

Internal validation

Discrimination accuracy was acceptable for the internal validation model [AUROC = 0.73 (0.68, 0.78), p < 0.0001]. There was no evidence of poor model calibration (p for Hosmer–Lemeshow test = 0.75). Model accuracy parameters are shown in Table 3. Similar to the derivation model, PPVs ranged from 19.0 to 47.7% (Table 3).

External validation

The demographic information for each of the external validation cohorts is provided in Table 1. Notably, the prevalence of childhood obesity in the POI cohort was 7.0% and 51.1% in PIF (Table 1).

Discrimination accuracy was excellent for the POI cohort [AUROC = 0.80 (0.71, 0.90), p < 0.0001] and acceptable for PIF [AUROC = 0.74 (0.66, 0.82), p < 0.0001]. There was no evidence of poor model calibration (all p > 0.05 for Hosmer–Lemeshow test).

Table 3 provides the sensitivity, specificity, PPV, and NPV of the external validation models at various percentile risk thresholds ranging from 0.20 to 0.95. PPVs were particularly low for POI, ranging from 7.8 to 23.5%, but higher and more consistent for PIF, ranging from 51.9 to 60.5% (Table 3).

Severe childhood obesity prediction

Both models with severe childhood obesity as the outcome included 2408 participants in their derivation cohorts (Supplementary Table S3). The predictors included for each model and their associations with the outcome are reported in Supplementary Tables S4 and S5. Discrimination accuracy was acceptable for the model predicting severe childhood obesity at BMI ≥ 120% of the 95th percentile [AUROC = 0.75 (0.72, 0.78); Supplementary Table S6] and at BMI ≥ 99th percentile [AUROC = 0.76 (0.72, 0.80); Supplementary Table S7]. There was no evidence of poor calibration for either derivation model (both p > 0.05 for Hosmer–Lemeshow test) or for the validation models (all p > 0.05 also), with the exception of the POI validation model for severe childhood obesity at a BMI z-score ≥ 99th percentile (p = 0.002).

The sensitivity, specificity, PPV, and NPV of the derivation and validation models for severe childhood obesity are reported in Supplementary Tables S6 and S7. PPVs for both derivation models were low, ranging from 15.0 to 47.9% for the model predicting severe childhood obesity at BMI ≥ 120% of the 95th percentile (Supplementary Table S6), and 9.0 to 33.1% at BMI ≥ 99th percentile (Supplementary Table S7). The PPVs for the internal validation model predicting severe childhood obesity at BMI ≥ 120% of the 95th percentile were similarly low to those of the derivation model (14.2–51.0%, Supplementary Table S6), as were the PPVs for the model at BMI ≥ 99th percentile (7.8–34.5%, Supplementary Table S7). Regarding external validations, the PPVs for both POI models were particularly poor (5.2–16.7%, Supplementary Table S6; 0–8.7%, Supplementary Table S7), although markedly higher for the PIF models (38.3–55.5%, Supplementary Table S6; 41.5–66.7%, Supplementary Table S7).


New Zealand has a high prevalence of early childhood obesity, with one in three children starting school already overweight or with obesity2. Furthermore, New Zealand’s population is an ethnically diverse one40, with obesity rates inequitably distributed across ethnicities. For example, 30.2% of Pacific Island children have obesity by the age of four, versus 12.7% of New Zealand European children2. We have developed a model to attempt to predict likelihood of obesity at 4–5 years based on infancy data, starting with a cohort that is representative of the New Zealand birth population19. The derivation model, internal validation model, and two external validation models all produced AUROCs that were either acceptable or excellent. However, despite the encouraging AUROCs, PPVs were almost invariably low across the risk threshold range, indicating high rates of false positives. In addition, derivation and validation of two models to predict severe childhood obesity produced similar findings.

The performance of our childhood obesity prediction model was comparable with previous international early childhood obesity or overweight prediction models, with derivation AUROCs ranging from 0.67 to 0.8715. Our final model used five predictors: birth weight, maternal pre-pregnancy BMI, paternal BMI, infancy weight gain, and maternal smoking during pregnancy. The inclusion of infancy weight gain data in early childhood obesity prediction regression models has been suggested to produce higher AUROC values than models solely reliant on predictors available at birth15. Our model produced a higher AUROC than one previous regression model supplemented by infancy weight gain data, but was lower than two others15. However, our infancy weight gain data were obtained using only two time points: birth and anywhere between 6 and 12 months. It is possible that data collected at more regular intervals would have improved our model’s discrimination.

Our model’s discrimination capacity was excellent for the POI cohort, while it remained acceptable for the PIF cohort. This may be explained by differences between the populations. POI was a cohort consisting of mostly New Zealand Europeans, with a low prevalence of obesity at 7.0%. By comparison, the PIF cohort included only Pacific Island children, with a markedly higher prevalence of obesity at 51.1%. However, of particular relevance, there were also substantial contrasts in PPVs when the model was applied to these external validation cohorts. For example, the highest PPV achieved when the model was applied to the POI cohort was 23.5% compared to 60.5% in the PIF cohort. In actual numbers, at the 95th probability percentile threshold (where the PPV was highest), 52 of 69 participants in the PIF cohort who did develop obesity were correctly identified as being likely to do so (sensitivity = 75.4%), while 34 were incorrectly identified as at risk. By contrast, at the same threshold, the model correctly identified only 4 of 27 participants that went on to develop obesity in the POI cohort (sensitivity = 14.8%), with 13 false positives. Choosing a lower risk threshold would only serve to increase the rate of false positives. Of note, derivation and validation of models to predict severe childhood obesity according to two definitions (known to be associated with poorer long-term clinical outcomes compared to less severe obesity17) did not substantially improve PPVs for any included cohort.

Risk thresholds should be determined following consideration of a number of factors, including sensitivity and specificity of the model, financial cost of any proposed intervention, and any potential risks of intervening or not intervening in misclassified cases15. For obvious reasons, the weight given to these individual aspects will vary markedly depending on the condition in question, whether it is a relatively benign condition vs a disease with a very high mortality rate (e.g. pancreatic cancer). In this study, the observed rates of false positives across the risk threshold continuum indicate that in New Zealand’s multi-ethnic population, a large number of children would be incorrectly classified as at risk of childhood obesity. There is a large social stigma associated with childhood obesity41, and a high number of false positives (i.e. children being wrongly labelled as at risk of developing obesity) could create considerable unwarranted anxiety for many families, and thus should be avoided. In addition, screening for obesity may be unethical in the absence of proven treatments for early childhood obesity. Nonetheless, a recent meta-analysis reported that randomised controlled trials on interventions to prevent early childhood obesity conducted in Australia and New Zealand (n=4) resulted in lower BMI z-scores at 18–24 months9.

There has been some research into the acceptability of early childhood obesity prediction to families. In New Zealand, parents and other caregivers were hypothetically accepting of communication regarding early childhood obesity prediction, although many anticipated feeling upset or worried in response14. UK parents were also hypothetically accepting of such communication, but were fearful of judgement from health professionals and others42. One study has explored parental views of actual risk communication43, reporting that some parents did not trust the prediction and believed the risk was not likely to be relevant for their baby43. Of note, some of the health professionals delivering the risk communication also expressed similar beliefs43. Indeed, parental awareness of current childhood obesity in older children is not necessarily associated with action to reduce the child’s weight44. Furthermore, there is some evidence that in the long-term, young children considered to have overweight by their parent gain more weight than children not considered to have overweight45,46,47. In one study, this was regardless of the child’s initial weight status (i.e. normal weight or overweight)45. Therefore, accurate parental perception of children’s weight status could even be detrimental to children’s long-term weight outcomes. Nonetheless, it is important to note that all three of these studies were observational in design, and no intervention was offered to children considered to have overweight (as would be the case if a prediction model like ours were to be used in clinical practice). Furthermore, a prediction model does not require parents to perceive their child as having a current weight issue, but rather to consider them at risk of developing obesity in the future. Therefore, use of an early childhood obesity prediction model could have different long-term outcomes to the previously mentioned longitudinal studies. A randomised controlled trial on the use of a prediction model (with lower false positive rates than ours), alongside supportive intervention, would be needed to assess if this was the case.

Our study has several limitations. Parental anthropometric and infancy weight gain data were missing for many participants in all cohorts, meaning these otherwise suitable participants were excluded from the final analysis. While the overall GUiNZ cohort was representative of New Zealand's population19, the exclusion of participants with missing data (in particular missing paternal data) meant that a greater proportion of children from lower socioeconomic backgrounds and/or with mothers of Māori or Pacific ethnicity, were excluded. Thus, this exclusion of participants should be considered when interpreting the applicability of the model to the wider New Zealand population. Also, parental BMI was based on self-reported data and potentially inaccurate48, although a New Zealand study found no differences between self-reported and measured BMI in adults49. In addition, our model was derived using historical data that might not be reflective of current trends. Nonetheless, all GUiNZ infants were born between 2009 and 201018; the prevalence of obesity in 4–5-year-old New Zealand children has not increased since 2010 (regardless of ethnicity or socioeconomic status)2. Furthermore, the prevalence of obesity among New Zealand adults has been stable since 2012/13, with only a slight rise since 2011/1250. Taken together, this suggests that the prevalence of obesity among children included in our derivation cohort is likely to be reflective of current early childhood obesity trends in New Zealand children, and at least two of the models’ parameters (maternal pre-pregnancy BMI and paternal BMI) are also likely to be closely reflective of current trends. Additionally, while BMI ≥ 95th percentile is a commonly used threshold for obesity in children, it is acknowledged that BMI does not directly measure body fat30. Thus, it cannot be considered a definitive diagnostic tool for obesity, which is defined by the WHO to be “abnormal or excessive fat accumulation that may impair health”51. Nonetheless, its practicality means it is a commonly used tool for diagnosis of obesity in clinical practice30; in New Zealand for example, BMI (z-score) is a key tool to identify children with weight issues in the Ministry of Health's B4 School Check, a nationwide programme that offers free health and development assessments for all children at 4 years of age52. Strengths of our study include the internal and external validations, and all children’s anthropometric measurements being taken using consistent protocols by trained professionals, which reduces the potential for error.

In conclusion, we have derived and validated an early childhood obesity prediction model using a cohort that is representative of the diversity of the contemporary New Zealand birth population, although children of Māori and Pacific mothers and those from lower socioeconomic backgrounds were under-represented in our selected subset. While AUROCs for the derivation and validation models were at least acceptable, PPVs were generally low so that high rates of false positives could create considerable unwarranted anxiety for many families. This would be a particular issue in a context where there remains uncertainty regarding what constitutes a successful intervention to prevent early childhood obesity, thus, we are hesitant to recommend the use our model in clinical practice.