Introduction

It is becoming more evident that preterm birth increases the incidence of risk factors for cardiovascular disease in early as well as later life. Preterm birth is associated with a higher percentage of body fat, higher blood pressure, and an increased risk of dysglycemia from infancy into adulthood.1,2,3,4,5,6 Furthermore, preterm birth has been associated with an increased risk of ischemic heart disease in adulthood.7 Therefore, it is of utmost importance to identify proper screening tools to identify risk factors for cardiovascular disease at an early age and implement preventive measures accordingly.

As body fat percentage and fat mass index have been shown to positively correlate with the occurrence of metabolic syndrome components,8,9 monitoring body composition during early life could help to implement timely preventive measures. To gain more insight into which methods should be used to monitor body composition in early life in preterm infants, this systematic review will assess validation studies in preterm infants from birth up to 6 months corrected age.

Several methods to assess body composition are available—ranging from inexpensive, bedside techniques, such as skinfold thickness (SFT) measurement, to expensive and bulky equipment, like air displacement plethysmography (ADP).10,11 Currently, ADP and dual energy x-ray absorptiometry (DXA) are frequently used in research settings and are considered as reliable methods.12 Nevertheless, these methods are not widely implemented in clinical practice. Furthermore, there is no consensus on which method should preferentially be used to assess body composition in preterm infants. This systematic review aims to determine the validity of different methods used to measure body composition in preterm infants and to show whether validated methods yield comparable results.

Methodology

This systematic review primarily assessed the accuracy of various methods used to measure or estimate body composition in infants born preterm.

A systematic literature review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement (www.prisma-statement.org). The review was registered and the protocol was published on PROSPERO International prospective register of systematic reviews under ID CRD42018107821. Searches were performed in PubMed, Embase.com, and Wiley/Cochrane Library from inception (1809) up to 29 September 2020 by D.F.J.Y. and J.C.F.K. The search included keywords and free-text terms for “premature” and “body composition” or “adiposity” or “lean.” Animal studies, conference papers and editorial letters were excluded. No language or publication date restrictions were applied. The full search strategies can be found in Table 1. In addition, the reference lists of relevant articles and Google Scholar were checked. Where needed, authors were contacted for clarification or additional information.

Table 1 Search strategies.

Study eligibility criteria

Studies were included if they reported on body composition measurement in infants born before 37 weeks of gestation. The body composition measurement had to take place between birth and 6 months corrected age. Studies needed to evaluate methods which measure or estimate fat (-free) mass in (kilo)grams or percentage. Studies measuring or estimating total body water (TBW) were also included. See Table 2 for a description of the included methods.12,13 In addition to validation studies, randomized controlled trials, cohort studies, and epidemiologic studies were included if these studies reported the accuracy or predictive values of body composition measurements.

Table 2 In vivo techniques for measuring body composition in preterm born infants.12,13

Definitions

A method was deemed validated if the method showed good statistical agreement to a reference method. Currently, there is no golden standard for the measurement of body composition. Hence, studies of all possible reference methods were accepted.

Good agreement was defined as a maximum allowed difference of 10% of the mean value of the body composition parameter in the study population. For example, if the mean of the fat free mass in a study population was 2000 g, then the bias ± limits of agreement had to be smaller than ±200 g.

For studies where agreement was not assessed, effect size of the different methods was determined by the r-squared value. A value below 0.5 was considered a poor predictive value, 0.5–0.7 as a moderate predictive value, and >0.7 was considered as good predictive value.14

Data extraction

Two reviewers (D.F.J.Y. and D.d.J.) separately screened the studies, initially based on title and abstract, followed by full-text review of the relevant studies.

Data extraction was performed by these two reviewers. The data collected included the method and reference method, details on the study setting, methods, and results. In case of any discrepancies between the two reviewers, the two reviewers came to agreement through discussion.

Reviewers were not blinded for authors or journal details. Where needed, authors were contacted for clarification or additional information.

Risk of bias (quality) assessment

Two reviewers (D.F.J.Y. and D.d.J.) primarily assessed bias using the Critical Appraisal Skills Program (CASP) checklist.15 In case of any discrepancies between the two reviewers, the two reviewers discussed and asked the expert opinion of the two other reviewers until an agreement was made. The synthesis was based on the final decision made under agreement of all reviewers. The quality of individual studies was assessed with CASP checklists. In addition, the Oxford Centre for Evidence-based Medicine’s Levels of Evidence was used to grade the level of evidence of each manuscript.16

Strategy for data synthesis

A narrative synthesis was primarily done by two researchers (D.F.J.Y. and M.M.v.W.) and was reviewed by D.d.J., J.C.F.K., and H.N.L. before finalization.

Results

Out of the 1884 identified records, 48 full-text articles were assessed for eligibility and 19 were included in this synthesis (Fig. 1). Nine studies (n = 1539) reported about the predictive value or validity of body proportionality measures. Five studies (n = 319) investigated the validity of bioelectrical impedance analysis (BIA), three studies (n = 90) investigated the validity of SFT, two studies (n = 24) investigated the validity of ADP, one study (n = 63) investigated the predictive value of ultrasound, and one study (n = 15) investigated the validity of MRI. There were no human studies that reported about the validation of DXA and isotope dilution studies in preterm infants (Table 3). Body composition measurements were performed at various postnatal ages, ranging from 24 h postpartum to 4 months corrected age.

Fig. 1: PRISMA Flow diagram of included studies.
figure 1

Flow diagram.

Table 3 Body proportionality calculations to estimate body composition in preterm infants.

Body proportionality measures

Table 3 shows our findings on the predictive value and validity of body proportionality measures. Weight and length indices had a moderate to good predictive value for fat-free mass (in grams). On the contrary, the predictive value of weight and length indices for fat mass was poor to moderate and fat mass percentage was poorly predicted by weight/length indices.17,18,19,20,21

Larcade et al. and Simon et al. assessed predictive equations with clinical parameters, such as caloric and macronutrient intake, and z-scores for weight, length and head circumference.22,23 They found that fat-free mass (g) could be predicted by the amount of human milk feeding, respiratory support, antenatal corticosteroid use, growth parameters and sex. A newly modeled equation by Larcade et al. showed good agreement for fat free mass (g). However, Larcade and colleagues did not assess fat mass percentage, while Simon and colleagues could only explain 24% of the variance in fat mass percentage with their predictive model22,23 (see Table 3).

Daly-Wolfe and colleagues reported that mid-arm circumference had a moderate predictive value for fat mass percentage measured by ADP24 (see Table 3). On the other hand, Koo et al. measured chest, abdomen, mid-thigh, and mid-arm circumference. After including sex, race, gestation, weight and length in a predictive equation for fat mass (g) and fat free mass (g), these body and limb circumference did not explain any additional part of the variance in fat (free) mass (g).17 Of note, these findings were based on the entire study population that included both term and preterm infants (respectively, n = 68 and n = 52) who were large, appropriate as well as small for gestational age. Nevertheless, Pereira-Da-Silva and colleagues investigated a group of exclusively preterm infants and reported mid-arm circumference to have a poor predictive value arm fat area (mm2) measured by MRI25 (see Table 3).

SFT measurements

Table 4 describes the predictive value of SFT. Schmelzle et al. showed that the sum of SFT measured at four sites had a good predictive value for fat mass (g) in a study population that included both preterm as well as term infants.26 However, only 10 out of 104 infants in this study population were born preterm.

Table 4 Skinfold thickness measurements to estimate body composition in preterm infants.

Koo and colleagues also assessed the predictive value of SFT in a mixed population of term and preterm infants (respectively, n = 68 and n = 52). They reported that SFT, when added to weight and length, explained an additional 13% of the variance in fat free mass (g).17 Thus, SFT had a poor predictive value for fat free mass (g).

Schmelzle et al. as well as Koo et al. did not assess the predictive value of SFT for fat (free) mass percentage. In contrast, Daly-Wolfe et al. did and reported a poor predictive value of SFT for fat mass percentage (see Table 4).24

Bioelectrical impedance measurements

Table 5 describes the predictive value and validity of bioelectrical impedance measurements. The impedance index (height2 in cm2/impedance in Ω) measured with BIA adds little to the variance in fat-free mass already explained by weight27 (see Table 5). Indeed Raghavan et al. reported that the least bias was obtained when weight alone was used to estimate TBW.28 Table 5 shows that models used to estimate body composition based on the impedance index alone showed poor agreement. In contrast, the predictive equation by Dung et al. based on weight and the impedance index showed good agreement27 (see Table 5).

Table 5 Bioelectrical impedance to estimate body composition in preterm infants.

Table 6 describes the predictive value of ultrasound measurements. Ultrasound measurements of muscle and fat mass showed high reliability (intraclass correlation coefficient 0.874–0.975; technical error of measurements 0.251–0.628 mm), but had a poor predictive value for fat mass percentage measured by ADP29 (see Table 6).

Table 6 Ultrasound to estimate body composition in preterm infants.

Table 7 describes the predictive value of MRI. Dyke and colleagues assessed the accuracy of body composition measured with rapid whole-body MRI. Repeated scans showed good agreement of fat mass percentage (95% limits of agreement 1.3%).30 However, body composition measurement was not compared to a reference method (see Table 7).

Table 7 Magnetic resonance imaging to estimate body composition in preterm infants.

Table 8 describes the validity of ADP. Compared to isotope dilution, ADP showed good agreement when measuring fat free mass density.31 Forsum et al. and Roggero et al. demonstrated that there was a small bias in the measurement of fat mass percentage.31,32 Nevertheless, the limits of agreement were relatively wide and thus there was poor agreement between fat mass percentage measured with ADP compared to fat mass percentage measured by isotope dilution (see Table 8).

Table 8 Air displacement plethysmography to estimate body composition in preterm infants.

Discussion

Numerous studies have addressed how to measure body composition in preterm infants.17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35 Indeed there is an urgent need to monitor body composition in this population given the less favorable body composition in infancy and childhood as well as the increased risk of adverse cardiometabolic outcomes in later life.1,2,3,4,5,6,7 Nevertheless, till date, there is no consensus on which method should preferentially be used to assess body composition in preterm infants.

Reference methods

The studies included in this review used ADP, DXA, isotope dilution, and MRI as reference methods. In our opinion, ADP, DXA, and isotope dilution are acceptable reference methods.

ADP, DXA, and isotope dilution have been validated against chemical carcass analysis in piglets.36,37,38,39,40 The body composition of piglets are considered to be comparable to the body composition of preterm infants.37 Therefore, in practice, these methods are accepted as accurate measures. Nevertheless, we should take into consideration that there is variation within and between these methods. For example, different types of software are used to analyze DXA. It has been reported that pediatric and infant software rely on different assumptions and yield varying results.41 Furthermore, as discussed below (under the accuracy of ADP), the statistical agreement between ADP and isotope dilution may be interpreted as poor.31,32 Therefore, we believe that some reservation is needed when comparing different reference methods.

In contrast to ADP, DXA, and isotope dilution, MRI has not been validated against chemical carcass analysis in subjects comparable to the neonatal population, but it has been found accurate in adult human cadaver and animal studies.42,43 Over the years, MRI has been increasingly used to measure body composition in the neonatal population as well.30,44,45,46 However, it is yet to be widely implemented and we find it preliminary to use MRI as a reference method in comparative studies with preterm infants.

In our opinion, due to the lack of a gold standard and the difference between reference methods, some reservation is needed when drawing conclusions on the validation studies included in this review.

Assessment of validity

To assess whether two methods agree, Bland–Altman analyses are an accepted and widely implemented method.47 Agreement should be based on a maximum difference between the two methods that is clinically acceptable.47,48,49,50 Nevertheless, this so-called predefined clinical agreement limit was omitted in all the studies included in this review. Hence, the interpretation of these studies is limited.

Accuracy of body proportionality measures

Studies conducted so far have mainly assessed the predictive value of body proportionality measures.17,18,20,21 Two studies assessed the agreement between ADP and a predictive equation including weight and clinical parameters.19,22 Liotto and colleagues found poor agreement between ADP and their predictive equation which estimated fat (free) mass adjusted by length (g/cm).19 However, they included both preterm and term infants in their analysis, which makes it difficult to extrapolate their findings to only preterm infants—the target population of this review. Larcade et al.22 investigated a study population of exclusively preterm infants and could not validate the predictive equation for fat free mass (g) made by Simon and colleagues.23 A difference in nutritional care and ensuing better growth in Larcade’s population may have been the cause of an underestimation of fat free mass by the previously modeled equation.

In our opinion, it is difficult to develop a predictive equation that can be validated externally. Just as Larcade et al. found nutritional practices to influence the prediction model, changes in neonatal care over time and across neonatal intensive care units (NICUs) influence the predictive equations and limit their universal application. Moreover, investigators have used mixed study populations which include small, appropriate as well as large for gestational age infants.17 Meanwhile, Koo et al. demonstrated that associations between weight/length indices and body composition differ for those born large for gestational age, which makes the use of a mixed study population inappropriate.17

In addition, it is important to note that predictive equations generally found that a large proportion of the variance in fat (free) mass (g) could be explained by weight or BMI. This logically follows the fact that fat mass (g) and fat free mass (g) together make up total body weight. However, fat mass percentage was poorly explained by weight or length indices. Meanwhile, in our opinion, fat (free) mass percentage, may be a more relevant parameter when it comes to comparing the body composition of an individual or groups because it takes the subject’s weight into account.

All in all, we conclude that the predictive equations based on weight and length indices currently cannot be implemented in clinical practice, because of the lack of external validation and a poor predictive value for fat (free) mass percentage.

Daly-Wolfe and colleagues found that mid-arm circumference had a moderate predictive value for fat mass percentage measured with ADP.24 Koo et al., on the other hand, found that mid-arm circumference together with chest, abdomen, and mid-thigh circumference, added <5% to the variance in fat mass percentage already explained by weight and length. In their study, however, fat mass was determined by DXA. Pereira-Da-Silva and colleagues compared upper arm anthropometry to regional fat mass measured with MRI and found it to be an inaccurate predictor of regional body composition.25 In our view, currently there is inconclusive evidence on the predictive value of body area circumferences and more research is needed to assess the potential of mid-arm circumference as a predictor of whole body or regional fat mass.

Accuracy of skinfold measurements

Several studies assess SFT in preterm infants.24,51,52,53,54,55,56 For example, Daly-Wolfe and colleagues investigated the predictive value of SFT and found that, together with the mid-arm circumference, SFT explained 49% of the variance in fat mass percentage.24 However, only one study included in the review assessed the validity of SFT in preterm infants.26 Unfortunately, only 10 late preterm infants were included in this study and analysis included term infants as well, so no robust conclusions can be drawn from their findings.26 Recently, we found poor agreement between SFT and body composition in a study with exclusively preterm infants.56

Moreover, SFT is influenced by the fluid status and there is a high interobserver variability.13 In addition, it could be deemed controversial to use SFT calipers in extremely preterm infants in light of their vulnerable skin. Therefore, there is insufficient evidence to support SFT as an clinically useful measure of fat mass in preterm infants at this time.

Accuracy of bioelectrical impedance

Both BIA, as well as bioelectrical impedance spectroscopy, have a poor predictive value for TBW measured with isotope dilution analyses28,35,57 and body composition measured with DXA.27 BIA did not seem to provide an additive predictive effect for fat-free mass or TBW, compared to body weight alone.27,28 Though both Kushner and colleagues as well as Tang and colleagues found the impedance index (cm2/Ω) to significantly improve the prediction of TBW, the majority of variance in TBW measured with isotope dilution was still explained by weight.33,57 Kushner and colleagues concluded that the impedance index (cm2/Ω) explained 99% of the variance in TBW. However, in their subgroup of preterm infants there was a significant bias: a higher variance was found for higher values of the impedance index. This bias was only eliminated by the addition of weight to the prediction equation33—implying that bioelectrical impedance on its own is not an adequate predictor of body composition in preterm infants.

Accuracy of ultrasound

Ahmad et al. previously demonstrated that ultrasound measurements correlate with fat mass in preterm infants.58 Depending on the site of ultrasound measurement the intraobserver variability was reported to be up to 14.7% in preterm infants.59 To our knowledge, the interobserver variability has not been investigated in preterm infants but has been reported to show high interobserver agreement (0.89–0.95) in 1- and 2-year-olds.60 Nevertheless, we only found one study which assessed the predictive value of ultrasound measurement for body composition in preterm infants and they found a poor predictive value.29 Others did report on ultrasound measurements as a means to estimate body composition in preterm infants, but they did not assess the predictive value or validity.58,59 Hence, in our opinion, there seems to be insufficient evidence for the use of ultrasound as a reference method for body composition. However, since ultrasound measurements showed high reliability, it may be of interest to investigate whether other body sites are a better representation of body composition.

Accuracy of MRI

Though several authors suggested the use of MRI to measure body composition in preterm infants,44,61 only one study was found that assessed the validity of MRI for the assessment of body composition in a small study population.30 Despite a sound assessment of repeatability, the actual fat mass (g) measurement was not compared with other techniques. Therefore, it should be concluded that more studies are necessary to draw conclusions on the use of MRI in determining fat mass in preterm infants.

Accuracy of ADP

Roggero and colleagues found a small bias when comparing ADP to isotope dilution.32 However, the limits of agreement were relatively wide, resulting in poor agreement between the two methods. Of note, the accuracy of ADP was only assessed in a small subgroup of 10 preterm infants. In line with their findings Forsum et al. also found a small bias when comparing ADP with deuterium dilution in 14 preterm infants.31 Fat mass percentage, however, had relatively wide limits of agreement and thus poor agreement. Fat free mass density (g/ml), however, agreed well. Precision was studied by Roggero in a larger group of 57 preterm infants and also showed a small bias for fat mass percentage between repeated ADP measurements.32 Despite wide limits of agreement the authors concluded that ADP shows good agreement with isotope dilution for fat mass percentage as well fat free mass density.31,32 Likewise, carcass analyses showed small bias, with relatively wide limits of agreement.36 Nevertheless, it is generally accepted that ADP is a reliable method in infants. Taking into account the relatively small study populations and small bias, it is to be expected that larger studies would yield better agreement. Therefore, we conclude that ADP is a reliable method to assess fat mass in preterm infants.

Accuracy of isotope dilution

Isotope dilution is a well-established method for the measurement of TBW from which the fat free mass can be derived.62 As a result there were no studies validating isotope dilution against another reference method, such as DXA, in preterm infants. Hartnoll and colleagues, however, did show similar results as late nineteenth century cadaver studies, even though no comparative analysis was done.63 A conclusion thus cannot be drawn based on human studies in preterm infants but is deemed reliable based on carcass analysis of piglets.37

Accuracy of DXA

There are no comparative studies with preterm infants where DXA was compared with other methods, such as isotope dilution. DXA has been validated in piglets38,40 and in practice is accepted as an accurate measure. Nevertheless, in human as well as in animal studies DXA has been reported to overestimate fat mass, especially in lower weights.40,64 Moreover, different software algorithms yield varying results in body composition.41

It would be insightful to investigate the agreement between isotope dilution, ADP, and DXA in preterm infants. This would help to give us guidance on which method should be preferentially used to assess body composition in preterm infants.

Patient-friendliness, ease of use, and costs

Methods are chosen based on local experience and available resources. In a clinical setting, there is a preference for quick, easy-to-use, but accurate methods, which could be used at the bedside. In contrast, in a research setting there is more room for less flexible methods.

Body proportionality measures are quick, low-cost, and minimally invasive methods. They are ideal in the intensive care setting as well as in outpatient department for follow-up. Unfortunately, studies so far have not confirmed these methods have sufficient accuracy. Likewise, SFT and bioelectrical impedance techniques are easy bedside methods, but robust evidence supporting their use is lacking. Moreover, there is a high interobserver variability, and in extremely preterm infants SFT measurements should be taken with caution to prevent injury of their vulnerable skin. Nevertheless, when used with caution SFT is a safe, non-invasive method.10

DXA, ADP, and isotope dilution are accurate methods but have some practical downsides. For DXA, infants need to be clinically stable and free from respiratory support and monitoring, making DXA more appropriate from term age onwards. Taking into account that movement is not allowed, it could be used in infants up to 6 months corrected age who can be swaddled or nursed to sleep during the procedure.

ADP could be used in the NICU if the infant is clinically stable and not on respiratory support. Nonetheless, the machinery is bulky and recalibration is needed after movement, making it less suitable for such use.

On the contrary, isotope dilution can be used in small infants who are not stable yet, making it very suitable for use in the NICU—were it not that it has a relatively high workload. In addition, oral administration of the isotope solution is challenging in older infants.

MRI is yet to become a widely used method to measure fat mass in preterm infants, but seems promising with high precision. Furthermore, MRI is a safe, radiation-free method. Nevertheless, infants on respiratory support and monitoring cannot easily undergo an MRI procedure. All in all, at this time there is no prospect of an accurate, easy, low-cost point-of-care instrument that could be used during the NICU stay or the follow-up period in a clinical setting. In a research setting, ADP may be the most practical, yet reliable, method to use in infants up to 6 months corrected age.

Limitations

This review included all potential reference methods, which made it challenging to come to a concise conclusion. Furthermore, it is important to note that, overall, validation studies were conducted in a limited number of study subjects with a wide range of gestational ages and varying postnatal ages at the time of assessment. Hence, it was even more difficult to draw definitive conclusions on the assessment of body composition at different gestational and postnatal ages. Moreover, the reference methods used in the various studies have not been validated in humans or were only validated in a small number of subjects. This lack of a solid golden standard further undermines any conclusions drawn from these studies. In addition, only formally published data was included, leaving potential publication bias unassessed. There is a need for larger cross-sectional studies comparing these instruments at different time points as well as longitudinal studies investigating the accuracy of the use of the instruments over time.

Conclusions

Monitoring body composition remains important in the light of the increased cardiometabolic disease risk in adults born prematurely.1,2,3,4,5,6,7 Therefore, the quest for not only accurate but also practical methods to assess body composition should continue.

This review reaffirmed that weight and length indices, body area circumferences, SFT, BIA, and ultrasound do not adequately reflect body composition. MRI looks promising for the use in preterm infants but has not been validated for the measurement of body composition. On the other hand, DXA, ADP, and isotope dilution methods are considered trustworthy and validated techniques. Nevertheless, this review showed that these methods may not yield comparable results. Therefore, caution should be taken when comparing body composition measured with different methods. Moreover, to facilitate future studies and support clinical practice it would be valuable for researchers and physicians to come to an agreement on which reference should preferentially be used to measure body composition in preterm infants.