Main

Dual-energy X-ray absorptiometry (DXA), a technique that allows investigators to measure bone mineral and soft tissue in vivo, has come into common use for the purposes of medical research and clinical evaluation (1,2). In body composition research, DXA is used for scanning the entire human body to obtain values for three composition compartments: bone mineral content (BMC), lean soft tissue, and fat mass. A wide range of subjects have been studied using DXA, ranging from premature infants up through obese adults (3,4). Various scan modes are available, tailored to the size of the subject. For small subjects such as neonates, investigators normally use an infant scan mode, which has a finer resolution than that of the adult whole-body scan mode.

In pediatrics, one of the trends raising interest in DXA technology is the increased incidence of childhood fractures and concerns over early bone health (5). Because early growth affects health later in life, early testing and monitoring may be wise in certain cases (6,7,8). One may apply similar logic to the current obesity epidemic, where early intervention may be needed to moderate future health issues (9). These observations increase the demand for reliable diagnostic tools.

Custom-made standards called phantoms are usually required for evaluating DXA scanning methodology more accurately (10). Phantoms are necessary for system calibration, precision testing, quality control (QC), cross-calibration, and stability evaluation, all of which are necessary for reliable instrument performance (11,12). For bone studies, for example, Emaus et al. (13) found that not only was the accuracy of bone loss estimates affected by densitometer performance, but that a calibration phantom itself could influence the quality of the results. DXA measurement results are also highly dependent on the manufacturer (14). Results can even vary within a single manufacturer. Different models and software versions can produce significantly different results (15,16).

The DXA instruments currently used most often in pediatric populations are manufactured by Hologic (Bedford, MA). Hologic has commercially available phantoms representing the lumbar spine, proximal femur, and adult whole body. Other phantoms, such as the European Spine Phantom (QRM, Moehrendorf, Germany), are available from other independent manufacturers (17). There are no commercially available phantoms specifically for use in infant whole-body scans.

Earlier research efforts in the development of infant DXA phantoms by Shypailo et al. (18) used sealed, fluid-filled plastic bags for simulating roughly anthropomorphic infant shapes. Although the results were successful in demonstrating a discrepancy between adult and infant scan modes for BMC, the phantoms themselves were fragile and difficult to construct. Later work by Hammami et al. (19) produced infant phantoms made from liquid-filled plastic bottles. These phantoms were more durable but somewhat less anthropomorphic, and the long-term stability of the phantom liquids is unknown. The investigators scanned the phantoms on different instruments. Individual results appeared to differ between the instruments, although statistical significance was not reported. Picaud et al. (20) presented the first all-solid phantom. A similar discrepancy between adult and infant scan modes was reported. Although solid, the phantoms were not anthropomorphic but, instead, were rectangular slabs.

None of the previous attempts at constructing suitable infant DXA phantoms were ideal, lacking either long-term stability, anthropomorphism, or ease of manufacture and transport. These requirements are important. Long-term stability of research instruments must be documented. Semi-anthropomorphic phantoms mimic in vivo results better than their nonanthropomorphic counterparts do (13). Any phantoms used for cross-calibration studies must be sturdy and easily transported (21).

Our goal was to develop an all-solid infant phantom made of readily obtainable materials. The phantom should be durable, roughly anthropomorphic, and transportable. The phantom’s dimensions (e.g., length, circumference) should represent an infant at a particular age, with body compartment values (i.e., lean tissue, fat, and bone) that fall within the expected range for that infant. Our intention was to construct a prototype of the infant phantom and test it on our own in-house instruments to assess the suitability of the infant phantom design.

We also wished to test the phantom at other collaborating sites involved in the scanning of neonates. This would provide information on precision and compatibility of different DXA instruments and allow us to assess the degree of agreement of infant body composition data obtained at multiple centers involved in pediatric studies.

Results

Figure 1 displays the outline of the infant phantom and shows the outline as an overlay on a DXA infant scan of comparable size (i.e., ~7 kg). Dashed lines on the phantom schematic represent the polyvinyl chloride (PVC) bones inset within the phantom. The skull PVC spans the entire head section as a thin layer. Figure 2 shows a three-dimensional representation of the phantom.

Figure 1
figure 1

Schematic outline of the infant phantom overlaid on a dual-energy X-ray absorptiometry scan of a comparably sized infant.

PowerPoint slide

Figure 2
figure 2

Three-dimensional infant phantom sketch showing layering of materials alongside a dual-energy X-ray absorptiometry image of the infant phantom showing inset polyvinyl chloride bones. Materials depicted in the sketch are high-density polyethylene (white), nylon (gray), and Nylatron MoS2 (black).

PowerPoint slide

The three-dimensional sketch displays the layered materials used for constructing the phantom. Layers of polyethylene (high-density polyethylene (HDPE)) make up the anterior and posterior surfaces of the phantom, surrounding the nylon and Nylatron MoS2 (nylon containing particles of molybdenum disulfide (MoS2)) layers. The inset PVC is not shown but can be seen on the DXA scan image.

Table 1 displays the initial precision testing results and also lists the actual gram values of the materials serving as proxies for the body compartments: PVC, HDPE, and nylon + MoS2 for BMC, fat, and lean tissue, respectively. These were the predicted target values we hoped to match with the scan results. Means and precisions (SD and coefficient of variation (percentage CV)) are listed for each compartment, as well as measured:predicted ratios indicating how closely the scan results matched the actual mass of each material.

Table 1 Infant phantom precision testing

Lean tissue compartment results showed the lowest percentage CV values (~2.5–3%) and were in closest agreement with predictions, coming within ~5% of the predicted values. Bone mineral results had slightly higher percentage CV values (~3.5%), and were about 12% lower than the predicted BMC based on PVC mass. The fat compartment showed the highest percentage CV values (~5.5–7.5%), and deviated most from the predicted values, overestimating the HDPE mass by ~30%. Overall, the percentage CV for the total phantom weight was very low (<0.2%).

In almost all cases, long-term results were slightly less precise than short-term results. The exception was BMC, which reduced percentage CV by 0.2%. However, there were no significant differences between the long-and short-term results for any of the compartments (P > 0.05).

Figures 3 and 4 graphically display comparisons of phantom scanning results among the different sites as a series of box plots. Duplicate site numerals (i.e., 1, 1A; 3, 3A; 4, and 4A) identify a single site housing more than one instrument. The mean at each site is identified by a ‘+’. The minimum, first quartile, median, third quartile, and maximum are also displayed for each site. The horizontal dotted lines represent ± 2 SD of the mean for site 1, used as the reference site (n = 30 runs). All other sites represent data from 10 phantom scans. The lowercase letters in parentheses below the site numbers identify site groups that do not differ significantly from each other based on multiple comparisons using the REGWQ method. For example, BMC results ( Figure 3a ) from sites 1A and 3—group ‘b’—are not significantly different. Separate graphs are shown for bone and soft tissue.

Figure 3
figure 3

Comparison box plot of (a) bone mineral content (BMC) results and (b) bone mineral density (BMD) results from multiple sites. The mean at each site is identified by a ‘+’ sign. The horizontal dotted lines represent ± 2 SD of the mean for site 1—the reference site. Lowercase letters in parentheses identify sites that do not differ within a group.

PowerPoint slide

Figure 4
figure 4

Comparison of (a) total body fat results and (b) lean tissue results from multiple sites. The horizontal dotted lines represent ± 2 SD of the mean for site 1—the reference site. Lowercase letters in parentheses identify sites that do not differ within a group.

PowerPoint slide

Significant differences between sites occurred more often with bone compartment results (BMC and bone mineral density (BMD)). More post hoc groups were identified, and a greater number of points appeared outside the range established by the initial reference site. Results for total body fat and lean mass showed better agreement. Most sites were assigned to the same group and did not differ significantly from each other.

Almost all of the results from each site followed a normal distribution. The only exceptions were site 7 BMC (P = 0.001); site 4 BMD (P = 0.041); site 2 fat and lean mass (P = 0.02); and site 3A fat and lean mass (P = 0.01). The results of the nonparametric Kruskal–Wallis tests agreed with those of one-way analysis of variance in determining that the scanning site effect was significant (P < 0.0001), and the same site groups were identified for each measured parameter. The results of analysis of variance are presented in Table 2 .

Table 2 One-way ANOVA multisite comparison

In Table 2 , the control or reference category is site 1. Overall, for all compartments, the scanning site had a significant effect on the measured parameter (F-test < 0.05). Coefficients of determination (R2) were fairly high for BMC and BMD, explaining a large percentage of the model variability. R2 values were much lower for the fat and lean tissue compartments. Comparing each site with site 1, most sites differed significantly in BMC and BMD results, whereas the opposite was true for fat and lean mass results. Sites 1A, 3A, and 6 were the sites not in agreement with the reference site for fat and lean mass.

In addition to multisite comparisons, differences between the instrument models were assessed ( Table 3 ). The model—Hologic Delphi vs. Hologic Discovery—significantly affected all compartments with the exception of BMD. BMC and fat results agreed within ~4% on the two instruments—discovery results were lower in both cases. The lean mass results were in closest agreement (~2% difference), although the results were still significantly different.

Table 3 DXA instrument model comparison

Discussion

Earlier efforts applicable to our project gave little detail about actual phantom dimensions or masses, other than perhaps overall phantom weight (18,19,20). These studies also did not provide comparisons of absolute amounts of construction materials with measured quantitative results, which would be an indication of the success of the tissue substitutes used. None of the previous infant phantom attempts were truly anthropomorphic. While an exact infant replica would be difficult to achieve and cost prohibitive, our phantom design moves closer to emulating a basic infant shape.

The infant phantom scan results presented in this article are encouraging for two reasons: precision of scanning results and success of tissue substitute materials. We may have been able to obtain better tissue surrogate matches, but this would have required us to create our own or alter existing materials, which would have violated our objective of using only commercially available common materials. As evidenced in Table 1 , the combination of nylon materials emulating lean tissue came within ~5% of the expected values. The PVC results did not emulate BMC as effectively, but were still within ~12% of the predicted values. This was not unexpected because the relevant physical properties of PVC—density and mass attenuation (μ/ρ, an index of X-ray energy absorption and scattering)—did not perfectly match those of bone ( Table 4 ). For example, the density of PVC is only 70% of bone, and although μ/ρ is slightly higher by comparison, we expected that the measured BMC values would underestimate the predicted values. The HDPE was, surprisingly, the furthest away from matching the predicted values for fat, yet this is a commonly used DXA phantom construction material. Its density and mass attenuation coefficients are very close to those of adipose tissue ( Table 4 ). It is important to note that DXA calculations of fat and lean mass are influenced by both fat and lean tissues in tandem. A given high- and low-energy pair of measured X-ray attenuation values is produced by a unique combination of fat and lean mass (22). Thus, phantom soft tissue results will be dependent on the performance of both materials—HDPE and nylon. An underestimation of the lean compartment will cause an overestimation of the fat compartment. Because lean tissue is the largest compartment (73% of body weight in the infant phantom), any error in lean mass measurement will cause a disproportionately greater error in fat measurement. This suggests that the fat compartment is the most difficult to measure. This supposition is confirmed in the sense that repeatability results were poorest for total body fat. Nevertheless, percentage CV values were still fairly low—~6% for fat and fat percentage in the short term.

Table 4 Candidate material characteristics: comparison with tissue to be emulated

Precision values in the longer term were slightly higher for most parameters (aside from BMC), although not significant. Other investigators have noted poorer long-term precision (23); thus, the 40-d time period may have been too short to cause any changes in instrument calibration. Any changes in performance would be attributable to instrument drift or maintenance issues because the phantom itself is an unchanging inanimate object and the materials are not expected to degrade.

When viewing infant phantom scan results from different research centers ( Figures 3 and 4 ), one can see that in some cases, the repeated measures cluster together very closely (e.g., site 3 BMC; percentage CV = 1.6%). Other sites show a wider range of results, and thus poorer reproducibility, with no obvious explanation at this point (e.g., site 4A BMC; percentage CV = 5.3%). This may suggest that poor precision indicates a need for system maintenance or recalibration. We did not have data on any site’s maintenance schedule or service reports; therefore, we cannot speculate on a particular instrument’s condition. It also seems evident that when replacing a DXA instrument, as was the case with two of the sites, some calibration and correction may be required to bring new and old results into closer agreement. As shown in Figure 3 , BMC and BMD results between the old and new instruments at sites 3 and 4 (i.e., 3 vs. 3A and 4 vs. 4A) were significantly different. Continuity of measurement results is important, and the International Society for Clinical Densitometry recommends cross-calibration using phantoms following hardware changes (11). Hologic uses multiple spine phantom scans for calibrating replacement systems during system upgrades to bring new instruments into agreement with older systems. Yet, agreement within a single-scan mode (spine) does not appear to guarantee agreement within other scan modes such as infant or adult whole body, which may potentially affect result reliability—especially for follow-up visits done on different machines.

When reviewing daily QC procedures that use a standard Hologic spine phantom, one expects BMD percentage CV to be <0.6%. Our in-house Hologic adult whole-body phantom shows a higher percentage CV, but it is still <2% (data not shown). Our infant phantom precision data shows percentage CV for bone area of <2%, whereas percentage CV for BMC and BMD are closer to 4% ( Table 1 ). Because of the lack of a validated phantom recommended specifically for the infant scan mode, we do not know whether this relatively poorer repeatability is due to the nature of the infant scan mode procedure or due to the flaws in our phantom design. Infant scan precision data in the literature are sparse, with most findings based on repeat scans of piglets. Precision values noted by other investigators include percentage CV for BMC of 2.0–2.8%, BMD of 1.5–2.5%, lean mass of 1.3–2.5%, with fat consistently showing the poorest repeatability with values approaching 6.9% (24,25,26,27). These value ranges are not far removed from the percentage CV data from our infant phantom. The poorer precision of certain phantom or scan types vs. the spine phantom highlights the differences between the performances of the various DXA scan modes. The results from our infant phantom seem to support this condition.

Recent work by Fields et al. (28) compared infant body composition results from DXA with air displacement plethysmography (Pea Pod, COSMED USA, Concord, CA). Although results were strongly correlated, there was a significant disparity in absolute values of fat, percentage fat, and fat-free mass between the two techniques. This becomes critical if the two techniques are to be used interchangeably. This circumstance is made plausible by the fact that the Pea Pod technique, validated for infants weighing between 1.5 and 8 kg (29,30), is limited to infants below ~6 mo of age. Thus, any longitudinal body composition study working with infants beyond an age of 6 mo may depend heavily on the use of DXA. The long-term repeatability results presented in this article ( Table 1 ) are encouraging in the sense that results from a single instrument did not vary to a great extent. Thus, as long as a longitudinal study utilizes the same instrument throughout, results may indeed be dependable.

Longitudinal assessment, just as with a single time point, also often requires interpretation of Z-scores, which is not possible at this time for infant DXA scanning (5). More data are still needed to produce valid normative references, something more readily achieved by a multicenter data pooling approach. However, to combine results from various instruments and software versions, some sort of cross-calibration appears necessary. In view of the multisite infant phantom results presented in this article, DXA use may be limited in multicenter studies without a reliable cross-calibration procedure. One solution is to use an infant phantom. Adult-based phantoms exist for DXA scans of the lumbar spine and hip, and scan precision and validation is declared to be a critical aspect of result interpretation. The same is needed for infant DXA scans, if results are to be accepted as accurate.

For the QC or cross-calibration to be truly useful, a range of relative and absolute values should be measured (31). Changes in measurement results due to machine maintenance or long-term instability may be influenced by density values, thus a single infant phantom may be insufficient for instrument calibration. Our future efforts will therefore involve the construction and evaluation of additional infant phantoms varying in size and composition. We also intend to scan the present phantom on instruments made by other manufacturers; otherwise, the results presented here may only benefit those centers using Hologic instruments.

Another limitation worth noting is that these data do not take into account certain factors often associated with the scanning of human neonates, such as movement artifacts, swaddling blankets or restraints, positioning issues, etc. These factors may also influence scanning precision and are difficult to correct for using an inanimate phantom.

Growth and development in early life is strongly associated with life-long health, and adverse influences can persist and manifest later in life (6,7,8). In the study of bone densitometry, most disorders associated with increased fracture risks, as identified by the International Society for Clinical Densitometry, manifest in infancy, if not in utero (32). Primary bone disorders such as osteogenesis imperfecta and juvenile osteoporosis, along with secondary diseases affecting the skeleton, such as cystic fibrosis and cerebral palsy, all affect early growth and development. A recent review by Bachrach et al. (5), however, indicated that there are insufficient data to recommend routine densitometry for infants in assessing bone fragility. This may only be remedied by a more thorough review of infant scanning technology and more stringent requirements for QC and calibration, as has been done for other scan modalities such as the lumbar spine. In addition, many studies rely on DXA to provide reference values for validation of alternate measurement techniques or the development of prediction equations (33,34). In these instances, the reference standard—DXA in this case—should be accurate and reliable for the study outcomes to be valid.

This infant phantom design, composed of PVC, nylon + MoS2, and HDPE as tissue substitutes representing BMC, lean tissue, and fat respectively, came fairly close to emulating those body compartments, judging by a series of DXA scans. Precision values from DXA instruments at several sites showed good reproducibility, which is a validation of the phantom itself as well as an assessment of the individual instruments. Significant differences were found between the results from the various DXA instruments and also between results from two different DXA models from the same manufacturer (Hologic Delphi vs. Discovery). Thus, cross-calibration appears to be a requirement for any multicenter study involving DXA infant scanning. Any DXA site using the infant whole-body scan mode should incorporate a QC phantom specific to that scan mode into its normal routine to evaluate machine precision and long-term stability.

Methods

This study was approved by the Institutional Review Board of Baylor College of Medicine, Houston, TX. There were no human or animal subjects involved in the study.

The choice of tissue-equivalent materials used for representing lean tissue, fat, and bone is critical to the creation of a successful DXA phantom design. An effective tissue substitute must closely match the tissue it simulates in certain physical properties, such as effective atomic number and material density (35). Therefore, for a given thickness or mass of tissue substitute, radiation absorption and scattering of photons should be similar to that found in the tissue itself (36). The suitability of any proposed tissue-equivalent materials must then be verified empirically (36). On the basis of these parameters, our phantom design process involved the following steps: (i) selection of materials, (ii) testing of materials, (iii) phantom design, and (iv) phantom construction, leading to actual precision and cross-calibration testing.

Materials/Properties

An extensive list of potential tissue-equivalent materials can be found in an article by White (36). Report 44 from the International Commission on Radiation Units and Measurements provides additional information, with detailed interaction data for body tissues and tissue substitutes (37). The potential material candidates are many and varied, but not all are practical, cost effective, or available. We did not wish to work with overly sophisticated polymers, or epoxy resins that required special processing or additives and instead focused on readily available plastics and metals that we could easily machine and shape. We tested many materials to identify the ones that would empirically match expected values based on mass. For example, 100 g of material meant to represent bone should ideally produce a BMC of 100 g on a DXA scan.

On the basis of the closest agreement in density, μ/ρ, and our preliminary empirical testing, we chose to use PVC to represent bone and HDPE to represent fat. None of the candidate lean tissue materials we tested were ideal individually, but a combination of nylon and Nylatron MoS2 mimicked the lean compartment quite well.

Table 4 lists the important material characteristics along with comparisons with their target tissues. The μ/ρ terms shown are for 60 and 100 keV. The Hologic infant DXA scan nominally uses two X-ray energies: 100 and 140 kVp, representing potential maximum voltages. We do not know the exact energy distribution of the actual X-rays produced during a scan but expect that the peaks will be somewhat less than the listed kVp values (38). Thus, the values 60 and 100 keV are presented as estimates for comparison. It is important to note that a tissue-equivalent material must match the tissue it is emulating for photon attenuation over a broad energy range, not just at a specific energy. This is shown in Figure 5 , which also illustrates how the combination of two materials (nylon and MoS2 in this case) together produces a combined attenuation profile. The combined mass attenuation coefficients are derived using the basic data method as outlined by White, which calculates a weighted mean based on the weight fractions of the compounds in the mix (36). Neither nylon nor Nylatron MoS2 matches the mass attenuation profile of muscle over the relevant energy range, but a 1:1 mixture of these two materials—represented by the large open circles in Figure 5 —matches the target muscle attenuation almost perfectly.

Figure 5
figure 5

Mass attenuation coefficients for Nylatron MoS2 (black squares), muscle (open diamonds), nylon (black triangles), and nylon + MoS2 (open circles), representing lean tissue, over a wide energy range.

PowerPoint slide

Phantom Dimensions/Construction

The physical characteristics of the infant phantom were based largely on anthropometric data from the Reference Man publication (39) and on representative DXA images from subjects involved in previous studies at our center. In general, the reference data provided head, trunk, arm, and leg circumferences and diameters. We calculated most lengths from DXA images, except for trunk and total, which were available from the Reference Man publication. Most of the bone measurements—lengths and diameters—were based on DXA images.

The final infant phantom design, composed of PVC, HDPE, and nylon materials representing bone, fat, and lean tissues, respectively, is representative of an average-sized (7 kg, 65 cm) normal infant at about 5 mo of age.

Phantom Test Runs

In accordance with recommended guidelines (11), the phantom was scanned on one instrument 30 times over 3 d (10/d), and precision values (SD and percentage CV) were calculated. The phantom was not repositioned between scans, thus precision results were influenced by the phantom only, not by movement or positioning inconsistencies. All scans were analyzed using Hologic V12.1. These scans served both as a precision/sensitivity assessment for our first DXA instrument, as well as a reference data set for comparison to other instruments. These scans also provided data allowing us to see how well the tissue-equivalent materials performed compared with expected values.

We evaluated longer-term precision by scanning the phantom 30 times on our first DXA over a period of 40 d. We compared the long- and short-term precision values for evaluating machine stability.

We scanned the phantom 10 times on our second DXA (site 1A), then shipped it to six additional research sites throughout the USA (sites 2–7). Two of those sites had recently purchased a new instrument and had two instruments available, thus the phantom was scanned on both machines. The phantom was scanned 10 times, usually in 1 d, at each of the six sites. All of the instruments were either a Hologic Delphi or Discovery model. All of the scans were analyzed at our center using Hologic Analysis Version 12.1 software. The data sets from each site were tested for normality (Shapiro–Wilk test). Analysis of variance was used to review differences between individual sites. Post hoc analysis using the REGWQ method was done for testing multiple intersite comparisons. Additional nonparametric tests (Kruskal–Wallis, including multiple pairwise comparisons with the Bonferroni correction applied) were carried out for verification of the analysis of variance to account for any nonnormal data distributions. Comparisons of results from the two different DXA models (Delphi, Discovery) were done using nonparametric Mann–Whitney tests to account for the nonnormal distributions.

All statistical analyses were performed using XLSTAT Version 2011.4.02 (Addinsoft, New York, NY). For all statistical comparisons, a P value of <0.05 was considered significant, except in cases where the Bonferroni correction was called for.

Statement of Financial Support

Funding for this work has been provided from the US Department of Agriculture/Agricultural Research Service under cooperative agreement 58-6250-6-001.