We performed a systematic review and meta-analysis of studies that assessed the performance of body mass index (BMI) to detect body adiposity.
Data sources were MEDLINE, EMBASE, Cochrane, Database of Systematic Reviews, Cochrane CENTRAL, Web of Science, and SCOPUS. To be included, studies must have assessed the performance of BMI to measure body adiposity, provided standard values of diagnostic performance, and used a body composition technique as the reference standard for body fat percent (BF%) measurement. We obtained pooled summary statistics for sensitivity, specificity, positive and negative likelihood ratios (LRs), and diagnostic odds ratio (DOR). The inconsistency statistic (I2) assessed potential heterogeneity.
The search strategy yielded 3341 potentially relevant abstracts, and 25 articles met our predefined inclusion criteria. These studies evaluated 32 different samples totaling 31 968 patients. Commonly used BMI cutoffs to diagnose obesity showed a pooled sensitivity to detect high adiposity of 0.50 (95% confidence interval (CI): 0.43–0.57) and a pooled specificity of 0.90 (CI: 0.86–0.94). Positive LR was 5.88 (CI: 4.24–8.15), I 2=97.8%; the negative LR was 0.43 (CI: 0.37–0.50), I 2=98.5%; and the DOR was 17.91 (CI: 12.56–25.53), I 2=91.7%. Analysis of studies that used BMI cutoffs 30 had a pooled sensitivity of 0.42 (CI: 0.31–0.43) and a pooled specificity of 0.97 (CI: 0.96–0.97). Cutoff values and regional origin of the studies can only partially explain the heterogeneity seen in pooled DOR estimates.
Commonly used BMI cutoff values to diagnose obesity have high specificity, but low sensitivity to identify adiposity, as they fail to identify half of the people with excess BF%.
Obesity has become one of the most important threats to human health worldwide. According to the data derived from the Third National Health and Nutrition Examination Survey, the prevalence of obesity in the United States of America is 31.1% in men and 33.2% in women.1 Regardless of the multiple efforts made to address this public health issue, the prevalence of obesity continues to rise.1
Abundant scientific evidence supports the associations between obesity and various disease including diabetes mellitus, hypertension, coronary artery disease, cancer, and sleep apnea.2 It should also be noted that the consequences of obesity extend beyond physical ailment and into the psychosocial as well as economic aspects of life.3
The most commonly used anthropometric method to diagnose obesity is the body mass index (BMI), which is calculated as an individual's weight in kilograms divided by the height in meters squared. This was first described in the 19th century by a Belgian mathematician who noticed that in people he considered to be ‘normal frame’, the weight was proportional to the height squared.4
BMI has been used extensively in epidemiological studies and incorporated into clinical practice because of its simplicity. A major shortcoming of BMI arises in that the numerator (weight) of the index fails to distinguish between lean and fat mass.5, 6, 7 Conversely, techniques that accurately measure body fat such as dual energy X-ray absorptiometry, hydrostatic weighing, air-displacement plethysmography, isotope dilution, and bioelectrical impedance analysis are rarely used in clinical practice.
Over the last few decades, several studies have analyzed the performance of BMI to detect body adiposity when compared with techniques known to accurately measure body composition. Indices of diagnostic performance used in such studies include sensitivity, defined as the probability that a person who actually has the condition of interest will have a positive test result; specificity, defined as the probability that a person who does not have the condition of interest will have a negative test result; and likelihood ratio (LR), which expresses the odds that the test result occurs in individuals with the condition versus the odds that the test result occurs in individuals without the condition. The results of these studies have been diverse, some showing a good diagnostic performance and others showing a poor sensitivity of BMI to detect high levels of adiposity. Other studies have suggested that the failure of many epidemiological studies to show a higher risk for adverse events in overweight people (BMI 25–29 kg m–2) when compared with normal weight individuals can be explained by the limited ability of BMI to differentiate body fat from lean mass in different populations.8, 9 Furthermore, excess body fat percent (BF%) has been shown to be associated with metabolic dysregulation regardless of body weight.10 Thus, it is imperative to know the accuracy of BMI to identify body adiposity to justify its use in clinical practice to either diagnose or rule-out excessive body adiposity at the individual patient level.
We performed a systematic review and meta-analysis to calculate the pooled sensitivity and specificity of BMI to identify excessive body adiposity.
Materials and methods
Selection criteria and search strategy
The predefined inclusion criteria were (1) the study must have assessed the performance of BMI to identify excess body fat; (2) provided standard values of diagnostic performance (for example sensitivity, specificity, positive predictive value, negative predictive value); and (3) used a body composition technique (for example dual energy X-ray absorptiometry, air-displacement plethysmography, hydrostatic weighing) as the gold standard.
We searched the databases MEDLINE (1950 to June 2008), EMBASE (1988 to June 2008), Cochrane, Database of Systematic Reviews (from inception), Cochrane CENTRAL (from inception), Web of Science (1993 to June 2008), and SCOPUS (1996 to June 2008). The search was conducted at the Mayo Clinic Plummer Library in Rochester, MN, by a librarian with expertise in systematic reviews, and was designed to find all studies that assessed the ability of BMI to detect excess adiposity as determined by BF%. In conducting the search, three domains were specified as absolute criteria: (1) BMI or equivalent (for example BMI, Quetelet Index); (2) diagnostic performance or equivalent (for example sensitivity, specificity, predictive values); and (3) body fat or equivalent (for example dual energy X-ray absorptiometry, bioelectrical impedance, air-displacement plethysmography, body composition). The results from the individual domain searches were combined using the ‘and’ conjugation or its equivalent.
On the basis of the information provided in the title and abstract alone, we eliminated irrelevant articles yielded from our primary search (Figure 1). The remaining studies were then read in their entirety by a single investigator and those that did not meet the inclusion criteria were excluded. Furthermore, our search was supplemented with cross-references from the selected articles as well as through correspondence with researchers. A particular effort was made to contact authors of articles with equivocal information that seemed incomplete to be included in this systematic review and meta-analysis looking for additional data. We also contacted investigators known to do research on BMI diagnostic performance.
Quality assessment/data abstraction
At random, 10 of the studies that were not excluded from our primary search based solely on title and abstract were independently reviewed for inclusion by two investigators (DO and ARC) and the agreement coefficient was determined.
The quality of studies eligible for review was assessed based on a 6- to 16-point scale considering factors that determine the validity of studies specifically assessing body composition and factors related to validity of diagnostic tests performance. The criteria used to evaluate the quality of the study included (1) standardization and accuracy of height measurement, (2) standardization and accuracy of weight measurement, (3) gold standard used to assess body fat, (4) time between BMI and BF% measurement, (5) blinding of BMI measurement from BF% measurement, and (6) instructions given to subjects regarding diet and exercise before measurements of body composition. Studies were classified as of excellent quality (15–16 points), good quality (12–14 points), fair quality (9–11 points), or low quality (6–8 points).
Data was abstracted from articles by a single investigator, who gathered information related to the population studied, the gold standard used to measure BF%, the BF% cutoff values used to define overweight or obesity, the BMI cutoff values for overweight or obesity, and values of diagnostic performance of BMI to detect high BF%.
The primary outcome for analysis was the performance of BMI to identify excess body fat compared with the gold standard measuring BF%. Sensitivity, specificity, and LRs were either collected or calculated using information provided in the original publications.
The heterogeneity of diagnostic test parameters was evaluated initially by graphic examination of Forrest plots for each parameter. Statistical assessment was then performed using the inconsistency statistic (I2). The I2 statistic is defined as the percentage of variation across studies as a result of heterogeneity beyond that from chance.11 A value of 0% indicates no observed heterogeneity, whereas values >50% representing the possibility of substantial heterogeneity. Pooled summary statistics for sensitivities, specificities, LRs, and diagnostic odds ratios (DORs) of the individual studies were then reported. The DOR, computed as the positive likelihood ratio (LR +) over negative likelihood ratio (LR −), is defined as the odds of having a positive test result in patients with disease compared with the odds of a positive test result in patients without disease.12 Owing to a priori assumptions about the likelihood for heterogeneity between primary studies, the random-effects model of DerSimonian and Laird was used for pooled analysis.13
Predefined subgroup analyses were performed with the following potential causes of between-study heterogeneity: (1) BMI cutoff values to define obesity, (2) BF% cutoff values to define obesity, (3) gold standard used to assess BF%, (4) regional origin of the studies, and (5) quality assessment score. Studies were grouped into one of three subgroups based on BMI cutoff value used to define obesity: 24.9 kg m–2, from 25 to <30 kg m–2, or 30 kg m–2. Studies were regrouped based on their definition of obesity according to BF% into (1) BF% <30 in females and <25 in males, (2) BF% equal to 30 in females and 25 in males, or (3) BF% >30 in females irrespective of BF% in males. Studies included in the meta-analysis used different methods as the gold standard to assess for BF composition. Owing to their comparable accuracy, studies that used dual energy X-ray absorptiometry, hydrostatic weighing, air-displacement plethysmography, and isotope dilution measurement of total body water were grouped together and the pooled estimates were reported and compared with those studies that used lower accuracy measures as their gold standard (bioelectrical impedance and skin fold).Owing to the reported ethnic and geographic differences on body composition,14, 15, 16 studies were grouped based on their regional origin into either from North America, South-East Asia, or Europe. Finally, studies were grouped into two subgroups based on their quality assessment score described above.
For studies that did not report the prevalence of obesity according to their gold standard used, we ascribed the national prevalence of obesity in the United States derived from the Third National Health and Nutrition Examination Survey,1 and then conducted sensitivity analysis using a lower and a higher prevalence and assessed for the effect on overall pooled parameters.
Analyses were preformed using version 1.4 of the statistical software Meta-DiSc.13
The search strategy yielded 3341 potentially relevant abstracts (Figure 1). Subsequently, 25 articles that met all our inclusion criteria were included for systematic review and meta-analysis.17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 Interobserver agreement using the Kappa statistics regarding the selection of articles was 0.90. The 25 studies evaluated 32 different samples and a total of 31 968 adults. The studies were published between 1990 and October 2008. Study and population characteristics are summarized in Table 1.
BMI shows a pooled sensitivity to identify excess body adiposity of 0.50 (95% confidence interval (CI): 0.43–0.57) and a pooled specificity of 0.90 (CI: 0.86–0.94) (Figure 2). The positive LR was 5.88 (CI: 4.24–8.15), I2=97.8%; the negative LR was 0.43 (CI: 0.37–0.50), I2=98.5%; and the DOR was 17.91 (CI: 12.56–25.53), I2=91.7% (Figures 3 and 4).
Graphic examination of Forrest plots of the different parameters (sensitivity, specificity, LRs, and DOR) revealed considerable heterogeneity among the studies (Figures 3 and 4). As anticipated, this was also shown by the high inconsistency index values for pooled estimates. Potential causes for this considerable between-study heterogeneity were explored through subgroup analyses. The results of these analyses are presented in Table 2. The BF% cutoff values and the regional origin of the studies can partially explain the heterogeneity seen in the pooled DOR estimates (Table 2).
This meta-analysis assessed the diagnostic performance of BMI in 31 968 individuals from 32 research studies from 12 different countries. The results show that the performance of BMI to identify excessive adiposity has a good specificity, but poor sensitivity. Pooled results from the 32 studies showed a sensitivity of around 50%, suggesting that many individuals not labeled as obese might indeed have excess adiposity.
These results have several implications. The low sensitivity using the current BMI cutoff values indicates that we are underdiagnosing excess adiposity in many individuals. As the first step of dealing with a risk factor is an accurate identification of the pathophysiological problem, not diagnosing obesity in individuals with excess adiposity represents a missed opportunity for initiating a lifestyle change in people at risk. Recent studies have shown that the amount of adipose tissue in subjects with normal BMI provides incremental prognostic value, particularly in women.10
The results of this meta-analysis suggest that the current definition of obesity at the individual level needs to be reassessed. Although obesity means excess body fat, the current definition of obesity is based on body weight regardless of its composition. Many years after its original description, the predictive value of BMI was shown and validated in multiple epidemiological studies showing that a high BMI value was associated with increased mortality. Further studies showed high correlation coefficients between body fat and BMI. These factors plus the simplicity of obtaining BMI resulted in the widespread use and acceptance of this index to diagnose obesity and to identify subjects at risk for obesity-related comorbidities. However, the results of this study suggest that BMI has its own limitations to diagnose excess adiposity at the individual-person level, particularly when BMI values are below 30. The inability of BMI to distinguish fat from lean mass can lead to the inappropriate diagnosis of obesity. This shortcoming has been shown in many studies including that of Ode et al.38 in which the specificity of BMI to diagnose excess adiposity in varsity male athletes was only 27%, whereas the sensitivity was excellent at 100%. This study's results showed a pooled specificity of 90% in diagnosing excess adiposity, thereby indicating that 10% of the studied individuals were misdiagnosed. Ongoing studies assessing the prognostic value of different methods of body composition will help to elucidate whether simple techniques such as air-displacement plethysmography that are capable of better distinguishing fat from lean mass may replace BMI in the clinical evaluation of obesity.
Furthermore, the results of this meta-analysis may also explain why BMI cannot discriminate cardiovascular risk very well in people with intermediate BMI values. Multiple studies have shown that a log-linear association between BMI and ischemic heart disease and mortality were risk increments in intermediate BMI values and are very small, but increase significantly when BMI is higher than 30 or 35.42 Studies assessing correlation between BMI and body fat have shown that people with intermediate BMI values represent a heterogenous group regarding body fat content, some with preserved lean mass and little muscle mass, whereas others have high body fat and limited lean mass or so-called ‘normal weight obese’.43 These latter individuals have shown to have more metabolic dysregulation than those with normal weight and low fat content.10, 43 In addition, measures of fat distribution have shown to discriminate CV risk very well in individuals with intermediate BMI values, suggesting that BMI does not account for all the adiposity-related risk in those individuals.
A similar log-linear association has been observed in total cholesterol and cardiovascular mortality. Very high total cholesterol values provide a disproportionally high risk when compared with the risk related to minor cholesterol elevations.42 This phenomenon is explained because very high values of total cholesterol generally reflect high very low-density lipoprotein and low-density lipoprotein content, whereas intermediate values of total cholesterol do not discriminate well between lipoproteins associated with higher versus lower cardiovascular risk.
Our results also suggest that if the obesity should be defined based on the amount of body fat, then the optimal cutoff level for BMI with the maximum diagnostic performance will fall between 25 and 30. As any diagnostic test is reported as a continuous variable, there is a tradeoff between sensitivity and specificity depending on where the cutoff value falls, with higher specificity with high cutoff values and higher sensitivity when lower cutoff values are used.
Our study has several limitations. As with any meta-analysis, there is a risk for publication bias in which positive results or results with ‘expected’ findings are more likely to be published. We made every possible effort to minimize this type of bias by contacting investigators in the field of BMI or people who were known to be working on body fat measurement. If editors were more likely to publish manuscripts showing the ‘expected’ results of a good diagnostic performance for BMI, then our results may be overestimating the real diagnostic performance of BMI.
Our study also showed significant heterogeneity. Inconsistency indices yielded substantial values for pooled LRs as well as for pooled DOR. Sources of heterogeneity included BMI cutoff values used to define obesity, BF% cutoff values to define obesity, gold standard used to assess body fat, and regional origin of the studies. However, it is important to note that though regional origin did contribute to heterogeneity, subgroup analysis shows that there is still significant amount of inconsistency between results even within specific regions. It is possible that publication bias contributed to the heterogeneity, should editors be prone to accept studies with extreme performance, that is those showing either outstanding diagnostic performance or those showing very poor performance.
Another major limitation was the use of different gold standards for the definition of excess adiposity. It is clear that some techniques to measure body composition are more reliable than others; therefore, our pooled results already reflect some of the inherent measurement error with techniques that are known to be suboptimal to measure BF%. However, our subgroup analysis did not show major differences in the pooled estimates when we limited the analysis to the most valid techniques to measure body fat. For this reason, we do not think this limitation will invalidate our results. The inclusion of studies performed in different geographic areas is a strength because it increases generalizability, but it also becomes a limitation because of the fact that body composition techniques have not been well validated in non-Caucasian populations.
Although our study illustrates some of the limitations in using BMI for the diagnosis of excess adiposity, it is important to stress that the use of BMI is of significant value. Our results confirm that when BMI is 30 kg m–2, it has a near perfect specificity and an excellent predictive value to detect excess adiposity in both sexes. In additon, BMI or even plain body weight is most likely the best way to evaluate changes in body adiposity over time because changes in body weight most likely represent an increase in the volume of adipose tissue, with the exception of body builders or patients with conditions that increase the third space volume such as renal or liver failure.
In conclusion, this study shows that the use of BMI to identify excess body adiposity at the individual patient level has good specificity, but poor sensitivity, with approximately half of individuals who have excessive BF% being labeled as non-obese. As excess BF% has been associated with metabolic dysregulation regardless of body weight, BMI should not be considered as the only measure of obesity in patient care settings, particularly in those with BMI <30 kg m–2.
Dr Somers is supported by NIH grants HL-65176, HL-70302, HL-73211, and M01RR00585. Dr Lopez-Jimenez was the recipient of a Clinical Scientist Development Award from the American Heart Association at the time of performing this study. Dr Somers, Dr Lopez-Jimenez, and Dr Romero-Corral are recipients of an unrestricted grant from Select Research to assess the clinical value of assessing regional body volumes.
About this article
International Journal of Obesity (2018)