The performance of anthropometric tools to determine obesity: a systematic review and meta-analysis

The aim of this systematic review was to assess the performance of anthropometric tools to determine obesity in the general population (CRD42018086888). Our review included 32 studies. To detect obesity with body mass index (BMI), the meta-analyses rendered a sensitivity of 51.4% (95% CI 38.5–64.2%) and a specificity of 95.4% (95% CI 90.7–97.8%) in women, and 49.6% (95% CI 34.8–64.5%) and 97.3% (95% CI 92.1–99.1%), respectively, in men. For waist circumference (WC), the summary estimates for the sensitivity were 62.4% (95% CI 49.2–73.9%) and 88.1% for the specificity (95% CI 77.0–94.2%) in men, and 57.0% (95% CI 32.2–79.0%) and 94.8% (95% CI 85.8–98.2%), respectively, in women. The data were insufficient to pool the results for waist-to-hip ratio (WHR) and waist-to-height ratio (WHtR) but were similar to BMI and WC. In conclusion, BMI and WC have serious limitations for use as obesity screening tools in clinical practice despite their widespread use. No evidence supports that WHR and WHtR are more suitable than BMI or WC to assess body fat. However, due to the lack of more accurate and feasible alternatives, BMI and WC might still have a role as initial tools for assessing individuals for excess adiposity until new evidence emerges.

. In older age, overweight and obesity as defined by BMI might even be protective against mortality [17][18][19] . Indeed, one of the main deficiencies of BMI is that it does not differentiate between fat mass and fat-free mass. Not all people with high levels of body fat have a BMI of 30 or greater, and some people with very high BMIs may have little fat mass. The proportion of body fat also differs across ethnic populations, sex, and age groups. For example, South Asian populations have a higher proportion of body fat than Caucasians for the same BMI 20 . Women have a significantly higher percentage of total and sub-cutaneous fat stores than their male counterparts 21 . The proportion of internal fat increases and muscle mass decreases with age, which can lead to sarcopenic obesity, the combination of obesity and muscle impairment 22 . In older populations, research even suggests that fat mass is associated with a decreased risk of morbidity and mortality 17,19,23,24 , while a low fat-free mass might be a risk factor for mortality 25,26 .
Another main deficiency of BMI is that it does not account for body fat distribution. The distribution of body fat is associated with the risk of metabolic syndrome and other cardiometabolic complications 10 . Longitudinal data have shown that the distribution of excess fat (resulting in a so-called apple or pear shape) has a greater influence on certain health risks, such as cardiovascular diseases or cancer, than total body fat 27,28 . Indices assessing the distribution of body fat include waist circumference (WC), waist-to-hip ratio (WHR) or waist-to-height ratio (WHtR) ( Table 1). A growing body of evidence suggests that such indices are independently associated with cardiometabolic diseases and mortality [29][30][31] . They could thus provide additional value in determining obesity and the risk for associated comorbidities in clinical practice.
Imaging techniques allow for the measurement of body fat, its distribution, and body composition but are rarely used in clinical practice. They are generally considered more precise than anthropometric methods and continue to serve as "reference standards" in many research studies 14 until the concept of obesity is fully understood.
Despite the definitional problems with BMI, it remains the routine measurement to classify obesity in clinical practice. Within the last two decades, only two systematic reviews on the performance of anthropometric tools compared to that of body composition techniques have been published. The review by Okorodudu et al. 32 focused on the performance of BMI, and Mc Tigue et al. 33 reviewed the performance of BMI, WC and WHR in older adults. Both reviews are relatively old, with Okorodudu et al. 32 searching for studies until June 2008 and Mc Tigue et al. 33 until February 2003. Due to the emergence of new research evidence and the development of anthropometric tools other than BMI, we have aimed to provide an up-to-date systematic review using four anthropometric tools (BMI, WC, WHR and WHtR) for determining obesity in the adult population.

Methods
This systematic review was conducted following the Cochrane Methods for Systematic Reviews of Diagnostic Test Accuracy 34 and reported according to the Preferred Reporting Items for a Systematic Review and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) statement 35 . The protocol is registered with the International Prospective Register of Systematic Reviews (PROSPERO), registration number CRD42018086888.
Information sources and searches. We searched the electronic databases Ovid MEDLINE, Embase. com (Elsevier), CINAHL (Ebsco) and PubMed (non-MEDLINE content) from 1 January 2000 to 16 January 2018, as well as the dissertation databases ProQuest Dissertations & Theses Global (ProQuest) and WorldCat dissertations from 1 January 2000 to 16 January 2018. In addition, we manually searched the reference lists of recent and relevant systematic reviews. Searches were limited to English and German language documents. An experienced information specialist developed a search strategy for Ovid/Medline MEDLINE, amended it to fit other electronic databases and performed all searches (see Supplementary file 1). In line with the peer review of the electronic search strategy (PRESS) statement 36 , the Ovid MEDLINE search strategy was peer-reviewed by another information specialist.
Inclusion criteria. We included randomised controlled trials and prospective cohort or cross-sectional diagnostic studies assessing the performance of anthropometric tools (BMI, WC, WHR and WHtR) to determine obesity in adults (≥ 18 years) from any country. Our target population was adults aged 18 years from any country. We did not exclude studies with adults with diseases or disabilities that could have an impact on the body fat distribution. We used imaging techniques including computed tomography (CT), magnetic resonance imaging (MRI), dual energy X-ray absorptiometry (DXA) and ultrasound scanning (US) as reference standards because they are currently considered the most precise methods for assessing body composition 37 . We included studies that reported sensitivity, specificity, predictive values, likelihood ratios, diagnostic odds ratios, positivity Body mass index (BMI): person's weight in kilograms divided by the square of his/her height in meters (kg/m 2 ) Waist circumference (WC): waist circumference measured at approximate midpoint between the lowest rib and the top of the iliac crest Waist-to -hip ratio (WHR): person's waist measurement divided by hip measurement taken around the widest portion of the buttocks Waist-to-height ratio (WHtR): person's waist circumference divided by their height Data collection process and data items. We designed and pilot-tested a structured data abstraction form. One reviewer extracted data, and another checked for completeness and accuracy. For studies that met our inclusion criteria, we abstracted information related to (a) population; (b) index tests; (c) reference test; (d) obesity; (e) diagnostic values and f) funding source. We extracted or reconstructed the original classification data (2 × 2 table) at or close to WHO's recommended cut-offs (BMI: ≥ 30 kg/m 2 , WC: ≥ 88 cm in women and ≥ 102 cm in men, WHR: 0.85 in women and 0.90 in men) 38 or utilised common definitions (body fat percentage: > 35% in women and > 25% in men) for further use in the meta-analyses. Otherwise, definitions of obesity as laid out in the articles were used. We contacted study authors via email if relevant data were not reported in an included publication. www.nature.com/scientificreports/ Risk of bias and certainty of evidence assessment. Two independent reviewers assessed the risk of bias of diagnostic accuracy studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool 39 . We dually assessed the certainty of evidence for relevant outcomes using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach for diagnostic tests 40 . We resolved disagreements by discussion and consensus or by consulting a third reviewer.

Data synthesis. We conducted meta-analyses using the metandi command in STATA (version 15, Stata
Corp.) when five or more studies were similar in terms of the index test, target condition and cut-offs used. The metandi command uses hierarchical logistic regression models to calculate meta-analyses of pairs of sensitivities and specificities. It displays the pooled estimates in both a bivariate and a hierarchical summary receiver operating characteristics (HSROC) model 41,42 . For each index test, we produced a paired forest plot of each study's sensitivity and specificity, as well as a plot of the sensitivities versus specificities in the ROC space. We assessed the heterogeneity by visually inspecting the CIs for sensitivity and specificity in the paired forest plots. For those index tests where we did not have sufficient studies to pool, we synthesised the data narratively. Because of differences in the definition of the target condition between men and women, we conducted all analyses separately by sex. When information was available, we analysed the data by ethnicity. We further conducted sensitivity analyses to determine the impact of study quality on the robustness of the overall test performance measures. Subgroup analyses by age were not possible due to dissimilarities in the age categories in the studies.

Body mass index (BMI). The 27 included studies
Based on a meta-analysis of 16 studies with data on 14,008 women of any race or ethnicity, the combined sensitivity of BMI (with thresholds from 25 to 30 kg/m 2 ) to detect obesity was 51.4% (95% CI 38.5-64.2%), with a corresponding specificity of 95.4% (95% CI 90.7-97.8%), as shown in the HSROC plot (Fig. 2). The HSROC plot shows the individual study estimates, a summary curve from the HSROC model, a summary estimate, a 95% confidence region for the summary estimate and a 95% prediction region. The confidence intervals of some studies failed to overlap for sensitivity, indicating considerable heterogeneity. The heterogeneity of the specificity was low (Fig. 3) Figures S1 and S2). We rated the certainty of evidence of the pooled studies and considered it as very low for the sensitivity and as moderate for the specificity. The reasons for downgrading the certainty of evidence included the wide range and confidence intervals of the results for the sensitivity and the risk of bias for the specificity.
In men, the results of a meta-analysis including 12 studies with data on 11,320 men of any race or ethnicity show a combined sensitivity of 49.6% (95% CI 34.8-64.5%) and a specificity of 97.3% (95% CI 92.1-99.1%) for BMI cut-offs from 25 to 30 kg/m 2 (Fig. 2). The sensitivity varied considerably across studies (Fig. 4) Figures S1 and S2). We considered the certainty of evidence as very low for the sensitivity and moderate for the specificity. The reasons for downgrading the certainty of evidence included risk of bias as well as the wide range and confidence intervals of the sensitivity results.  (Fig. 2). For both sensitivity and  www.nature.com/scientificreports/ sensitivity, the heterogeneity of the included studies was high (Fig. 3). Excluding the study by Goh et al. 51 Figures S1 and S2). Because of methodological concerns and highly inconsistent and heterogeneous results, we rated the certainty of evidence as very low for both sensitivity and specificity. www.nature.com/scientificreports/ In men, the pooled estimates of six studies including 3,590 male participants were 57.0% (95% CI 32.2-79.0) for the sensitivity and 94.8% (95% CI 85.8-98.2%) for the specificity (Fig. 2). The cut-offs for WC ranged from 90.2 to 100.0 cm. The results of the included studies had a high heterogeneity for the sensitivity and a low heterogeneity for the specificity (Fig. 4). We were not able to perform sensitivity analyses or subgroup analysis due to the low number of studies included in the meta-analyses. Due to serious methodological concerns and highly inconsistent and heterogeneous results, we considered the certainty of evidence for the sensitivity as very low and for the specificity as low.
We rated the certainty of evidence for the sensitivity and specificity as very low in women and in men. The reasons for downgrading the certainty of the evidence related to methodological concerns, heterogeneous results and wide confidence intervals.
Waist-to-height ratio (WHtR). We identified four studies 51-56 assessing WHtR. The data were insufficient to combine the results in a meta-analysis. Their cut-offs for defining obesity ranged from 0.50 to 0.59 in women and from 0.50 to 0.55 in men (see the characteristics of the included studies in Supplementary file 2, Table S1 and the results of all studies in Supplementary file 3, Table S4). In women, the sensitivity ranged from 51.0% 51 to 83.3% 55,56 and the specificity from 78.6% 55,56 to 95.2% 54 (Fig. 3 and Supplementary file 3, Table S4). The results in men were similar, from 46.7 51 to 86.7% [54][55][56] for the sensitivity and from 71.0 54 to 89.4% 51 for the specificity (Fig. 4 and Supplementary file 3, Table S4). Carneiro Roriz et al. 55,56 did not identify any differences in sensitivity and specificity between adults aged 20 to 59 years and adults aged 60 and older, irrespective of sex. Oreopoulos et al. 52,53 used slightly higher cut-offs for defining obesity (0.615 for women and 0.605 for men) in their study and reported a combined sensitivity of 77.4% and a specificity of 76.9%.
We rated the certainty of evidence for the sensitivity and specificity in both women and men as very low. The reasons for downgrading the certainty of evidence included methodological concerns, heterogeneous results and wide confidence intervals of the results.

Discussion
To the best of our knowledge, our work is the most recent and most comprehensive systematic review on the use of four anthropometric tools to determine obesity. Our findings, in general, indicate a lack of reliable scientific evidence on the performance of anthropometric tools to rule out or determine obesity as assessed by imaging techniques, which constitute the gold standard in obesity research until the concept of obesity is fully understood. Many of the included studies were fraught with methodological shortcomings. Consequently, we rated the certainty of evidence as low or very low, which indicates that we have little or very little confidence in the estimates of the effects.
The available studies focused mainly on BMI and WC to assess obesity. The pooled results of our metaanalyses consistently rendered low sensitivities and relatively high specificities for BMI and WC when compared to imaging techniques as reference standards. The sensitivities ranged from 49.6% (BMI for men) and 51.4% (BMI for women) to 57.0% (WC for men) and 62.4% (WC for women) in the pooled analyses. By contrast, the specificities ranged from 88.1% (WC for women) and 94.8% (WC for men) to 95.4% (BMI for women) and 97.3% (BMI for men).
These estimates are consistent with the findings from a previous systematic review by Okorodudu et al. 32 , who reported a pooled sensitivity of 50% (95% CI 43-57%) and a pooled specificity of 90% (95% CI 86-94%) in their review of 25 studies. The studies included in this review went back to the 1990s and also used reference standards other than imaging techniques. For our systematic review, we employed more rigorous eligibility criteria than the Okordudu review 32 and included 17 additional studies that were published after the literature searches by Okordudu et al.
Our systematic review and the underlying evidence base have several notable limitations. A main limitation of the review is the substantial heterogeneity of the sensitivity estimates across studies. High heterogeneity is a common phenomenon in diagnostic test accuracy meta-analyses and is usually attributable to the spectrum effects and methodological shortcomings of the included studies. The subgroup analyses in our review that stratified meta-analyses by sex and by countries with predominantly White, Latin or mixed populations rendered similar estimates for BMI and WC as the overall analyses that also included Asian populations. We would have expected a difference, as WHO recommends lower cut-off values for Asian populations than for White populations 79 . Similarly, removing studies with a high risk of bias had little impact on the results. Nevertheless, many other factors, the impact of which we did not have sufficient data to explore, could have introduced heterogeneity. For example, the age of the participants, which varied widely among the studies, could have had an influence on the results. Without access to individual patient data, we were unable to assess the impact of age. Another potential source is the spectrum of prevalence rates among the studies (5.7 51 to 95.8% 67 ). Studies with a higher disease prevalence most likely include more severely diseased patients, which ultimately leads to a better test performance in this population.
Heterogeneity could also stem from the use of different cut-offs both for determining obesity with the anthropometric measurement tools and for the reference tests in the primary studies. For BMI and WC, the majority of studies adhered to the cut-offs recommended by WHO (BMI: ≥ 30 kg/m 2 , WC: ≥ 88 cm in women and ≥ 102 cm in men) 1,38 . The cut-offs for the reference tests ranged from ≥ 30% to ≥ 43% body fat in women and from ≥ 20% to ≥ 34.6% in men using DXA, with most studies referring to a body fat percentage > 35% in women and > 25% in men as the standards for defining obesity. Even though these cut-offs are widely applied and recommended, it is important to note that they were chosen arbitrarily and lack sound scientific basis 9 www.nature.com/scientificreports/ thresholds have only been based upon visual inspection of the relationship between BMI and mortality 82 . For body fat percentage, there is little evidence supporting the cut-offs due to the lack of studies investigating the relationship between a continuum of body fat percentage values and cardiometabolic disease and mortality 9,80 . In addition to the heterogeneity that is introduced by the application of various cut-off values, the cut-offs themselves remain an issue of debate and should be the focus of future research. However, their validity goes beyond the scope of this review. The use of various imaging techniques, including DXA, CT, and MRI, could have led to differences in performance estimates. However, imaging techniques are currently considered to be the most accurate tools for body composition analysis because of their ability to accurately discriminate tissues 37,83 . We excluded all studies that used other reference standards, such as bioelectrical impedance analysis or dilution techniques, to increase homogeneity. Also limiting this review is the absence of a "gold standard" to diagnose obesity. Although imaging techniques are generally able to produce good-quality body composition data, they all have their shortcomings. For example, DXA does not differentiate between types of fat. Silver further argues that an accurate body composition analysis measuring excess body fat is insufficient for diagnosing obesity; it would rather need a tool that translates the interplay between body composition and metabolic risks into a new concept of obesity 84 . Nonetheless, until research has elucidated that interplay, obesity assessment relies on body composition data.
Finally, another major limitation of the underlying evidence base is the low methodological quality of the included studies that, together with the inconsistency and heterogeneity of the results, has contributed to the mostly low or very low confidence that we have in the evidence. We rated only six out of the 32 included studies as a low risk of bias. Many studies included convenience sampling, used inappropriate exclusion criteria for study participants, lacked predefined cut-offs for index and reference tests and failed to provide information about the numbers of participants included in the analysis.
The strengths of this systematic review include a comprehensive search strategy in four electronic databases combined with manual reference checking of pertinent research articles and a search for unpublished research studies. The search strategy was peer-reviewed by an additional information specialist. We contacted the authors of the included studies to receive the data of 2 × 2 tables when not reported. During the whole systematic review process, we followed Cochrane methods 34 , which are known to be methodologically sound and rigorous. Despite these efforts, we cannot entirely rule out the possibility that we have overlooked a relevant research study.
The findings of our review should be interpreted cautiously within the context of clinical practice. Thresholds between normal weight, overweight, and obesity are arbitrary and not based on universally agreed upon standards. Our review emphasises the substantial uncertainties that obesity assessment with anthropometric tools bring with them. Methodologically sound studies with appropriate sampling strategies, predefined and valid cut-offs and complete analyses are needed for firm conclusions. Future research should focus on studies that differentiate between age groups, are conducted in a European setting and examine the combined use of anthropometric tools.

conclusions
This systematic review shows that BMI and WC have serious limitations for use as obesity screening tools in clinical practice despite their widespread use, and no evidence supports that WHR and WHtR are more suitable than BMI or WC to access body fat. However, due to the lack of alternatives, BMI and WC might still have a role as initial tools for assessing individuals for excess adiposity until new evidence emerges. Nonetheless, one should be aware of the limitations of these tools when interpreting the results. In some clinical circumstances, particularly for BMI or WC results that are borderline between overweight and obesity, it might be useful to conduct further examinations of obesity-related risk factors or to confirm results with imaging techniques (e.g. DXA scans).

Data availability
2 × 2 tables that support the findings of this study are available from the first author (IS) upon reasonable request. www.nature.com/scientificreports/