Introduction

Numerous indices have been developed since 1960s to rank/score the severity of a malocclusion relative to a preconceived orthodontic ideal or in terms of treatment need.1 Some indices have been developed to assess orthodontic treatment need, such as the Index of Orthodontic Treatment Need (IOTN),2 the Handicapped Labio-Lingual Deviation index3 and the Dental Aesthetic Index,4 while others have been developed to evaluate orthodontic treatment outcome, such as the Peer Assessment Rating.5 The above-mentioned indices only involve one aspect of orthodontic treatment; however, the treatment need and treatment outcome are also considered important by orthodontists and patients. Therefore, the Index of Complexity, Outcome and Need (ICON),6 which assesses both elements, was developed and used widely in recent years.

With social improvements and economical developments in China, more and more people are receiving orthodontic treatments. There is a need in China for an overall objective assessment of orthodontic treatment need based on malocclusion severity. The ICON can achieve this purpose by evaluating the orthodontic treatment needs, complexity and outcomes. If this system is adopted, it will help maximise the benefit of orthodontic treatment by guiding the distribution of limited health resources based on those who need it most. In a recent study determining the malocclusion complexity and orthodontic treatment needs in urban Iranian schoolchildren, the authors used the ICON and IOTN to assess the relationship between the two indices and found that ICON is a good substitute for IOTN, yet ICON results in a lower treatment-need threshold.7,8

There is evidence that geographic location may affect orthodontists' determination of treatment need and outcome.9,10 It is necessary to validate any index purporting to identify treatment need with the opinions of orthodontists practicing within a limited geographic region. Therefore, the primary aim of this study was to validate ICON as an index of treatment need in southern China and to investigate whether the indicated cutoff point was acceptable to the orthodontists in southern China. The secondary aim was to plot a receiver operating characteristic (ROC) curve, and to find a new cutoff point for the Chinese population.

Materials and methods

Study samples

The participants were randomly selected from grade-7 students at 16 randomly selected middle schools in the main city and suburbs of Chengdu, the capital of Sichuan Province, China. Participants were enrolled from January 2008 to March 2008. The initial number of selected students was 350. This study was approved by the Committee for the Use of Human Subjects in Research, Sichuan University.

The exclusion criteria were a history of orthodontic treatment, physical and mental impairments, cleft lip and/or palate, dentofacial deformities and the presence of mixed dentition. Overall, 335 out of the 350 (95.7%) participants met the inclusion criteria, with 174 boys (51.9%) and 161 girls (48.1%). Their ages ranged from 12 to 13 years. The subjects represented a full range of malocclusions and severities. The orthodontic study casts were prepared for each subject and used for further assessments.

Examiners

Volunteers were selected from specialist orthodontists; all of them are working in the Department of Orthodontics, West China School of Stomatology. Finally, three experienced associate professors were invited to judge the ICON. They received extensive training and underwent the calibration processes.6,11

Procedure

In the first session, the three calibrated examiners assessed the 335 study casts under strict adherence to the ICON guidelines. Study casts were displayed on tables in a fixed order. Each examiner started with a different case. The examiners were allowed to work at their own pace with no time limit. This procedure was similar to that described by Louwerse et al.1 Five components of ICON (Table 1) were scored, multiplied by respective weights and summed up.12

Table 1 ICON Scoring System

At the second session, organized 4 weeks later, a random set of 50 models was assessed by the three examiners in order to test their reliability. The kappa value for the three examiners was calculated by assessing 50 sets of models twice within a 4-week interval.

Two experts from the Department of Orthodontics, West China School of Stomatology, Sichuan University, were asked to score the same 335 casts and record the need for treatment of each with a score on a scale from 1 to 7. The experts were senior orthodontists in China and had made great contributions in the development of orthodontics in China. They used the above-mentioned scale to evaluate the casts, with 1 denoting ‘none/minimal’ need and 7 denoting ‘very great’ need. This procedure was similar to that used by Firestone et al.13 The experts were required to rate treatment need without considering the treatment cost or orthodontists' skills required to treat the case.

The average score of the two experts was used. The resulting score was designated as the ‘clinical sense’. The raters were further asked to point out which score on the seven-point scale indicated the cutoff point above which they felt orthodontic treatment was indicated. This score was termed the Indicated Treatment Point (ITP) score, i.e., the ‘gold standard’.14 The method of determining the gold standard was similar to the approach employed by Firestone et al.13

To assess the representativeness and reliability of the gold standard, we randomly selected 50 casts 8 months later and asked another 23 orthodontic experts from domestic and board to give clinical sense to the casts using the method mentioned above.

Statistical analysis

The Shapiro–Wilk test15 was used to test the normality of the ICON data. The rank-sum test16 was used to test for gender differences when using ICON in determining orthodontic need. The statistical analysis was done using SPSS 15.0 software (SPSS Inc, Chicago, IL, USA).

For evaluation of the reliability of the gold standard, the mean ITP and kappa value were used. The correlation between and within raters was estimated by kappa coefficient. The gold standard was determined in the following way; first, the ITP for the two experts was calculated. Then, the mean rater score of the two experts for each cast was calculated, and finally, the mean score for each cast was compared with the ITP score. If the cast score was equal to or greater than the ITP value, the case was assigned to the ‘treatment’ category, and if the cast score was less than the ITP score, the cast was assigned to the ‘no treatment’ category.

To determine the reliability of ICON, correlations between and within raters were estimated by the kappa value. For evaluation of the validity of ICON, Spearman's correlation coefficient17 was applied to find the correlation between the ICON score and the gold ITP score. The overall agreement (simple kappa coefficient) of ICON with the gold standard (the decisions of the orthodontists) was calculated. Sensitivity and specificity were used to compare the casts' ICON scores recorded by the three examiners with the gold standard. The sensitivity was the percentage of those who were identified as needing treatment among all cases needing treatment. The specificity was the percentage of those who were identified as not needing treatment among all cases not needing treatment.

The Spearman's correlation coefficient was used to evaluate the correlation between ICON and the gold standard. The Chinese optimum cutoff point for ICON regarding treatment need was assessed by plotting a ROC curve.18,19,20 The area under the ROC curve was used to discriminate the randomly chosen cases in need of orthodontic treatment from those cases with no treatment needs.21

As an important supplementary part of this study, the revised gold standard (i.e., the clinical sense of the 25 experts for the 50 casts) was averaged. The correlation between the gold standard and the ICON scores was evaluated. The ROC curve for the 50 casts was also drawn.

Results

Analysis and results of the 335 casts

ICON scores

The ICON scores ranged from 7 to 102, with the mean value being 35 (Figure 1). Study samples comprised 174 boys (51.9%) and 161 girls (48.1%). The distribution of ICON scores across genders is presented in Figure 2. There were no statistically significant differences in orthodontic treatment need between genders.

Figure 1
figure 1

The distribution of the 335 models' ICON scores. ICON, Index of Complexity, Outcome and Need.

Figure 2
figure 2

The distribution of the models' ICON scores relating to gender. The data were made up of 174 boys and 161 girls. ICON, Index of Complexity, Outcome and Need.

Reliability of the gold standard

The kappa coefficient for the two experts was 0.87. The mean indicated treatment point score was 4.20±1.95 (mean±SD) points. The casts with mean scores equal to or greater than 4.20 points were classified to the ‘treatment’ category. The remaining casts, with scores less than 4.20 points, were assigned to the ‘no treatment’ category. One hundred and ninety-five (58%) casts belonged to the ‘treatment’ category, while 140 (42%) were categorized as ‘no treatment’ category.

Reliability of ICON

The kappa coefficient of the ICON scores of the three ICON raters was 0.82. For intra-rater reliability, the kappa coefficients were 0.89, 0.87 and 0.92. These results suggest a relatively high reliability of the ICON score in our study.

Validity of ICON

The Spearman’s correlation coefficient between the ICON score and the gold ITP score was 0.83. As mentioned before, 140 casts had mean scores below 4.20, and 195 casts had scores above 4.20. With the cutoff point of 43, these numbers were 245 and 90, respectively. The prevalence of orthodontic treatment need according to the recommended cutoff point of 43 was 26.9%. The cross tabulation of the ICON treatment need categories according the gold standard is shown in Table 2. The sensitivity and specificity of the ICON scores were 0.45 and 0.98, respectively. The overall agreement between the ICON and the gold standard treatment need categories (the kappa coefficient) was 0.38. This agreement was fair according to the Altman classification.22

Table 2 Distribution of the categorized scores of the ICON

The area under the ROC curve was 0.91, suggesting a high validity21 of the ICON. Thus, the index reflects the decisions of the gold-standard experts of orthodontists to a high degree (Figure 3). The best compromise between sensitivity and specificity in Chengdu, considering our gold standard, may be found at a cutoff point of 29. Different cutoff points with their sensitivity and specificity are displayed in Table 3. Based on this cutoff point, the sensitivity was 0.86, and the specificity was 0.83. The prevalence of orthodontic need according to an adjusted cutoff point of 29 was 58.2%. The kappa value also improved to 0.71, which is higher than the value of international cutoff point (43). The cross-tabulation between ICON and the gold standard with the new cutoff point of 29 is shown in Table 2.

Figure 3
figure 3

Sensitivity and specificity at different cutoff points for the ICON score (335 models). ICON, Index of Complexity, Outcome and Need.

Table 3 Different cutoff points with their sensitivity and specificity

Under the optimal cutoff point, the number of false negatives (i.e., cases recommended for no orthodontic treatment by the ICON index but judged as needing treatment by the gold standard) decreased from 108 to 24. The number of false positives (i.e., cases recommended to receive orthodontic treatment by the ICON index but not by the expert panel) increased from 3 to 24.

Gender differences

The result of normality test revealed that the ICON scores did not obey the Gaussian distribution (P<0.05). Therefore, we used the rank-sum test for gender differences when using the ICON. Table 4 demonstrates that gender differences were not statistically significant in our study samples. Table 5 displays the comparison of the diagnostic performance of ICON at the international (43) and adjusted (29) ICON score cutoff points when applied to the 335 casts.

Table 4 Statistical analysis of the gender difference test of the ICON scores by using the rank-sum test
Table 5 Comparison of the diagnostic performance characteristics of the ICON at the international and adjusted ICON score cutoff point for determining orthodontic treatment need when applied to the 335 casts

Analysis results of the 50 casts with the gold standard of 25 experts

ICON scores

The ICON scores ranged from 7.0 to 89.0, and the mean ICON score of the casts was 30.

Reliability of the gold standard

The kappa value for the 25 experts was 0.90. The mean indicated treatment point (mean±SD) was 4.15±1.26. Casts with mean scores equal to or greater than 4.15 were classified in the ‘treatment’ category. The remaining casts, with scores less than 4.15 points, were assigned to the ‘no treatment’ category.

Reliability of the ICON

The kappa value of the ICON score of the three ICON raters was 0.81.

Validity of the ICON

The Spearman's correlation coefficient between the ICON score and the gold ITP score was 0.84. The area under the ROC curve (0.90) suggested a high validity of the ICON (Figure 4). The best compromise between sensitivity and specificity in Chengdu, compared with our gold standard, may be found at a cutoff point of 29. At this cutoff point, the sensitivity and specificity were 0.86 and 0.79, respectively.

Figure 4
figure 4

Sensitivity and specificity at different cutoff points for the ICON score (50 models). ICON, Index of Complexity, Outcome and Need.

Discussion

The methods for evaluating malocclusion include orthodontic screening exams, dental cast analysis, cephalometric records, etc. There may be a lack of agreement among investigators concerned with assessing occlusion by orthodontic screening exams. However, the occlusal indices based mainly on analysis of dental casts would serve for epidemiological purposes more objectively.

Many investigators employed a subset of casts to test the reliability and validity of ICON.1,12,13,23,24,25,26,27,28,29 Most scholars concluded that ICON could be a substitute for IOTN, Peer Assessment Rating and Dental Aesthetic Index as a good index for the prediction and evaluation of orthodontic treatment need and treatment outcome.12,23,25,28,29,30,31 Some scholars believed that the relationship between ICON and Peer Assessment Rating needs to be studied further.32

There was a relatively high agreement between the scores of the three associate professors using ICON to assess orthodontic treatment need for both inter- and intra-examiner agreement. This may be attributed to that all of the professors have received similar orthodontic trainings in China and worked in the same orthodontic treatment centre, and that the ICON offers a very objective way to determine it. With regard to validity, the relationship between ICON scores and ITP could be described as a relatively strong association, as the Spearman's correlation coefficient33 was 0.83. However, when using the international cutoff point of 43, the sensitivity, specificity and kappa value were 0.29, 0.98 and 0.38, respectively.

A diagnostic test is described by the ROC curve. By changing the cutoff point, one can change the number of true-positive diagnoses and true-negative diagnoses (i.e., to change the sensitivity and specificity).

Based on the present study, the cutoff point of ICON for the 12–13 year-olds in Chengdu should be adjusted to 29. However, the cutoff point setting relies on many factors including the opinion of experts regarding the need for treatment, available resources and the selection of a reliable gold standard. We invited two experts at first. Then we invited other 23 experts from domestic and board (the premier two experts were not included) to reevaluate the randomly selected 50 casts of this study. They all used the ITP. The scores of the 50 casts valued by the primer two experts, together with the 23 experts' scores, were collected and statistically analysed. We found that kappa value for the 25 experts and the mean indicated treatment point did have a few changes, but the best compromise between sensitivity and specificity in Chengdu, compared with the gold standard, may be found at a cutoff point of 29, where there were no changes. The ROC curve was adopted to determine the cutoff point. The best compromise between sensitivity and specificity in Chengdu, compared with the gold standard, were obtained with a cutoff point of 29.

With this new cutoff point, the sensitivity and the specificity were 0.88 and 0.83, respectively. The kappa value was 0.71, which was better than the previous kappa value for the international cutoff point. Lowering the cutoff to 29 would result in approximately 60% of individuals needing treatment, while the prevalence of malocclusion in children and adolescents in China is 67.8%.34 However, this prevalence was only 26.9% with the international cutoff point of 43; thus, the cutoff point of 29 may be in line with Chinese conditions.

One possible explanation for the lower recommended cutoff point is cultural and ethnic differences. The photos used by ICON were of Caucasian children. The aesthetic component comprises the most important part of the ICON system.35 According to a recent study (Borzabadi-Farahani 2010), with a cutoff of 43, nearly half of Iranian schoolchildren need orthodontic treatment; however, judging by the aesthetic component (IOTN), the value was 17.9%. Aesthetic assessment is to some extent impacted by culture and education, and the final ICON score with the adjusted cutoff point may therefore deviate from the international score. Some investigators have therefore recommended that, when ICON is used to determine treatment need locally, the cutoff point should be adjusted to optimal levels.1

In one study,36 the author found that the mean ICON value in the age group of 12–13 years was 35.8 and was the highest (39.9) in the age group of 18 years. Majority of the individuals in the age group of 12–13 years had a complexity grade of easy and mild and only a small proportion of individuals had difficult complexity grade. In the age group of 18 years, the majority of individuals had a mild (46.9%) complexity grade and easy and moderate complexity grade was observed 1.8 and 2.1 times less frequently, respectively. Only a very small proportion of individuals in this age group had a very difficult complexity grade. Overall, we conclude that the orthodontic treatment complexity grade tended to increase with age. In our study, the ages of the included 335 participants ranged from 12 to 13 years, which may lead to a relatively lower ICON score according to Dr Ilga Urtane's study.36 A possible limitation of the present study is the use of a local panel of ICON examiners. A larger number of ICON raters would be necessary if we want to obtain a more accurate and a countrywide opinion.

The lower cutoff point may be due to the high prevalence of bimaxillary protrusion in our region. This condition has a relatively high prevalence in certain provinces of China.37 The molar relationship is usually class I, with the incisor overjet and overbite being normal. The ICON scores of cases with bimaxillary protrusion are often lower; however, facial aesthetics are not well perceived. Some patients with this malocclusion have severe jaw deformities, which can be solved only by orthognathic surgery. The treatment complexity and need are not comparable with the ICON score in this scenario. It would be more appropriate to consider bimaxillary protrusion separately when applying ICON.

The treatment needs for boys and girls were the same, although their self-perceived needs may be different, with girls feeling more need for treatment than boys.38

The distribution of the ICON scores in our study was not normal and did not obey the Gaussian distribution. A larger number of samples may lead to data that are in line with the Gaussian distribution.

ICON is recommended to be used with late mixed dentition and permanent dentition.6 We chose students in grade seven from 16 randomly selected middle schools in Chengdu with permanent dentition for the present study. For future studies in China, there is a need for a sample with wider age ranges to make the study findings applicable to a broader population. Multicentre clinical studies, especially studies including both northern and southern Chinese, are necessary if we want to make ICON a suitable index for assessing the orthodontic need in a larger geographic area of China.

We hope that the ICON system can be adopted in China based on this study, as this may help the provision of orthodontic treatment by guiding the limited Chinese health resources to those who need it most.

Conclusions

It was feasible to obtain good agreement with ICON among trained orthodontists. However, the international cutoff point (43) had poor sensitivity and specificity compared to the experts rating in southern China. Adjusting the cutoff point to 29 enhanced the sensitivity and specificity of ICONs in determining treatment need. Other studies are warranted to support or refute a lowering of ICON's cutoff point for treatment need in China. We may draw the following conclusions: when used to evaluate the treatment need of 12–13 year-olds in southern China, the international ICON cutoff value did not correspond well with Chinese orthodontists opinions of the need for orthodontics. In this study, a lower cutoff value (29) had greater sensitivity and specificity with respect to expert orthodontists' perception of treatment need.