I was interested to read the paper by Erdem et al. [1] published in Spinal Cord, July 2017. The purpose of the authors was to investigate the validity and reliability of neurogenic bowel dysfunction (NBD) score. The study included 42 patients with spinal cord injury. Although they correctly applied Cronbach’s alpha coefficient to determine internal consistency, in test–retest reliability analysis, they reported a high correlations between test–retest total NBD score and also test–retest answers of each question (r = 1.000, P < 0.001). Pearson r just assesses the linearity and we can have a linear correlation with no reliability at all. Any shift in location and scale of the regression line, which indicates no reliability cannot be detected by Pearson r. Moreover, in reliability analysis, our approach should be individual based instead of global average and Pearson r cannot cover this approach [2, 3].

They also reported that the consistency of frequency distribution of all answers for each item was analyzed by kappa statistics and very high consistency was found (κ = 1.000, P < 0.001). It revealed acceptable reliability [1]. It is important to know that two important weaknesses of kappa are as follows. First, it depends on the prevalence in each category, which means it can be possible to have different k values having the same percentage for both concordant and discordant cells. Table 1 shows that in both (a) and (b) situations, the prevalence of concordant cells is 90% and of discordant cells, 10%; however, we get different kappa values (0.44 as moderate and 0.80 as very good, respectively). Kappa value also depends on the number of categories [2, 3].

Table 1 Limitation of kappa for comparison of two observers’ diagnoses with different prevalence in the two categories

Another methodological issue is differentiation between clinical importance and statistical significant. They reported that a statistically significant negative correlation was detected between bodily pain (r = −0.3, P = 0.01), general health (r = −0.5, P < 0.001), vitality (r = −0.6, P < 0.001), social role functioning (SF) (r = −0.7, P < 0.001), emotional role functioning (r = −0.6, P < 0.001), and mental health (r = −0.6, P < 0.001) subscales, whereas no significant correlation was found with physical functioning (PF) (r = −0.2, P = 0.13) and physical role functioning (RP) (r = 0.06, P = 0.67) subscales of SF-36 [1]. In both positive and negative association, we should emphasize on the strength of the association instead of statistical significant because the second issue dramatically depends on the sample size.

Finally they reported that Cronbach alpha coefficient for internal consistency of NBD score was 0.54. However, test–retest reliability of the Turkish version of NBD score was high. They concluded that the Turkish version of the NBD score is a valid and reliable instrument. Such conclusion may be a misinterpretation of the results [2, 3].

In conclusion, to assess reliability and validity, appropriate test as well as correct interpretation should be considered. In this letter, I discuss the limitations of their statistical and methodological approach and the lack of evidence for such a sweeping conclusion. Their conclusion should be supported by the above mentioned statistical and methodological issues. Otherwise, misdiagnosis and mismanagement of the patients cannot be avoided.