Introduction

Laryngopharyngeal reflux disease (LPRD) is an inflammatory state of the upper respiratory tract tissue correlated with gastroduodenal content reflux, which induces morphologic changes in the upper respiratory tract histology1. It is accompanied by non-specific symptoms, such as throat clearing, pain, and hoarseness2. A previous study demonstrated that approximately 10.15% of patients in Chinese otorhinolaryngology outpatient clinics had LPRD3, and the incidence rate of LPRD has become increasingly higher since changes in diet structure and lifestyle4. Statistically, in the USA, the average direct cost for LPRD patients is $5,438, and drug spending is a major part of the first year5, which causes significant distress in quality of life (QoL)6. It follows that the timely and accurate diagnosis of LPRD is critical. Although 24-h multichannel intraluminal impedance-pH (MII-pH) monitoring is the gold standard for LPRD7, it places high demands on healthcare settings. Therefore, screening by symptom scales is still mainstream in clinical practice, especially in rural areas where laryngoscopy is unavailable. Nevertheless, previous studies found that the prevalence of symptoms, scale cutoff, sensitivity, and specificity values varied across areas8,9, making the symptom-base scales need to be explored on a region-by-region basis.

A previous study conducted by Lechien JR et al. observed that 53.9% of patients with primary burning mouth syndrome had nonacid or mixed laryngopharyngeal reflux (LPR), and the reflux contents can directly and indirectly damage the laryngopharynx or mouth mucosa10. Meanwhile, one study found that dry throat and mouth prevalence in LPRD patients is higher than 80%11. Another study by Becker et al.12 found burning mouth sensations in 58 of 120 (48.3%) patients with gastroesophageal reflux disease (GERD). However, the tissue susceptibility of the larynx and pharynx compared with the esophagus may explain why some patients experience “dryness and burning sensation in the laryngopharynx or mouth” symptoms of LPR even in the absence of heartburn symptoms13. Given that the symptom is lacking in the commonly used scales, such as Reflux Symptom Index (RSI) and Reflux Symptom Score-12 (RSS-12)14,15, this would result in underdiagnosis of some patients, and adding the symptom may partially address the diagnostic challenges.

Hence, it is necessary to supplement LPRD symptom-based scales more accurately and systematically across cultures, translate them into different country versions, and apply them to the largest number of LPRD patients. This study aimed to include the symptoms of “dryness and burning sensation in the laryngopharynx or mouth” into the RSI/RSS-12 scales and designed a cross-sectional survey at an otorhinolaryngology clinic in a tertiary hospital to assess the reliability, validity, and diagnostic efficiency alteration between pre- and post-change. Thus, it explores the role of the new symptom in the diagnosis of LPRD and the new scales’ optimal thresholds in the Chinese population.

Methods

Patients and design

This study enrolled patients who attended the Ear Nose Throat (ENT) outpatient clinic of Fujian Provincial Hospital from December 2021 to December 2022, using a random selection of patients whose clinic numbers numbered 5. The inclusion criteria were as follows: (1) age ≥ 18 years and (2) ability to accurately cooperate in completing the survey or understanding and following the investigator's instructions. The exclusion criteria were as follows: (1) previous history of throat surgery or laryngeal cancer, vocal cord leukoplakia, and other pharyngeal diseases; (2) alcohol dependence, pregnancy, neurologic or psychiatric illness, and upper respiratory tract infection within the last month and (3) allergies to PPI medications, and history of acid-inhibitory drugs, antibiotics, and hormones within the past four weeks. Questionnaires were distributed to the enrolled patients, and an instructor instructed patients to respond on the spot. The instructor promptly alerted patients to missing data to ensure a comprehensive and accurate response.

Volunteers were recruited to complete the laryngoscopy examination and 24-h MII-pH monitoring among all outpatients who completed the scale. Eight weeks of proton-pump inhibitor (esomeprazole, 20 mg twice daily) treatment was administered to patients with LPRD according to 24 h MII-pH monitoring results. Follow-ups were conducted after treatment, and the definition as positive for the assignment of LPRD diagnosis was a 50% improvement in the RSI score16, as summarized in Fig. 1. Ethical approval was obtained for this study from the Fujian Provincial Hospital Ethics Committee (Ethics Review Approval No: K2021-11-011). All participants signed an informed consent form, and all methods were performed under the relevant guidelines and regulations.

Figure 1
figure 1

The flow of participants. 24 h MII-pH, 24-h multichannel intraluminal impedance-pH monitoring; PPI, proton-pump inhibitor.

Questionnaires and sample size calculation

The RSI scale is a 9-item self-reported questionnaire developed to assess the subjective perception of LPRD. Patients indicate severity on a scale of 0 (no symptoms) to 5(very severe)14. We introduced the 10th item (dryness and burning sensation in the laryngopharynx or mouth) to RSI and formed the RSI-10 scale. The RSI-10 scores range from 0 to 50.

The RSS-12 scale is a 12-item self-reported questionnaire and includes seven ear, nose, and throat (ENT) items, three digestive items, and two respiratory items. Within the previous month, the frequency, severity, and QoL aspects were evaluated, with frequency ranging from “I don’t have this complaint over the past month” to “complaint occurs daily” and severity ranging from “no symptoms” to “very severe at the time of the attack”. The responses on each aspect ranged from 0 to 515. We introduced the 13th item (dryness and burning sensation in the laryngopharynx or mouth) to RSS-12 scales and formed the RSS-13 scale. The total symptom score is the sum of the "frequency × severity scores" ratings for 13 symptoms, which range from 0 to 325. Meanwhile, the total QoL score ranges from 0 to 65.

Based on the scale principle that the sample size is at least 5 to 10 times that of item17, the RSS-13 scale consisted of 13 items, each with three aspects for severity, frequency, and QoL. It was assumed there would be 10–20% invalid questionnaires, so 500 questionnaires were distributed.

24 h MII-pH monitoring and laryngoscopy

The 91 volunteers received no food for at least 8 h and water for 4 h before the laryngoscopy and 24 h MII-pH monitoring examination. The 24 h MII-pH monitoring system (Jinshan Science and Technology, Chongqing, China) includes a recorder and monitoring catheter as well as analysis software, which has 4 impedance channels and 1 pH channel. The impedance channels started 0.5 cm above the upper end of the esophageal sphincter (UES) and were arranged 1 cm down sequentially. The pH channel is in the middle of the first two impedance channels. All operations were performed through laryngoscope positioning (Supplementary Fig. 1). The positive diagnostic criteria were as follows7: ≥ 1 LPR event or ≥ 5 full column reflux events [reflux 2 cm distal to the upper esophageal sphincter] per day.

Statistical analysis

The RSI, RSI-10, RSS-12, and RSS-13 scales were evaluated using multiple measures of reliability, validity, and diagnostic efficacy.

Reliability

The internal consistency reliability was evaluated within each scale using Cronbach's alpha and Spearman-Brown split-half reliability coefficients. Intragroup correlation coefficients (ICCs) were used to demonstrate test–retest reliability through reassessed the 30 patients without treatment after two weeks from the initial assessment. Cronbach's alpha, Spearman-Brown correlations, and ICCs ≥ 0.6 are acceptable, and those ≥ 0.8 are excellent18,19,20.

Validity

Criterion validity

Spearman’s correlation coefficient was used to analyze criterion validity through RSI-10/RSS-13 and RSI/RSS-12, weak (r < 0.3), moderate (0.3  r < 0.7), or strong (r ≥ 0.7)21.

Structural validity

To examine the factor structure of the RSS-12 and RSS-13 scales (item > 10), exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were examined. When Kaiser‒Meyer‒Olkin (KMO) value > 0.6 and Bartlett’s test of sphericity < 0.01 indicates sampling adequacy and suitability for factor analysis22. Data were divided into two groups according to the coded singles or doubles: CFA and EFA. In EFA, the factor loadings and cumulative variance contribution rate > 0.4 were considered satisfactory and interpretability23. In CFA, the model fit indexes were the comparative fit index (CFI), standardized root mean square residual (SRMR), and root-mean-square error of approximation (RMSEA)24. χ2/df < 3, RMSEA and SRMR < 0.06 indicate a good fit, > 0.10 indicate an inadequate fit, and recommend that the model be rejected25. CFI compares the target model with the original model, with values > 0.90, suggesting an acceptable fit26.

Diagnostic efficiency

The diagnostic ability of all scales were evaluated using the receiver operating characteristic (ROC) analysis, specifically by calculating the area under the curve (AUC) value27. A larger AUC indicates a higher accuracy in detecting the disease's presence, and a greater than 0.7 suggests a reasonable estimation. Furthermore, the AUC values were compared using DeLong’s test28.

Data were recorded by Epidata (version 3.1) and analyzed using SPSS 24.0 and AMOS 24.0 (SPSS Inc, Chicago, IL). Quantitative data are expressed as the mean ± standard deviation (\(\overline{x}\) ± s), and qualitative data are expressed as percentages (%). We performed the Mann‒Whitney U test for the nonparametric data. The chi-square test was used for categorical variable data. All statistical significance levels were considered bilateral, and a P < 0.05 indicated the presence of statistically significant differences.

Results

A total of 500 questionnaires were distributed, removing outlier data, and 479 valid questionnaires were obtained through screening, with an effective response rate of 95.60%. Of these, 237 were males, and 242 were females, ages 18 to 78, with a mean of 39.5 ± 13.4 years. The RSI scale score of 8.16 ± 6.95, RSI-10 scale score of 9.41 ± 7.67, RSS-12 scale symptom score of 29.8 ± 31.5, QoL score of 9.79 ± 10.43, and RSS-13 score of 33.28 ± 34.82, and QoL score of 9.79 ± 10.43 (Table 1).

Table 1 The score and correlation analysis of the RSI-10 and RSS-13 scales (N = 479).

Variation of reliability index

The Cronbach’s alpha coefficients of the RSI, RSI-10, RSS-12, and RSS-13 scales were calculated to be 0.788, 0.800, 0.811, and 0.825, respectively. Meanwhile, the split-half reliability coefficient improved from 0.841 to 0.851 compared to the RSI and RSI-10 scales, while the RSS-12 scale demonstrated an increase from 0.879 to 0.888 upon adding the new item. Additionally, the test–retest reliability assessments conducted on a cohort of 30 patients after a two-week interval yielded coefficients that rose from 0.956 to 0.967 when incorporating the new items in the RSI scale, as Table 2 shows.

Table 2 Reliability analysis and criterion validity.

Variation of validity index

The results of the correlation analysis showed strong correlations from RSI to RSI-10 and RSS-12 to RSS-13. The RSS-12 and RSS-13 scales' KMO values are > 0.06, and Bartlett's spherical values are < 0.01. In EFA, the cumulative variance contributions of the RSS-12/RSS-13 scale factors were 61.51% and 59.69%, respectively. Their factor standardized factor loadings ranged from 0.407 to 0.888, and the new item factor loading was 0.568 (Table 3). In CFA, the RSS-12 and RSS-13 standardized regression coefficients of the three factors were 0.42–0.93, all above the standard value of 0.4 (Fig. 2). Furthermore, the RSS-12 scale (χ2/df = 1.94, CFI = 0.95, RMSEA = 0.06, SRMR = 0.05) and RSS-13 scale (χ2/df = 1.95, CFI = 0.95, RMSEA = 0.06, SRMR = 0.05) both exhibited fitness indicators that met or closely approached the model criteria, indicating that the three dimensions are suitable for the diagnosis of LPRD (Supplementary Table 1).

Table 3 Exploratory factor analysis (EFA) of RSS-12 and RSS-13 (N = 239).
Figure 2
figure 2

Confirmatory factor analysis (standardized measurement model). (a) Reflux symptom score-12(RSS-12), (b) RSS-13; the oval and rectangle represent the latent variable and the measured item, respectively; e1 to e13 represent the residual terms of the measure term; the numbers on the arrows are the standardized factor loading values.

Diagnostic efficiency

91 patients participated in the 24 h MII-pH monitoring, with an average age of 34.01 ± 13.50 years, of whom 60 were diagnosed as positive (treated with PPIs for eight weeks), and only 4 had no effect after treatment. The mean age in the LPRD group was 30.54 ± 11.31 years, and 23 (35.0%) were females, with RSI, RSI-10, RSS-12, and RSS-13 on a median of 14, 16, 49, and 55, respectively. Of the 35 non-LPRD patients, have a mean age of 30.5 ± 11.3 years, and 18 (%) were male, with RSI, RSI-10, RSS-12, and RSS-13 on a median of 5, 5, 14, 15, respectively (Table 4). Besides, we found significant improvements in symptom scores after 8 weeks of PPI treatment, and a reduction in the incidence of the dryness/burning sensation from 75.0% to 21.4%.

Table 4 Demographic and clinical characteristics of the patients and following-up.

The AUC values of RSI, RSI-10, RSS-12, and RSS-13 scales were 0.83 (sensitivity 73.2%; specificity 88.6%), 0.84 (sensitivity 75.0%; specificity 91.4%), 0.84 (sensitivity 78.6%; specificity 82.9%), and 0.85 (sensitivity 69.6%; specificity 91.4%), respectively (Fig. 3). A statistically significant difference was observed between the RSI and RSI-10 scales, as determined by DeLong's test (P = 0.04). The maximum Youden index corresponding values of the RSI, RSI-10 RSS-12, and RSS-13 scales were calculated to be 0.618, 0.664, 0.614, and 0.611, respectively. In addition, the analysis revealed that the optimal cutoff values for these scales were 11, 13, 29, and 36, respectively (Supplementary Table 2).

Figure 3
figure 3

Receiver operating characteristic curve. (a) The receiver operating characteristic (ROC) curve of RSI and RSI-10; (b) The ROC of RSS-12 and RSS-13; P means using DeLong’s test.

Discussion

The symptom of “dryness and burning sensation in the laryngopharynx or mouth” may be a valuable addition to diagnosing LPRD. It ranks among the most common symptoms reported by patients visiting ENT outpatient clinics, and there was a notable decrease in incidence after 8 weeks of PPI treatment. Furthermore, our findings indicated that the RSI and RSS-12 scales exhibit high levels of reliability and validity, and adding the new symptoms enhanced these scales' reliability, validity, and diagnostic efficacy among Chinese populations.

Although the symptom of “Togue burning” was initially included among the items of the RSS scales, it was then eliminated when designing the short version of the instrument (RSS-12) for not reaching a 50% prevalence in European LPRD patients29. However, Lechien JR et al.10 observed that 53.9% of patients with primary burning mouth syndrome had nonacid or mixed LPR. While, the item of “dryness and burning sensation in the laryngopharynx or mouth” was one of the most frequently reported complaints in our study (75%), and this result aligns with a study conducted by Chen et al.11 in which the three most common extraesophageal symptoms in LPRD patients were globus sensation, dry throat/pharyngeal itching, and dry mouth in the Chinese population. The discrepancy between the Chinese and European populations may be due to the following reasons. Firstly, the geographical bias leads to dietary and lifestyle habits playing a vital role in manifesting symptoms. Secondly, the description of “Tongue burning” in the original RSS is inaccurate because the refluxate of LPR first injures laryngopharynx or mouth rather than tongue, and the mucous membranes of the laryngopharynx and mouth, both lateral and posterior, are susceptible to reflux burning than tongue. Therefore, we advocate for further qualitative investigations, aiming to better understand patient experience of dryness/burning.

Consistent with prior studies, the reliability of the RSI and RSS-12 scales was excellent30,31,32. It is worth noting that our research boasts the largest sample size in cross-sectional studies compared to the study mentioned above (479 vs. 273) and explores the diagnostic efficacy of the scale through a longitudinal study. Furthermore, our findings indicate that the inclusion of new items may enhance the RSI/RSS-12 scale's stability, internal consistency, validity and diagnostic efficacy, as demonstrated by multimethod approach, including Cronbach's alpha, Spearman-Brown split-half, test–retest, and factor analysis, etc.

Factor analysis was employed to assess the validity of the RSS-12 and RSS-13 scales. The results of the EFA revealed three factors (ENT, digestive tract, respiratory tract) that were consistent with the original scale15, indicating that the RSS-12 scales possessed satisfactory construct validity and that the symptoms of laryngopharyngeal reflux consist of 3 systems are appropriate. However, the seventh item of “bad breath” was distributed to the digestive tract rather than ENT. Some previous studies divided "bad breath" symptoms into oral and nonoral categories, and the pharynx and digestive tract are intimately related to the nonoral tract33. A recent study34 discovered that methyl mercaptan and hydrogen sulfide concentrations in the pharynx and upper esophagus are significantly higher in patients with pharyngeal reflux than in healthy individuals, which can evaporate “bad breath”. Therefore, “bad breath” items assigned to the digestive tract may be applicable in real life. Similarly, we noticed that “dryness and burning sensation in the laryngopharynx or mouth” was distributed to the respiratory tract. However, the larynx or mouth is anatomically part of the otolaryngology department. The result may also partly conform to reality because the respiratory tract and ENT overlap in the upper respiratory tract. Equivalent to Lechien’s hypothesis15, our CFA results showed that the three systems had a good structural validity, which makes us believe that the diagnosis of LPRD needs to be comprehensively evaluated by ENT, digestive tract, and respiratory tract examinations.

Previous research has indicated that the RSI and RSS-12 scales exhibit favorable diagnosability14,15. Furthermore, robust criterion validity was observed when comparing these scales with the RSI-10 and RSS-13, as evidenced by a strong correlation (r-value > 0.9, P < 0.01), demonstrating that the new symptoms highly agree with the original scale. In addition, we discovered that adding the new item allowed the differentiation of LPRD patients from non-LPRD subjects by plotting ROC curves. The AUC increased significantly between the RSI and RSI-10 scales (P < 0.05). Given that the new item is one of the most frequently reported complaints by patients and significant relief after PPI treatment, we concluded that it is an essential supplement for diagnosing LPRD (Fig. 3a). Additionally, Beletsky et al.12 discovered that 13 was the optimal RSI scale cutoff point, while in both Kamani35 and our studies, the values were 11. Varied diets, geographical circumstances, and even the frequency of symptom manifestation might lead to different results8,9. For this reason, future cross-sectional and longitudinal studies based on larger population samples are needed to determine the most appropriate cutoff values and the prevalence of dryness and burning sensations.

The RSS-12 scale encounters a comparable situation. Because the severity score is multiplied by the frequency score to obtain a symptom score ranging from 0 to 25 for each item, the cutoff of 11 in the RSS-12 scale was less than half of the maximum in one item, which might be too low to diagnose LPRD accurately. However, in our investigation, the cutoff value of the RSS-12 scale was 29, meaning that at least two severe symptoms are required to diagnose LPRD, avoiding overdiagnosis, such as due to symptoms of “abdominal pain or diarrhea” caused by gastroenteritis was misdiagnosed as LPRD, which can minimize the waste of resources. Although our RSS-12 scales exhibited a lower AUC, sensitivity, and specificity than the original study15, it is suitable for the Chinese population. Additionally, our team discovered that compared with the RSI-10, the RSS-13 took more time to complete, but there were no statistically significant differences in AUC between the RSS-12/RSS-13 and RSI-10 scales. Thus, the RSI-10 might be a more promising screening tool for LPRD, but it still needs translations in multiple cultures, and leveraging scales in e-health/m-health platforms for wider screening access to validity assessments. These additional studies will be instrumental in corroborating our findings and could potentially influence the development of global LPRD management guidelines.

Although the symptom of “dryness and burning sensation in the laryngopharynx or mouth” is so prevalent, there are several limitations. First, as a single-center study, selection bias is possible, and further studies are necessary at the international level (recruiting patients of different nationalities) using standardized protocols to confirm the results. Second, we found that LPRD patients with higher symptom scores may be more likely to participate in 24 h MII-pH monitoring, leading to a positive rate of 61.5%. Third, our study did not give the PPI treatment to patients with 24 h MII-pH negativity, which may lead to a missed diagnosis of false-negative patients.

Conclusion

Validated and accurate questionnaires are essential for the global study of patients with LPRD. The item “dryness and burning sensation in the laryngopharynx or mouth” is so prevalent that the RSI-10/RSS-13 scales have better reliability, validity, and diagnostic ability for evaluating patients with LPRD in the Chinese population. In addition, the RSI-10 might be a more promising screening scale.