Abstract
To evaluate the expiratory sounds produced during swallowing recorded simultaneously with videofluorographic examination of swallowing (VF) using fast Fourier transform (FFT), and to examine the relationship between dysphagia and its acoustic characteristics. A total of 348 samples of expiratory sounds were collected from 61 patients with dysphagia whose expiratory sounds were recorded during VF. The VF results were evaluated by one dentist and categorized into three groups: safe group (SG), penetration group (PG), and aspiration group (AG). The duration and maximum amplitude of expiratory sounds produced were measured as the domain characteristics on the time waveform of these sounds and compared among the groups. Time window-length appropriate for FFT and acoustic discriminate values (AD values) of SG, PG, and AG were also investigated. The groups were analyzed using analysis of variance and Scheffé's multiple comparison method. The maximum amplitude of SG was significantly smaller than those of PG and AG. The mean duration in SG (2.05 s) was significantly longer than those in PG (0.84 s) and AG (0.96 s). The AD value in SG was significantly lower than those in PG and AG. AD value detects penetration or aspiration, and can be useful in screening for dysphagia.
Similar content being viewed by others
Introduction
Japan is a super-aging society and has one of the longest life expectancies in the world. Dysphagia is a common pathologic condition in elderly people with or without geriatric diseases, such as stroke, Parkinson’s disease, and neurological disorders. Therefore, dysphagia assessments are widely performed to evaluate ingestion in several settings, such as medical fields and nursing facilities in Japan. The gold standards for the diagnosis of dysphagia are videoendoscopic examination of swallowing (VE) and videofluorographic examination of swallowing (VF). VE is probably the most frequently used tool for objective dysphagia assessment. It allows for the evaluation of the efficacy and safety of swallowing, determination of appropriate feeding strategies, and assessment of the efficacy of different swallowing maneuvers 1,2. However, the demerit of VE is that patients feel uncomfortable with a fiber passing through the nasal cavity during the examination. On the other hand, VF can be used to investigate the entire swallowing activity, from the oral stage to the esophageal stage2,3. The pathological, morphological, and functional aspects of swallowing are evaluated using VF findings. However, the inspection sites and study designs using VF have been limited because of radiation exposure. There are many alternative assessment methods for detecting dysphagia, such as cervical auscultation (CA)4,5,6, repetitive saliva swallowing test7, and water swallowing test7,8. CA has been used for detecting dysphagia and is a noninvasive screening tool for assessing aspiration, penetration, and pharyngeal retention9,10,11. It has been used widely in the clinical setting, including medical fields and nursing facilities12. Moreover, CA is used at meal rounds in nursing facilities and in home-visit medical care.
Several studies have evaluated the diagnostic accuracy of CA using subjective evaluation4,6,13,14. Studies focusing on acoustic data detected from the neck during swallowing attempts using VF or VE, reported that the diagnostic accuracy of CA for penetration or aspiration had a sensitivity of 62–94% and a specificity of 50–100%13,14,15. However, some studies have focused on the diagnostic accuracy of CA through objective evaluation. Hirano et al. analyzed the acoustic signals of expiratory sounds recorded by an accelerometer by octave band analysis, and reported that the 125.0-Hz band was the critical band for detecting dysphagia16. Yamashita et al. analyzed the acoustic signals of expiratory sounds recorded using a microphone by one-third octave band analysis, compared them with VF images, and calculated the value obtained by subtracting the averaged level of the 1000.0-Hz reference band of the expiratory sounds’ central frequency from the sound level of 125.0-Hz central frequency. They reported that the consistency rate of diagnosis for penetration/aspiration using expiratory sounds of 85.4% based on VF findings17. These studies investigated the relevance of the frequency characteristics for expiratory sounds and VF images using one-third octave band analysis or one octave band analysis to approach the auditory perception of humans. On the other hand, narrow-band analysis using the fast Fourier transform (FFT) can be used to analyze the characteristics of expiratory sounds in more detail than one-third octave band analysis or one octave band analysis. The aim of this study was to analyze expiratory sounds produced during swallowing recorded simultaneously with VF using the FFT focusing on the low frequency band, and to examine the relationship between the dysphagia and the acoustic characteristics of expiratory sounds.
Methods
Subjects
Patients with dysphagia and those who complained of dysphagic symptoms and underwent VF in the Department of Oral Rehabilitation, Showa University Dental Hospital were included in the study. The exclusion criteria were as follows: 1) patients wearing tracheal cannula; 2) patients who could not follow the instructions; 3) patients with fatigue, fever, and/or any other poor physical conditions that might influence swallowing function; and 4) patients who could not exhale constantly.
Compliance with ethical standards
All study protocols and procedures were conducted in accordance with the ethical guidelines and in compliance with the Declaration of Helsinki. Each subject gave written informed consent and the ethics committee of Showa University School of Dentistry granted approval for this study (no. SUDH0063).
Acquisition of acoustic data
At the same time as VF imaging, exhalation sounds were recorded. Before VF, the patient’s neck was cleaned with an alcohol pad, and the diaphragm chest piece of a double-faced stethoscope connected to a short tube with an inserted electret condenser microphone (Sanken Co., COS-11D-BP, Japan) was attached to a site over the lateral border of the trachea immediately inferior to the cricoid cartilage using a 1 cm2 piece of double-sided adhesive paper tape17 (Fig. 1). Moreover, residual secretions in the oral cavity, pharynx, and larynx of the patients were removed either by strong voluntary cough or by forced expiration. After an auditory impression by one dentist determined that the oral, pharyngeal, and laryngeal residues had been removed, the patients were instructed to practice exhaling three times with constant force. During VF, the patients swallowed test foods containing barium sulfate according to their swallowing ability. The patients were asked to exhale three times with constant force before the sample was inserted into the mouth and again three times immediately after swallowing the whole test sample. The detected acoustic signals, including expiratory sounds were amplified, digitally converted with 48 kHz sampling rate, and the number of quantization bits was 16 bits in wave file format. The characteristic of the electret condenser microphone (Sanken Co., COS-11D-BP, Japan) was flat from over 200 Hz of the whole.
Classification of subjects based on VF findings
The VF findings were analyzed by one dentist, who has been working in dysphagia rehabilitation for more than 10 years and was certified by the Japanese Society of Dysphagia Rehabilitation, and categorized into three groups: without penetration or aspiration (safe group: SG), with penetration (penetration group: PG), and with aspiration (aspiration group: AG).
Sound analysis
The maximum amplitude value of the expiratory sounds was measured by Multi speech 3000® (Pentax, Tokyo, Japan). The duration of expiratory sounds obtained from acoustic data were measured by auditory perception by one dentist. The expiratory sounds and noise were excluded.
FFT was used for sound analysis. In the analysis using FFT, it is necessary to use an appropriate sampling rate, because the use of an excessively high sampling rate will affect the frequency band and analysis time. For evaluation of CA, it was considered appropriate to use a frequency range from 62.5 Hz (slightly lower than the human audible range) to 4 kHz (the middle range). The wave files simultaneously recorded with VF images were analyzed at a sampling rate of 48 kHz; the number of quantization bits was 16 bits. After downsampling the wave files to a sampling rate of 8 kHz, we analyzed them using Multi speech 3000®. Expiratory sound before swallowing (ESBS) and expiratory sound after swallowing (ESAS) obtained from the wave files recorded simultaneously with the VF images were Hamming-windowed using four length-of-time windows, including acoustic signals with the maximum amplitude and transformed into frequency waveforms using FFT. The lengths of the time windows were 256 (0.032 s), 1024 (0.128 s), 2048 (0.256 s), and 8192 (1.024 s) (Fig. 2). There were differences in the length of the recording environment and the volume of the voluntary expiratory sounds for each subject. Therefore, normalization was performed by subtracting the average amplitude of the overall frequency band (62.5–2000 Hz) from the average amplitude of the lower frequency band (62.5–250 Hz). The following equation was used to find the average amplitude values for the low and total frequency bands (Fig. 3). The mean dB of each frequency band was obtained by
The acoustic discriminant vale (AD value) was calculated by subtracting the average volume of the total frequency band from the average volume of the low frequency band.
Statistical analyses
The maximum amplitude value, duration of expiratory sounds, and AD values for ESBS and ESAS in SG, PG, and AG were analyzed using analysis of variance (ANOVA) and Scheffe’s multiple comparison method. Results were accepted as statistically significant at the probability level of 5%. Data were analyzed using SPSS (ver.26.0 SPSS, Chicago, IL, USA).
Results
Details of subjects
Sixty-one patients (44 men) with dysphagia were included in this study. The details of the subjects (sex, age, and primary cause of dysphagia) are shown in Table 1. A total of 348 expiratory sound samples were collected in this study. Thirty samples were excluded from the analysis due to reasons, such as clothing abrasion and digital noise.
Classification of subjects
Subjects were divided into three groups based on VF findings. SG included 18, PG included 23, and AG included 20 subjects; SG included 91, PG included 148, and AG included 109 expiratory sounds samples.
Results of the maximum amplitude values for ESBS
The maximum amplitude values in SG, AG, and PG were 5521.28 (SD = 4335.64), 22231.12 (SD = 11019.50), and 19815.61 (SD = 11953.14), respectively (Table 2). The maximum amplitude in SG was significantly lower than those in PG and AG (Fig. 4).
Results of the maximum amplitude values for ESAS
The maximum amplitude value in SG, PG, and AG was 7538.24 (SD = 4460.23), 18751.82 (SD = 12159.01), and 17336.75 (SD = 11765.83), respectively (Table 2). The maximum amplitude in SG was significantly lower than those in PG and AG. (Fig. 4).
Results of the duration about expiratory sounds for ESBS
The mean of the duration of all expiratory sounds was 1.20 s (SD = 0.87). The mean of the duration of all expiratory sounds in SG, PG, and AG was 1.90 s (SD = 1.26), 0.81 s (SD = 0.33), and 1.08 s (SD = 0.47), respectively (Table 2). The mean of the duration of all expiratory sounds in SG was significantly longer than those in PG and AG (p = 0.000) (Fig. 4).
Results of the duration of expiratory sounds for ESAS
The mean of the length of all expiratory sounds was 1.20 s (SD = 0.96). The mean of the length of all expiratory sounds in SG, PG, and AG was 1.82 s (SD = 1.23), 0.85 s (SD = 0.39), and 1.26 s (SD = 1.16), respectively (Table 2). The mean of the length of all expiratory sounds in SG was significantly longer than those in PG and AG (p < 0.0001) (Fig. 4).
Results of the AD values for ESBS
In SG, the AD values for ESBS acquired using the 256, 1024, 2048, and 8192-points time windows were − 3.71 (SD = 5.00), − 2.84 (SD = 5.34), − 2.16 (SD = 5.67), and − 1.60 (SD = 4.96), respectively. In PG, the AD values for ESBS acquired using the 256, 1024, 2048, and 8192-points time windows were 2.38 (SD = 4.88), 1.42 (SD = 5.24), 2.18 (SD = 4.95), and 2.96 (SD = 4.68), respectively. In AG, the AD values for ESBS acquired using the 256, 1024, 2048, and 8192-points time windows were 2.78 (SD = 6.68), 4.00 (SD = 4.93), 4.82 (SD = 4.72), and 5.07 (SD = 4.72), respectively.
Regarding the results from the 256-, 2048-, and 8192-point time windows, the AD values of SG were significantly lower than those of PG (p = 0.001, p < 0.0001, and p = 0.003) and AG (p < 0.0001, p < 0.0001, and p < 0.0001). Regarding the results from the 1024-point time window, the AD value in SG was noted to be significantly lower than that in AG (p < 0.001); however, there was no significant difference compared with that in PG (p = 0.116) (Fig. 5).
Results of the AD values for ESAS
In SG, the AD values for ESAS acquired in the 256, 1024, 2048, and 8192-points time windows were 3.95 (SD = 5.39), -1.89 (SD = 6.00), -4.64 (SD = 5.46), and -2.39 (SD = 4.88), respectively. In PG, the AD values for ESAS acquired the 256, 1024, 2048, and 8192-points time windows were 1.21 (SD = 5.32), 1.21 (SD = 5.20), 1.07 (SD = 5.13), and 1.93 (SD = 5.32), respectively. In AG, the AD values for ESAS the 256, 1024, 2048, and 8192-points time windows were 1.68 (SD = 5.66), 3.43 (SD = 4.93), 3.65 (SD = 5.28), and 3.86 (SD = 5.57), respectively. Regarding the results of the 256, 1024, 2048, and 8192-points time windows, the AD value of SG was significantly lower than that of PG (256 points, p < 0.0001; 1024 points, p = 0.002; 2048 points, p < 0.0001; and 8192 points, p < 0.0001) and AG (256 points, p < 0.0001; 1024 points, p < 0.0001; 2048 points, p < 0.0001; and 8192 points, p < 0.0001) (Fig. 6).
At the 8192-point time window, 208 of 348 samples in the case could be analyzed for all of the expiratory durations. On the other hand, only four samples were analyzed for the entire expiratory duration at the 2048-point window, and no sample was able to cover the entire expiratory duration at the 1024- and 256- point time windows.
Discussion
Our research team focused on swallowing and respiratory sounds for diagnosing oropharyngeal dysphagia. In a series of our studies, methodology for detecting swallowing sounds was established9 and symmetry, reproducibility, and production sites of swallowing sounds were investigated19,20. Acoustic analysis of swallowing sounds was performed for examining the physiology of swallowing events or oropharyngeal dysphagic conditions by our research team and other researchers21,22,23,24. Effectiveness of acoustic and auditory characteristics of swallowing and expiratory sounds for detecting oropharyngeal dysphagia was evaluated by our research team17,18,25. However, former studies focusing on respiratory or expiratory sounds as a clinical tool for detecting oropharyngeal dysphagia were lacking in elucidating the acoustic characteristics of respiratory or expiratory sounds.
This study was conducted to clarify the acoustic characteristics of voluntary expiratory sounds by evaluating the maximum amplitude and duration of the expiratory sounds produced before and after swallowing attempts during VF. These acquired data were compared among three groups divided based on VF findings: safe swallow (SG), penetration (PG), and aspiration (AG). Frequency characteristics of the voluntary expiratory sounds were also evaluated using AD values and compared among SG, PG, and AG. Therefore, both the time- and frequency-domain characteristics of voluntary expiratory sounds produced before and after swallowing attempts during VF were obtained and compared among the groups.
The maximum amplitude in SG was significantly smaller than those in PG and AG and the mean duration in SG was significantly longer than those in PG and AG in this study. Dysphagic patients are thought to have difficulty in exhalation compared with normal subjects. A previous study showed that it was difficult for dysphagic patients to blow up an 80-cm-long party horn perfectly because of the close relationship between respiratory and swallowing functions26.
We used four separate time windows using the Hamming window to identify the frequency bandwidth and target time for analysis. Among the different time windows, the 256-point window showed the most significant inequality of AD values between SG and PG/AG, with an analysis time of 0.032 s. These results suggest that using a shorter analyzing time and including the maximum the wave pattern were important factors. However, there were significant differences in AD values between SG and PG/AG in the other window sizes as well. One of the possible reasons could be that windowing is not important for the low frequency band. It was thought that the duration of most acoustic samples used in this study were over 1 s, and 256 (0.032 s), 1024 (0.128 s), and 2048 (0.256 s) points were too short to analyze the sounds in detail. In this study, the entire expiratory sounds could be analyzed in 208 samples in 8192-point windows. Therefore, the best window size was 8192 points to perform frequency analysis in detail.
There were no significant differences in the AD values between the pre- and post-swallowing expiratory sounds of the SG and PG/AG in this study. Since the analysis was performed by a narrow band using FFT, it was possible to detect low frequencies that cannot be judged by human sensory correction characteristics. It is thought that because the residual secretions in the pharynx and larynx of subjects were cleared, either by strong voluntary cough or by forced expiration, before VF and analysis by FFT, AD values did not show significant differences between pre- and post-swallowing expiratory sounds between SG and PG/AG. This suggests that FFT had the ability to analyze expiratory sounds objectively. AD values acquired in this study might be a key to differentiate between dysphagic oropharyngeal condition and normal condition. The voluntary expiratory sounds were also verified as a useful target for detecting oropharyngeal dysphagia.
Further research is required for obtaining a more incisive discriminant index than the AD value for distinguishing between dysphagic and normal conditions. A large amount of expiratory sound data and sophisticated analyzing methods are needed to meet this requirement.
Conclusion
Analysis of expiratory sounds during swallowing using FFT provides a more detailed analysis of the low-frequency band compared to analysis using human auditory results. When analyzing the pre- and post-swallowing expiratory sounds using FFT, the frequency band of expiratory sounds was significantly lower in patients in PG/AG than in SG. The results suggested that the AD value detects penetration or aspiration and can be useful in screening for dysphagia. Among the time window sizes used in this study, the most effective was the 8192-point window when the sampling rate was 8000 Hz.
References
Espitalier, F. et al. International consensus (ICON) on assessment of oropharyngeal dysphagia. Eur. Ann. Otorhinolaryngol. Head Neck Dis. 135, S17–S21 (2018).
Pilz, W. et al. Observers’ agreement on measurements in fiberoptic endoscopic evaluation of swallowing. Dysphagia 31, 180–187 (2016).
Rugiu, M. G. Role of videofluoroscopy in evaluation of neurologic dysphagia. Acta Otorhinolaryngol. Ital. 27, 306–316 (2007).
Frakking, T. T., Chang, A. B., O’Grady, K. F., David, M. & Weir, K. A. Reliability for detecting oropharyngeal aspiration in children using cervical auscultation. Int. J. Speech Lang. Pathol. 19, 569–577 (2017).
Frakking, T. T. et al. The use of cervical auscultation to predict oropharyngeal aspiration in children: A randomized controlled trial. Dysphagia 31, 738–748 (2016).
Frakking, T. T., Chang, A. B., David, M., Orbell-Smith, J. & Weir, K. A. Clinical feeding examination with cervical auscultation for detecting oropharyngeal aspiration: A systematic review of the evidence. Clin. Otolaryngol. 44, 927–934 (2019).
Horii, N. et al. Validity of a dysphagia screening test following resection for head and neck cancer. Ir. J. Med. Sci. 190, 67–77 (2020).
Chen, P. C., Chuang, C. H., Leong, C. P., Guo, S. E. & Hsin, Y. J. Systematic review and meta-analysis of the diagnostic accuracy of the water swallow test for screening aspiration in stroke patients. J. Adv. Nurs. 72, 2575–2586 (2016).
Takahashi, K., Groher, M. E. & Michi, K. Methodology for detecting swallowing sounds. Dysphagia 9, 54–62 (1994).
Rumbach, A., Coombes, C. & Doeltgen, S. A survey of australian dysphagia practice patterns. Dysphagia 33, 216–226 (2018).
Bateman, C., Leslie, P. & Drinnan, M. J. Adult dysphagia assessment in the UK and Ireland are SLTs assessing the same factors?. Dysphagia 22, 174–186 (2007).
Vogels, B., Cartwright, J. & Cocks, N. The bedside assessment practices of speech-language pathologists in adult dysphagia. Int. J. Speech Lang. Pathol. 17, 390–400 (2015).
Borr, C., Hielscher-Fastabend, M. & Lucking, A. Reliability and validity of cervical auscultation. Dysphagia 22, 225–234 (2007).
Cichero, J. A. & Murdoch, B. E. Detection of swallowing sounds: Methodology revisited. Dysphagia 17, 40–49 (2002).
Leslie, P., Drinnan, M. J., Finn, P., Ford, G. A. & Wilson, J. A. Reliability and validity of cervical auscultation: A controlled comparison using videofluoroscopy. Dysphagia 19, 231–240 (2004).
Dudik, J. M., Jestrovic, I., Luan, B., Coyle, J. L. & Sejdic, E. A comparative analysis of swallowing accelerometry and sounds during saliva swallows. Biomed. Eng. Online 14, 3 (2015).
Kaoru, H., Koji, T., Risa, U. & Michi, K. Evaluation of accuracy of cervical auscultation for clinical assesment of dyphagia. Jpn J Oral Maxillofac Surg 47, 29–36 (2001).
Yamashita, M. et al. Acoustic characteristics of voluntary expiratory sounds after swallow for detecting dysphagia. J Oral Rehabil 41, 667–674 (2014).
Takahashi, K., Groher, M. E. & Michi, K. Symmetry and reproducibility of swallowing sounds. Dysphagia 9, 168–173 (1994).
Hiroshi, N., Koji, T., Risa, U., Kaoru, H., Miki, F., N M. Evaluation of the sites where swallowing sound produces and the acousitc characteristic of swallowing sound in healthy adults. J. Showa Univ. Dental Soc. 26, 163–174 (2006).
Santamato, A. et al. Acoustic analysis of swallowing sounds: A new technique for assessing dysphagia. J. Rehabil. Med. 41, 639–645 (2009).
Reynolds, E. W., Vice, F. L., Bosma, J. F. & Gewolb, I. H. Cervical accelerometry in preterm infants. Dev. Med. Child Neurol. 44, 587–592 (2002).
Khalifa, Y., Coyle, J. L. & Sejdic, E. Non-invasive identification of swallows via deep learning in high resolution cervical auscultation recordings. Sci. Rep. 10, 8704 (2020).
Furuya, N. et al. Identifying the swallowing using videoendoscopy findings in healthy adults. Showa univ. Japan med. Soc. 27, 271–284 (2015).
Nozue, S. et al. Accuracy of cervical auscultation in detecting the presence of material in the airway. Clin. Exp. Dent. Res. 3, 209–214 (2017).
Misako, H. Relationship between swallowing dysfunction and decreased respiratory function in Dementia patiens. J. Phys. Ther. Sci. 25, 941–942 (2013).
Acknowledgements
We would like to thank Cactus Communication and Editage for their support in editing this paper.
Author information
Authors and Affiliations
Contributions
All authors wrote the main manuscript text and Yoshiaki Ihara prepared all figures, S.H. prepared all tables. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hattori, S., Nozue, S., Ihara, Y. et al. A study of acoustic characteristics of voluntary expiratory sounds produced before and immediately after swallowing. Sci Rep 12, 765 (2022). https://doi.org/10.1038/s41598-021-04624-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-04624-7
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.