Abstract
Glucose levels in the body have been hypothesized to affect voice characteristics. One of the primary justifications for voice changes are due to Hooke’s law, in which a variation in the tension, mass, or length of the vocal folds, mediated by the body’s glucose levels, results in an alteration in their vibrational frequency. To explore this hypothesis, 505 participants were fitted with a continuous glucose monitor (CGM) and instructed to record their voice using a custom mobile application up to six times daily for 2 weeks. Glucose values from CGM were paired to voice recordings to create a sampled dataset that closely resembled the glucose profile of the comprehensive CGM dataset. Glucose levels and fundamental frequency (F0) had a significant positive association within an individual, and a 1 mg/dL increase in CGM recorded glucose corresponded to a 0.02 Hz increase in F0 (CI 0.01–0.03 Hz, P < 0.001). This effect was also observed when the participants were split into non-diabetic, prediabetic, and Type 2 Diabetic classifications (P = 0.03, P = 0.01, & P = 0.01 respectively). Vocal F0 increased with blood glucose levels, but future predictive models of glucose levels based on voice may need to be personalized due to high intraclass correlation.
Similar content being viewed by others
Introduction
Glucose is an essential energy source for the human body. It is obtained through the consumption of carbohydrates, from storage as glycogen in the liver, or from the breakdown and subsequent conversion of fat molecules. It is transported and distributed throughout the body via the bloodstream and regulated through metabolic hormone (i.e. insulin and glucagon) homeostasis. Insulin results in decreased blood glucose levels by signaling to the body’s cells to take up glucose, whereas glucagon signals to release more glucose into the blood. In healthy individuals, glucose levels are tightly controlled and sustained periods of hypoglycemia (low blood glucose) or hyperglycemia (high blood glucose) can indicate pathology. Unregulated blood glucose values or impaired insulin action can cause the development of metabolic disorders such as diabetes1.
Identifying reliable biomarkers for glucose is critical, given diabetes’ considerable impact on the population and its status as a leading cause of death. Current glucose monitoring methods, such as finger prick tests and continuous glucose monitors, are invasive and come with several limitations. These include the discomfort or pain associated with finger pricks, the inconvenience of having to rely on physical devices for monitoring, and the costs involved with purchasing the devices. Early detection through non-invasive, accessible methods, such as analyzing changes in voice, could greatly enhance monitoring and management, potentially reducing the high prevalence of undiagnosed diabetes and its associated risks.
Previous research has found that there may be detectable indicators of physiological changes within the voice. High blood pressure and hypertension, lung function, and heart rate have been shown to affect the properties of the voice2,3,4,5,6. There is limited but promising support of blood glucose levels affecting voice7,8,9,10,11,12,13,14. Furthermore, there have been indications that diseases such as type 2 diabetes (T2D) or cystic fibrosis related diabetes can affect the voice7,15,16, both of which are characterized by unregulated glucose levels.
The most prevalent hypothesis of voice modulation is that glucose concentrations affect the elastic properties of the vocal folds, subsequently altering the frequency of vocal fold vibration7,8. This hypothesis stemmed from Hooke’s law, in which the frequency of oscillation of a spring is proportional to the square root of the spring constant divided by the mass of the spring. This concept was expanded to simply reflect the properties of the vocal folds17 and is displayed as a proportional relationship in Eq. (1).
where f represents the frequency of oscillation of the vocal folds, T represents the tension of the vocal folds, l represents the vocal fold length, and m represents the vocal fold mass. Essentially, the equation depicts that the frequency of vocal fold oscillation and voice frequency rises with increased vocal fold tension and falls with increased vocal fold length or mass. In the current basis of understanding, vocal fundamental frequency is primarily controlled through changes in length or tension of the vocal folds through activation of the cricothyroid and thyroarytenoid muscles18,19,20.
There exists a gap in existing literature of the direct relationship between glucose and voice fundamental frequency. Although Hooke’s law is cited in numerous sources7,8, there is no exploratory research relating voice frequency values to glucose within an individual. Previous research has split continuous glucose levels into discrete groups (hypoglycemia and hyperglycemia compared to normal glucose levels)7,9,10 or evaluated the continuous effects of blood glucose levels using a combination of vocal features11. Hypo- and hyper-glycemia classifications are useful clinically for managing glucose levels, but do not provide information on the overall effect of glucose variation on the voice in each individual, as hypo- and hyper-glycemia may only occur in 0.5 and 15% of a non-diabetic population, respectively21. Furthermore, researchers have evaluated voice features as a tool for the prediction (i.e. they built a predictive model for blood glucose levels) rather than determining the associative effects of glucose on the voice, which is the principal understanding of the effect of the voice on physiological changes12,13,14.
In this manuscript we provide a foundation for past and future research on the relationship between voice and glucose. We propose a sampling methodology to capture the overall behavior of blood glucose levels using only a few data recordings per day. The linear relationship between voice fundamental frequency and glucose was evaluated and discussed in the context of Hooke’s law for a more concrete understanding of what processes may occur for vocal changes. Moreover, we investigate this relationship independently for non-diabetic (ND), prediabetic (PD), and T2D individuals, enhancing our understanding of glucose effects on the voice across different pathologies. By doing so, we aim to evaluate two research questions:
-
1.
Is the proposed sampling methodology sufficient for characterizing the overall glucose distributions for each diabetic group?
-
2.
Is there a distinct linear relationship between the voice fundamental frequency and glucose levels?
Methods
Recruitment and data collection
524 (347 male, 177 female) participants were recruited for the study in India. Participants were classified as ND, PD, T2D based on HbA1c levels and diagnosed by a practicing physician according to guidelines set by the American Diabetes Association22. Participant age, BMI, heart rate, blood pressure, and HbA1c were measured and recorded at recruitment. All participants were non-smokers, were fluent in English, and had no diagnosis of acute illnesses (e.g., upper respiratory infections) or chronic medical conditions other than T2D. All participants signed informed consent. The study protocol was approved by three ethics committees (Jasleen Hospitals Ethics Committee, Mavens Institutional Ethics Committee & Saanvi Ethical Research LLP; Clinical Trial Identifier: CTRI/2021/08/035957). All methods were conducted in accordance with relevant guidelines and regulations.
Participants were outfitted with a FreeStyle Libre Pro Continuous Glucose Monitor (CGM) and instructed to record their voice in a quiet environment into a custom voice recording app on participant’s personal smartphones. For 2 weeks, participants were instructed to record up to six times daily, corresponding to before and after breakfast, lunch, and dinner. Voice recordings were recorded at a sampling rate of 44.1 kHz and stored as uncompressed WAV files. They were uploaded from the app to a secure Google Firebase server, where they could only be accessed via a private API key by our researchers. Participant cell phone models are reflected in Appendix 1. All participants were anonymized and were assigned an alphanumeric participant ID corresponding to the serial number on the CGM device. No identifying information was collected in the voice recordings. Upon completion of the study, all voice recordings were paired to the participant’s CGM data, to the glucose recording with the closest recording time. As glucose data was measured every 15 min, this allowed all voice recordings to be within 7.5 min of a glucose recording. Participants were excluded if they had a CGM sensor error, an error in the voice collection app, or recorded less than 5 voice recordings over the entire study period. In total, 242 ND (152 male, 90 female), 89 PD (62 male, 27 female), and 174 T2D (122 male, 52 female) were included in the analysis, for a total of 505 participants (336 male, 169 female).
Audio recording specifications
The fixed speech segment “Hello, how are you? What is my glucose level right now?” was selected for participants to record at each recording time. A fixed phrase was selected rather than a sustained vowel to characterize the natural frequency of participants’ voices during speech. Fixed sentences in general have had success in characterizing and predicting blood glucose levels and T2D in previous work8,16. Furthermore, fundamental frequency obtained from sustained vowel sounds has been shown to have a decreased test–retest reliability and may not be an accurate representation of habitual fundamental frequency23. Spoken sentences can also contain information on frequency variation and intonation within speech that would not be accessible in a single phoneme or sustained vowel24. These particular speech segments were selected based on positive preliminary results observed with a speech segment that employed rising intonation11.
Fundamental frequency (F0) extraction
From the speech segments, we chose to assess two features: the mean fundamental frequency (F0) and the standard deviation of the fundamental frequency (vF0). Averaging the fundamental frequencies across these phrases provides a broader insight into habitual fundamental frequency, rather than attempting to extract F0 from a single phoneme.
The F0 and vF0 from each voice recording was extracted using Parselmouth (Version 0.4.3), a publicly available Python integration for Praat (Python Version 3.10.8)25,26.
Statistical analysis
Demographic and physiological variable values are displayed as mean +/− standard deviation. Glucose values when comparing data distributions are displayed as median (first quartile–third quartile). Individual interquartile ranges (iR) of glucose levels and voice variables were calculated by taking the difference between the first and third quartiles of F0, vF0, and glucose levels for each participant.
Two-way analysis of variance (ANOVA) was performed for each dependent variable (the average number of recordings per participant, the range of glucose values per participant, F0, and vF0) to assess differences in the recording values, with T2D group and biological sex as group factors. The interaction effects of T2D group and biological sex were also included in the analysis. Two-way ANOVA was also performed in the iR values between diabetic groups. Tukey’s Honestly Significant Difference (HSD) test with the Bonferroni correction was performed post hoc on significant ANOVA results, between all interactions between sex and T2D group if the interaction was significant in the ANOVA, and between T2D group if the interaction was not significant but differences in T2D group were. Differences between the entire CGM glucose data and sampled data corresponding to voice recordings were assessed using Cliff’s Delta effect size (δ). Interpretation of Cliff’s δ was conducted according to thresholds used by Bais and van der Neut27, such that δ < 0.11 indicates a negligible effect, 0.11 ≤ δ < 0.28 is a small effect, 0.28 ≤ δ < 0.43 is a medium effect, and δ ≥ 0.43 indicates a large effect.
Changes in F0 and vF0 relative to glucose measurements were assessed using a linear mixed model (LMM), with glucose and biological sex as fixed effects and participant ID and time of day as random effects. Three additional models for F0 were fit for each diabetic class. Overall, there were five mixed models fit, and the Bonferroni correction was applied such that all presented P-values obtained from LMM were multiplied by five. Intraclass correlation (ICC) and marginal and conditional correlation coefficients (R2) were calculated using methodology from Nakagawa, Johnson and Schielzeth28, and are calculated using Eqs. (2), (3), and (4).
where σ2α is the between-group variance, and σ2ε is the residual (within-group) variance, and σ2F is the fixed effect variance. Voice recording time of day was labeled as morning (4:00–11:59 at the time of the recording), afternoon (12:00–19:59), or night (20:00–3:59) to account for a potential time-dependency in the vocal fold physiology (e.g., vocal fold edema from reflux while sleeping). Biological sex is a binary fixed effect variable (0 for female, 1 for male).
All statistical analysis was performed in RStudio (Version 2023.06.1 + 524). Statistical significance is defined as P < 0.05.
Results
Patient and demographic information
Participant demographic data is displayed in Table 1. Recording, and voice parameter information of participants is displayed in Table 2. Some participants recorded fewer than the instructed six recordings per day over 2 weeks, so ANOVA was used to ensure no significant differences existed between diabetic classes in terms of compliance with the recording protocol. Indeed, there were no significant differences in the average number of recordings per participant across diabetic groups (P = 0.94). Looking at the range of glucose values for each participant between diabetic groups (Table 2), all three groups (ND, PD, T2D) were different with a statistical significance after post hoc analysis (ND and PD P = 0.002, all other P < 0.001). The interactions between sex and number of recordings, and sex and the glucose range were not significant (P = 0.85 and P = 0.77, respectively).
F0 and vF0 were significantly different between male and female ND, PD, and T2D (P < 0.001). Post-hoc, F0 obtained from female ND voice samples was 13.3 and 17.2 Hz higher than PD and T2D F0 respectively (PD CI 10.0–16.5 Hz, P < 0.001; T2D CI 14.5–19.8, P < 0.001), and F0 obtained from male ND voice samples was 4.0 and 3.5 Hz higher than PD and T2D (PD CI 1.7–6.2 Hz, P < 0.001; T2D CI 1.6–5.3, P < 0.001). There was a smaller difference between F0 in PD and T2D females (3.9 Hz higher in PD, CI 0.3–7.4 Hz, P = 0.03), and the difference between F0 in PD and T2D males was not significant (P = 0.99). Female ND vF0 was 2.44 and 2.35 Hz higher than female PD vF0 (P = 0.02) and T2D vF0 (P = 0.003). There were no significant differences between vF0 in PD and T2D females (P > 0.99) or any male diabetic group (all P > 0.1).
To assess intra-individual variability of values, the interquartile range for each individual (iR) was calculated for F0, vF0, and glucose recordings within each individual (Fig. 1). There was no significant difference between the iRs of ND, PD, and T2D for both F0 and vF0 (Fig. 1a and b, ANOVA P = 0.39 for F0 and P = 0.94 for vF0). However, the median glucose iRs of PD and T2D were 1.33 and 2.52 times higher than ND, respectively (Fig. 1C, post-hoc Tukey HSD P = 0.001 comparing ND and PD, all other P < 0.001). There was no interaction between biological sex and F0 iR, vF0 iR or glucose iR (P = 0.56, P = 0.54, P = 0.43, respectively).
Data recording distributions
We wanted to ensure the sampled glucose values corresponding to collected voice recordings were representative of the entire glucose distributions from all CGM data points. We aggregated CGM glucose data across all participants within each diabetic group (ND, PD, and T2D), creating a comprehensive glucose profile for each group (Fig. 2a–c). The median glucose level for ND individuals was 76.0 mg/dL (IQR: 65.0–89.0 mg/dL), the median glucose value for PD was 86.0 mg/dL (IQR: 73.0–105.0 mg/dL), and the median glucose for T2D was 149.5 mg/dL (IQR: 105.0–211.0 mg/dL). The glucose values corresponding to voice samples were also compiled into glucose profiles (Fig. 2d–f). The median glucose value of sampled data corresponding to voice recordings for ND individuals was 79.0 mg/dL (IQR: 68.0–93.0 mg/dL), the median glucose value for PD was 92.0 mg/dL (IQR: 78.0–113.0 mg/dL), and the median glucose for T2D was 149.0 mg/dL (IQR: 107.0–206.0 mg/dL). Although the quartiles are slightly elevated in the sample data compared to the comprehensive data set in ND and PD, the overall difference between the datasets are small to negligible (Cliff’s δ of ND = 0.10, PD δ = 0.14, T2D δ = − 0.004). Furthermore, when the distribution of glucose data associated with voice recordings was compared with glucose data from CGM, there was very close alignment of the data (Fig. 2g–i).
Relationship between F0 and glucose in spoken sentences
From the LMM analyses, we determined the relationship between blood glucose levels and voice pitch (F0 and vF0) within an individual. We found that there was a significant association between F0 and blood glucose levels. As shown in Table 3, 1 mg/dL increase in blood glucose level corresponded to a 0.02 Hz increase in the pitch (CI 0.01–0.03 Hz, Conditional R2 = 0.80, P < 0.001). There was no association between glucose level and vF0 (P > 0.99, Table 3).
The relationship between glucose and F0 carried across groups ND, PD, and T2D. A mixed model with F0 as an outcome variable fit exclusively on ND individuals had a notable association, in which a 1 mg/dL increase in glucose corresponded to a 0.03 Hz increase in the pitch (CI 0.01–0.06 Hz, Conditional R2 = 0.82, P = 0.03, Table 4). F0 models fit exclusively to PD and T2D data also performed well, with a slope of 0.04 Hz (CI 0.02–0.07 Hz, Conditional R2 = 0.77, P = 0.01) for PD, and a slope of 0.02 Hz (CI 0.01–0.03 Hz, Conditional R2 = 0.78, P = 0.01, Table 4) for T2D. Since vF0 was not significant in all individuals, it was not assessed by T2D group to reduce the number of statistical tests performed.
Discussion
This study provides foundational knowledge and insights for a better understanding of glucose-driven changes in voice signals. We assessed the modulation of fundamental frequency in relation to glucose levels on an individual basis and determined there was a small significant positive relationship between glucose levels and voice fundamental frequency. Furthermore, the sampling regime used to collect the voice data was able to recreate the entire CGM glucose distribution of ND, PD, and T2D individuals, allowing for an assessment of overall glucose behavior in a sampled dataset.
Overall, the proposed sampling regime was successful in recreating the glucose distributions for ND, PD, T2D populations. The corresponding glucose quartiles of the collected samples were slightly higher than the data collected from CGM, which could be a result of lower CGM values during non-recording periods such as sleep29. Nonetheless, there was a very small effect size between the sampled and comprehensive CGM data, and the shape of the distributions were preserved in the sampled data. Future studies attempting to capture glucose levels in real world scenarios could implement this sampling regime for discrete data sampling, provided they are aware of the elevation in glucose.
Our findings indicate alterations in voice signals associated with the changes of blood glucose levels. We observed a significant positive relationship between continuous glucose levels and mean fundamental frequency. For all participant diabetic groups (ND, PD, T2D), fundamental frequency increased when glucose levels increased. Research has indicated that F0 changes can rarely be attributed to changes in vocal fold mass alone18, so instead, we focus on potential changes to vocal fold length and tension to speculate on the physiological effects of glucose on vocal F0. Referring to Hooke’s Law, either increased vocal fold tension or decreased vocal fold length could contribute to a rise in fundamental frequency. High blood glucose has been associated with decreased muscle strength30,31. Due to F0 being a result of the interacting effects of the cricothyroid and thyroarytenoid muscles, an increase in F0 may be recorded if this decrease in strength were to manifest in the thyroarytenoid muscle20. However, such a hypothesis cannot be substantiated by the presented analysis and must be tested in future work. In terms of the vocal folds themselves, high blood glucose levels could cause cellular dehydration, similar to what occurs in severe metabolic conditions like Hyperglycemic Hyperosmotic State32. Studies on systemic dehydration in the vocal folds indicate that F0 may increase at high levels of dehydration due to vocal fold stiffness, although this effect was negligible in normal phonetic conditions33. Moreover, dehydration has been associated with a 5% reduction in muscle volume34. Reduced thickness of the folds (defined as the ratio between the width and length of the vocal folds) has been observed in hemodialysis patients experiencing a decrease in total body water content35. Thinner, shorter vocal folds are associated with increased F0, so this may be a potential mechanism to describe the observed results36. Finally, increased subglottal pressure has been linked to a rise in F036. While this parameter does not directly affect the vocal folds, it could influence the ease with which the vocal folds vibrate, thereby impacting the observed fundamental frequency. However, there has been an association with increased glucose levels and decreased respiratory function37, so this hypothesis is less justified without direct observation. More research is needed to confirm these hypotheses, and vocal folds may need to be observed via Laryngoscopy or Stroboscopy to validate these findings. Taken together, there could be an interplay between blood glucose levels, vocal fold physiology, and voice characteristics, which warrants the potential for using voice signals as biomarkers to predict changes in blood glucose levels. Further investigation using a larger data set is required.
Notably, individuals with T2D exhibited a decreased relationship between voice and glucose compared to their non-diabetic counterparts. This result may be attributed to potential interactions arising from complications associated with T2D. As seen in Table 2, fundamental frequency decreases between ND and T2D, a result supported in previous studies38,39. One hypothesis behind these acoustic alterations is that they are a result of common complications of T2D such as edema40,41. The interplay of T2D-related complications (decreasing F0) may counteract the impact of blood glucose effects (increasing F0) and lead to an observed decrease in effect. Furthermore, the persistent elevation of blood glucose levels characteristic of T2D might trigger an adaptive response, mitigating the discernible effects on fundamental frequency.
There is a strong relationship between glucose values and fundamental frequency, however there is some inherent variation in the values. Multiplying the average blood glucose ranges for each class with the respective rate of change of the fundamental frequency, the range of glucose values would only account for a 2–4 Hz change in F0 for all diabetic groups. This only accounts for around 15% of the median F0 individual interquartile ranges for all diabetic groups, so using F0 alone as a predictive marker for glucose levels would be insufficient. That being said, future research using F0 paired with other voice features (likely also lying in the frequency domain) may find success in prediction. Given the elevated ICC, it becomes evident that the influence of glucose modulation on voice features is highly individualized, indicating the need for personalized assessments in subsequent studies. Any future work involving glucose-related voice changes will likely require individual assessment or personalized prediction models in order to achieve a high accuracy.
An important caveat to these findings is the potential of vocal parameters, specifically F0, to be affected by external factors. Emotional and psychological state can affect F042,43,44, and edema from conditions such as upper respiratory infections, allergies, or other irritants can lower F045,46,47. Conditions such as gastroesophageal reflux disease or pathologies related to the thyroid and its associated hormones can also result in F0 changes48,49. These factors must be taken into careful consideration while conducting future research, particularly due to the small reported effect of glucose on voice F0. Additionally, the chosen phrases likely exhibit increased F0 because of the rising intonation characteristic of questions. Future research should explore the effect of glucose on phrases that are not questions.
Overall, the frequency of the voice has a small but significant relationship to glucose levels when evaluated within an individual. For future studies intending to build a prediction model based on vocal features, results from this study indicate that a model must be built on a per-individual basis. Furthermore, F0 alone is unlikely to be able to predict blood glucose levels, although there is a distinct linear relationship. Other vocal features are likely necessary to build a successful prediction model.
Data availability
The dataset generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Bano, G. Glucose homeostasis, obesity and diabetes. Best Pract. Res. Clin. Obstet. Gynaecol. 27(5), 715–726. https://doi.org/10.1016/j.bpobgyn.2013.02.007 (2013).
Ankışhan, H. Blood pressure prediction from speech recordings. Biomed. Signal Process. Control 58, 101842. https://doi.org/10.1016/j.bspc.2019.101842 (2020).
Shankar, O. & Lohiya, B. V. Cardiovocal syndrome—A rare presentation of primary pulmonary hypertension. Indian Heart J. 66(3), 375–377. https://doi.org/10.1016/j.ihj.2013.12.055 (2014).
Alam, M. Z. et al. Predicting pulmonary function from the analysis of voice: A machine learning approach. Front. Digit. Health 8(4), 750226. https://doi.org/10.3389/fdgth.2022.750226 (2022).
James, A. P. Heart rate monitoring using human speech spectral features. HCIS 5, 1–2. https://doi.org/10.1186/s13673-015-0052-z (2015).
Poleshenkov, D. & Basov, O. Application of method of extracting pulse rate from speech signal in absence of priori information about speaker to improve traffic safety. Transp. Res. Procedia 1(50), 545–551. https://doi.org/10.1016/j.trpro.2020.10.065 (2020).
Suppakitjanusant, P. et al. Predicting glycemic control status and high blood glucose levels through voice characteristic analysis in patients with cystic fibrosis-related diabetes (CFRD). Sci. Rep. 13(1), 8617. https://doi.org/10.1038/s41598-023-35416-w (2023).
Sidorova, J., Carbonell, P. & Čukić, M. Blood glucose estimation from voice: First review of successes and challenges. J. Voice 36(5), 737-e1. https://doi.org/10.1016/j.jvoice.2020.08.034 (2022).
Czupryniak, L. et al. 378-P: Human voice is modulated by hypoglycemia and hyperglycemia in type 1 diabetes. Diabetes https://doi.org/10.2337/db19-378-P (2019).
Michaelis, P. R. Detection of extreme hypoglycemia and hyperglycemia based on automatic analysis of speech patterns. US patent US 7(925,508):B1 (2011).
Tschöpe, C., Duckhorn, F., Wolff, M. & Saeltzer, G. Estimating blood sugar from voice samples: a preliminary study. In 2015 International Conference on Computational Science and Computational Intelligence (CSCI) 804–805 (IEEE, 2015). https://doi.org/10.1109/CSCI.2015.184
Rasmusson, J., Karlsson. P. C., Svensson, M., Nilsson, C. & Eklund, J. Inventors; Sony Group Corp, assignee. Method and device for blood glucose level monitoring. United States patent US 11,363,974. (2022).
Motorin, V. Scientific solutions for the parameter’s automation in biochemical and biomechanical processes of the operational estimation of blood glucose from human voice. Theory Pract. Mod. Sci. 7, 214–26 (2016).
Jeon, J., Palanica, A., Sarabadani, S., Lieberman, M. & Fossat, Y. Biomarker potential of real-world voice signals to predict abnormal blood glucose levels. bioRxiv. (2020).
Sidorova, J. & Anisimova, M. Impact of diabetes mellitus on voice: A methodological commentary. J. Voice 36(2), 294-e1. https://doi.org/10.1016/j.jvoice.2020.05.015 (2022).
Kaufman, J. M., Thommandram, A. & Fossat, Y. Acoustic analysis and prediction of type 2 diabetes mellitus using smartphone-recorded voice segments. Mayo Clin. Proc. Digit. Health 1(4), 534–544. https://doi.org/10.1016/j.mcpdig.2023.08.005 (2023).
Park, M. C. Understanding the multi-mass model and sound generation of vocal fold oscillation. AIP Adv. 9(10), 105002. https://doi.org/10.1063/1.5113911 (2019).
Titze, I. R. Vocal fold mass is not a useful quantity for describing F0 in vocalization. J. Speech Lang. Hear. Res. 54(2), 520–522 (2011).
Hirano, M. Morphological structure of the vocal cord as a vibrator and its variations. Folia phoniatrica et logopaedica 26(2), 89–94 (1974).
Chhetri, D. K., Neubauer, J., Sofer, E. & Berry, D. A. Influence and interactions of laryngeal adductors and cricothyroid muscles on fundamental frequency and glottal posture control. J. Acoust. Soc. Am. 135(4), 2052–64. https://doi.org/10.1121/1.4865918.PMID:25235003;PMCID:PMC4188037 (2014).
Hasanloei, M. A. et al. Non-diabetic hyperglycemia and some of its correlates in ICU hospitalized patients receiving enteral nutrition. Maedica 12(3), 174 (2017).
American Diabetes Association Professional Practice Committee 2. Classification and diagnosis of diabetes: Standards of medical care in diabetes-2022. Diabetes Care 45, S17–S38. https://doi.org/10.2337/dc22-S002 (2022).
Fitch, J. L. Consistency of fundamental frequency and perturbation in repeated phonations of sustained vowels, reading, and connected speech. J. Speech Hear. Disord. 55(2), 360–3. https://doi.org/10.1044/jshd.5502.360 (1990).
Moon, K. R., Chung, S. M., Park, H. S. & Kim, H. S. Materials of acoustic analysis: sustained vowel versus sentence. J. Voice 26(5), 563–565. https://doi.org/10.1016/j.jvoice.2011.09.007 (2012).
Jadoul, Y., Thompson, B. & De Boer, B. Introducing parselmouth: A python interface to praat. J. Phon. 71, 1–15. https://doi.org/10.1016/j.wocn.2018.07.001 (2018).
Boersma, P. & Weenink, D. Praat: Doing phonetics by computer [Computer program]. http://www.praat.org/ (2011).
Bais, F. & van der Neut, J. Adapting the Robust effect size cliff's delta to compare behaviour profiles. Surv. Res. Methods. 16(3), 329–352. https://doi.org/10.18148/srm/2022.v16i2.7908 (2022).
Nakagawa, S., Johnson, P. C. & Schielzeth, H. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J. R. Soc. Interface 14(134), 20170213. https://doi.org/10.1098/rsif.2017.0213 (2017).
Liang, Z. Mining associations between glycemic variability in awake-time and in-sleep among non-diabetic adults. Front. Med. Technol. 4(4), 1026830. https://doi.org/10.3389/fmedt.2022.1026830 (2022).
Bavaresco, S. S. et al. comparison between muscle strength and flexibility of the lower limbs of individuals with and without type 2 diabetes mellitus. Fisioter. Pesqui. 18(26), 137–44. https://doi.org/10.1590/1809-2950/17024826022019 (2019).
Aminuddin, A. et al. The association between arterial stiffness and muscle indices among healthy subjects and subjects with cardiovascular risk factors: An evidence-based review. Front. Physiol. 12, 742338. https://doi.org/10.3389/fphys.2021.742338 (2021).
Pasquel, F. J. & Umpierrez, G. E. Hyperosmolar hyperglycemic state: A historic review of the clinical presentation, diagnosis, and treatment. Diabetes Care 37(11), 3124–3131. https://doi.org/10.2337/dc14-0984 (2014).
Wu, L. & Zhang, Z. Computational study of the impact of dehydration-induced vocal fold stiffness changes on voice production. J. Voice 38(4), 836–843. https://doi.org/10.1016/j.jvoice.2022.02.001 (2022).
Hackney, K. J., Cook, S. B., Fairchild, T. J. & Ploutz-Snyder, L. L. Skeletal muscle volume following dehydration induced by exercise in heat. Extrem. Physiol. Med. 1(1), 3. https://doi.org/10.1186/2046-7648-1-3.PMID:23849266;PMCID:PMC3707098 (2012).
Ori, Y. et al. Effect of hemodialysis on the thickness of vocal folds: A possible explanation for postdialysis hoarseness. Nephron Clin. Pract. 103(4), c144–c148. https://doi.org/10.1159/000092911 (2006) (Epub 2006 Apr 24 PMID: 16636582).
Zhang, Z. Cause-effect relationship between vocal fold physiology and voice production in a three-dimensional phonation model. J. Acoust. Soc. Am. 139(4), 1493. https://doi.org/10.1121/1.4944754.PMID:27106298;PMCID:PMC4818279 (2016).
Khafaie, M. A. et al. Role of blood glucose and fat profile in lung function pattern of Indian type 2 diabetic subjects. Multidiscip. Respir. Med. 14, 22. https://doi.org/10.1186/s40248-019-0184-5 (2019).
Pinyopodjanard, S. et al. Instrumental acoustic voice characteristics in adults with type 2 diabetes. J. Voice 35, 116–121. https://doi.org/10.1016/j.jvoice.2019.07.003 (2021).
Chitkara, D. & Sharma, R. Voice based detection of type 2 diabetes mellitus. In 2nd International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB) 83–87 (IEEE publications, 2016). https://doi.org/10.1109/AEEICB.2016.7538402
Low, S. et al. Higher ratio of extracellular water to total body water was associated with reduced cognitive function in type 2 diabetes. J. Diabetes 13, 222–231. https://doi.org/10.1111/1753-0407.13104 (2021).
Dewan, K., Chhetri, D. K. & Hoffman, H. Reinke’s edema management and voice outcomes. Laryngoscope Investig. Otolaryngol. 7, 1042–1050. https://doi.org/10.1002/lio2.840 (2022).
Protopapas, A. & Lieberman, P. Fundamental frequency of phonation and perceived emotional stress. J. Acoust. Soc. Am. 101(4), 2267–2277 (1997).
Bänziger, T. & Scherer, K. R. The role of intonation in emotional expressions. Speech Commun. 46(3–4), 252–267 (2005).
Guidi, A. et al. Automatic analysis of speech F0 contour for the characterization of mood changes in bipolar patients. Biomed. Signal Process. Control. 1(17), 29–37 (2015).
Longo, L., Pipitone, L. L., Cilfone, A., Gobbi, L. & Mariani, L., Reinke’s edema: New insights into voice analysis, a retrospective study. J. Voice. https://doi.org/10.1016/j.jvoice.2023.08.008 (2023). Epub ahead of print. PMID: 37716890.
Dworkin-Valenti, J. P. et al. Laryngeal inflammation. Ann. Otol. Rhinol. 2, 1058–1066 (2015).
Jackson-Menaldi, C. A., Dzul, A. I. & Holland, R. W. Allergies and vocal fold edema: A preliminary report. J. Voice 13(1), 113–122 (1999).
Groenewald, N. E. et al. Reflux symptoms and vocal characteristics in adults with non-organic voice disorders. S. Afr. J. Commun. Disord. 69(1), e1–e9. https://doi.org/10.4102/sajcd.v69i1.935.PMID:36331218;PMCID:PMC9634952 (2022).
Junuzović-Žunić, L., Ibrahimagić, A. & Altumbabić, S. Voice characteristics in patients with thyroid disorders. Eurasian J. Med. 51(2), 101 (2019).
Funding
The funding was supported by the Klick Inc.
Author information
Authors and Affiliations
Contributions
J.J. and Y.F. conceptualized and designed the study. J.K. analyzed the results and performed the statistical analysis. All authors (J.K., J.J., J.O., and Y.F.) contributed significantly to the drafting and editing of the manuscript, with Y.F. providing project administration and oversight. J.J. and Y.F. ensured compliance with ethical standards and obtained the necessary approvals for study conduct.
Corresponding author
Ethics declarations
Competing interests
JK, JJ, JO and YF are employees of Klick Inc., the source of funding for the project. JJ and YF are listed as inventors on patents corresponding to the prediction of glucose from voice (Systems and methods for generating models for determining blood glucose levels using voice, WO2022109714A1, and Systems, devices and methods for blood glucose monitoring using voice, WO2022109713A1).
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kaufman, J., Jeon, J., Oreskovic, J. et al. Linear effects of glucose levels on voice fundamental frequency in type 2 diabetes and individuals with normoglycemia. Sci Rep 14, 19012 (2024). https://doi.org/10.1038/s41598-024-69620-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-69620-z
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.