Introduction

Vital capacity is a fundamental parameter used to properly interpret lung function in clinical practice. The loss of functioning lung parenchyma contributes to decreased vital capacity in many nonobstructive lung disorders1. Moreover, vital capacity provides prognostic information and is associated with increased mortality in the elderly population2. Conventionally, a spirometer is generally used (Fig. 1a) to measure vital capacity; however, this approach is expensive and inconvenient because these devices must be used in hospital settings. Thus, home-based respiratory function monitoring has attracted considerable attention3,4. Moreover, this vital capacity measurement method has been improved and uses a smartphone connected to a device such as a flow sensor held in the mouth by the subject5,6. However, a flow sensor is required for measurement and requires a mouthpiece and a filter to prevent infection, which is costly. Thus, home respiratory function monitoring would be easier and cheaper if vital capacity could be estimated without requiring a device that touches the subject’s mouth.

Figure 1
figure 1

The current vital capacity measurement method and our cough sound-based method. (a) The current vital capacity measurement method. (b) The proposed vital capacity measurement method. (c) Data processing methods and neural networks analysis techniques for each model.

Vital capacity estimation has been studied for a long time. In a study published in 1948, Baldwin7 developed a multiple regression equation for predicting vital capacity, which depended on the individual’s characteristics, such as gender, age, and height. In 2006, the World Health Organization (WHO)8 deprecated the use of regression curves to predict references for biological measurements and recommended using the lambda-mu-sigma (LMS) method. This method allows simultaneous modelling of the skewness (lambda), which models the departure of the variables from normality using a Box‒Cox transformation, the mean (mu), and the coefficient of variation (sigma), for the analysis of its recently published growth standard. In 2012, the Global Lung Function Initiative announced the global lung function 2012 equations derived using the LMS method9. In this way, methods for calculating the reference value of vital capacity have been established and used all over the world10,11,12. The common point of each method is that aspects of the subject’s physical attributes, such as gender, age, and height, are used as explanatory variables. However, the vital capacity estimated using the LMS method (VCLMS) does not represent actual vital capacity for each individual and is merely a reference value based on the age and height of the individual13.

Previous studies have shown that vital capacity is related to cough strength, such as cough peak flow, which can be measured by a spirometer or peak flow metre14,15. The cough peak flow is an index of airway clearance ability and is related to the cough sound pressure level16,17, which can be easily measured by various microphones, such as condenser microphones, microphones in headsets17 and those built into smartphones 15. Moreover, the cough peak flow can be estimated via the cough sound pressure level17,18, which is referred to as the cough peak flow computed via cough sounds in this study. We hypothesize that the actual vital capacity of each individual could be estimated by using physical functions such as the cough peak flow computed via cough sound, which are related to both vital capacity and the subject’s physical attributes. If vital capacity can be estimated using cough sounds, flow sensors, mouthpieces, and filters would be unnecessary. More importantly, abnormal decreases in vital capacity could be detected by comparing the estimated vital capacity and the lower limit of the normal vital capacity9, which can be calculated by using the LMS method because pulmonary function varies with age, height, sex and ethnicity9.

Therefore, the purpose of this study was to estimate vital capacity using cough sounds (Fig. 1b). We employed an artificial neural network to estimate the vital capacity using the cough peak flow computed via cough sounds and VCLMS as inputs. Because it is well known that vital capacity changes nonlinearly with age and height, we hypothesized that the artificial neural network, which uses nonlinear transformations19 to estimate vital capacity, could be advantageous. The estimated vital capacity was then used to detect the decrease in vital capacity below the lower limit of the normal vital capacity.

Materials and methods

Participants and inclusion criteria

Table 1 shows the participants’ characteristics. A total of fifty-six participants were included. Twenty-five elderly (10 male and 15 female) and 31 young (19 male and 12 female) adults participated in the experiment. The elderly participants, aged 70 to 91, lived at home where they had been receiving routine healthcare services through private arrangements. The following were exclusion criteria: a history of lung disease, institutionalization, terminal illness, unstable acute or chronic disease, a score of less than 23 on the Mini-Mental State Examination20, inability to give informed consent, inability to walk independently or use of a cane, and neuromusculoskeletal impairment. The young group consisted of self-reported healthy participants with no previous cardiovascular or pulmonary diseases. Participants who failed to manoeuvre the respiratory function test were excluded.

Table 1 Characteristics of the participants.

Ethical approval

This study was conducted in accordance with the amended Declaration of Helsinki. The Hiroshima Cosmopolitan University Institutional Review Board (No. 20200305) approved the protocol, and written informed consent was obtained from all participants.

Pulmonary function testing

Pulmonary function tests, as shown in Fig. 1a, were performed using a spirometer (Autospiro AS-507; Minato Medical Science Co., Ltd., Osaka, Japan) with the participants in a sitting position according to ATS/ERS guidelines21. Vital capacity was determined as the largest value from at least three acceptable manoeuvres. This measured vital capacity was utilized as the estimation target for the proposed model, as explained in subsequent sections. In addition, before the respiratory function test and cough sound measurement, an interview was conducted to check for respiratory symptoms such as acute upper respiratory tract infection or changes in physical condition. Moreover, the order of the respiratory function test and cough test was randomly assigned to minimize the effects of bias due to the measurement order, and the interval between the two tests was one week.

Cough sound measurement system

Figures 1b,c show the cough sound measurement system. The experiment was performed as previously described17 using an in-ear microphone to measure cough sounds. A previous study on cough peak flow estimation via cough sound measurements reported that an in-ear microphone is suitable for measuring cough sounds due to the constant distance between the mouth of the sound source and the microphone17. The electret condenser microphone (in-ear microphone, ECM-TL3; Sony Corporation, Japan) was attached to the right ear canal. The measured sound signals were digitized using a 16-bit analogue-to-digital converter (PowerLab16/35, AD Instruments, Inc., Dunedin, New Zealand) at a 100 kHz sampling rate set by analysis software (LabChart version 8, AD Instruments, Inc.), and stored on a personal computer. The digitized cough sound signal was band-pass filtered between 140 and 2000 Hz to minimize artefacts caused by heart sounds and muscle interference (see Fig. 1c).

Cough sound measurement protocols

Following thorough instructions on the coughing method provided to participants, three trials of maximal voluntary coughing were performed during each 20-s measurement period. Adequate rest periods were provided between each trial to minimize the potential impact of fatigue.

Feature extraction

A respiratory physiotherapist with expertise in respiratory diseases extracted a 5-s segment of cough sound from each 20-s measurement period. The selected 5-s segment included the maximum cough sound in all cases. The sound pressure level, measured in dB, was subsequently determined using the following equation:

$$L_{P} \left( t \right) = 20\log_{10} \left\{ {\frac{{V_{r} \left( t \right)}}{{P_{0} V_{s} }}} \right\},$$
(1)

where \({V}_{r}\left(t\right)\) represents the measured voltage value, t is the discrete measurement time in the cough sound period, \({P}_{0}=20\) µ Pa is the reference sound pressure, and \({V}_{s}={10}^{\left(S/20\right)}\) is the voltage output per Pa. Here, \(S=-35.0\mathrm{ dB }(0\mathrm{ dB }= 1\mathrm{ V}/1\mathrm{ Pa})\) is the sensitivity of the in-ear microphone. The maximum sound pressure level was calculated for each acceptable trial as follows:

$$L_{p,}^{\left( i \right)} = \max \left( {L_{P} \left( t \right)} \right) ,$$
(2)

where the superscript (i) indicates the trial. Finally, the cough sound pressure level \(SPL\) was determined based on at least three acceptable trials as follows:

$$SPL = \max \left( {L_{{p,{\text{max}}}}^{\left( i \right)} } \right),$$
(3)

Cough peak flow is a cough strength parameter that can be estimated via cough sounds, namely, CPS16,17. Specifically, it is calculated based on the cough sound pressure level and participant age by using the following Equation18:

$$CPS = \left( {a_{1} + a_{2}\,age} \right) \left( {e^{\beta SPL} - 1} \right),$$
(4)

where \({a}_{1}, {a}_{2}\) and \(\beta\) are constant parameters determined based on a nonlinear optimization scheme18.

Proposed model

A previous study reported that vital capacity is related to cough strength15. We hypothesized that the measured vital capacity can be estimated by correcting the VCLMS value, which reflects the height and age of a subject, using the cough peak flow computed via cough sounds. Here, VCLMS in Liter can be estimated based on the previous literature12, as shown in Eqs. (5) and (6):

$${\text{Male}}:\,VC_{LMS} = {\text{exp}}\left( { - 8.8317 + 2.1043\,{\text{ln}}\left( h \right) - 0.1382\,{\text{ln}}\left( a \right) + m - s} \right),$$
(5)
$${\text{Female}}:\,VC_{LMS} = {\text{exp}}\left( { - 8.0707 + 1.9399\,{\text{ln}}\left( h \right) - 0.1678\,{\text{ln}}\left( a \right) + m - s} \right),$$
(6)

where h represents the participant’s height, a represents their age and m-s is the age-specific contribution from the spline function9.

To examine the linear relationships between the measured vital capacity and cough peak flow computed via cough sounds, partial correlation analysis was carried out. We then proposed the use of a neural network-based model to estimate the measured vital capacity from VCLMS and the cough peak flow computed via cough sounds (see Fig. 1c and Eq. 4). To estimate the measured vital capacity, a three-layer feedforward perceptron was employed, which was composed of an input layer using an identity activation function, a hidden layer using a hyperbolic tangent function, and an output layer using an identity activation function. The number of units in the input layer was equivalent to the dimension of the input vector \({\varvec{I}}={\left[CPS, {VC}_{\mathrm{LMS}}\right]}^{T}\in {\mathbb{R}}^{2}\). The output layer, which was used to estimate the vital capacity, was configured with a single unit. The number of units in the hidden layer was set as a hyperparameter, denoted by H. Accordingly, the models included a total of 4H + 1 weight and bias parameters, which were trained using an error backpropagation algorithm. The objective of the training process was to minimize the root mean squared error, which was calculated as follows:

$$\sqrt {\left( {\frac{1}{N}\sum\nolimits_{i}^{N} {\left( {\widehat{{VC_{i} }} - VC_{i} } \right)^{2} } } \right)} ,$$
(7)

where \({\widehat{VC}}_{i}\) and \(V{C}_{i}\) represent the estimated and measured vital capacity from observation i, respectively, and N is the total number of observations.

To determine the optimal number of units in the hidden layer, a nested cross-validation method was utilized22. The outer loop of this method involved training the model based on N − 1 observations and evaluating its accuracy based on the remaining observation. Moreover, the inner loop divided the N − 1 observations into two datasets, with one half of the data used to train the model using different unit numbers (H = 1, 2, 3) and the other half utilized to assess the estimation accuracy. This process was repeated for all possible combinations of training and test sets, and the optimal value of H was selected based on the highest accuracy achieved. The analyses were conducted using IBM Statistical Package for Neural Networks (SPSS) version 26.

Verification of the estimation accuracy

To validate the efficacy of the proposed model, we compared its estimation accuracy with two different methods. First, the VCLMS and the vital capacity estimated by the proposed model, NNVCCPS, were compared to verify the combined effectiveness of employing the neural network-based model and the cough peak flow computed via cough sounds. To evaluate the effectiveness of converting the cough sound pressure level to the cough peak flow computed via cough sounds with Eq. (4), the input vector was modified from \({\varvec{I}}={\left[CPS, {VC}_{\mathrm{LMS}}\right]}^{T}\) to \({\varvec{I}}={\left[SPL, {VC}_{\mathrm{LMS}}\right]}^{T}\). Hereafter, the vital capacity estimated based on the SPL and VCLMS inputs is referred to as NNVCSPL (see Fig. 1c). The estimation accuracy was evaluated using the mean square error \(\frac{1}{N}\sum_{i}^{N}{\left(\widehat{V{C}_{i}}-V{C}_{i}\right)}^{2}\), where \({\widehat{VC}}_{i}\) and \(V{C}_{i}\) represent the estimated and measured vital capacity from observation i, respectively, and N is the total number of observations, and the Spearman’s rank correlation coefficient between the measured vital capacity and the estimated vital capacities (VCLMS, NNVCSPL, and NNVCCPS). In addition, the absolute reliability of the model was investigated using regression analysis and the Bland‒Altman analysis method to detect systematic errors, such as fixed and proportional bias23,24. The Wilcoxon signed-rank test and, the Friedman and Holm tests25,26 were used for the comparison, and p < 0.05 was considered significant.

Detecting abnormal decreases in vital capacity

The LLN, which can be calculated using the LMS method, represents the lower limit of the normal vital capacity, and a vital capacity less than this limit is diagnosed as respiratory dysfunction. Therefore, we detected abnormal vital capacity when the estimated vital capacities (NNVCCPS and NNVCSPL) were less than the LNN and verified the discrimination accuracy using the area under the receiver operating characteristic27 curve (AUC). The AUCs resulting from NNVCCPS and NNVCSPL were compared using the DeLong test, where p < 0.05 was considered significant.

Statistical analyses other than the neural network analysis were performed with IBM SPSS version 26 and EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan)28, which is a graphical user interface for R (the R Foundation for Statistical Computing, Vienna, Austria).

Results

Relationships between vital capacity and the body structure parameters and cough peak flow

Table 2 shows that the partial correlations between the vital capacity and cough peak flow, age, and height were 0.286 (p = 0.038), − 0.718 (p < 0.001), and 0.683 (p < 0.001), respectively. Because there was no significant partial correlation between vital capacity and weight, we excluded this parameter from the input for the neural network-based model for estimating vital capacity.

Table 2 Partial correlation analysis results, n = 56.

Vital capacity estimation accuracy

Leave-one-out cross-validation analysis showed that the root mean squared errors of the NNVCSPL and NNVCCPS against the measured vital capacity were 0.165 L and 0.112 L, respectively. Figure 2 shows the relationships between the measured vital capacity and the estimated vital capacities, indicating correlation coefficients of 0.924 (p < 0.001) for VCLMS, 0.909 (p < 0.001) for NNVCSPL, and 0.944 (p < 0.001) for NNVCCPS. Figure 3 shows the corresponding Bland‒Altman plots. Neither NNVCSPL nor NNVCCPS showed systematic errors, but VCLMS showed a fixed bias (one sample t test; p < 0.001) and a proportional bias (r = − 0.414; p = 0.002). Furthermore, the Friedman and Holm tests showed significant differences in the squared error between VCLMS and NNVCSPL, VCLMS and NNVCCPS, NNVCSPL and NNVCCPS (median 0.308 L2 vs. 0.100 L; p = 0.001, 0.308 L2 vs. 0.052 L; p < 0.001, 0.100 L2 vs. 0.052 L2; p = 0.037, respectively) (see Fig. 4a). In young participants, the Friedman test showed no significant differences in the squared error (p = 0.198) (see Fig. 4b); however, among elderly participants, the Freidman and Holm tests showed significant differences in the squared error between the VCLMS and NNVCSPL, VCLMS and NNVCCPS (0.548 L2 vs. 0.110 L2; p < 0.001, 0.548 L2 vs. 0.034 L2; p < 0.001, respectively) (see Fig. 4c). Figure 5 demonstrates the results of comparing the squared error between generations. The Wilcoxon signed-rank test showed significant differences in the squared error of VCLMS between young and elderly participants (0.130 L2 vs. 0.548 L2; p < 0.001) (see Fig. 5a); however, there were no significant differences in the squared error for NNVCSPL between generations (see Fig. 5b). Although there was no significant difference in the NNVCCPS between generations, the squared error among the elderly participants was approximately 40% lower than that of the young participants (see Fig. 5c).

Figure 2
figure 2

Estimation accuracy of VCLMS, NNVCSPL and NNVCCPS. (a) Plot of VCLMS versus the measured vital capacity. (b) Plot of NNVCSPL versus the measured vital capacity. (c) Plot of NNVCCPS versus the measured vital capacity. The linear regression lines are drawn in red, and the corresponding equations are shown in the lower part of each figure. The left lower side shows the correlation coefficients and p values for each scatter plot.

Figure 3
figure 3

Bland‒Altman plots of the measured and estimated vital capacities. (a) The estimation accuracy of VCLMS. (b) The estimation accuracy of NNVCSPL. (c) The estimation accuracy of NNVCCPS. The horizontal line shows the mean of the measured and estimated vital capacities. The red line shows the mean of the measured and estimated vital capacities. The vertical line represents the difference between the measured and estimated vital capacities. The bold black solid lines represent the mean differences between the measured and estimated vital capacities, and the green dotted lines represent ± 2 standard deviations of the mean differences.

Figure 4
figure 4

Comparison of the squared error between the previous and proposed models. (a) Results of all participants, n = 56. (b) Results of young participants, n = 31. (c) Results of elderly participants, n = 25. The p value was adjusted using the Holm test25,26 for multiple testing.

Figure 5
figure 5

Comparison of the squared error between generations. (a) Results of VCLMS. (b) Results of NNVCSPL. (c) Results of NNVCCPS. Young participants, n = 31. Elderly participants, n = 25.

Detection accuracy of abnormal decreases in vital capacity

The DeLong test showed a significant difference in the AUC between the NNVCSPL and NNVCCPS (0.578 vs. 0.831; p = 0.002, respectively) (see Fig. 6). The true positive and false negative rates of the NNVCCPS were 0.731 and 0.269, respectively.

Figure 6
figure 6

Comparison of the AUCs of NNVCSPL and NNVCCPS using the DeLong test. AUC; area under curve. The purple line represents the NNVCSPL. The blue line represents the NNVCCPS.

Discussion

This study aimed to develop a simple vital capacity evaluation method. To the best of our knowledge, this was the first study that estimated vital capacity based on cough sound. The proposed method demonstrated that an accurate vital capacity can be estimated for different individuals by using VCLMS and the cough peak flow computed via cough sounds. In addition, we found that an abnormal decrease in vital capacity, which is associated with respiratory dysfunction, can be detected using the proposed vital capacity estimation method, with an AUC of 0.831.

First, to determine the input to the neural network-based model, the relationships between the vital capacity and different physical attributes and the cough peak flow were analysed using partial correlations. The results showed that cough peak flow, age, and height were significantly correlated with vital capacity. These relationships were consistent with those found in previous studies14,15. Height and age were used as independent variables for calculating the reference value of the vital capacity (VCLMS) via the LMS method. The LMS method has the advantage of reflecting age-dependent changes in respiratory function because a nonlinearly smooth fit of the vital capacity over the entire age range can be predicted9. Thus, VCLMS includes information about both age and height. In addition, a previous study reported that vital capacity is related to cough peak flow15, which can be estimated via the cough sound pressure level, and its estimated value is CPS17. For these reasons, we hypothesized that vital capacity can be estimated by correcting the VCLMS value using the cough peak flow computed via cough sounds or the cough sound pressure level. Thus, two neural network-based models were constructed: the first model uses VCLMS and the cough peak flow as inputs, and the other model uses VCLMS and the cough sound pressure level as inputs.

The experimental results showed that NNVCCPS led to the best estimation accuracy among the three methods, and no systematic error was observed (see Fig. 3c). Furthermore, Eq. (4) incorporates an age factor, indicating that an equation that uses the cough peak flow computed via cough sounds as an input is less susceptible to the effects of aging. While the cough peak flow computed via cough sounds and cough sound pressure level are related, as shown in Eq. (4), they differ in that the age factor is included in the formula for computing the cough peak flow computed via cough sounds (Eq. 4) but not in the cough sound pressure level formula (Eq. 3). Previous studies reported that vocal fold function, a crucial factor in coughing, is negatively impacted by aging29,30. Thus, it is plausible that NNVCCPS could effectively suppress the effects of aging on vocal cord function and enable the detection of decreased vital capacity. In spirometer measurements, if the difference in the vital capacity between the largest and second largest manoeuvre exceeds 0.150 L, the measurement is considered a failure, and additional trials should be performed21. In this study, the mean relative difference between the measured vital capacity and NNVCCPS was 0.008 L ml (95% CI − 0.082 to 0.098), which is lower than the standard value for additional trials. A recent investigation employing dynamic chest radiography estimated the forced vital capacity, yielding a correlation coefficient of 0.86 (95% CI 0.79 to 0.90) between the measured and estimated values31. Similarly, a recent study examining forced vital capacity estimation via vocal analysis in patients with amyotrophic lateral sclerosis reported a correlation coefficient of 0.8, with a mean absolute error of 0.54 L32. Our study focused on measuring slow vital capacity, which involves slow expiration, while previous studies measured forced vital capacity, which involves fast expiration with effort. However, despite the differences in the measurement methods and participant attributes, the estimation accuracy of our proposed model is expected to be better than or at least equivalent to that of methods proposed in prior investigations. Therefore, the proposed method could have sufficient accuracy and be useful in screening tests.

We also attempted to detect abnormal decreases in vital capacity using the estimated NNVCCPS. The efficacy of this approach was confirmed, with a high AUC of 0.831 (see Fig. 6). In a previous study that discriminates restrictive impairment of lung function, spirometry values were used to calculate the difference between lung age and actual age. This method showed an AUC of 0.89133, which is slightly higher than the proposed model. Nonetheless, the proposed model significantly outperforms previous studies in terms of ease of measurement.

It should be noted that the false-negative rate was high, and there was a possibility of missing respiratory function decline. This suggests the effect of cases with loud cough sounds but low lung capacity. Respiratory muscle strength and cough strength have been shown to be positively correlated15,34. Thus, respiratory muscle strength could affect the cough sound pressure level, which was used to calculate the estimated cough peak flow computed via cough sounds. The estimation accuracy could be improved further to reduce the false-negative rate, such as by adding variables related to respiratory muscle strength to the input layer in the neural network.

The findings of this study suggest that the vital capacity of individual participants can be estimated with a neural network analysis approach using VCLMS and the cough peak flow computed via cough sounds as inputs. Unlike linear models, the neural network-based model can handle nonlinear changes and was suggested to be promising for respiratory monitoring in a previous study35. However, the participants of this study were limited to young and elderly people without underlying diseases based on self-report. The cough peak flow computed via cough sounds used in this study, which reflects cough force, is calculated based on cough sounds. Because the cough sound may be affected by the accumulation of secretions such as sputum, narrowing of the airway due to some diseases, or inadequate closure of the glottis, it is unclear to what extent the accuracy of vital capacity estimation may be affected. Therefore, it is necessary to clarify the effects of secretions and diseases on the accuracy of vital capacity estimation in future studies. In addition, although this study estimated only vital capacity, it is necessary to estimate measures that reflect obstructive ventilation disorders, such as forced vital capacity, one-second volume, and peak flow, to construct a more comprehensive respiratory function estimation system. Moreover, to apply the proposed method to patients in home environments, it would be more convenient to measure cough sounds with a smartphone. However, it has been found that the measurement accuracy of smartphones is lower than that of in-ear microphones17. Thus, additional studies are needed to implement the proposed method for estimating vital capacity on smartphones. In addition, it is essential to improve the proposed method in the future so that the error between the measured vital capacity and the estimated vital capacity is minimized; then, the same cut-off reference value could be applied. Nonetheless, it should be noted that the proposed method is presented as a screening method, and a conclusive diagnosis must be based on a thorough examination at a medical institution.