Area under the expiratory flow-volume curve: predicted values by artificial neural networks

Area under expiratory flow-volume curve (AEX) has been proposed recently to be a useful spirometric tool for assessing ventilatory patterns and impairment severity. We derive here normative reference values for AEX, based on age, gender, race, height and weight, and by using artificial neural network (ANN) algorithms. We analyzed 3567 normal spirometry tests with available AEX values, performed on subjects from two countries (United States and Spain). Regular linear or optimized regression and ANN models were built using traditional predictors of lung function. The ANN-based models outperformed the de novo regression-based equations for AEXpredicted and AEX z scores using race, gender, age, height and weight as predictor factors. We compared these reference values with previously developed equations for AEX (by gender and race), and found that the ANN models led to the most accurate predictions. When we compared the performance of ANN-based models in derivation/training, internal validation/testing, and external validation random groups, we found that the models based on pooling samples from various geographic areas outperformed the other models (in both central tendency and dispersion of the residuals, ameliorating any cohort effects). In a geographically diverse cohort of subjects with normal spirometry, we computed by both regression and ANN models several predicted equations and z scores for AEX, an alternative measurement of respiratory function. We found that the dynamic nature of the ANN allows for continuous improvement of the predictive models’ performance, thus promising that the AEX could become an essential tool in assessing respiratory impairment.


Results
We analyzed 3111 spirometry tests constituting the Cleveland group, which were randomly divided into a derivation/training (66%) and an internal validation/testing set (33%). In this group of tests originating from the USA, approximately 66% of the subjects were women; 87% of the tested individuals were White and 13% self-identified as Black. In addition, we analyzed 457 normal spirometry tests from Spain, which constituted the Madrid group. In this group, 61% were women, and all subjects were characterized as White. The main anthropometric characteristics and pulmonary function measurements of the two groups are shown in Table 1. Figure 1 shows the AEX distributions by gender and race, while Fig. 2 shows the relationship between AEX and the subject's age at the time of testing.
Next, we computed the AEX approximations called AEX 1 through AEX 4 from FVC, Peak Expiratory Flow (PEF), FEF 25 , FEF 50 and FEF 75 , based on the areas of the triangles and trapezoids delineated by these flows and volumes, as described elsewhere 7 . Then, we compared them with their predicted values, as derived from the main predictive equation sets for FVC, PEF and for the respective isovolumic flows (for the latter, we computed the same triangles and trapezoids' areas from the predicted values of the instantaneous flows and volumes). For comparison, we used European Community of Steel and Coal (ECSC), National Health and Nutrition Evaluation Survey (NHANES) III and the more recent Global Lung Initiative (GLI) formulas (Fig. 3). The AEX 1 , AEX 2 , AEX 3 and AEX 4 approximations of AEX based on one, two, three or four flows, respectively were very close to the actual AEX values (i.e., small deviance and dispersion, Fig. 3-dark grey box plots). First and as iterated before, these approximations are valuable when the pulmonary function software does not provide the actual AEX. All in-between group comparisons showed correlation coefficients > 0.97 and p < 0.0001 (Table 2), findings consistent with our prior investigations [7][8][9][10] . Second, we found that predicted AEX k (k = 1-4) based on the major equation sets overestimated on average the actual AEX or its approximations AEX k (k = 1-4)- Fig. 3, light grey box plots. Among the three predicted sets compared, the ECSC equations overestimated the AEX 1 through AEX 4 and, indirectly AEX, the most (correlation coefficients were the lowest, i.e., ~ 0.80, p < 0.0001).
In a side-by-side bar graph format, Fig. 4 illustrates the median and interquartile ranges (IQR) of AEX 1-4, actual AEX and the four predictive models for AEX, i.e., derived from the formulas published by Vermaak et al. 6 , Garcia-Rio et al. 11 , the current linear regression and the ANN-based models. Standard least square-based Scientific Reports | (2020) 10:16624 | https://doi.org/10.1038/s41598-020-73925-0 www.nature.com/scientificreports/ regression predictive equations for AEX developed de novo in the two groups combined found R 2 between 0.62 and 0.71, depending on the gender and race-based subset. In these models, weight was a predictive variable only in White men, while race, gender, age, and height remained significant predictors in all the other groups. Regression optimization by transforming the AEX variable for normalization and variance reduction (either by logarithmic or by gamma function transformation), and by using regression regularization techniques ('generalized regression') such as ridge penalty regression, single or double lasso (with or without adaptive features), and elastic net led to only minor improvements in Akaike Information Criterion (AICc, maximal delta 2324), generalized R 2 (maximal delta 0.01, up to 0.75), in Average Absolute Error (AAE, delta 0.24, ~ 2.11) or in the square root of the mean squared prediction error (RASE, likely one of the most important performance measurements here, with maximal delta 0.02, > 3.39) in the random validation subsets of the entire population of tests, by either tenfold crossvalidation or fixed rate holdback validation methods. For the ANN, we used as inputs the same parameters, i.e., age, weight, height, gender and race, and the output was AEX or its gender plus race-determined z scores [derived from the formula (X − Mean)/Standard Deviation]. As mentioned earlier (see full details here: Supplemental_Material_S1), the chosen neural network architecture included two 'hidden' layers, each containing three sigmodal, three linear and three gaussian activation function nodes. In our analyses, this represented the best architecture in the trade-off between performance and speed, bias and variance, underfitting and overfitting (see also Table 3, which shows the results of ANN ablation experiments). Expectedly, mean predicted AEX was larger in Whites vs. African Americans, and in men vs. women. The ANN-based model predicted the AEX with the highest accuracy, with a median difference of − 0.01 (IQR − 1.66 to 1.30) L 2 /s, and a correlation coefficient of 0.89. The residuals remained low in the external validation lot (Madrid group, Fig. 5a): median difference of − 0.36 (IQR − 1.66 to 1.30) L 2 /s, and a correlation coefficient of 0.76. The model performed well due to its small dispersion, without significant heteroscedasticity, i.e., residuals Table 1. Demographic and functional characteristics of the study participants. *Depicts AEX 2 , AEX 3 and AEX 4 obtained based on an Estimated FEV PEF **, as this variable was not available in the Madrid group. The formula used was: estimated FEV PEF ** = 0.157174 + 0.176439*FEV 1 and was derived in the Cleveland group by modeling the variable based on FEV 1 only. The coefficient of correlation between actual FEV PEF and Estimated FEV PEF ** was 0.54, standard deviation of the difference was 0.12, p < 0.0001). www.nature.com/scientificreports/ were not progressively larger at higher values. The model's R 2 ranged from 0.80 and 0.83 in the derivation/training and the internal validation/testing sets, and 0.55 in the external validation set (Fig. 5a). These were much higher than prior models' R 2 (regression-based), which ranged from 0.39 to 0.42 11 . More importantly, other measurements of model error (Fig. 5a) remained lower vs other regression techniques used. By contrast, in our analyses, the regression-based predicted AEX had a median difference of 0.12 (IQR − 1.90 to 2.03) L 2 /s, and a correlation coefficient of 0.86; in the external validation lot (Madrid group), the median difference was − 1.04 (IQR − 2.73 to 1.21) L 2 /s, and the correlation coefficient was 0.78. Similarly, in our ANN models, the AEX z score prediction, which is important for determining LLN, was also very robust (Fig. 5b). While all inputs were significant independent predictors, the most important factors (total effects, %) for predicted AEX were gender (28.6%), www.nature.com/scientificreports/ race (28.6%), height (21.6%) and age (20.5%), while for AEX z scores (which are computed by gender and race) were height (50.3%), age (30.7%) and weight (18.8%), respectively. Figures 5 and 6 show two possible modelling approaches by ANN methods. The approach shown in Fig. 5a,b is represented by models developed for AEX predicted and AEX z scores, respectively, on two thirds of the Cleveland group (derivation/training set) and verified on the rest of the subjects (internal validation/testing set), followed by validation (external validation) in the Madrid group. In this case, one can observe the classic 'cohort effect' , i.e., the model is 'overfitting' in the Cleveland group and it loses its precision when applied to another cohort, of different subjects. The alternative approach, which is shown in Fig. 6a,b, takes advantage of the adaptability or optimization functions of the ANN models, by mixing the two cohorts and deriving a model on ~ 50% of the subjects, followed by testing in 25% of the cohort (internal validation) and validation on the rest of the tests from the two groups combined. This allowed for better fitting models, in this case with larger R 2 (0.79-0.82) and improved precision of AEX predicted (consistently lower measurements of error/bias and dispersion). Figures 5 and 6 also show that the condition of homoscedasticity for the models is generally met, i.e., residuals remain roughly in the same range at higher values, with the exception of very few outliers.
In a more comprehensive one-on-one analysis of various variables, Table 2 illustrates the main differences (with 95% Confidence Intervals, CI) between observed AEX, computed AEX 1 through AEX 4 , predicted AEX values by previously published formulas 6,11 , and by the new regression and ANN-based models.

Discussion
The main finding of this article is that artificial neural networks (ANN) can provide a great alternative to traditional methodologies in computing normal predicted equations, as well as LLNs based on z scores, in this case applied to Area Under Expiratory flow-volume curve (AEX). The adaptive, machine learning model performed better than a de novo linear regression model (smaller dispersion) and was superior to two previously published equations for AEX 6,11 .
Traditional regression-based models used for deriving predictive equations for pulmonary function have been flawed by internal and external validity biases ('cohort effects'), or by various degrees of untrue assumptions Table 2. Mean differences (with 95% Confidence Intervals, CI) between actual AEX, AEX approximations (AEX 1 through AEX 4 ) and predicted AEX values by four different formulas (Vermaak et al. 6 ; Garcia-Rio et al. 11 , regression and artificial neural networks or ANN, 2020) in the training, testing and validation sets. While the deviance (central tendency) seems slightly larger in the ANN-based model, the dispersion is smaller vs regression-based model using the same parameters (gender, race, age, height and weight), based on RMSE (Root Mean Square Error), RASE (square root of the mean squared prediction error, calculated as the square root of the sum of squares error divided by n) and AAE (average absolute error). www.nature.com/scientificreports/ of normality, additivity or linearity 12 . For these reasons, we used here a more modern method of modelling, able to circumvent collinearities and non-linear relationships, and which can be used in spirometry reference equation derivation, i.e., the ANN. In addition, we found that this methodology outperformed more advanced regression regularization techniques in reducing the bias and the dispersion of the residuals. Nowadays, in an era of exploding computational capabilities, neural networks represent the backbone of many emerging artificial intelligence techniques, which could successfully be applied in our field [13][14][15][16] . We explored first a comparison between measured AEX and its approximations called AEX 1 , AEX 2 , AEX 3 and AEX 4. As described before 7 , these parameters are computed based on FVC and PEF (AEX 1 ); FVC, PEF and FEF 50 (AEX 2 ); FVC, PEF, FEF 25   Whiskers represent 25th-75th interquartile ranges (IQR). Table 3. Comparison of the Linear Regression (LR) using Standard Least Squares method, Generalized Regression (GR) model using a logarithmic transformation and the double-lasso method, and the main ablation experiments of the Artificial Neural Network (ANN) methods tried. The ablation study identified the 2 hidden-layer ANN design (i.e., each layer with three sigmodal, three linear and three Gaussian activation functions) as the best compromise between improved performance and processing speed (bold characters). *Using an additive sequence of 100 models based on a learning rate of 0.1. **Using for optimization a robust fit with a squared penalty method and transformed covariates. www.nature.com/scientificreports/ the most common, validated predictive equations such as ECSC 17,18 , NHANES III 19 and GLI 2 sets, stratified by gender and race to derive predicted values for AEX 1 through AEX 4 . We illustrate in Fig. 3 several salient findings of our investigation. First, we confirmed our previously published findings 7 , i.e., that AEX 1-4 are acceptable approximations of AEX (with great metrics of central tendency and dispersion for the estimations). The analyses were performed on a subset of subjects with normal lung function from the Cleveland group (in which inclusion was adjudicated by normal lung volume determinations), and on an external validation set of non-smoking elderly subjects with normal spirometry (the Madrid group). Second, we show that the ECSC equations tend to overestimate these spirometric parameters the most, while GLI-based predicted values for AEX 1 through AEX 4 are the closest to the actual normal AEX values.

# Layers # Hidden nodes
In Fig. 4 we show both central tendency (medians) and dispersion (IQR) metrics for actual AEX, AEX 1 through AEX 4 , and for two AEX predicted values, as published before by Vermaak et al. 6 and Garcia-Rio et al. 11 . Of note, the distribution of these parameters was non-gaussian (sinusoidal or logarithmic-like). In addition to these functional parameters, we included in Fig. 4 the values derived from the linear regression and ANN-based models developed de novo in this article. The ANN-based median AEX predicted (dark blue bar in Fig. 4) was the closest to the actual median AEX (red, double-hashed bar), while the model's dispersion (as assessed by the IQR) was also the smallest in the ANN-based model. Supplemental Figures S2 and S3 show the distributions of residuals (AEX predicted -AEX) by both methods and by gender and race, combining all tests from the two groups. In the figures, highlighted (dark green in Supplemental Figure S2 and dark blue in Supplemental Figure S3) represent the men, while lighter colors illustrate the distributions in women. The linear regression model tended to overestimate AEX in males, while the ANN model provided a more precise estimate of the central tendency in all subgroups. In Table 2, we show the in-between variables' average differences and their 95% CIs (yet we caution the reader that the residuals are non-normally distributed), together with RMSE (root mean square error), RASE (square root of the mean squared prediction error, calculated as the square root of the sum of squares error divided by n, measurement considered by some as equivalent to an off-sample RSME) and R 2 . As such, we confirmed the high correlations and small dispersions for ANN-based model, both in aggregate and by cohort (for the latter, data not shown).
The ANN-based models described here had as input parameters traditional predictors of lung function, i.e., subjects' gender, race or ethnicity, height, weight, and age, two layers of nine 'hidden' nodes (with three sigmoidal, three linear and three gaussian activation functions), and AEX as the output. The model developed in the Cleveland group was also validated internally-dark green (males) and light green (females) dots, followed by external validation in the Madrid group-black (males) and grey (females) dots, Fig. 5a,b. Expectedly, there was a significant 'step-down' in the model's performance, even when ANN methodology was used and by employing a traditional approach of derivation and internal validation in a population, followed by external validation in another cohort. Instead, taking advantage of the learning property of the ANN models (Fig. 6a,b), pooling all tests from the two groups leads to better predictive ability (better central tendency, smaller dispersion and higher percentage of variance explained by the model). See additional online information (link: Supplemen-tal_Material_S1), which also shows the formulas and the code used, for future validation or refinements of the models in other pulmonary function sets.
Several limitations of this investigation deserve to be mentioned. First, the current predictive models for AEX do not consider the intra-individual, test-to-test variability of the AEX measurement, which needs to be explored in future investigations. It is conceivable that, similarly to the large variability of FEF 25 , FEF 50 and FEF 75 , AEX could also present large variations. This intrinsic variability can be explored and, if found to be high, could potentially be minimized by using AEX variables in concert with other spirometric measurements, approach which can further refine the characterization of the functional impairments. Second, the Madrid cohort included very different subjects, i.e., older, White, and from a small geographic footprint. This limitation could be overcome in the future by extending the geographic coverage and the diversity of the pooled tests. This will allow the ANN models to continue to evolve (trying to the minimize the gradient descent) and to further refine the node equations based on additional variation of the inputs. Third, additional predictors of lung function can be assessed, as modern computational techniques allow us to employ fast and powerful mathematical models, leveraging the unprecedented access to big data, unavailable decades ago, or when using traditional modeling methods. Fourth, one of the disadvantages of the ANN is the complexity of the equations in the hidden nodes, leading to a perceived lack of transparency or 'black box' effect, yet it can be visualized easily at each node and in all layers. Fifth, the accuracy of the presented models or equations may not be optimal in a new experimental study that considers different ranges of age, weight and height or other racial profiles. In future training, testing and validation sets, ANN-based models may differ mathematically and deal differently with possible new sources of variance from other factors and with the potential of higher systematic bias. However, this is exactly the point we are making here when we illustrate modeling outcomes in one population with external validation in a different cohort vs 'pooling' of all tests together and devising the ANN models that use input variability from all demographic categories. Lastly, the utility of AEX needs to be explored in relationship to specific conditions and outcomes, as most measurements in modern medicine need to be 'anchored' against prevention, early diagnosis and development of personalized therapies.

Conclusion
In this investigation, we used neural network models in a pooled, geographically diverse cohort, in order to compute predicted Area Under Expiratory flow-volume curve, a spirometric measurement that may have great impact on how we define respiratory functional impairment in the future. In a large pool of normal spirometry tests, we found that the learning property of the artificial neural networks allows continuous improvement of the www.nature.com/scientificreports/ predictive models that compute the reference values for AEX and that these models may outperform traditional methods and validation approaches.

Methods
Analyses were performed on a development cohort (the Cleveland group) of 3111 consecutive adult subjects who had normal spirometry and normal same-day lung volume testing in the Cleveland Clinic Pulmonary Function Laboratory over a 10-year time span. A second cohort (the Madrid group) was constituted by 457 never-smoker healthy volunteers who met the American Thoracic Society criteria for reference subjects and participated in a Spanish study that was aimed at deriving spirometry reference values for elderly European individuals 11 . Spirometry was performed and interpreted per the current, joint American Thoracic Society (ATS) and European Respiratory Society (ERS) standards and recommendations 1,[20][21][22][23] . Lung volume assessments 4 were performed only in the Cleveland group, by either body plethysmography [24][25][26] or helium dilution 27,28 methods. Normal lung volume testing was defined as values between lower and upper limits of normal for the following parameters: total lung capacity, functional residual capacity and residual volume. All tests were done using a Jaeger-Viasys Master Lab Pro system (Wurzberg, Germany). The most recent, validated and widely applicable reference values, as developed in 'semi-parametric' regression-type models and published by the Global Lung Initiative (GLI) were used for spirometry interpretation and definition of normality 2,19 . For lung volumes, the reference values used were those published by Crapo et al. 29 . We did not use the previously published lung volume reference values developed for 65-85 year-old Europeans 30 , as the Cleveland group (the only group with lung volume determinations, which constitute gold standard in pulmonary function testing) was overall younger and likely with different anthropometric characteristics. We calculated the parameters AEX 1 through AEX 4 from FVC, FEF 25 , FEF 50 and FEF 75 , as done elsewhere 7 , and compared them with their predicted values using three of the most popular and widely used equation sets, i.e., European Community for Steel and Coal (ECSC) 18 , National Health and Nutrition Survey (NHANES) III 19 and Global Lung Initiative (GLI) 2 . The largest AEX was selected from all the pre-bronchodilator spirometry trials performed. In addition, predicted AEX was computed by using two predictive equations for AEX, as published before by Vermaak et al. 6 and Garcia-Rio et al. 11  Descriptive statistical analysis of available variables was performed. Categorical variables were summarized as frequencies or percentages. Continuous variables were characterized by mean, standard deviation, median and 25 th -75th interquartile range (IQR), as appropriate (as most distributions were non-gaussian).
The GLI equations 2 were developed and made available as Generalized Additive Models for Location, Scale and Shape (GAMLSS) in the R software package. The methods are 'parametric' in the sense that they require a parametric distribution assumption for the response variables, and 'semi' because modelling of the parameters of distribution as functions of exploratory variables may involve non-parametric smoothing functions (link: GAMLSS).
Some of the prior models for pulmonary function normal values used regular linear regression (standard least squares method) by gender and race, relying on predictive variables such as age, height and, occasionally, weight. In this work, regular regression models were improved by several types of optimization approaches, e.g., generalized additive models defining splines for means, variance and skewness (as in the GLI equations 2 ), regression regularization techniques such as ridge regression, lasso, elastic net and double lasso techniques, with and without adaptive features, using both native values and logarithmic or gamma transformations (as they represented the closest distribution fits) and comparing them with deep learning algorithms or artificial intelligence (AI) methods. The latter models were based on ANN, which could adjust for more complex relationships and interactions between variables, thus modeling more efficiently complex response surfaces. The machine learning models used here are described in more detail online (link: Supplemental_Material_S1). We tried different ANN architectures, with variable number of nodes (3)(4)(5) in the first and second layer, and different activation functions in the hidden nodes. During ablation study experiments, we selected the simplest models that provided the lowest dispersion of the predicted variables (variance) vs smallest bias, and the best trade-off between speed and performance, fitting and overfitting. We used the approach of a derivation (training) and an internal validation (testing) set from the Cleveland group with a random holdback method at 33% rate for the internal validation; following this step, we applied the model on an external validation (validation) set constituted by data points from the Madrid group (Fig. 5a,b). In another approach (Fig. 6a,b), we pooled the data from the two cohorts and developed new ANN-based models; we used a 50-25-25% random partition for training-testing-validation ('ongoing validation'), respectively. In the AI models used, we performed an analysis of the residuals (i.e., the differences between predicted and actual AEX), checking for normality, internal consistency by various parameters and for homoscedasticity of the residuals. The variables' weight in various models, independent of the model type and fitting used, was assessed by the dependent resampled inputs methods in JMP Pro15, in which factor values are constructed from observed combinations using a k-nearest neighbors' approach (k = 5 was used), in order to account for correlation. This method, used mainly when there is an assumption that the inputs (such as height, weight, gender, race and age) are possibly correlated, and treats observed variance and covariance as representative of the covariance structure for the used factors 31 . The performance of the standard least squares fit method (regression) and ANN models were assessed by using the JMP Pro15 platform and comparing the means, the residuals, as well as R 2 , square root of the mean squared prediction error (RASE) and average absolute errors (AAE).