Introduction

As per the definition provided by the Global Initiative for Asthma (GINA), asthma is considered to be a heterogeneous disease, with characteristics of chronic airway inflammation, clinical history of wheezing, shortness of breath, chest tightness and cough, together with reversible airflow limitation1. Chronic obstructive pulmonary disease (COPD), defined by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), is a disease characterized by persistent respiratory symptoms like dyspnea, cough or sputum production and irreversible airflow obstruction. Emphysema and chronic bronchitis are usually seen in these patients. The changes in COPD are brought about by inhalation of noxious particles or gases and is also influenced by host factors including abnormal lung development2,3.

A significant proportion of older patients with chronic airflow limitation (i.e. not completely reversible after bronchodilation) have diagnosis and/or features of both asthma and COPD, particularly amongst smokers4,5. Such patients are termed as ‘overlap’ cases. However, there is no generally agreed term or defining features for this category4,6,7. Asthma and COPD are both heterogeneous diseases, with a range of underlying mechanisms. Similarly, overlap itself does not represent a single disease or a single phenotype. Gibson and Simpson proposed the term asthma COPD overlap syndrome (ACOS) in 2009 to describe patients with features of both, asthma and COPD4. The GINA and GOLD global bodies later replaced the term ACOS by asthma COPD overlap (ACO) as the former was often misinterpreted as a single disease entity8. There is general agreement that patients with features of both asthma and COPD experience a more rapid decline in lung function, have poor quality of life, experience frequent exacerbations, have increased mortality and consume more resources than asthma or COPD alone9,10. These patients are generally treated with inhaled corticosteroids (ICS) with add-on treatment of long-acting beta-agonist (LABA) and/or long-acting muscarinic antagonists (LAMA), whenever necessary8. ACO patients have been excluded from randomised controlled trials (RCTs) till date and this population is also poorly characterised in most mechanistic studies. Precise diagnosis of an ACO patient is the first step towards development of an effective treatment plan.

Metabolomics is the large-scale study of small molecules, commonly known as metabolites produced during normal endogenous metabolism within biofluids, cells, tissues or organisms. The ever expanding metabolomic approach has moved on beyond biomarker discovery to provide insight into mechanistic changes associated with various physiological conditions, including diseases11. Exhaled breath condensate (EBC) has emerged as a popular clinical tool over the past two decades. EBC is a non-invasive method of sampling lung epithelial lining fluid, and is obtained by cooling the exhaled air from spontaneous breathing. It is predominantly composed of water vapour along with volatile and non-volatile substances from the lower airways. Collection of EBC involves minimal technical skills and is not associated with any discomfort or risk12. Nuclear magnetic resonance (NMR) spectroscopy is a promising approach for the identification of metabolites in EBC with prognostic and predictive significance. Though not very sensitive, NMR is characterized by inherent distinctive advantages, which include minimal sample preparation, rapid spectra acquisition time, and the possibility to perform an untargeted analysis limited to the chemical nature of metabolites. These advantages have promoted NMR-based metabolomics of EBC to the rank of a valuable method for an efficient investigation of a variety of lung diseases13.

It is evidenced that electronic-nose (eNose) and NMR metabolomics of EBC can distinguish patients with respiratory disorders, including asthma and COPD14,15. Also, EBC metabolic profiling using NMR can be effectively used for differential diagnosis between newly diagnosed asthma and COPD16. It is reported that NMR spectral signatures of EBC can be used for the discovery and characterization of different asthma endotypes17.

A number of encouraging reports related to NMR metabolomics of EBC in asthma and COPD motivates us to test the hypothesis that this method could be a helpful tool to explore ACO. For this purpose, the ability of this technique to identify metabolic patterns that discriminate the EBC metabolome of ACO from asthma and COPD is ascertained. Our previous studies using NMR, gas chromatography–mass spectrometry (GC–MS) and immunological profiling have indicated significantly altered expression of 36 metabolites and immunological markers in serum of ACO patients as compared with asthma and COPD18,19. We have also attempted to integrate these markers with the present EBC metabolomic signatures for accurate prediction of ACO.

Results

Discovery phase

The baseline clinical characteristics of all subjects recruited is tabulated in Table 1. For the initial part of the study, the discovery phase samples were assigned into two groups, obstructive lung diseases (asthma, COPD and ACO) and healthy controls. The supervised approach of partial least squares discriminant analysis (PLS-DA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA) were applied to unit variance scaled NMR binned spectra of both the groups. PLS-DA models displayed class separation while OPLS-DA, when applied to the dataset with factors unrelated to group characters removed, resulted in improved discrimination between the two groups (R2 = 0.979 and Q2 = 0.881; analysis of variance testing of cross validated predictive residuals (CV ANOVA) score p = 0) (Fig. 1a). In the permutation test, R2 and Q2 values were found significantly higher than the 200 permutated models (Fig. 1b).

Table 1 Clinical characteristics of the recruited subjects.
Figure 1
figure 1

Orthogonal projections to latent structures discriminant analysis (OPLS-DA) model removes outliers which do not contribute to class separation. OPLS-DA model shows optimized discrimination between (a) obstructive lung diseases and healthy controls (R2Y = 0.979 and Q2 = 0.881, CV-ANOVA score p = 0), (c) ACO and asthma (R2Y = 0.982 and Q2 = 0.862, CV-ANOVA score p = 0) and (e) ACO and COPD (R2Y = 0.98 and Q2 = 0.862, CV-ANOVA score p = 0). Response permutation test (n = 200) to estimate the statistical significance of the PLS-DA models. (b) Healthy controls vs. diseases R2 = (0.0, 0.433), Q2 = (0.0, − 1.22), (d) ACO vs. asthma R2 = (0.0, 0.44), Q2 = (0.0, − 0.248) and (f) ACO vs. COPD R2 = (0.0, 0.706), Q2 = (0.0, − 0.541). All models were generated using SIMCA 13.0.2 (Umetrics, Sweden). COPD chronic obstructive pulmonary disease, ACO asthma COPD overlap.

Furthermore, OPLS-DA model optimized separation between ACO vs. asthma (R2 = 0.982 and Q2 = 0.862; CV ANOVA score p = 0) (Fig. 1c) and ACO vs. COPD (R2 = 0.98 and Q2 = 0.862; CV ANOVA score p = 0) (Fig. 1e). Permutation test indicated that the generated models could predict classes better than chance (Fig. 1d,f).

A typical representative 1H NMR EBC spectrum of an ACO patient comprising of signals arising from energy metabolites, organic acids and amino acids is shown in Fig. 2. Eighteen metabolites could be consistently identified using 2D NMR experiments, human metabolome database (HMDB) and literature. Analysis was based on these consistently identified metabolites.

Figure 2
figure 2

Representative 800 MHz 1H-NMR zgesgp spectra of exhaled breath condensate (EBC) collected from an ACO patient. 1.TSP 2. Fatty acid 3. Propionate 4. Isopropanol 5. Ethanol 6. Lactate 7. Threonine 8. Alanine 9. Acetate 10. Proline 11. Acetone 12. Valine 13. Pyruvate 14. Trimethylamine 15. Methanol 16. Glycerol 17. 2,3-butanediol 18. Phenylalanine 19. Formate. The NMR spectra was assigned using MestReNova version 7.1.0 (Mestrelab Research, Santiago de Compostela, Spain).

Based on variable-importance in projection (VIP) ≥ 1 and S-line plot correlation value |r|≥ 0.6, highly significant variables responsible for class separation between disease and controls were identified for ACO vs asthma and ACO vs COPD. Ten bins and twelve bins, respectively qualified the screening criteria described above. Eight common bins corresponding to metabolites from both ACO vs asthma and ACO vs COPD models were considered for further analysis. These bins corresponded to the metabolites propionate, lactate, valine, fatty acid, acetone, isopropanol, formate and methanol. To cross-validate the binning approach, integral values of these metabolites were subjected to univariate analysis (UVA). Metabolites such as propionate, isopropanol and acetone were found to be up-regulated in ACO as compared to both, asthma and COPD (Table 2). Only valine was observed to be downregulated in ACO as compared to both, asthma and COPD. Fatty acid, lactate and methanol were also found to be significantly dysregulated with contrasting trends in ACO with respect to asthma and COPD. The sample peaks identified in the chemical shift region 5.1–9.0 ppm (aromatic region) were also subjected to one way ANOVA (Dunnett’s post hoc test) or Kruskal–Wallis test (Dunn’s post hoc test), as applicable and only formate level was found to be significantly dysregulated in ACO.

Table 2 Human Metabolome Database identifiers (HMDB ID), multivariate data analysis (variable influence on projection (VIP) scores, false discovery rate (FDR) adjusted p value), fold changes and pairwise univariate (ANOVA/Kruskal Wallis test) values are provided for the 8 significantly altered metabolites common to ACO vs. asthma and ACO vs. COPD.

Receiver operating characteristic (ROC) curve analysis

Next, multivariate ROC curves were generated for the significantly altered 8 common metabolites of the two groups (ACO vs. asthma and ACO vs. COPD). In order to establish a predictive model that could differentiate ACO from a group of both asthma and COPD patients together, we merged the data set of asthma and COPD (referred to as 0) and ACO (referred to as 1). ROC curves were generated by Monte-Carlo cross validation (MCCV), a method involving balanced sub-sampling. In each MCCV, two thirds (2/3) of the samples were used to evaluate the feature importance. The classification models were generated using the top important features, subsequently the models were validated on the 1/3 of the samples left out. The procedure was repeated multiple times to calculate the performance and confidence interval of each model. The multivariate ROC curves based on cross validation (CV) performance are shown in Fig. 3a. The curves of all models are averaged across all CV runs. The highest AUC was obtained for model 5, generated using 6 features (Fig. 3b). The significant features ranked using mean importance measure for the model are shown in Fig. 3c. The metabolites acetone, isopropanol, valine, propionate, fatty acid and lactate show maximum importance in generation of the 5th model.

Figure 3
figure 3

(a) Comparison of various receiver operating characteristic (ROC) curves using different number of variables based on Monte Carlo cross validation (MCCV) performance at 95% confidence interval. The ROC curves were generated using Metaboanalyst 4.0 (https://www.metaboanalyst.ca/). Var. variable, AUC area under curve, CI confidence interval. (b) Model 5 ROC curve generated using 6 variables shows the highest AUC = 0.852 and good power in discriminating ACO from asthma and COPD. (c) Significant features plot ranked by frequencies of being selected for classification derived from 6 variable multivariate receiver operating characteristic (ROC) curve.

Validation phase

For validation of the 8 EBC metabolites identified in the discovery phase, UVA was performed on the data generated in the validation cohort. These metabolites viz propionate, lactate, valine, fatty acid, acetone, isopropanol, formate and methanol exhibited a similar trend yet again in the new cohort, thereby confirming our findings (Table 2). To strengthen our findings, MVA was also performed on the validation cohort and the results were found to be similar to that of the discovery cohort (“Supplementary Materials").

Correlations and predictive modelling

A combination of positive and negative correlations were observed between metabolites and immunological markers. This visualization further supports the hypothesis of tight interplay between metabolism and inflammation in ACO. Except interleukin-1beta (IL-1β) and IL-17E, the inflammatory markers exhibited negative correlation with the significantly altered metabolites. These metabolites predominantly showed a positive correlation with lung function parameters such as forced expiratory volume in one second (FEV1) and ratio of the forced expiratory volume in one second to the forced vital capacity of the lungs (FEV1/FVC). The inflammatory markers, on the other hand, mostly displayed negative correlation with lung function parameters (Fig. 4).

Figure 4
figure 4

Correlation heat map of 46 significantly altered markers in ACO using Pearson’s correlation coefficients. The analysis was performed using Scikit-learn package with Python 3.8 (https://scikit-learn.org/stable). (nmr)—depicts the altered metabolites detected in serum using NMR based metabolomics, (gc)—depicts the altered metabolites detected in serum using GC MS based metabolomics, (ebc)—depicts the altered metabolites detected in EBC using NMR based metabolomics.

Random forest (RF) classifier (number of trees = 50) was built via tenfold cross-validation and feature ranking obtained to map the feature importance scores which play an important role in predictive modelling, as shown in Fig. 5a. This provides an insight into the data as well as the model, and the basis for dimensionality reduction. Feature selection can improve the efficiency and effectiveness of a predictive model. A multivariate ROC analysis with the RF classifier was performed by adding the features sequentially in a descending order of relative importance (Fig. 5a). The mean and standard deviations (SDs) of ROC AUC and their corresponding true positive rates (TPR) and false positive rates (FPR) are presented in Fig. 5b,c, respectively.

Figure 5
figure 5

(a) Feature ranking obtained by mapping the feature importance scores (46 variables) which play an important role in predictive modelling using random forest (RF) classifier (number of trees = 50) which was built via tenfold cross-validation classifier and used to predict whether ACO can be effectively distinguished from asthma and COPD patients taken together. (b) The mean and standard deviations (stdev) in area under curve (AUC) of multivariate receiver operating characteristic (ROC) with the RF classifier of the most important features sequentially in a descending order of relative importance. (c) The corresponding true positive rate (TPR) and false positive rate (FPR) of multivariate ROC. (d) A combination of all platforms provides the highest mean AUC with least uncertainty and largest TPR with smallest FPR for robust classification of ACO from asthma and COPD. The analysis was performed using Scikit-learn package with Python 3.8 (https://scikit-learn.org/stable) (nmr)—depicts the altered metabolites detected in serum using NMR based metabolomics, (gc)—depicts the altered metabolites detected in serum using GC MS based metabolomics, (ebc)—depicts the altered metabolites detected in EBC using NMR based metabolomics, nmr_ftr—depicts the altered metabolites detected in serum using NMR based metabolomics, gc_ftr—depicts the altered metabolites detected in serum using GC MS based metabolomics, ebc_ftr—depicts the altered metabolites detected in EBC using NMR based metabolomics, im_ftr—depicts the altered immunological markers detected in serum, all_ftr—depicts the combination of features obtained from all the platforms.

Finally, a comparison of all platforms in terms of serum NMR metabolomic fingerprints, serum GC–MS fingerprints, EBC NMR metabolomic fingerprints and immunological mediator fingerprints using the same classifier was performed, both individually and taken together. It was observed that combination of all these platforms provide the highest mean AUC with least uncertainty and largest TPR with smallest FPR for robust classification of ACO from asthma and COPD (Fig. 5d).

Discussion

EBC profiling using NMR metabolomics has increasingly gained popularity for the identification of metabolic phenotypes in various respiratory diseases. The present study discusses the metabolite content of EBC collected from subjects with ACO and their potential to discriminate ACO from asthma and COPD. NMR could unequivocally identify distinct metabolic signatures in EBC of patients with ACO. Furthermore, the method unambiguously recognizes metabolites responsible for between-group differences (asthma, COPD and healthy controls), thereby strongly suggesting identification of a set of promising unbiased biomarkers characterizing ACO.

Valine, an essential glucogenic amino acid, was found to be downregulated in ACO with respect to both asthma and COPD. Studies by other groups have shown that plasma levels of the branched chain amino acids (BCAAs) are significantly lower in patients with severe COPD than in controls20,21. Since BCAAs are utilized for muscle protein synthesis, their reduced levels appear to be consistent with muscle wasting and weight loss that are known to occur in advanced COPD22. It is, therefore, suggested that the significantly decreased level of valine in ACO reflects an enhanced metabolic demand. A similar trend in the expression of valine was also observed in serum of these subjects in our earlier study18.

The increased expression of lactate in EBC may be attributed to increased glycolysis due to an imbalance in oxygen supplement and demand, as explained by the “Warburg effect”. The concentration levels of lactate have been extensively studied in asthma23,24 and COPD25,26. Our observation, yet again, suggests an increased energy demand of ACO subjects. Also, a similar trend in lactate expression has been observed in serum of ACO subjects19.

Numerous groups have reported altered expressions of a common volatile organic compound (VOC), acetone in EBC of various respiratory diseases16,21. Montuschi et al. (2012) have indicated an increased expression of acetone in cystic fibrosis patients as compared with heathy controls27. This observation suggests increased lipid catabolism to meet the added energy requirements of the disease28. Our findings too indicate that the levels of acetone in EBC are significantly higher in ACO patients when compared to both, asthma and COPD.

Isopropanol or 2-propanol, yet another VOC, exhibited a similar trend in expression, as observed with acetone. It was found to be significantly upregulated in ACO. Elevated 2-propanol concentration in EBC might be due to bacterial metabolism and/or increased lipolysis and lipid peroxidation29. Endogenous formation of 2-propanol can occur in humans from reduction of acetone by liver alcohol dehydrogenase, mainly when increased levels of acetone and high NADH/ NAD + ratio occurs, as observed in ketosis30. Increased levels of 2-propanol concentration is also documented in COPD and cystic fibrosis cases27,31.

Methanol, another VOC, was also found to be significantly higher in ACO with respect to asthma, but lower in comparison to COPD. It is reported that increased methanol may be associated with airway inflammation32. Methanol, present in human breath33, is a breakdown product of formaldehyde, which is demonstrated to exacerbate airways inflammation in A549 alveolar and BEAS-2B bronchial cells34. Methanol levels in ACO may be linked to the heightened inflammatory status of the patients.

Altered levels of fatty acids in EBC are extensively reported16,35,36,37. Fatty acid levels in ACO are significantly dysregulated with an expression pattern similar to that of serum observed in these patients19. Short-chain fatty acids (SCFAs) regulate several leukocyte functions linked to the production of cytokines, eicosanoids, and chemokines and seem to affect leukocyte migration to the foci of inflammation38. Since they are also involved in energy requirement, their decrease might reflect the tight interplay between inflammation and energy production36. SCFAs can also originate from different pyruvate metabolism pathways39. Various studies on cell/animal models40,41 suggest that fatty acid residues can be harvested from lipids and may serve as energetic substrates in conditions of inhibited glycolysis and up-regulated β-oxidation42.

An increased expression of propionate was observed in EBC of ACO when compared with asthma and COPD. Given the role of propionate in the inhibition of cholesterol synthesis, its increase may also suggest the involvement of lipid metabolism. This finding is in agreement with the reports of several research groups, where increased lipolysis has been attributed to this trend13,31. Few groups have also suggested that increased propionate level could be the effect of drugs regularly administered to these patients13.

Elevated levels of formate in EBC are reported in patients with COPD as compared with healthy subjects31,43. These findings have been confirmed in a recent 1H-NMR spectroscopy study which shows that formate concentration in EBC is 2.5 higher in patients with emphysema due to α1-antitrypsin deficiency than in healthy subjects13. Formate levels were higher in ACO with respect to asthma but lower than that of COPD in the present study.

To make the findings more robust and reproducible, expression levels of the 8 metabolites identified were validated in a fresh cohort of subjects. A similar trend in expression was observed in the validation cohort. These findings are in accordance with our earlier serum metabolomic findings18,19. Dysregulation of a number of metabolites in EBC of ACO patients suggests that the energy and metabolic requirement in ACO is more severe when compared to COPD or asthma alone. This could be attributed to the higher disease severity associated with ACO. Also, multivariate ROC curves could discriminate ACO from the combined group of asthma and COPD with good sensitivity and specificity. Furthermore, the models generated from these multivariate ROC curves suggest that the combination of these metabolites provide a robust classification model which is better than using any one or two metabolites as a biomarker for differentiating ACO from both asthma and COPD taken together. In conclusion, NMR metabolomics using EBC as a biofluid holds potential to be explored as a platform for development of non-invasive biomarkers responsible for identification of ACO from a cohort of asthma and COPD.

Earlier, metabolic and immunological signatures have been identified by our group in serum of these ACO subjects differentiating it from both asthma and COPD19. Here, a correlation heat map is generated using the 46 markers which comprise of a wide array of significantly altered metabolites, immunological mediators and clinical parameters from both the present and previous studies of our group. Using these variables, we have also attempted to use RF classifier to predict whether a combination of markers generated using various platforms is useful for optimum classification of ACO, and whether ACO can be effectively distinguished from asthma and COPD patients taken together. A random forest (RF) algorithm is an extremely reliable classifier and has become popular as a biomarker detection tool in various metabolomics studies. Using RF as a classifier has the following advantages: simple theory, fast speed, stable and insensitive to noise, little or no overfitting, and automatic compensation mechanism on biased sample numbers of groups44,45,46. It constructs an ensemble of decision trees, which is a combination of tree-structured predictors. Each tree is independently constructed using a bootstrap sample of the original data. This serves as the training data, which is used to build the classification model. Using predictive random forest algorithms on our present and earlier reported findings, it is concluded that a combination of 10 markers i.e. glucose, glutamate, D-mannose, IL-6, citrate, butanedioic acid, neutrophil gelatinase-associated lipocalin (NGAL), chitinase 3 like 1 (YKL-40), FEV1/FVC and IL-5 provide optimum classification of ACO. Based on these findings, a schematic representation of the various biochemical cycles found to be dysregulated in ACO subjects is shown in Fig. 6. It exhibits perturbation of key metabolism cycles such as tricarboxylic acid (TCA) cycle, glycolysis and lipid metabolism. This could possibly be responsible for the higher disease burden in ACO. Enhanced inflammatory response was also observed in patients with ACO. The dysregulated key immunological mediators hold promise in designing strategies for clinical management of ACO subjects in future.

Figure 6
figure 6

Proposed schematic diagram of the dysregulated biochemical pathways in ACO.

There are several limitations associated with this study. First, the ACO subjects in this study are selected using GINA/GOLD and American Thoracic Society (ATS) roundtable diagnostic criteria; hence, the present findings should not be extended to patients diagnosed using other guidelines7,47,48. Second, here only South Asian male subjects are recruited; genetic, race and sex differences adjustment is necessary before generalization of our findings49. Third, the present findings are related to ACO patients without active exacerbations and respiratory infections. Alterations in metabolic signatures are likely in presence of lung infections and exacerbations50,51,52. Fourth, validation of the models in a fresh patient cohort could not be performed. However, we did include a fresh cohort of patients for each sub-group, i.e. asthma, COPD and ACO. Fifth, stage IV COPD patients could not be included since most of them reported with co-morbidities and exacerbations. However, we propose to conduct multi-centric studies to validate our findings. This will increase the sample size considerably and facilitate inclusion of a group of severe COPD patients for comparison with ACO. Finally, limited sensitivity of NMR spectroscopy as a metabolomics tool is well realized. Integration of NMR with complementary GC-qTOF and LC–MS/MS would increase metabolome coverage and improve data quality significantly. Our group is presently involved in using these approaches on the same patient cohort for complementary information.

EBC analysis has garnered a lot of interest in recent times. It is undoubtedly an advantageous technique, being a non-invasive, effortless, and painless method of respiratory biological material collection. However, it is a less explored biofluid as compared to serum, plasma, urine etc. and has certain limitations which require attention. The diversity of EBC itself has prevented it from achieving clinical applicability12. It is a dilute, complex solution of diverse molecules with different chemical stabilities. EBC is possibly not the best matrix for exploring VOCs and studies should be limited to non/semi volatile compounds. The low concentration of the dissolved molecules in EBC is one of its major limitations. The measurement of these molecules is limited by the sensitivity of most assays53. Standardization of biomarker analysis in EBC remains a challenge. High variability and low reproducibility in exhaled markers may be explained by differences in a large number of pre-collection, collection and post-collection conditions. Conditions and duration of storage may also affect the assayed concentrations of biomarkers. It is also important to note that despite EBC sampling established to be a method of sampling the airways that is generally acceptable to patients12, a few patients reporting to our clinic, especially those reporting with severe disease condition, found this technique challenging. This poses a question regarding the clinical utility of the method, though extensive use of this method in research is well recognized.

Comparison of efficacy and reproducibility of sampling between different devices requires further investigation. Further studies are also required to establish if different mediators are evenly distributed in the expired air and if collection of all water vapour expired would decrease variability of EBC data. Flow-dependency and the potential for oral/upper airway/salivary contamination needs to be explored54. Inter-individual differences in the generation of particles in EBC during inhalation may exist. As of yet no fully validated method for calculating dilution of respiratory droplets is available and the anatomic origin of biomarkers is not precisely known. Therefore, it can be said that even though EBC is a promising biofluid, there are a number of challenges which needs better understanding and standardizations before its potential can be fully explored.

Though biomarkers are known to positively impact patient care by permitting early detection of disease which increases therapeutic efficacy considerably, it is well recognized that very few omics-derived biomarkers have made their way to the clinic so far. Disease heterogeneity, complexity of the involved cellular processes, non-ideal study design, and limited methodological robustness are the major bottlenecks that are responsible for the huge gap between the number of omics-based biomarkers reported in literature and those introduced to the clinical set-up. Identification of a panel of markers highly specific to the disease with a perfect diagnostic potential of 100% sensitivity and specificity is desirable and remains a challenge. There is a need to establish the exact concentration and cut-off levels of candidate markers for their use in clinical settings.

In recent years, there has been an increasing interest in understanding the group of individuals having features of both asthma and COPD. As the complexity of ACO as a disease entity is gradually unraveled and better understood, a further revision in ACO definition would likely be required. It is envisaged that the findings from this exploratory study will motivate researchers and direct their attention towards unravelling the intricacies of ACO pathophysiology which remains to be elucidated.

Material and methods

Subject selection and sample collection

Patients reporting to the Institute of Pulmocare and Research (IPCR), a tertiary respiratory care clinic at Kolkata, India were considered for this study. Based on the clinical history, questionnaire data, pulmonary function tests (PFTs) and diagnosis by the clinicians, the subjects were divided into four groups: (i) patients with asthma (ii) patients with COPD (iii) patients with ACO and (iv) healthy controls with normal lung function. Informed consent was obtained from all participants. Approval was obtained from the Institutional Ethics Committee of IPCR, Kolkata prior to commencement of the study. The detailed inclusion and exclusion criteria are mentioned elsewhere18,19.

Asthma was diagnosed as per GINA (2014) guidelines55. Subjects with wheezing in the past 12 months and reversibility (increase in post-bronchodilator (post-BD) FEV1 or FVC ≥ 200 ml and ≥ 12% baseline change) or asthma diagnosed earlier were included. All patients enrolled were moderate or severe asthma cases. Diagnosis of COPD was according to the GOLD (2014) criterion56. Subjects with FEV1 to FVC ratio < 70% post BD were included. All COPD subjects included were moderate (Stage II) and severe COPD (Stage III) cases.

Joint guidelines of GINA and GOLD and ATS roundtable discussion57,58,59,60 was used to diagnose ACO. The main criteria considered were (i) persistent airflow limitation [post-BD FEV1/FVC < 0.70] in individuals 40 years of age or older (ii) tobacco smoking ≥ 10 pack-years and (iii) bronchodilator response (BDR) of > 400 ml in FEV1 or history of asthma before 40 years of age. The minor criteria taken into consideration were (i) history of allergic rhinitis or atopy (ii) BDR of FEV1 ≥ 200 ml and 12% change from baseline values on two or more visits (iii) peripheral blood eosinophil count ≥ 300 cells/μl. All major criteria and at least one minor criterion were considered for inclusion of ACO subjects. History of alpha-1 antitrypsin deficiency (AATD) in the family of COPD and ACO subjects was excluded. In order to avoid gender and smoking induced bias, only present/former male smokers were considered. All volunteers were also matched for age and BMI to minimize bias.

For comparison purposes, healthy age-matched male smokers were considered as controls. Patients with history of exacerbations and those who had received oral corticosteroid (OCS) treatment during the last three months were excluded. Patients with co-morbidities including metabolic disorders were also excluded. For this pilot metabolomic study, two independent patient cohorts, the discovery and validation group having the same exclusion and inclusion criteria were considered. The discovery phase patient cohort comprised of (i) asthma = 34 (ii) COPD = 30 (iii) ACO = 35 and (iv) controls = 33. The validation phase cohort consisted of (i) asthma = 32 (ii) COPD = 32 and (iii) ACO = 40 subjects.

EBC was collected in single-use disposable collection circuits using TURBO-DECCS 14 system (Medivac, Parma, Italy), according to the manufacturer’s instructions. All samples were collected based on the recommendations of the ATS/ERS Task force12. Patients were requested to be in resting condition for 30 min prior to collection. Smoking was not permitted for a minimum of 12 h. All samples were collected following a minimum of 12 h overnight fasting. Tidal breathing into the mouthpiece was advised for about 15–20 min till 2–3 ml EBC was collected. EBC collection vials were immediately sealed and stored at − 80 °C. During initial standardization of the collection process, two samples of EBC were collected from each subject for a total of 12 subjects (3 healthy controls, 3 asthma, 3 COPD and 3 ACO) within the same day (at time 0 h and 8 h). The spectral regions (excluding the water region) were integrated and normalized to the total spectrum area to avoid possible variation in metabolite concentrations. SD was calculated to check for natural variance and was found to be within ± 1.96, which according to the Bland–Altman test indicates good within-day repeatability61.

Sample processing

Briefly, EBC samples were thawed and homogenized before performing the NMR experiments. A volume of 70 μl of D2O containing 1 mM TSP and 3 mM sodium azide was added to 630 μl of thawed condensate, thus making 700 μl of total volume and centrifuged at 8000 rpm for 5 min. The samples were transferred to 5 mm NMR tubes and subjected to NMR analysis.

NMR measurements

1D NMR spectra of EBC samples were acquired, as discussed elsewhere31,62,63. Briefly, 1D spectra were recorded on a Bruker AVANCE NEO 800 MHz spectrometer equipped with TCI CryoProbe operating at a frequency of 800 MHz (1H) and at a probe temperature of 27 °C. The water resonance was suppressed by using the excitation sculpting pulse sequence, zgesgp according to the manufacturer’s instructions. In the present study, the carrier frequency (01) value was set at the water resonance, the relaxation delay was 3.5 s, acquisition time 0.85 s, the spectral width was 9615 Hz, the time domain was 16 K and the number of transients was 256. This resulted in a total acquisition time of about 18 min per sample. For every sample, the probe was perfectly tuned and the 90 degree pulse width determined. For processing, a shifted sine bell (SSB) of 2 Hz was applied before Fourier transformation and a real spectrum size of 16 K used.

The resulting spectra were phased and baseline corrected offline using MestReNova version 7.1.0 (Mestrelab Research, Santiago de Compostela, Spain) software. TSP served as the chemical shift reference point (δ = 0.00 ppm) and a concentration standard for all samples. Each metabolite was identified from earlier published literature and cross verified with the HMDB 3.6 and 2D NMR experiments like correlation spectroscopy (COSY) and total correlation spectroscopy (TOCSY).

Quality control (QC) samples were prepared to monitor the analytical variability and reproducibility of NMR data acquisition over time. Equal volume of EBC (100 µl) was aliquoted from all subjects (discovery and validation cohort) and pooled together to prepare the QC samples. A total of 4 QC samples were included per day with each sample introduced at the beginning, after every 10 samples, and at the end of each day, respectively of NMR experimentation (~ 25 samples per day). CVs for each of the 18 consistently identified and quantified metabolites were calculated based on the QC samples (~ 40) and was found to be < 15%. In addition, pH of each sample (including QC samples) was checked and adjusted prior to each run.

Data pre-processing

Multivariate analysis was performed on the spectral region of 0.5–4.0 ppm (excluding residual water signal: 4.0–6.0 ppm). This region was bucketed/binned into various integrated segments of equal frequency window (0.03 ppm) using MestReNova. The binned data matrix was normalized and scaled to the working region of 0.5 to 4.0 ppm. Normalization (by sum) was performed on the binned data sets to compensate for the variation in concentration between the samples and to represent each data point as a fraction of the total integral value of the spectra. Following normalization, unit variance scaling was performed using Metaboanalyst 4.0. Data transformation and scaling are two different strategies to make features more comparable. The preprocessed data was subjected to MVA using SIMCA 13.0.2 (Umetrics, Sweden)64.

Multivariate analysis and statistical model validation

Various multivariate statistical approaches exist to understand the complex structures of metabolomic data. Supervised classification models including PLS-DA and OPLS-DA were generated using SIMCA 13.0.2 (Umetrics, Sweden). A coefficient of variation plot was used to represent differences in the metabolite concentration between the groups. The parameters including R2, Q2, and CV-ANOVA score were used to detect robustness of the OPLS-DA model. The most significantly altered bins (corresponding to dysregulated metabolites) were identified based on VIP score with a threshold of VIP ≥ 1 and S-line plot with correlation value r ≥ 0.6.

Univariate statistical analysis of selected metabolites

Following constant sum normalization using MestReNova, the selected altered metabolite signals in the discovery cohort were extracted and subjected to spectral integration. Statistical significance of the mean integral values for the corresponding metabolites between the groups was obtained using one way ANOVA (Dunnett’s post hoc test) or Kruskal–Wallis test (Dunn’s post hoc test), whichever applicable (GraphPad Prism version 7.00 for Windows, GraphPad Software, San Diego, CA, USA). The peaks identified in the chemical shift region 6.0–9.0 ppm (aromatic region) were also subjected to one way ANOVA (Dunnett’s post hoc test) or Kruskal–Wallis test (Dunn’s post hoc test), as applicable. Statistical significance was considered to be p < 0.05. Fold change analysis was also performed for ACO vs. asthma and ACO vs. COPD.

The most significantly altered metabolites identified in the discovery phase were further validated by integrating the spectra for quantitative measurements in a fresh cohort of subjects (validation cohort) using the same univariate techniques.

Multivariate ROC curve correlation analysis and predictive modelling

Multivariate ROC was performed on all significantly altered EBC metabolites and the AUC calculated (Metaboanlyst 4.0) using classification method support vector machine (SVM) and feature ranking as SVM built-in.

Our earlier studies using GC–MS metabolomics indicated 11 metabolites [serine, threonine, ethanolamine, glucose, cholesterol, 2-palmitoylglycerol, stearic acid, lactic acid, linoleic acid, D-mannose and succinic acid] to be significantly altered in serum of ACO patients as compared with asthma and COPD19. NMR based metabolomics evidenced dysregulation of 12 metabolites including lipids, isoleucine, N-acetylglycoproteins (NAG), valine, glutamate, citric acid, glucose, L-leucine, lysine, asparagine, phenylalanine and histidine in ACO patients18. Also, 13 immunological mediators including tumor necrosis factor alpha (TNF-α), IL-1β, IL-17E, granulocyte macrophage-colony stimulating factor (GM-CSF), IL-18, NGAL, IL-5, IL-10, monocyte chemoattractant protein- 1 (MCP-1), YKL-40, interferon gamma (IFN-γ), IL-6 and transforming growth factor (TGF-β) showed distinct expression patterns in ACO19.

A correlation heat map was generated using a total of 46 variables dysregulated in ACO. The wide array of significantly altered metabolites, immunological mediators and clinical parameters consisted of 8 EBC markers identified in the present study, 2 lung function parameters, and 36 serum markers generated from our previous studies. Pearson’s correlation coefficients were calculated between all the features using Scikit-learn package with Python 3.8. Using these 46 variables, RF classifier65 was used to predict whether ACO can be effectively distinguished from asthma and COPD patients taken together. To develop this binary classifier, ACO was defined as one group and asthma and COPD, taken together, as the other.

Ethics approval

All procedures performed in the study involving human participants were done in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments.

Consent to participate

Informed consent was obtained from all individual participants included in the study.